[ 
https://issues.apache.org/jira/browse/MAHOUT-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184122#comment-15184122
 ] 

ASF GitHub Bot commented on MAHOUT-1640:
----------------------------------------

Github user smarthi commented on the pull request:

    https://github.com/apache/mahout/pull/81#issuecomment-193536262
  
    Seems like it's ASL 2.0 -
    https://github.com/vigna/fastutil/blob/master/LICENSE-2.0
    
    +1 from me, good to go.
    
    On Mon, Mar 7, 2016 at 7:21 PM, Dmitriy Lyubimov <[email protected]>
    wrote:
    
    > @vigna <https://github.com/vigna> is 0.7.2 fastutil is still the best
    > version to use? I can't immediately find the license on it?
    > @smarthi <https://github.com/smarthi> et. al. : need a few votes on
    > inclusion of fastutil as a dependency
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/mahout/pull/81#issuecomment-193522992>.
    >



> Better collections would significantly improve vector-operation speed
> ---------------------------------------------------------------------
>
>                 Key: MAHOUT-1640
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1640
>             Project: Mahout
>          Issue Type: Improvement
>          Components: collections
>         Environment: Darwin lithium.local 14.1.0 Darwin Kernel Version 
> 14.1.0: Mon Dec 22 23:10:38 PST 2014; root:xnu-2782.10.72~2/RELEASE_X86_64 
> x86_64 i386 MacBookPro10,1 Darwin
> java version "1.8.0_31"
> Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
>            Reporter: Sebastiano Vigna
>            Assignee: Suneel Marthi
>              Labels: legacy, math, scala
>         Attachments: fastutil.patch, speed-fastutil, speed-std
>
>
> The collections currently used by Mahout to implement sparse vectors are 
> extremely slow. The proposed patch (localized to RandomAccessSparseVector) 
> uses fastutil's maps and the speed improvements in vector benchmarks are very 
> significant. It would be interesting to see whether these improvements 
> percolate to high-level classes using sparse vectors.
> I had to patch two unit tests (an off-by-one bug and an overfitting bug; both 
> were exposed by the different order in which key/values were returned by 
> iterators).
> The included files speed-std and speed-fastutil show the speed improvement. 
> Some more speed might be gained by using everywhere the standard 
> java.util.Map.Entry interface instead of Element.
> DISCLAIMER: The "Times" set of tests has been run multiplying two identical 
> vectors. The standard tests multiply two random vectors, so in fact they just 
> test the speed of the underlying map remove() method, as almost all products 
> are zero. This is not very realistic and was heavily penalizing fastutil's 
> "true deletions". Better tests, with a typical overlap of nonzero entries, 
> would be even more realistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to