No, Koloboke is bigger IIRC, to the tune of 20+ MB. It kind of matters when
this is part of an MR or Spark app that will be shipped to a bunch of
workers repeatedly. Not critical. But can be worth 'tuning' in the
assembly.
On Jan 18, 2015 2:18 PM, "Benson Margulies" <bimargul...@gmail.com> wrote:

> Sean, is Koloboke smaller? And why does, pardon me, size matter?
>
>
> On Sun, Jan 18, 2015 at 3:18 AM, Sean Owen <sro...@gmail.com> wrote:
>
> > FWIW I prefer Koloboke at this stage. Fastutil is Apache licensed, both
> > are. The only drawback is the massive size of the dependency. The
> resulting
> > assembly artifact could be slimmed to only include required classes. Yes
> at
> > this stage I would have used a library though and not made custom code.
> > On Jan 17, 2015 10:35 PM, "Sebastiano Vigna" <vi...@di.unimi.it> wrote:
> >
> > > Dear developers,
> > > I'm writing to suggest to improve significantly Mahout's speed by
> > > replacing the current, Colt-based collections with faster collections.
> > > These are results from benchmarks at java-performance.info comparing
> > > fastutil and Mahout in get operations (Mahout collections were not
> > included
> > > in the java-performance.info tests):
> > >
> > > tests.maptests.primitive.MahoutMapTest (10000) = 2176.1182139999996
> > > tests.maptests.primitive.FastUtilMapTest (10000) = 782.8528527999999
> > > tests.maptests.primitive.MahoutMapTest (100000) = 2630.1235654
> > > tests.maptests.primitive.FastUtilMapTest (100000) = 1074.9035660000002
> > > tests.maptests.primitive.MahoutMapTest (1000000) = 3969.1322968
> > > tests.maptests.primitive.FastUtilMapTest (1000000) = 1940.7466792
> > >
> > > This is with fastutil 6.6.1, which is comparable in speed to Koloboke
> or
> > > the GS collections (the java-performance.info tests use an older,
> slower
> > > version), and, I believe, faster for the purposes of Mahout. Get
> > operations
> > > in Mahout collections are 2-3x slower.
> > >
> > > I modified locally RandomAccessSparseVector to use fastutil, and run
> some
> > > of the VectorBenchmarks.
> > >
> > > 0    [main] INFO  org.apache.mahout.benchmark.VectorBenchmarks  -
> Create
> > > (copy) RandSparseVector        mean   = 12.57us;       mean   =
> 64.88us;
> > > 32935 [main] INFO  org.apache.mahout.benchmark.VectorBenchmarks  -
> Create
> > > (incrementally) RandSparseVector
> > > mean   = 31.77us;       mean   = 79.33us;
> > > 244212 [main] INFO  org.apache.mahout.benchmark.VectorBenchmarks  -
> Plus
> > > RandSparseVector
> > > mean   = 47.36us;       mean   = 101.63us;
> > >
> > > On the left you can find the fastutil timings, on the right the Mahout
> > > timings. The only case in which I saw a slowdown is for some
> dense/sparse
> > > products:
> > >
> > > 429433 [main] INFO  org.apache.mahout.benchmark.VectorBenchmarks  -
> Times
> > > Rand.fn(Dense)        mean   = 78us;  mean   = 52.47us;
> > >
> > > but I think this is due to the different way removals are handled:
> Mahout
> > > uses tombstones (and thus slows down all subsequent operations),
> whereas
> > > fastutil does true deletions, which are slightly slower at remove time,
> > but
> > > make subsequent operations faster. Also, iteration over a
> fastutil-based
> > > RandomAccessSparseVector is slowed down by having to return
> non-standard
> > > Element instances instead of Map.Entry instances (as fastutil or the
> JDK
> > > would do naturally).
> > >
> > > If you'd like to benchmark the speed at a high level, the one-file
> > drop-in
> > > is included (you'll need to add fastutil 6.6.1 as a dependency to
> > > mahout-math). As I said, things can be improved by using a standard
> > > Map.Entry (Int2DoubleMap.Entry) instead of Element. But this is a more
> > > pervasive change.
> > >
> > > Ciao,
> > >
> > >                                         seba
> > >
> > >
> > >
> > >
> > > PS: One caveat: presently fastutil does not shrink backing arrays,
> which
> > > might not be what you want. It will, however, from the next release.
> > >
> > >
> > >
> >
>

Reply via email to