I have some cross-checks offline (fastutil, pcj, colt, trove). I didn't want to publish them because a great deal depends on the test case (micro-benchmark), CPU, JVM, system architecture, memory speed... you name it. This said, if you're curious, a simple bigram counting on real-data (using a hash map of int->int (char|char concatenation as key)). The results are similar with the growing number of rounds. Note "jcf" (java foundation classes): the "jcfWithHolder" is basically HashMap<Integer, MutableInteger>. Like I said -- in most benchmarks, built-in JVM data structures do very well.
Dawid BigramCounting.hppc: [measured 5 out of 7 rounds] round: 0.19 [+- 0.02], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.49, time.warmup: 0.56, time.bench: 0.93 BigramCounting.trove: [measured 5 out of 7 rounds] round: 0.64 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 4.52, time.warmup: 1.32, time.bench: 3.21 BigramCounting.fastutilOpenHashMap: [measured 5 out of 7 rounds] round: 0.65 [+- 0.11], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 4.24, time.warmup: 1.00, time.bench: 3.24 BigramCounting.fastutilLinkedOpenHashMap: [measured 5 out of 7 rounds] round: 0.52 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 3.44, time.warmup: 0.84, time.bench: 2.60 BigramCounting.pcjOpenHashMap: [measured 5 out of 7 rounds] round: 0.60 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 1, GC.time: 0.01, time.total: 4.16, time.warmup: 1.17, time.bench: 2.99 BigramCounting.pcjChainedHashMap: [measured 5 out of 7 rounds] round: 0.47 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 3.41, time.warmup: 1.06, time.bench: 2.35 BigramCounting.jcf: [measured 5 out of 7 rounds] round: 0.70 [+- 0.05], round.gc: 0.00 [+- 0.00], GC.calls: 8, GC.time: 0.02, time.total: 5.44, time.warmup: 1.95, time.bench: 3.50 BigramCounting.jcfWithHolder: [measured 5 out of 7 rounds] round: 0.30 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 4, GC.time: 0.01, time.total: 2.21, time.warmup: 0.73, time.bench: 1.48 On Wed, Apr 21, 2010 at 10:10 AM, John Wang <[email protected]> wrote: > Hi Dawid: > > Any performance comparisons with fastutil? > > Thanks > > -John > > On Mon, Apr 19, 2010 at 1:11 PM, Dawid Weiss <[email protected]> wrote: >> >> > Hmmm.. can anybody compare these to fastutil? >> >> I believe I can answer some of your questions. >> >> 1) HPPC is not directly Java Collections-compatible. It does have >> interface hierarchy, but it's not a descendant of the familiar Set, >> Map or List. Fastutil is collections-compatible. >> >> 2) HPPC has open internals, so you can do anything you like once your >> collections are created, including manipulation of internal storage >> arrays, for instance. This was a design decision and goal. As with any >> sharp objects, improper use may cause harm. >> >> 3) HPPC uses assert instead of fixed condition checks. There are no >> attempts to detect misuse (fail-fast iterators, etc.). >> >> 4) fastutil is more mature, has support for more data structures >> (sorted trees, etc.) and was written by an excellent programmer >> (Sebastiano Vigna). HPPC was created internally for use at Carrot >> Search and was primarily motivated by speed; we believed that in >> certain applications direct access to collections' internals should be >> allowed and should be beneficial. Our micro-benchmarks show that this >> is largerly true if you manipulate LOTS of data. For smaller data sets >> even built-in Java collections with boxed types do surprisingly well >> (due to HotSpot optimizations too). >> >> 5) There are subtle differences in how HPPC is written -- I use pretty >> much normal generic classes with some pseudo-intrinsics and >> regexp-substituted comments. Sebastiano uses C++ preprocessor to >> generate Java classes from templates (yes, wicked). >> >> I look at Lucene and SOLR source code and learn a LOT from folks >> contributing to this project, so HPPC will be hardly any faster or >> better compared to what Lucene already has, but if anybody find >> anything from HPPC useful, please take handfuls. I would love for this >> project to be finally merged with Mahout, but I intentially left it in >> Carrot Search labs for a little while so that the API can stabilize >> (through our in-house experiments mostly). >> >> Thanks for showing your interest! >> Dawid >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
