Yes, sure it is interesting -- github would be probably a good spot? Dawid
On Wed, Oct 26, 2011 at 7:02 PM, mark harwood <markharw...@yahoo.co.uk> wrote: >>> > Avg lookup time slightly less than a HashSet? Interesting. > > Scratch that. A new dataset and revised code shows HashSets out in front (but > still not a realistic option for very large sets) : http://goo.gl/Lb4J1 > > In this benchmark I removed the code common to all previous tests which was > first retrieving a random key from a test query Lucene index to then look up > in the target Set ( a choice of database, hashset or a different Lucene > index). > > I assumed that being common code to all tests, this initial Lucene-based > fetch would not bias results but it was. Now the tests first load a random > sample of 100k keys from a flat file *then* start the timer on the look-ups. > I'm also using public domain Wikipedia data so can release the code and data > somewhere if that's of interest. > > Cheers > Mark > > > > ----- Original Message ----- > From: Dawid Weiss <dawid.we...@gmail.com> > To: java-user@lucene.apache.org > Cc: > Sent: Tuesday, 25 October 2011, 23:17 > Subject: Re: Bet you didn't know Lucene can... > >> Lucene started out at an avg 3ms but subsequent runs took it down >> dramatically due to OS file caching. The all-in-memory hashset >> implementation clearly did not demonstrate the same speed ups between runs. > > I don't say the benchmark was wrong or anything, but this is > surprising. I mean, the default HashSet impl. is a bucketed > linked-list implementation. It made me wonder how the data was > distributed. Even with OS file caching the in-memory data structure > shouldn't fall short, at least intuitively. > >> I can make the code available but the data wouldn't be possible. >> The English Wikipedia page titles are probably an equivalent size and shape >> so I could try and package something up around that as a benchmarking tool >> for others to play with. > > If you find a spare cycle, it'd be great, thanks! > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org