> The times for KinoSearch and Lucene are 5-run ... > is due to cache reassignment.) Therefore, the same > command was > issued on the command line 6 times, separated by > semicolons. The > first iter was discarded, and the rest were > averaged. ... > The maximum memory consumption was measured during > auxiliary passes > (i.e. not averaged in), using the crude method of > eyeballing RPRVT in > the output of top.
Marvin, I think it is great that different implementations are compared, and your results are interesting. However, I think that above methodology does not work well with Java (it may work better for/with Perl, but might have problems there as well). In this case it is maybe not quite as big a difference as for some other tests (since test runs were almost minute long), ie. no order of magnitude difference, but it will be noticeable. The reason is that it is crucial NOT to run consequtive tests by restarting JVM, unless you really want to measure one-shot single-run command line total times. The reason is that the startup overhead and warmup of HotSpot essentially mean that if you did run second indexing right after first one, it would be significantly faster, and not just due to caching effects. And consequtive runs would have run times that converge towards sustainable long-term performance -- in this case the second run may already be as fast as it'll get, since it's running for significant amount of time (I have noticed 30 or even 10 second warm up time is often sufficient). HotSpot only compiles Java bytecode when it determines a need, and figuring that out will take a while. So in this case, what would give more comparable results (assuming you are interested in measuring likely server-side usage scenario, which is usually what Lucene is used for) would be to run all runs within same JVM / execution (for Perl), and either take the fastest runs, or discard the first one and take median or average. Would this be possible? I am not really concerned about "whose language is faster" here, but about relevancy of the results, using methodology that gives realistic numbers for the usual use case. Chances are, Perl-based version would also perform better (depending on how Perl runtime optimizes things) if tests were run under a single process. Anyway, above is intended as constructive criticism, so once again thank you for doing these tests! -+ Tatu +- ps. Regarding memory usage: it is also quite tricky to measure reliably, since Garbage Collection only kicks in when it has to... so Java uses as much memory as it can (without expanding heap)... plus, JVMs do not necessarily (or even usually) return unused chunks later on. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]