Re: Ferret's changes

David Balmain Tue, 10 Oct 2006 09:07:35 -0700

On 10/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote:
> Given these factors and the fact that benchmarks can be a very touchy
> subject, particularly in the Java community,


OK, I'll bite!  (but I'm always too aggravated at many of the Java
design decisions to consider myself part of that community ;-)

I always see unqualified statements about Ferret being faster than
Lucene, but I only see benchmarks for Indexing to back it up:
http://rubyforge.org/forum/forum.php?forum_id=9058

Now, I'm sure that Ferret is faster for indexing, but I'm still curious if you:
 - used the non-compound format
  - optimized both indicies at the end
  - set maxBufferedDocs to 1000 (or at least 100)


The start of my benchmarks are here:

http://ferret.davebalmain.com/trac/wiki/FerretVsLucene

I did set maxBufferedDocs to 1000 and optimized both indeces at the
end but I didn't use non-compound format. I think it is better to use
compound file format as it is default in both libraries and the
penalty will be similar in both cases. If you really like I can tell
you what the difference is for my tests. Please feel free to tell me
where else I can improve the Lucene benchmarker.

So is Ferret faster for searching too?  The absence of stats suggests
that it's not :-)


:-) Well, I'd like to think the absence of stats for searching has
nothing to do with Lucene being faster. For starters, the indexing
time is the a lot more noticable to the user. And benchmarking
searching is a little more difficult. There are numerous Queries,
Filters and Sorts to test and it's important to test with optimized
and unoptimized indexes. Anyway, I'll attempt to put a search
benchmark out tomorrow.

> I thought it better to
> leave any performance comparison off this list. It looks like the cat
> is out of the bag now so I'll put some benchmarks up on my Wiki and
> everyone can check that I haven't cheated or made any mistakes.

Cool, looking forward to it.  Is there something that can be used for
search benchmarking also?


Yep. I'll do that tomorrow. As I said, there are a lot of aspects to test.

Looking forward to progress on Lucy.  What is done there could
potentially be the a future Lucene index format.


I think it will eventually take over Ferret as well. I'm not sure if
it will ever be as fast as Ferret because the goal of Lucy is to have
most of the code in the native language, be it Perl, Ruby or whatever.
However, I think I am going to be taking Ferret in a different
direction. Most Ferret users use Ferret in web applications to enable
full-text search on there databases. I think keeping Ferret in sync
with a database is becoming more of an unnecessary hassle as Ferret
takes on more database like qualities. The next generation of Ferret
will probably be an object database with full-text search. This will
be a lot more useful in the environment where Ferret is currently
being used.

Anyway, that is probably of no interest to anyone here but I thought I
better let everyone know that Lucy development will one day go ahead.

Cheers,
Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Ferret's changes

Reply via email to