Re: Ferret's changes

Grant Ingersoll Tue, 10 Oct 2006 05:34:51 -0700

I would be interested in another survey, this time about how manypeople use a fixed set of Fields in their applications. The largemajority of mine do. I know SOLR supports dynamic fields, but Iwonder how much they are used. If there truly is a benefit to it,then perhaps we can have an implementation that can utilize them.

I would like to hear more about your merge strategy and how you dothe hashing. Perhaps if we all work through it then can figure outsome ways to incorporate it. As for backwards compatibility, we havea strategy for dealing with it that I think works (deprecation).Furthermore, there is no reason we can't start working towards a newframework for indexing/searching that is interface based and allowsfor using the existing format or a newer format as Marvin, Doug andothers have suggested (in fact we have a first attempt at it as apatch).

As for benchmarks, in my experience, the people who get all touchyare those who are so married to one way of doing things that theycan't think of any other way to solve a problem. I think reasonablepeople who want Lucene to be better will take the benchmarks aslessons in how to improve Lucene, not as some personal attack onthem. Once I get the basics of our benchmark stuff in place, itwould be interesting to implement the Ferret version and see how itstacks up. So far, we have been using http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz but I can see aboutincorporating the Reuters collection in, as this is much more thestandard when it comes to these things


-Grant

On Oct 10, 2006, at 5:02 AM, David Balmain wrote:

On 10/10/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Hi,
Maybe I missed it, but I was surprised that nobody here wonderedabout the algorithm and data structure changes that Dave Balmainmade in Ferret, to make it go faster (than Java Lucene). I knowI've been wondering whether/when Dave will bring those up, andwhat the chances of those changes being applied to Java Lucene are.
Here is an interesting and recent interview with Dave thatmentions some of this stuff.
http://on-ruby.blogspot.com/2006/10/ruby-hacker-interview-dave-balmain.html
Otis
Hi Otis,

I did bring this up here:
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200607.mbox/%[EMAIL PROTECTED]
The reason I didn't press the issue was that the changes are pretty
substantial and would break backwards compatibility in Lucene. Also, I
didn't think the major performance benifits would map back to Java
since I'm taking advantage of the fact that I have so much control
over memory allocation in C.

Given these factors and the fact that benchmarks can be a very touchy
subject, particularly in the Java community, I thought it better to
leave any performance comparison off this list. It looks like the cat
is out of the bag now so I'll put some benchmarks up on my Wiki and
everyone can check that I haven't cheated or made any mistakes. I'll
use the Reuters collection:

   http://www.daviddlewis.com/resources/testcollections/reuters21578/

If anyone thinks I should use a different corpus, please let me know.
I also have the entire Gutenburg collection here. I'll post a link
when I'm done.

Cheers,
Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Ferret's changes

Reply via email to