Andrzej,

I hadn't had time to look at this closely until now. I like it very much, and find it a little suprising. You use Lucene as a file sorter / b-tree-like package, rather than using SequenceFile.Sorter and MapFile. Lucene is as efficient, yet is a simpler API.

With the Lucene API is you can simply

  while (...) { writer.add(datum); }
  writer.optimize();
  ... access sorted data efficiently...

With SequenceFile you need to:

  while (...) { SequenceFile.Writer.add(datum); }
  SequenceFile.Sorter.sort();
  SequenceFile.Reader.open();
  while (...) { MapFile.Writer.add(datum); }
  ... access sorted data efficiently ...

The problem is that one cannot, in a single step, add items to a data structure and then access them efficiently. Things must be first explicitly sorted, and only then added to a data structure that can efficiently access sorted data.

I never would have guessed that Lucene would make such a good general-purpose database engine, given that it is only really designed to handle document search!

Doug



-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to