Re: Question about FieldInfos

Marvin Humphrey Sun, 15 Jan 2006 16:43:50 -0800


On Jan 15, 2006, at 3:34 PM, Robert Kirchgessner wrote:

There was even a patch to that problem:

http://issues.apache.org/jira/browse/LUCENE-211

This is a large and somewhat hard-to-read patch. Some stuff looksfamiliar. Looks like he's concatenating fieldname along withtokentext, which is sort-of the right idea, though you need to takesome precautions for field names of differing lengths I didn'timmediately detect. (KinoSearch uses field number (which correspondsto lexically sorted field name at index-time), encoded as a big-endian 16-bit int.)

The interesting thing to me is that it doesn't seem to feed anexternal sorter. If I understand the concept correctly, he's feedinga sortpool for minMergeDocuments documents; creating a small mini-index (minMergeDocuments in size), then falling back to the primarymerge model. If that isn't what that patch does, well... thatconcept would still work, and it would be nice not to need anexternal sorter.

Yes, the binary format is fully compatible to that of Lucene, as
is the read/write/search logic.


So...

   * You use Sun's "Modified UTF-8" (not true UTF-8) to
     encode character data.
   * The VInt counts at the head of strings represent Java
     chars, not Unicode code points or bytes.
   * You've run tests with source material containing
     null bytes, Unicode characters outside the Basic
     Multilingual Plane, and corrupt character data (e.g.,
     broken UTF-8), and you are confident that indexes produced
     by Lucene and PHPLucene from such data are mutually compatible.

By the way, though the project
emerged as a lucene implementation in PHP I soon switched
to writing a pure C-library with a binding to PHP. Now its
mostly a C-project.

KinoSearch has taken a similar path of late, adding more and more XS(Perl's C API).


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about FieldInfos

Reply via email to