[ http://issues.apache.org/jira/browse/LUCENE-510?page=all ]

Marvin Humphrey updated LUCENE-510:
-----------------------------------

    Attachment: SortExternal.java
                TestSortExternal.java

Greets,

I've ported KinoSearch's external sorting module to java, along with its tests. 
 This class is the linchpin for the KinoSearch merge model, as it allows 
serialized postings to be dumped into a sort pool of effectively unlimited size.

At some point, I'll submit patches implementing the KinoSearch merge model in 
Lucene.  I'm reasonably confident that it will more than make up for the 
index-time performance hit caused by using bytecounts as string headers.

Thematically, this class belongs in org.apache.lucene.util, and that's where 
I've put it for now.  The classes that will use it are in 
org.apache.lucene.index, so if it stays in util, it will have to be public.  
However, it shouldn't be part of Lucene's documented public API.  The process 
by which Lucene's docs are generated is not clear to me, so access control 
advice would be appreciated.

There are a number of other areas where this code could stand review, 
especially considering my relatively limited experience using Java.  I'd single 
out exception handling and thread safety, but of course anything else is fair 
game.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>
>          Key: LUCENE-510
>          URL: http://issues.apache.org/jira/browse/LUCENE-510
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Store
>     Versions: 2.1
>     Reporter: Doug Cutting
>      Fix For: 2.1
>  Attachments: SortExternal.java, TestSortExternal.java, strings.diff
>
> We should change the format of strings written to indexes so that the length 
> of the string is in bytes, not Java characters.  This issue has been 
> discussed at:
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> We must increment the file format number to indicate this change.  At least 
> the format number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until 
> after 2.0 is released, to minimize incompatible changes between 1.9 and 2.0 
> (other than removal of deprecated features).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to