[ 
http://issues.apache.org/jira/browse/LUCENE-510?page=comments#action_12378519 ] 

Marvin Humphrey commented on LUCENE-510:
----------------------------------------

The following patch...

  * Changes Lucene to use bytecounts as the prefix to all written Strings
  * Changes Lucene to write standard UTF-8 rather than Modified UTF-8 
  * Adds the new test classes MockIndexOutput and TestIndexOutput
  * Increases the number of tests in TestIndexInput

It also slows Lucene down -- indexing takes around a 20% speed hit.  It would 
be possible to submit a patch which had a smaller impact on performance, but 
this one is already over 700 lines long, and it's goal is to achieve standard 
UTF-8 compliance and modify the definition of Lucene strings as simply and 
reliably as possible.  Optimization patches can now be submitted which build 
upon this one.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>
>          Key: LUCENE-510
>          URL: http://issues.apache.org/jira/browse/LUCENE-510
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Store
>     Versions: 2.1
>     Reporter: Doug Cutting
>      Fix For: 2.1

>
> We should change the format of strings written to indexes so that the length 
> of the string is in bytes, not Java characters.  This issue has been 
> discussed at:
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> We must increment the file format number to indicate this change.  At least 
> the format number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until 
> after 2.0 is released, to minimize incompatible changes between 1.9 and 2.0 
> (other than removal of deprecated features).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to