[ 
https://issues.apache.org/jira/browse/LUCENE-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1241:
---------------------------------------

    Attachment: LUCENE-1241.take2.patch

Attached take2 patch.  I fixed it to apply to trunk, and I removed
0xffff entirely.  All tests pass, but...

Unfortunately, this change causes a significant net slowdown (5.9%) in
indexing throughput.  I ran this alg:

  analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
  doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
  docs.file=/Volumes/External/lucene/wiki.txt
  doc.stored = true
  doc.term.vector = true
  doc.add.log.step=2000
  directory=FSDirectory
  autocommit=false
  compound=false
  ram.flush.mb=64
  { "Rounds"
    ResetSystemErase
    { "BuildIndex"
      - CreateIndex
      { "AddDocs" AddDoc > : 200000
      - CloseIndex
    }
    NewRound
  } : 5
  RepSumByPrefRound BuildIndex

I ran the test on an Intel quad core Mac Pro with 4-drive RAID 0.  JVM
is 1.5 and I run with "-Xms1024M -Xmx1024M -Xbatch -server".

Trunk gets 897.3 rec/s and the patch gets 844.3 rec/s, best of 5 =
5.9% slower.

I don't think we should commit this.

> 0xffff char is not a string terminator
> --------------------------------------
>
>                 Key: LUCENE-1241
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1241
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Hiroaki Kawai
>            Assignee: Michael McCandless
>         Attachments: ComparableCharSequence.java, LUCENE-1241.patch, 
> LUCENE-1241.take2.patch
>
>
> Current trunk index.DocumentWriter uses "\uffff" as a string terminator, but 
> it should not to be for some reasons. \uffff is not a terminator char itself 
> and we can't handle a string that really contains \uffff. And also, we can 
> calculate the end char position in a character sequence from the string 
> length that we already know.
> However, I agree with the usage for assertion, that "\uffff" is placed after 
> at the end of a string in a char sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to