[jira] Issue Comment Edited: (LUCENE-2084) remove Byte/CharBuffer wrapping for collation key generation

Uwe Schindler (JIRA) Sun, 27 Dec 2009 02:03:56 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794675#action_12794675
 ]


Uwe Schindler edited comment on LUCENE-2084 at 12/27/09 10:01 AM:
------------------------------------------------------------------

bq. Therefore, limit - offset is a correct way to calculate length of the 
entire buffer, which is what this class has always used.

This is exactly wrong: The lengthz of the "valid" area inside the buffer is 
always limit()-position() [see javadocs of wrap()]. If arrayOffset()!=0, 
position() may still be 0 (because the arrayoffset is only used when wrapping 
byte arrays. The arrayOffset() defines the postion in the underlying array that 
defines the buffer's position()==0 (see javadocs).

This error is mostly no problem for dynamical allocated array buffers (as most 
users use), because arrayOffset() is always 0 (I have never seen a buffer with 
offset != 0)

A problem also occurs if you wrap an array with wrap([], offset, length). the 
offset there is *not* the arrayOffset(), it is the position()! And length is 
after wrapping remaining(). The capacity() is the array length. Therefore it is 
really a bug. You deprecated it, that's good, but it should be fixed, it does 
not work correct.

For verifying, read the source code of ByteBuffer and HeapByteBuffer from 
src.zip in your jdk distrib and of course the javadocs.

So just to repeat, the correct code would be:
- start postion in array=arrayOffset()+position()
- length=remaining()

This is how the IdentityEncode for Payloads works. So deprecate *and* fix it.
If we really want to work with buffers, the correct way to implement this would 
be that the whole class implements CharsetEncoder.

      was (Author: thetaphi):
    bq. Therefore, limit - offset is a correct way to calculate length of the 
entire buffer, which is what this class has always used.

This is exactly wrong: The lengthz of the "valid" area inside the buffer is 
always limit(). If arrayOffset()!=0, position() may still be 0 (because the 
arrayoffset is only used when wrapping byte arrays. The arrayoffset defines the 
postion in the underlying array that defines the buffer's position 0 (see 
javadocs).

This error is mostly no problem for dynamical allocated array buffers (as most 
users use), because arrayOffset() is always 0 (i have never seen a buffer with 
offset != 0)

A problem also occurs if you wrap an array with wrap([], offset, length). the 
offset there is *not* the arrayOffset(), it is the position()! And length is 
after wrapping remaining(). The capacity() is the array length. Therefore it is 
really a bug. You deprecated it, thats ood, but it should be fixed, it does not 
work correct.

For verifying, read the source code of ByteBuffer and HeapByteBuffer from 
src.zip in your jdk distrib and of course the javadocs..
  
> remove Byte/CharBuffer wrapping for collation key generation
> ------------------------------------------------------------
>
>                 Key: LUCENE-2084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2084
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: collation.benchmark.tar.bz2, LUCENE-2084.patch, 
> LUCENE-2084.patch, LUCENE-2084.patch, TopTFWikipediaWords.tar.bz2
>
>
> We can remove the overhead of ByteBuffer and CharBuffer wrapping in 
> CollationKeyFilter and ICUCollationKeyFilter.
> this patch moves the logic in IndexableBinaryStringTools into char[],int,int 
> and byte[],int,int based methods, with the previous Byte/CharBuffer methods 
> delegating to these.
> Previously, the Byte/CharBuffer methods required a backing array anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2084) remove Byte/CharBuffer wrapping for collation key generation

Reply via email to