[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

Michael McCandless (JIRA) Sun, 12 May 2013 05:47:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655526#comment-13655526
 ]


Michael McCandless commented on LUCENE-4583:
--------------------------------------------

{quote}
I'm worried about a few things:
I think the limit is ok, because in my eyes its the limit of a single term. I 
feel that anyone arguing for increasing the limit only has abuse cases (not use 
cases) in mind. I'm worried about making dv more complicated for no good reason.
{quote}

I guess I see DV binary as more like a stored field, just stored
column stride for faster access.  Faceting (and I guess spatial)
encode many things inside one DV binary field.

bq. I'm worried about opening up the possibility of bugs and index corruption 
(e.g. clearly MULTIPLE people on this issue dont understand why you cannot just 
remove IndexWriter's limit without causing corruption).

I agree this is a concern and we need to take it slow, add good
test coverage.

{quote}
I'm really worried about the precedent: once these abuse-case-fans have their 
way and increase this limit, they will next argue that we should do the same 
for SORTED, maybe SORTED_SET, maybe even inverted terms. They will make 
arguments that its the same as binary, just with sorting, and why should 
sorting bring in additional limits. I can easily see this all spinning out of 
control.
I think that most people hitting the limit are abusing docvalues as stored 
fields, so the limit is providing a really useful thing today actually, and 
telling them they are doing something wrong.
{quote}

I don't think we should change the limit for sorted/set nor terms: I
think we should raise the limit ONLY for BINARY, and declare that DV
BINARY is for these "abuse" cases.  So if you really really want
sorted set with a higher limit then you will have to encode yourself
into DV BINARY.

{quote}
The only argument i have for removing the limit is that by expanding BINARY's 
possible abuse cases (in my opinion, thats pretty much all its useful for), we 
might prevent additional complexity from being added elsewhere to DV in the 
long-term.
{quote}

+1

                
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
>                 Key: LUCENE-4583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4583
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: David Smiley
>            Priority: Critical
>             Fix For: 4.4
>
>         Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field 
> value in the docs.  It appears that the limit is 32k, although I didn't get 
> any friendly error telling me that was the limit.  32k is kind of small IMO; 
> I suspect this limit is unintended and as such is a bug.    The following 
> test fails:
> {code:java}
>   public void testBigDocValue() throws IOException {
>     Directory dir = newDirectory();
>     IndexWriter writer = new IndexWriter(dir, writerConfig(false));
>     Document doc = new Document();
>     BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
>     bytes.length = bytes.bytes.length;//byte data doesn't matter
>     doc.add(new StraightBytesDocValuesField("dvField", bytes));
>     writer.addDocument(doc);
>     writer.commit();
>     writer.close();
>     DirectoryReader reader = DirectoryReader.open(dir);
>     DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
>     //FAILS IF BYTES IS BIG!
>     docValues.getSource().getBytes(0, bytes);
>     reader.close();
>     dir.close();
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

Reply via email to