[
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655526#comment-13655526
]
Michael McCandless commented on LUCENE-4583:
--------------------------------------------
{quote}
I'm worried about a few things:
I think the limit is ok, because in my eyes its the limit of a single term. I
feel that anyone arguing for increasing the limit only has abuse cases (not use
cases) in mind. I'm worried about making dv more complicated for no good reason.
{quote}
I guess I see DV binary as more like a stored field, just stored
column stride for faster access. Faceting (and I guess spatial)
encode many things inside one DV binary field.
bq. I'm worried about opening up the possibility of bugs and index corruption
(e.g. clearly MULTIPLE people on this issue dont understand why you cannot just
remove IndexWriter's limit without causing corruption).
I agree this is a concern and we need to take it slow, add good
test coverage.
{quote}
I'm really worried about the precedent: once these abuse-case-fans have their
way and increase this limit, they will next argue that we should do the same
for SORTED, maybe SORTED_SET, maybe even inverted terms. They will make
arguments that its the same as binary, just with sorting, and why should
sorting bring in additional limits. I can easily see this all spinning out of
control.
I think that most people hitting the limit are abusing docvalues as stored
fields, so the limit is providing a really useful thing today actually, and
telling them they are doing something wrong.
{quote}
I don't think we should change the limit for sorted/set nor terms: I
think we should raise the limit ONLY for BINARY, and declare that DV
BINARY is for these "abuse" cases. So if you really really want
sorted set with a higher limit then you will have to encode yourself
into DV BINARY.
{quote}
The only argument i have for removing the limit is that by expanding BINARY's
possible abuse cases (in my opinion, thats pretty much all its useful for), we
might prevent additional complexity from being added elsewhere to DV in the
long-term.
{quote}
+1
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
> Key: LUCENE-4583
> URL: https://issues.apache.org/jira/browse/LUCENE-4583
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1, 5.0
> Reporter: David Smiley
> Priority: Critical
> Fix For: 4.4
>
> Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field
> value in the docs. It appears that the limit is 32k, although I didn't get
> any friendly error telling me that was the limit. 32k is kind of small IMO;
> I suspect this limit is unintended and as such is a bug. The following
> test fails:
> {code:java}
> public void testBigDocValue() throws IOException {
> Directory dir = newDirectory();
> IndexWriter writer = new IndexWriter(dir, writerConfig(false));
> Document doc = new Document();
> BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
> bytes.length = bytes.bytes.length;//byte data doesn't matter
> doc.add(new StraightBytesDocValuesField("dvField", bytes));
> writer.addDocument(doc);
> writer.commit();
> writer.close();
> DirectoryReader reader = DirectoryReader.open(dir);
> DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
> //FAILS IF BYTES IS BIG!
> docValues.getSource().getBytes(0, bytes);
> reader.close();
> dir.close();
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]