[jira] [Commented] (LUCENE-4547) DocValues field broken on large indexes

Simon Willnauer (JIRA) Mon, 19 Nov 2012 05:21:03 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500213#comment-13500213
 ]


Simon Willnauer commented on LUCENE-4547:
-----------------------------------------

bq. Its really broken that things like grouping and faceting are coded to a 
separate API. This makes DV unsuccessful. 

don't get me wrong I think this adds a lot of value. BUT, if the only way to 
get a DV instance that is cached is FC then this entire thing is inconsistent. 
We don't ask people to cache TermEnum which can be heavy if you use 
MemoryPostings for instance. We neither do for StoredFields nor do we have a 
caching layer that is not used by default. All I am arguing for is that if 
somebody wants to use DV it should be simple to do so. The distinction between 
in-memory and on disk should not be in FC.

bq.  If we arent going to fix DocValues and Faceting APIs then perhaps we 
shoudl consider removing DocValues completely from lucene.

you mean remove fieldcache?
                
> DocValues field broken on large indexes
> ---------------------------------------
>
>                 Key: LUCENE-4547
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4547
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Blocker
>             Fix For: 4.1
>
>         Attachments: test.patch
>
>
> I tried to write a test to sanity check LUCENE-4536 (first running against 
> svn revision 1406416, before the change).
> But i found docvalues is already broken here for large indexes that have a 
> PackedLongDocValues field:
> {code}
> final int numDocs = 500000000;
> for (int i = 0; i < numDocs; ++i) {
>   if (i == 0) {
>     field.setLongValue(0L); // force > 32bit deltas
>   } else {
>     field.setLongValue(1<<33L); 
>   }
>   w.addDocument(doc);
> }
> w.forceMerge(1);
> w.close();
> dir.close(); // checkindex
> {code}
> {noformat}
> [junit4:junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene 
> Merge Thread #0,6,TGRP-Test2GBDocValues]
> [junit4:junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
> java.lang.ArrayIndexOutOfBoundsException: -65536
> [junit4:junit4]   2>  at 
> __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
> [junit4:junit4]   2>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535)
> [junit4:junit4]   2>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508)
> [junit4:junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: 
> -65536
> [junit4:junit4]   2>  at 
> org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4547) DocValues field broken on large indexes

Reply via email to