[ https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285835#comment-14285835 ]
ASF subversion and git services commented on LUCENE-6192: --------------------------------------------------------- Commit 1653585 from [~mikemccand] in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1653585 ] LUCENE-6192: don't overflow int when writing skip data for high freq terms in extremely large indices > Long overflow in LuceneXXSkipWriter can corrupt skip data > --------------------------------------------------------- > > Key: LUCENE-6192 > URL: https://issues.apache.org/jira/browse/LUCENE-6192 > Project: Lucene - Core > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 5.0, Trunk, 4.x > > Attachments: LUCENE-6192.patch > > > I've been iterating with Tom on this corruption that CheckIndex detects in > his rather large index (720 GB in a single segment): > {noformat} > java -Xmx16G -Xms16G -cp $JAR -ea:org.apache.lucene... > org.apache.lucene.index.CheckIndex /XXXX/shards/4/core-1/data/test_index > -verbose 2>&1 |tee -a shard4_reoptimizedNewJava > Opening index @ /htsolr/lss-reindex/shards/4/core-1/data/test_index > Segments file=segments_e numSegments=1 version=4.10.2 format= > userData={commitTimeMSec=1421479358825} > 1 of 1: name=_8m8 docCount=1130856 > version=4.10.2 > codec=Lucene410 > compound=false > numFiles=10 > size (MB)=719,967.32 > diagnostics = {timestamp=1421437320935, os=Linux, > os.version=2.6.18-400.1.1.el5, mergeFactor=2, source=merge, > lucene.version=4.10.2, os.arch=amd64, mergeMaxNumSegments=1, > java.version=1.7.0_71, java.vendor=Oracle Corporation} > no deletions > test: open reader.........OK > test: check integrity.....OK > test: check live docs.....OK > test: fields..............OK [80 fields] > test: field norms.........OK [23 fields] > test: terms, freq, prox...ERROR: java.lang.AssertionError: -96 > java.lang.AssertionError: -96 > at > org.apache.lucene.codecs.lucene41.ForUtil.skipBlock(ForUtil.java:228) > at > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.skipPositions(Lucene41PostingsReader.java:925) > at > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextPosition(Lucene41PostingsReader.java:955) > at > org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1100) > at > org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1357) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:655) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096) > test: stored fields.......OK [67472796 total field count; avg 59.665 > fields per doc] > test: term vectors........OK [0 total vector count; avg 0 term/freq > vector fields per doc] > test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 > SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET] > FAILED > WARNING: fixIndex() would remove reference to this segment; full > exception: > java.lang.RuntimeException: Term Index test failed > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:670) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2096) > WARNING: 1 broken segments (containing 1130856 documents) detected > WARNING: would write new segments file, and 1130856 documents would be lost, > if -fix were specified > {noformat} > And Rob spotted long -> int casts in our skip list writers that look like > they could cause such corruption if a single high-freq term with many > positions required > 2.1 GB to write its positions into .pos. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org