[ https://issues.apache.org/jira/browse/LUCENE-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599405#action_12599405 ]
Andreas Kohn commented on LUCENE-1282: -------------------------------------- We've seen this bug (rarely) when indexing quite huge amounts of data. Just to add some datapoints, attached is [^crashtest], using the above Crash.java to test all java VMs I have currently available. [^crashtest.log] contains the output. Tests were run on a loaded EM64T dual core machine with fedora 9/x86_64, all VMs are 64bit. The openjdk is a build from yesterdays public repository contents, build using gcc 4.3 (trivial patches to make it build were added). Some scary solaris (SunOS 5.10 Generic_120011-14 sun4u sparc SUNW,UltraAX-i2) results as well: {quote} /usr/jdk/jdk1.6.0_04 (java full version "1.6.0_04-b12"): : 0/200 failed: PASS -server: 0/200 failed: PASS -client: 0/200 failed: PASS -Xbatch: 0/200 failed: PASS -Xint: 0/200 failed: PASS /usr/jdk/jdk1.6.0_04 (java full version "1.6.0_04-b12"): -d64: 0/200 failed: PASS -server -d64: 0/200 failed: PASS -client -d64: 0/200 failed: PASS -Xbatch -d64: 0/200 failed: PASS -Xint -d64: 0/200 failed: PASS {quote} > Sun hotspot compiler bug in 1.6.0_04/05 affects Lucene > ------------------------------------------------------ > > Key: LUCENE-1282 > URL: https://issues.apache.org/jira/browse/LUCENE-1282 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.3, 2.3.1 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.4 > > Attachments: corrupt_merge_out15.txt, hs_err_pid27359.log > > > This is not a Lucene bug. It's an as-yet not fully characterized Sun > JRE bug, as best I can tell. I'm opening this to gather all things we > know, and to work around it in Lucene if possible, and maybe open an > issue with Sun if we can reduce it to a compact test case. > It's hit at least 3 users: > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200803.mbox/[EMAIL > PROTECTED] > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200804.mbox/[EMAIL > PROTECTED] > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200805.mbox/[EMAIL > PROTECTED] > It's specific to at least JRE 1.6.0_04 and 1.6.0_05, that affects > Lucene. Whereas 1.6.0_03 works OK and it's unknown whether 1.6.0_06 > shows it. > The bug affects bulk merging of stored fields. When it strikes, the > segment produced by a merge is corrupt because its fdx file (stored > fields index file) is missing one document. After iterating many > times with the first user that hit this, adding diagnostics & > assertions, its seems that a call to fieldsWriter.addDocument some > either fails to run entirely, or, fails to invoke its call to > indexStream.writeLong. It's as if when hotspot compiles a method, > there's some sort of race condition in cutting over to the compiled > code whereby a single method call fails to be invoked (speculation). > Unfortunately, this corruption is silent when it occurs and only later > detected when a merge tries to merge the bad segment, or an > IndexReader tries to open it. Here's a typical merge exception: > {code} > Exception in thread "Thread-10" > org.apache.lucene.index.MergePolicy$MergeException: > org.apache.lucene.index.CorruptIndexException: > doc counts differ for segment _3gh: fieldsReader shows 15999 but > segmentInfo shows 16000 > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:271) > Caused by: org.apache.lucene.index.CorruptIndexException: doc counts differ > for segment _3gh: fieldsReader shows 15999 but segmentInfo shows 16000 > at > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:313) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:221) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3099) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240) > {code} > and here's a typical exception hit when opening a searcher: > {code} > org.apache.lucene.index.CorruptIndexException: doc counts differ for segment > _kk: fieldsReader shows 72670 but segmentInfo shows 72671 > at > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:313) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:230) > at > org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:73) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636) > at > org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:209) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:173) > at > org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48) > {code} > Sometimes, adding -Xbatch (forces up front compilation) or -Xint > (disables compilation) to the java command line works around the > issue. > Here are some of the OS's we've seen the failure on: > {code} > SuSE 10.0 > Linux phoebe 2.6.13-15-smp #1 SMP Tue Sep 13 14:56:15 UTC 2005 x86_64 > x86_64 x86_64 GNU/Linux > SuSE 8.2 > Linux phobos 2.4.20-64GB-SMP #1 SMP Mon Mar 17 17:56:03 UTC 2003 i686 > unknown unknown GNU/Linux > Red Hat Enterprise Linux Server release 5.1 (Tikanga) > Linux lab8.betech.virginia.edu 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 > 07:18:21 EST 2008 i686 i686 i386 GNU/Linux > {code} > I've already added assertions to Lucene to detect when this bug > strikes, but since assertions are not usually enabled, I plan to add a > real check to catch when this bug strikes *before* we commit the merge > to the index. This way we can detect & quarantine the failure and > prevent corruption from entering the index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]