[ https://issues.apache.org/jira/browse/LUCENE-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-6576. --------------------------------- Resolution: Invalid bad ram chip after all. > possible index corruption with java 8u45 > ---------------------------------------- > > Key: LUCENE-6576 > URL: https://issues.apache.org/jira/browse/LUCENE-6576 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > > Recently, I've experienced sporatic corruptions when trying to index > wikipedia in the benchmark. I know [~mikemccand] hit similar problems in the > nightly benchmark, and he also has an older cpu (see below for more on this). > I am using this python script (compliments of mike) to index wikipedia in a > loop, tweaked for lots of threads and heavy merging so it fails faster: > http://pastebin.com/jwpdELDe I get corruptions constantly, though sometimes > it takes a few iterations. > The errors look like this, where the bytes we write "seem to be fine" but the > CRC32 itself is maybe computed incorrectly at *write time*: > {quote} > Exception in thread "Thread-0" java.lang.RuntimeException: > org.apache.lucene.index.CorruptIndexException: checksum failed (hardware > problem?) : expected=e2b2d8f5 actual=a04da0c > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim"))) > at perf.IndexThreads$IndexThread.run(IndexThreads.java:402) > Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed > (hardware problem?) : expected=e2b2d8f5 actual=a04da0c > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim"))) > {quote} > This happens with different file extensions (.tip, .tim, .pos, .doc, .dvd, > ...). Whenever one of these corrupted files was included in a commit point, > I've run "the rest of CheckIndex" minus the CRC32 check and it always passes: > but that is no guarantee thats what is happening. > I think maybe the bugs are for some reason, easier to reproduce on my CPU, > maybe because its older and only has AVX1, or some other reason: > {quote} > model : 42 > model name : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp > lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc > aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 > cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx > lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept > vpid > {quote} > Other notes: > * does not need multiple threads. I did this to make the "test" fail faster. > It will fail sometimes with maxBufferedDocs + SerialMergeScheduler + 1 > thread, which is deterministic. > * have not tested JDK9 in any way, might be some already-fixed bug. > * I've run numerous hardware tests: memory, disk, etc. > * I've run the tests with two different SSD drives: both fail. > First step: clean up this script and make it so it can be reproduced on other > hardware. I can try on my laptop as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org