[
https://issues.apache.org/jira/browse/SOLR-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999341#comment-14999341
]
Mark Miller commented on SOLR-7255:
-----------------------------------
On trunk, there is really nothing to explain. This broken feature never should
have existed, no reason to explain about it not being there.
It shouldn't be mentioned in the doc. Anyone can update or comment on that page
of they want.
I see this as a dupe report of that feature not working. If you want to take
ownership and change it to a doc issue, feel free. Given its age, it wasn't
benefiting me anymore.
> Index Corruption on HDFS whenever online bulk indexing (from Hive)
> ------------------------------------------------------------------
>
> Key: SOLR-7255
> URL: https://issues.apache.org/jira/browse/SOLR-7255
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.10.3
> Environment: HDP 2.2 / HDP Search + LucidWorks hadoop-lws-job.jar
> Reporter: Hari Sekhon
> Priority: Blocker
>
> When running SolrCloud on HDFS and using the LucidWorks hadoop-lws-job.jar to
> index a Hive table (620M rows) to Solr it runs for about 1500 secs and then
> gets this exception:
> {code}Exception in thread "Lucene Merge Thread #2191"
> org.apache.lucene.index.MergePolicy$MergeException:
> org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual
> header=1494817490 vs expected header=1071082519 (resource:
> BufferedChecksumIndexInput(_r3.nvm))
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:549)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:522)
> Caused by: org.apache.lucene.index.CorruptIndexException: codec header
> mismatch: actual header=1494817490 vs expected header=1071082519 (resource:
> BufferedChecksumIndexInput(_r3.nvm))
> at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:136)
> at
> org.apache.lucene.codecs.lucene49.Lucene49NormsProducer.<init>(Lucene49NormsProducer.java:75)
> at
> org.apache.lucene.codecs.lucene49.Lucene49NormsFormat.normsProducer(Lucene49NormsFormat.java:112)
> at
> org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:127)
> at
> org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:108)
> at
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
> at
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
> at
> org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3951)
> at
> org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3913)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3766)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
> {code}
> So I deleted the whole index, re-create it and re-ran the job to send Hive
> table contents to Solr again and it returned exactly the same exception the
> first time after trying to send a lot of updates to Solr.
> I moved off HDFS to a normal dataDir backend and then re-indexed the full
> table in 2 hours successfully without index corruptions.
> This implies that this is some sort of stability issue on the HDFS
> DirectoryFactory implementation.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]