[ https://issues.apache.org/jira/browse/SOLR-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630042#comment-16630042 ]
Mano Kovacs commented on SOLR-8335: ----------------------------------- If [~mihaly.toth] does not mind, I will take on the remaining work regarding this patch. I rebased the patch and opened a PR. I will remove the hamcrest dependency too. > HdfsLockFactory does not allow core to come up after a node was killed > ---------------------------------------------------------------------- > > Key: SOLR-8335 > URL: https://issues.apache.org/jira/browse/SOLR-8335 > Project: Solr > Issue Type: Bug > Affects Versions: 5.0, 5.1, 5.2, 5.2.1, 5.3, 5.3.1 > Reporter: Varun Thacker > Assignee: Mark Miller > Priority: Major > Attachments: SOLR-8335.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When using HdfsLockFactory if a node gets killed instead of a graceful > shutdown the write.lock file remains in HDFS . The next time you start the > node the core doesn't load up because of LockObtainFailedException . > I was able to reproduce this in all 5.x versions of Solr . The problem wasn't > there when I tested it in 4.10.4 > Steps to reproduce this on 5.x > 1. Create directory in HDFS : {{bin/hdfs dfs -mkdir /solr}} > 2. Start Solr: {{bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory > -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://localhost:9000/solr > -Dsolr.updatelog=hdfs://localhost:9000/solr}} > 3. Create core: {{./bin/solr create -c test -n data_driven}} > 4. Kill solr > 5. The lock file is there in HDFS and is called {{write.lock}} > 6. Start Solr again and you get a stack trace like this: > {code} > 2015-11-23 13:28:04.287 ERROR (coreLoadExecutor-6-thread-1) [ x:test] > o.a.s.c.CoreContainer Error creating core [test]: Index locked for write for > core 'test'. Solr now longer supports forceful unlocking via > 'unlockOnStartup'. Please verify locks manually! > org.apache.solr.common.SolrException: Index locked for write for core 'test'. > Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please > verify locks manually! > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:820) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:659) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked > for write for core 'test'. Solr now longer supports forceful unlocking via > 'unlockOnStartup'. Please verify locks manually! > at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:761) > ... 9 more > 2015-11-23 13:28:04.289 ERROR (coreContainerWorkExecutor-2-thread-1) [ ] > o.a.s.c.CoreContainer Error waiting for SolrCore to be created > java.util.concurrent.ExecutionException: > org.apache.solr.common.SolrException: Unable to create core [test] > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:472) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.solr.common.SolrException: Unable to create core [test] > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:737) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443) > at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434) > ... 5 more > Caused by: org.apache.solr.common.SolrException: Index locked for write for > core 'test'. Solr now longer supports forceful unlocking via > 'unlockOnStartup'. Please verify locks manually! > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:820) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:659) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723) > ... 7 more > Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked > for write for core 'test'. Solr now longer supports forceful unlocking via > 'unlockOnStartup'. Please verify locks manually! > at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:761) > ... 9 more > {code} > In 4.10.4 I saw these two differences > 1. The lock file name was different . It's something like : > {{/solr/index/HdfsDirectory@46ad6bd3 > lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@4b44b5f6-write.lock}} > 2. When the node is started again after it was killed , it loaded up the core > just fine but there were two lock files in hdfs now . 4b44b5f6-write.lock is > the latest one > {code} > /solr/index/HdfsDirectory@46ad6bd3 > lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@4b44b5f6-write.lock > /solr/index/HdfsDirectory@52959724 > lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@9d59d3f-write.lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org