Without evidence of further damage, that is fine - it just means a thread was interrupted.
- Mark On Jan 30, 2013, at 10:52 AM, Erick Erickson <erickerick...@gmail.com> wrote: > FWIW, by the way, I'm getting some exceptions in the solr log, here are the > two patterns I see on a quick look (this is with my hacks, but the test of > your patch also produced some like this, I think they were the same). Your > comment about trying this with the old-style XML is getting more > compelling.... > > But do note that the stress tests still ran to completion OK, which means > that all documents sent to the server made it into the indexes.... > > Jan 30, 2013 10:20:27 AM org.apache.solr.common.SolrException log > SEVERE: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279) > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:597) > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > > > FWIW > > > On Wed, Jan 30, 2013 at 8:46 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > No joy. It ran to completion on one of my machines for an hour, but not the > other, stack traces below. > > About running without the patch. The other thing that's different is I've > changed where the core.close happens in the patch as opposed to the old code, > which may confuse things. OTOH, it'd be interesting to try since my changes > to where I called close were based on a faulty assumption about where the > lock was occurring. I did a quick hack that I have yet to test making the > stress tester bang up old-style solr.xml setups, but haven't run it yet. I > can give it a try against an unmodified trunk if you think that would > generate useful information but I sadly fear it's an apples/oranges > comparison. > > I'm on IM or we can voice chat if you want to strategize, but I won't be able > to devote much time to this until tonight or tomorrow. I can apply patches > and run tests all day though.... > > Although before digging to deeply, I had to cobble the patch into > DefaultSolrCoreState and I might have screwed it up. Is the finally block > containing if (yielded) in the right place? > > Here's my patched code, but maybe you could just send me the whole file? I > haven't changed it outside this patch.... > > synchronized (writerPauseLock) { > // we need to wait for the Writer to fall out of use > // first lets stop it from being lent out > pauseWriter = true; > // then lets wait until its out of use > log.info("Waiting until IndexWriter is unused... core=" + coreName); > boolean yielded = false; > try { > while (!writerFree) { > // yield the commit lock > core.getUpdateHandler().yieldCommitLock(); > yielded = true; > try { > writerPauseLock.wait(100); > } catch (InterruptedException e) {} > if (closed) { > throw new RuntimeException("SolrCoreState already closed"); > } > } > } finally { > if (yielded) { > core.getUpdateHandler().getCommitLock(); > } > } > try { > if (indexWriter != null) { > if (!rollback) { > try { > > > > Stack trace for the deadlock bits, full file attached: > > Found one Java-level deadlock: > ============================= > "commitScheduler-36850-thread-1": > waiting to lock monitor 7fc2c625f7f8 (object 7429863a0, a > org.apache.solr.update.DefaultSolrCoreState), > which is held by "qtp132616134-30" > "qtp132616134-30": > waiting for ownable synchronizer 740bc1108, (a > java.util.concurrent.locks.ReentrantLock$NonfairSync), > which is held by "commitScheduler-36850-thread-1" > > Java stack information for the threads listed above: > =================================================== > "commitScheduler-36850-thread-1": > at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78) > - waiting to lock <7429863a0> (a > org.apache.solr.update.DefaultSolrCoreState) > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561) > - locked <74066b838> (a java.lang.Object) > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > "qtp132616134-30": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <740bc1108> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > at > org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668) > at > org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64) > - locked <7429863a0> (a org.apache.solr.update.DefaultSolrCoreState) > at > org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:280) > - locked <7429863a0> (a org.apache.solr.update.DefaultSolrCoreState) > at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888) > - locked <7429863a0> (a org.apache.solr.update.DefaultSolrCoreState) > at org.apache.solr.core.SolrCore.close(SolrCore.java:980) > at > org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1466) > at > org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730) > at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1138) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:365) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > at > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) > at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at java.lang.Thread.run(Thread.java:680) > > Found 1 deadlock. > > > > On Wed, Jan 30, 2013 at 7:11 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > bq: I don't follow this at all - of course you could rapidly load and unload > cores at the same time before this patch? > > Not quite what I was trying to say. The stress test opens and closes cores a > LOT. Of course you could open and close cores simultaneously before. In fact > given what I think is a new pattern I'm amazed that there aren't a lot more > problems, that's some damn good code. What's new is the stress test opens and > closes a core with every call. From 30 threads, 15 indexing and 15 querying. > > bq: If you cannot easily produce a test that causes deadlock without your > patch > > I don't know how you'd really write a predictable junit test, the time window > for the race here is small. I had to run my stress test for 20-30 minutes to > hit it. I think I can modify the stress test to use old-style solr.xml which > I could run against current trunk, is it worth it though after your second > look? Note that this open/closing a core every call from the stress test > wasn't really possible the same way before SOLR-1028, one of the keys is that > the transient cores have to be aged out. The bolds below are new code > (SOLR-1028+): > > at org.apache.solr.core.CoreMaps$1.removeEldestEntry(CoreContainer.java:1384) > ...... > at org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1444) > > > If you still think it's worthwhile, I have some travel time tomorrow that I > could use to make this test run with old-style solr.xml. Let me know. I'm > also wondering if the stress test should go into our test suite somewhere. > It's possible to bring the stress test into junit I think, there's nothing > magic about it. But it might be better as an external test that we run on, > say, a nightly or weekly basis, is there precedent? > > bq: Because they are different locks protecting different state. > > So you're saying that synchronizing the method actually is protecting the > SolrCore that's passed as a parameter? Otherwise I don't get it, seems like > moving the synchronized block to the first line of, e.g., newIndexWriter and > removing synchronized from the method signature would be sufficient. That > said, my hack of removing the synchronized from the method signature was more > to poke that what I thought I saw than a well thought-out solution... Which > is why I'm glad you're looking too... > > Yes, the code attached to the JIRA is the latest. You've got enough on your > plate I'm sure, I'll apply your patch and let you know. I had a little > trouble with SVN applying it cleanly, but I think I reconciled it > correctly... > > > On Tue, Jan 29, 2013 at 11:43 PM, Mark Miller <markrmil...@gmail.com> wrote: > Do you have your latest work attached to the issue? If so, I'll start working > with it locally. > > For now, can you try this experimental, test patch and see what the results > are? > > Index: solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java > =================================================================== > --- solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java > (revision 1440275) > +++ solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java > (working copy) > @@ -135,13 +135,24 @@ > pauseWriter = true; > // then lets wait until its out of use > log.info("Waiting until IndexWriter is unused... core=" + coreName); > + > + boolean yielded = false; > + try { > while (!writerFree) { > - try { > - writerPauseLock.wait(100); > - } catch (InterruptedException e) {} > - > - if (closed) { > - throw new RuntimeException("SolrCoreState already closed"); > + // yield the commit lock > + core.getUpdateHandler().yieldCommitLock(); > + yielded = true; > + try { > + writerPauseLock.wait(100); > + } catch (InterruptedException e) {} > + > + if (closed) { > + throw new RuntimeException("SolrCoreState already closed"); > + } > + } > + } finally { > + if (yielded) { > + core.getUpdateHandler().getCommitLock(); > } > } > > Index: solr/core/src/java/org/apache/solr/update/UpdateHandler.java > =================================================================== > --- solr/core/src/java/org/apache/solr/update/UpdateHandler.java > (revision 1440275) > +++ solr/core/src/java/org/apache/solr/update/UpdateHandler.java > (working copy) > @@ -189,4 +189,10 @@ > } > > public abstract void split(SplitIndexCommand cmd) throws IOException; > + > + > + public void getCommitLock() {} > + > + > + public void yieldCommitLock() {} > } > Index: solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java > =================================================================== > --- solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java > (revision 1440275) > +++ solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java > (working copy) > @@ -830,4 +830,13 @@ > public CommitTracker getSoftCommitTracker() { > return softCommitTracker; > } > + > + public void getCommitLock() { > + commitLock.lock(); > + } > + > + > + public void yieldCommitLock() { > + commitLock.unlock(); > + } > } > > > On Jan 29, 2013, at 11:24 PM, Mark Miller <markrmil...@gmail.com> wrote: > > > Digging into the stack traces... > > > > This shows a thread waiting for the commit lock trying to close an index > > writer. > > > > There is another thread with the commit lock that is waiting for the writer > > to be returned. > > > > That seems to be the situation - a race around the commit lock. > > > > Needs some thought. > > > > - Mark > > > > On Jan 29, 2013, at 8:31 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > > >> Java stack information for the threads listed above: > >> =================================================== > >> "commitScheduler-42617-thread-1": > >> at > >> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78) > >> - waiting to lock <78b4aa518> (a > >> org.apache.solr.update.DefaultSolrCoreState) > >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359) > >> at > >> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561) > >> - locked <7884ca730> (a java.lang.Object) > >> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >> at > >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > >> at > >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> at java.lang.Thread.run(Thread.java:680) > >> > >> ********* > >> Other thread > >> "qtp1401888126-32": > >> at sun.misc.Unsafe.park(Native Method) > >> - parking to wait for <788d73208> (a > >> java.util.concurrent.locks.ReentrantLock$NonfairSync) > >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) > >> at > >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > >> at > >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) > >> at > >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) > >> at > >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > >> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > >> at > >> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668) > >> at > >> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64) > >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState) > >> at > >> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272) > >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState) > >> at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888) > >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState) > >> at org.apache.solr.core.SolrCore.close(SolrCore.java:980) > >> at org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1465) > >> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730) > >> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1137) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190) > >> at > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org