Without evidence of further damage, that is fine - it just means a thread was 
interrupted.

- Mark

On Jan 30, 2013, at 10:52 AM, Erick Erickson <erickerick...@gmail.com> wrote:

> FWIW, by the way, I'm getting some exceptions in the solr log, here are the 
> two patterns I see on a quick look (this is with my hacks, but the test of 
> your patch also produced some like this, I think they were the same). Your 
> comment about trying this with the old-style XML is getting more 
> compelling....
> 
> But do note that the stress tests still ran to completion OK, which means 
> that all documents sent to the server made it into the indexes....
> 
> Jan 30, 2013 10:20:27 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279)
>       at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>       at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:597)
>       at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:680)
> 
> 
> FWIW
> 
> 
> On Wed, Jan 30, 2013 at 8:46 AM, Erick Erickson <erickerick...@gmail.com> 
> wrote:
> No joy. It ran to completion on one of my machines for an hour, but not the 
> other, stack traces below.
> 
> About running without the patch. The other thing that's different is I've 
> changed where the core.close happens in the patch as opposed to the old code, 
> which may confuse things. OTOH, it'd be interesting to try since my changes 
> to where I called close were based on a faulty assumption about where the 
> lock was occurring. I did a quick hack that I have yet to test making the 
> stress tester bang up old-style solr.xml setups, but haven't run it yet. I 
> can give it a try against an unmodified trunk if you think that would 
> generate useful information but I sadly fear it's an apples/oranges 
> comparison.
> 
> I'm on IM or we can voice chat if you want to strategize, but I won't be able 
> to devote much time to this until tonight or tomorrow. I can apply patches 
> and run tests all day though....
> 
> Although before digging to deeply, I had to cobble the patch into 
> DefaultSolrCoreState and I might have screwed it up. Is the finally block  
> containing if (yielded) in the right place?
> 
> Here's my patched code, but maybe you could just send me the whole file? I 
> haven't changed it outside this patch....
> 
>     synchronized (writerPauseLock) {
>       // we need to wait for the Writer to fall out of use
>       // first lets stop it from being lent out
>       pauseWriter = true;
>       // then lets wait until its out of use
>       log.info("Waiting until IndexWriter is unused... core=" + coreName);
>       boolean yielded = false;
>       try {
>         while (!writerFree) {
>           // yield the commit lock
>           core.getUpdateHandler().yieldCommitLock();
>           yielded = true;
>           try {
>             writerPauseLock.wait(100);
>           } catch (InterruptedException e) {}
>           if (closed) {
>             throw new RuntimeException("SolrCoreState already closed");
>           }
>         }
>       } finally {
>         if (yielded) {
>         core.getUpdateHandler().getCommitLock();
>         }
>       }
>       try {
>         if (indexWriter != null) {
>           if (!rollback) {
>             try {
>  
> 
> 
> Stack trace for the deadlock bits, full file attached:
> 
> Found one Java-level deadlock:
> =============================
> "commitScheduler-36850-thread-1":
>   waiting to lock monitor 7fc2c625f7f8 (object 7429863a0, a 
> org.apache.solr.update.DefaultSolrCoreState),
>   which is held by "qtp132616134-30"
> "qtp132616134-30":
>   waiting for ownable synchronizer 740bc1108, (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync),
>   which is held by "commitScheduler-36850-thread-1"
> 
> Java stack information for the threads listed above:
> ===================================================
> "commitScheduler-36850-thread-1":
>       at 
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
>       - waiting to lock <7429863a0> (a 
> org.apache.solr.update.DefaultSolrCoreState)
>       at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359)
>       at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561)
>       - locked <74066b838> (a java.lang.Object)
>       at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:680)
> "qtp132616134-30":
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <740bc1108> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>       at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>       at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>       at 
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668)
>       at 
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
>       - locked <7429863a0> (a org.apache.solr.update.DefaultSolrCoreState)
>       at 
> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:280)
>       - locked <7429863a0> (a org.apache.solr.update.DefaultSolrCoreState)
>       at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
>       - locked <7429863a0> (a org.apache.solr.update.DefaultSolrCoreState)
>       at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
>       at 
> org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1466)
>       at 
> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730)
>       at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1138)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>       at org.eclipse.jetty.server.Server.handle(Server.java:365)
>       at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>       at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>       at 
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
>       at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
>       at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
>       at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>       at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>       at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>       at java.lang.Thread.run(Thread.java:680)
> 
> Found 1 deadlock.
> 
> 
> 
> On Wed, Jan 30, 2013 at 7:11 AM, Erick Erickson <erickerick...@gmail.com> 
> wrote:
> bq: I don't follow this at all - of course you could rapidly load and unload 
> cores at the same time before this patch?
> 
> Not quite what I was trying to say. The stress test opens and closes cores a 
> LOT. Of course you could open and close cores simultaneously before. In fact 
> given what I think is a new pattern I'm amazed that there aren't a lot more 
> problems, that's some damn good code. What's new is the stress test opens and 
> closes a core with every call. From 30 threads, 15 indexing and 15 querying.
> 
> bq: If you cannot easily produce a test that causes deadlock without your 
> patch
> 
> I don't know how you'd really write a predictable junit test, the time window 
> for the race here is small. I had to run my stress test for 20-30 minutes to 
> hit it. I think I can modify the stress test to use old-style solr.xml which 
> I could run against current trunk, is it worth it though after your second 
> look? Note that this open/closing a core every call from the stress test 
> wasn't really possible the same way before SOLR-1028, one of the keys is that 
> the transient cores have to be aged out. The bolds below are new code 
> (SOLR-1028+):
> 
> at org.apache.solr.core.CoreMaps$1.removeEldestEntry(CoreContainer.java:1384)
> ......
> at org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1444)
> 
> 
> If you still think it's worthwhile, I have some travel time tomorrow that I 
> could use to make this test run with old-style solr.xml. Let me know. I'm 
> also wondering if the stress test should go into our test suite somewhere. 
> It's possible to bring the stress test into junit I think, there's nothing 
> magic about it. But it might be better as an external test that we run on, 
> say, a nightly or weekly basis, is there precedent?
> 
> bq: Because they are different locks protecting different state.
> 
> So you're saying that synchronizing the method actually is protecting the 
> SolrCore that's passed as a parameter? Otherwise I don't get it, seems like 
> moving the synchronized block to the first line of, e.g.,  newIndexWriter and 
> removing synchronized from the method signature would be sufficient. That 
> said, my hack of removing the synchronized from the method signature was more 
> to poke that what I thought I saw than a well thought-out solution... Which 
> is why I'm glad you're looking too...
> 
> Yes, the code attached to the JIRA is the latest. You've got enough on your 
> plate I'm sure, I'll apply your patch and let you know. I had a little 
> trouble with SVN applying it cleanly, but  I think I reconciled it 
> correctly...
> 
> 
> On Tue, Jan 29, 2013 at 11:43 PM, Mark Miller <markrmil...@gmail.com> wrote:
> Do you have your latest work attached to the issue? If so, I'll start working 
> with it locally.
> 
> For now, can you try this experimental, test patch and see what the results 
> are?
> 
> Index: solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java
> ===================================================================
> --- solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java 
> (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java 
> (working copy)
> @@ -135,13 +135,24 @@
>        pauseWriter = true;
>        // then lets wait until its out of use
>        log.info("Waiting until IndexWriter is unused... core=" + coreName);
> +
> +      boolean yielded = false;
> +      try {
>        while (!writerFree) {
> -        try {
> -          writerPauseLock.wait(100);
> -        } catch (InterruptedException e) {}
> -
> -        if (closed) {
> -          throw new RuntimeException("SolrCoreState already closed");
> +          // yield the commit lock
> +          core.getUpdateHandler().yieldCommitLock();
> +          yielded = true;
> +          try {
> +            writerPauseLock.wait(100);
> +          } catch (InterruptedException e) {}
> +
> +          if (closed) {
> +            throw new RuntimeException("SolrCoreState already closed");
> +          }
> +        }
> +      } finally {
> +        if (yielded) {
> +          core.getUpdateHandler().getCommitLock();
>          }
>        }
> 
> Index: solr/core/src/java/org/apache/solr/update/UpdateHandler.java
> ===================================================================
> --- solr/core/src/java/org/apache/solr/update/UpdateHandler.java        
> (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/UpdateHandler.java        
> (working copy)
> @@ -189,4 +189,10 @@
>    }
> 
>    public abstract void split(SplitIndexCommand cmd) throws IOException;
> +
> +
> +  public void getCommitLock() {}
> +
> +
> +  public void yieldCommitLock() {}
>  }
> Index: solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
> ===================================================================
> --- solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java 
> (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java 
> (working copy)
> @@ -830,4 +830,13 @@
>    public CommitTracker getSoftCommitTracker() {
>      return softCommitTracker;
>    }
> +
> +  public void getCommitLock() {
> +    commitLock.lock();
> +  }
> +
> +
> +  public void yieldCommitLock() {
> +    commitLock.unlock();
> +  }
>  }
> 
> 
> On Jan 29, 2013, at 11:24 PM, Mark Miller <markrmil...@gmail.com> wrote:
> 
> > Digging into the stack traces...
> >
> > This shows a thread waiting for the commit lock trying to close an index 
> > writer.
> >
> > There is another thread with the commit lock that is waiting for the writer 
> > to be returned.
> >
> > That seems to be the situation - a race around the commit lock.
> >
> > Needs some thought.
> >
> > - Mark
> >
> > On Jan 29, 2013, at 8:31 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> >
> >> Java stack information for the threads listed above:
> >> ===================================================
> >> "commitScheduler-42617-thread-1":
> >> at 
> >> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
> >> - waiting to lock <78b4aa518> (a 
> >> org.apache.solr.update.DefaultSolrCoreState)
> >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359)
> >> at 
> >> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561)
> >> - locked <7884ca730> (a java.lang.Object)
> >> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >> at 
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> >> at 
> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> >> at 
> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> at 
> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> at java.lang.Thread.run(Thread.java:680)
> >>
> >> *********
> >> Other thread
> >> "qtp1401888126-32":
> >> at sun.misc.Unsafe.park(Native Method)
> >> - parking to wait for  <788d73208> (a
> >> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> >> at 
> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> >> at 
> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
> >> at 
> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
> >> at 
> >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
> >> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
> >> at 
> >> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668)
> >> at 
> >> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
> >> at 
> >> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
> >> at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
> >> at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
> >> at org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1465)
> >> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730)
> >> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1137)
> >> at 
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
> >> at
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to