bq: I don't follow this at all - of course you could rapidly load and
unload cores at the same time before this patch?

Not quite what I was trying to say. The stress test opens and closes cores
a LOT. Of course you could open and close cores simultaneously before. In
fact given what I think is a new pattern I'm amazed that there aren't a lot
more problems, that's some damn good code. What's new is the stress test
opens and closes a core with *every* call. From 30 threads, 15 indexing and
15 querying.

bq: If you cannot easily produce a test that causes deadlock without your
patch

I don't know how you'd really write a predictable junit test, the time
window for the race here is small. I had to run my stress test for 20-30
minutes to hit it. I think I can modify the stress test to use old-style
solr.xml which I could run against current trunk, is it worth it though
after your second look? Note that this open/closing a core every call from
the stress test wasn't really possible the same way before SOLR-1028, one
of the keys is that the transient cores have to be aged out. The bolds
below are new code (SOLR-1028+):

at org.apache.solr.core.CoreMaps$1.*removeEldestEntry*
(CoreContainer.java:1384)
......
at org.apache.solr.core.CoreMaps.*putTransientCore*(CoreContainer.java:1444)


If you still think it's worthwhile, I have some travel time tomorrow that I
could use to make this test run with old-style solr.xml. Let me know. I'm
also wondering if the stress test should go into our test suite somewhere.
It's possible to bring the stress test into junit I think, there's nothing
magic about it. But it might be better as an external test that we run on,
say, a nightly or weekly basis, is there precedent?

bq: Because they are different locks protecting different state.

So you're saying that synchronizing the method actually is protecting the
SolrCore that's passed as a parameter? Otherwise I don't get it, seems like
moving the synchronized block to the first line of, e.g.,  newIndexWriter
and removing synchronized from the method signature would be sufficient.
That said, my hack of removing the synchronized from the method signature
was more to poke that what I thought I saw than a well thought-out
solution... Which is why I'm glad you're looking too...

Yes, the code attached to the JIRA is the latest. You've got enough on your
plate I'm sure, I'll apply your patch and let you know. I had a little
trouble with SVN applying it cleanly, but  I think I reconciled it
correctly...


On Tue, Jan 29, 2013 at 11:43 PM, Mark Miller <markrmil...@gmail.com> wrote:

> Do you have your latest work attached to the issue? If so, I'll start
> working with it locally.
>
> For now, can you try this experimental, test patch and see what the
> results are?
>
> Index: solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java
> ===================================================================
> --- solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java
> (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java
> (working copy)
> @@ -135,13 +135,24 @@
>        pauseWriter = true;
>        // then lets wait until its out of use
>        log.info("Waiting until IndexWriter is unused... core=" +
> coreName);
> +
> +      boolean yielded = false;
> +      try {
>        while (!writerFree) {
> -        try {
> -          writerPauseLock.wait(100);
> -        } catch (InterruptedException e) {}
> -
> -        if (closed) {
> -          throw new RuntimeException("SolrCoreState already closed");
> +          // yield the commit lock
> +          core.getUpdateHandler().yieldCommitLock();
> +          yielded = true;
> +          try {
> +            writerPauseLock.wait(100);
> +          } catch (InterruptedException e) {}
> +
> +          if (closed) {
> +            throw new RuntimeException("SolrCoreState already closed");
> +          }
> +        }
> +      } finally {
> +        if (yielded) {
> +          core.getUpdateHandler().getCommitLock();
>          }
>        }
>
> Index: solr/core/src/java/org/apache/solr/update/UpdateHandler.java
> ===================================================================
> --- solr/core/src/java/org/apache/solr/update/UpdateHandler.java
>  (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/UpdateHandler.java
>  (working copy)
> @@ -189,4 +189,10 @@
>    }
>
>    public abstract void split(SplitIndexCommand cmd) throws IOException;
> +
> +
> +  public void getCommitLock() {}
> +
> +
> +  public void yieldCommitLock() {}
>  }
> Index: solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
> ===================================================================
> --- solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
> (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
> (working copy)
> @@ -830,4 +830,13 @@
>    public CommitTracker getSoftCommitTracker() {
>      return softCommitTracker;
>    }
> +
> +  public void getCommitLock() {
> +    commitLock.lock();
> +  }
> +
> +
> +  public void yieldCommitLock() {
> +    commitLock.unlock();
> +  }
>  }
>
>
> On Jan 29, 2013, at 11:24 PM, Mark Miller <markrmil...@gmail.com> wrote:
>
> > Digging into the stack traces...
> >
> > This shows a thread waiting for the commit lock trying to close an index
> writer.
> >
> > There is another thread with the commit lock that is waiting for the
> writer to be returned.
> >
> > That seems to be the situation - a race around the commit lock.
> >
> > Needs some thought.
> >
> > - Mark
> >
> > On Jan 29, 2013, at 8:31 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> >> Java stack information for the threads listed above:
> >> ===================================================
> >> "commitScheduler-42617-thread-1":
> >> at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
> >> - waiting to lock <78b4aa518> (a
> org.apache.solr.update.DefaultSolrCoreState)
> >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359)
> >> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561)
> >> - locked <7884ca730> (a java.lang.Object)
> >> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> >> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> >> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> >> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> at java.lang.Thread.run(Thread.java:680)
> >>
> >> *********
> >> Other thread
> >> "qtp1401888126-32":
> >> at sun.misc.Unsafe.park(Native Method)
> >> - parking to wait for  <788d73208> (a
> >> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
> >> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> >> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
> >> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
> >> at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
> >> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
> >> at
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668)
> >> at
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
> >> at
> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
> >> at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
> >> at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
> >> at
> org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1465)
> >> at
> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730)
> >> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1137)
> >> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
> >> at
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to