There is a lot of complicated interplay between locks in that area of the code 
- small changes can easily get you into trouble.

Can you modify your test to run on the code before your patch? That would help 
in telling if it's something existing or something your introducing.

I still plan on looking at this issue closer this week, so perhaps I can offer 
some more help soon.

- Mark

On Jan 29, 2013, at 1:26 PM, Erick Erickson <erickerick...@gmail.com> wrote:

> Two runs doth not a conclusion reach, but removing "synchronized" from:
> DefaultSolrCoreState.getIndexWriter (line 78)
> 
> let me run for an hour, at least twice. And my stress test succeeds,
> which fires up 15 indexing threads on 100 cores (transient core size
> is 20), indexes documents for an hour while another 15 threads fire
> off queries. At the end, it inspects each core to see if there are the
> expected number of documents.
> 
> But that's kinda a frail reed to pin my hopes on, these are
> notoriously hard to reproduce.
> 
> I'll set this up to run on an old machine for much longer later today.
> Does anyone who knows that code know whether I'm playing with fire? I
> haven't looked at the synchronization in that code to try to figure
> out the purpose, I'm hoping someone will glance at it and say "that's
> wrong".
> 
> I'll dig into it later and see how much I can figure out about whether it's 
> safe
> 
> FWIW,
> 
> On Tue, Jan 29, 2013 at 8:31 AM, Erick Erickson <erickerick...@gmail.com> 
> wrote:
>> All:
>> 
>> As part of SOLR-4196, I'm opening and closing cores at a furious rate.
>> My tests are running for 20-40 minutes then locking up quite
>> regularly. Of course the first place I'm looking is my recent code,
>> since it has a bunch of synchronized blocks.
>> 
>> The deadlock is definitely happening at a call from the new code to
>> close a Solr core, so to really look at this anyone will need to get
>> the patch I'll put up in a minute. The deadlock trace is below.
>> 
>> But without going that far, I question whether it's really anything to
>> do with new synchronizations I'm doing or whether it's just something
>> that's been lurking for a while and I'm flushing out of the woodwork.
>> One of the deadlocked threads may be called form my code, but as far
>> as I can tell none of the actual synchronization objects I'm using are
>> held. I have the full jstack output if anyone needs it...
>> 
>> Of course I'll continue looking, but at a glance I'm wondering if this
>> code has really ever been stressed this way before or whether these
>> have existed for a while. All synchronization should be approached
>> with fear and loathing IMO.....
>> 
>> One thread blocks at a synchronized method, but should this method
>> really be synchronized?
>> 
>> at 
>> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
>> (here's the method)  @Override
>>  public synchronized RefCounted<IndexWriter> getIndexWriter(SolrCore core)
>>      throws IOException {
>> 
>> and a little later in the method there's:
>>    synchronized (writerPauseLock) {
>>      if (core == null) {
>> 
>> 
>> 
>> and the other thread blocks at:
>> 
>> at 
>> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668),
>> (here's the method)
>>  // IndexWriterCloser interface method - called from 
>> solrCoreState.decref(this)
>>  @Override
>>  public void closeWriter(IndexWriter writer) throws IOException {
>>    boolean clearRequestInfo = false;
>>    commitLock.lock(); **********locking here!
>>    try {
>>      SolrQueryRequest req = new LocalSolrQueryRequest(core, new
>> ModifiableSolrParams());
>>      SolrQueryResponse rsp = new SolrQueryResponse();
>>      if (SolrRequestInfo.getRequestInfo() == null) {
>>        clearRequestInfo = true;
>> 
>> 
>> 
>> 
>> Java stack information for the threads listed above:
>> ===================================================
>> "commitScheduler-42617-thread-1":
>> at 
>> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
>> - waiting to lock <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
>> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359)
>> at 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561)
>> - locked <7884ca730> (a java.lang.Object)
>> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:680)
>> 
>> *********
>> Other thread
>> "qtp1401888126-32":
>> at sun.misc.Unsafe.park(Native Method)
>> - parking to wait for  <788d73208> (a
>> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>> at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>> at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>> at 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>> at 
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>> at 
>> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668)
>> at 
>> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
>> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
>> at 
>> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272)
>> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
>> at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
>> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState)
>> at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
>> at org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1465)
>> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730)
>> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1137)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
>> at
>> 
>> Thanks,
>> Erick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to