[jira] [Commented] (SOLR-1144) replication hang

2011-03-29 Thread Vadim Kisselmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012407#comment-13012407
 ] 

Vadim Kisselmann commented on SOLR-1144:


After a few tests, I think I've located the problem. It's probably the Solr 
caches.

If I deactivate the caches in solrconfig.xml, replication works fine. But if 
any of them are active, the replication slows down.

Disabling the caches isn't an option for me since the query times gets way too 
long.

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1144) replication hang

2011-03-28 Thread Vadim Kisselmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012047#comment-13012047
 ] 

Vadim Kisselmann commented on SOLR-1144:


I have Solr running on one master and two slaves (load balanced) via
Solr 1.4.1 native replication.

If the load is low, both slaves replicate with around 100MB/s from master.
After a couple of hours the replication slows down to 100KB/s.
So the problem is still there.

I tested it with both Jetty and Tomcat.
It looks like that aggressive JVM-Options can delay the problem, but then it 
starts anyway.

My Index is about 100GB, i use 10GB for JVM, 24GB total.
The slaves polls every 5 minutes.

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884644#action_12884644
 ] 

Toby Cole commented on SOLR-1144:
-

Just over a year since it was first spotted, I'm consistently getting the same 
symptoms as this bug.
We've got a single master, with two slaves polling it, both slaves have stalled 
at exactly the same point in the replication.

Here's the relevent section of the replication handler's 'details' response:
Node A
{code:xml}
  str name=numFilesDownloaded18/str
  str name=replicationStartTimeFri Jul 02 10:40:00 BST 2010/str
  str name=timeElapsed6683s/str
  str name=currentFile_9du.prx/str
  str name=currentFileSize8.17 MB/str
  str name=currentFileSizeDownloaded8.17 MB/str
  str name=currentFileSizePercent100.0/str
  str name=bytesDownloaded40.55 MB/str
  str name=totalPercent0.0/str
  str name=timeRemaining8290722s/str
  str name=downloadSpeed6.21 KB/str
{code}

Node B
{code:xml}
  str name=numFilesDownloaded18/str
  str name=replicationStartTimeFri Jul 02 10:40:00 BST 2010/str
  str name=timeElapsed6752s/str
  str name=currentFile_9du.prx/str
  str name=currentFileSize8.17 MB/str
  str name=currentFileSizeDownloaded8.17 MB/str
  str name=currentFileSizePercent100.0/str
  str name=bytesDownloaded40.55 MB/str
  str name=totalPercent0.0/str
  str name=timeRemaining8376322s/str
  str name=downloadSpeed6.15 KB/str
{code}

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884690#action_12884690
 ] 

Yonik Seeley commented on SOLR-1144:


Thanks for the stack traces Toby!

Interesting... seems like the commit in the slave blocked...
{code}
at 
org.apache.solr.common.util.ConcurrentLRUCache.getLatestAccessedItems(ConcurrentLRUCache.java:276)
{code}

So perhaps another thread locked, but didn't unlock the lock?

SOLR-1538 did fix something that could possibly lead to a deadlock, but it's 
super unlikely (a very small object allocation would have to fail at just the 
right spot).  Still, if this is easy enough to reproduce, could you try Solr 
1.4.1 and see if it's fixed?  (and if it hangs again, be sure to get stack 
traces... they are super helpful!)

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884693#action_12884693
 ] 

Toby Cole commented on SOLR-1144:
-

Oh yes, should have mentioned... we're already on Solr 1.4.1 in production as 
of yesterday (we don't hang about y'know ;) ).

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884717#action_12884717
 ] 

Yonik Seeley commented on SOLR-1144:


The odd thing is that the line numbers in the stack traces don't match up for 
either 1.4.0 or 1.4.1
Specifically ConcurrentLRUCache.java:276 is in the middle of markAndSweep() in 
both versions (as opposed to getLatestAccessedItems() which your stack trace 
would suggest).

Are these stack traces from stock 1.4.0 or 1.41?  If so, does anyone have a 
clue why the line numbers would be off?


 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1144) replication hang

2010-07-02 Thread Toby Cole (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884719#action_12884719
 ] 

Toby Cole commented on SOLR-1144:
-

I know exactly why the line numbers would be off. I just remembered we're using 
a custom war package so we can add our own plugins in (yes, I know we can use 
solr.home/lib, but we've not got round to that yet).

The only classes we're overriding from solr are ConcurrentLRUCache and 
FastLRUCache. This was from pre solr 1.4, when the cache implementations were 
slowing faceting right down.
I have a feeling if I remove those overridden classes and use the new 
(bug-free) ones, the hang may stop.

I'll give it a go now, sorry in advance if it was my oversight that is causing 
this bug to re-appear.
T

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: stacktrace-master.txt, stacktrace-slave-1.txt, 
 stacktrace-slave-2.txt


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1144) replication hang

2009-05-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707405#action_12707405
 ] 

Yonik Seeley commented on SOLR-1144:


bq. ReplicationHandler does not cause the hang on the master.

The slave is waiting forever, but it *could* be due to a bug on either the 
master or the slave, and it could be due to the replication handler.  It could 
also be another Solr bug somewhere, or it could be a Tomcat bug.

What is apparent is that since there is no replication stack trace on the 
master, it thinks it finished the file send (either that or got an exception), 
but the slave is still expecting more for some reason.  Perhaps if we used 
non-persistent connections for replication, the master would close the 
connection when it thought it had sent everything?


 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1144) replication hang

2009-05-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707421#action_12707421
 ] 

Noble Paul commented on SOLR-1144:
--

The master closes the connection if everything is written.  if the download of 
a file is complete slave also closes the stream . The fact that the slave 
continued to wait means the file has not been downloaded completely. 

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1144) replication hang

2009-05-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707425#action_12707425
 ] 

Yonik Seeley commented on SOLR-1144:


bq. The master closes the connection if everything is written. 

Hmmm, that doesn't jive with the slave hanging on a read though... seems like 
the only way read() should block is if there is no more data to read currently 
and the socket is still open.

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1144) replication hang

2009-05-07 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706868#action_12706868
 ] 

Noble Paul commented on SOLR-1144:
--

ReplicationHandler does not cause the hang on the master. On the slave the 
SnapPuller was waiting forever which I hope would have fixed with SOLR-1096

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1144) replication hang

2009-05-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706199#action_12706199
 ] 

Yonik Seeley commented on SOLR-1144:


Hmmm, I had trouble finding SOLR-1096 before.
But it looks like it was used mainly for adding a timeout.  There's still an 
underlying bug somewhere, right?

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1144) replication hang

2009-05-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706302#action_12706302
 ] 

Noble Paul commented on SOLR-1144:
--

the stacktrace http://markmail.org/message/ecr6m4rf4iy2d652 . 

I suspect the following two threads are blocked

{code}
'NioBlockingSelector.BlockPoller-2' Id=10, RUNNABLE on lock=, total cpu
time=5580.ms user time=2120.ms
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSe
lector.java:305)
'NioBlockingSelector.BlockPoller-1' Id=9, RUNNABLE on lock=, total cpu
time=333280.ms user time=107520.ms
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollrrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSe
lector.java:305)
{code}



 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
 Fix For: 1.4


 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1144) replication hang

2009-05-04 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705891#action_12705891
 ] 

Noble Paul commented on SOLR-1144:
--

isn't this same as SOLR-1096 ?

 replication hang
 

 Key: SOLR-1144
 URL: https://issues.apache.org/jira/browse/SOLR-1144
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley

 It seems that replication can sometimes hang.
 http://www.lucidimagination.com/search/document/403305a3fda18599

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.