[jira] Commented: (SOLR-1144) replication hang
[ https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707405#action_12707405 ] Yonik Seeley commented on SOLR-1144: bq. ReplicationHandler does not cause the hang on the master. The slave is waiting forever, but it *could* be due to a bug on either the master or the slave, and it could be due to the replication handler. It could also be another Solr bug somewhere, or it could be a Tomcat bug. What is apparent is that since there is no replication stack trace on the master, it thinks it finished the file send (either that or got an exception), but the slave is still expecting more for some reason. Perhaps if we used non-persistent connections for replication, the master would close the connection when it thought it had sent everything? replication hang Key: SOLR-1144 URL: https://issues.apache.org/jira/browse/SOLR-1144 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Fix For: 1.4 It seems that replication can sometimes hang. http://www.lucidimagination.com/search/document/403305a3fda18599 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1144) replication hang
[ https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707421#action_12707421 ] Noble Paul commented on SOLR-1144: -- The master closes the connection if everything is written. if the download of a file is complete slave also closes the stream . The fact that the slave continued to wait means the file has not been downloaded completely. replication hang Key: SOLR-1144 URL: https://issues.apache.org/jira/browse/SOLR-1144 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Fix For: 1.4 It seems that replication can sometimes hang. http://www.lucidimagination.com/search/document/403305a3fda18599 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1144) replication hang
[ https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707425#action_12707425 ] Yonik Seeley commented on SOLR-1144: bq. The master closes the connection if everything is written. Hmmm, that doesn't jive with the slave hanging on a read though... seems like the only way read() should block is if there is no more data to read currently and the socket is still open. replication hang Key: SOLR-1144 URL: https://issues.apache.org/jira/browse/SOLR-1144 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Fix For: 1.4 It seems that replication can sometimes hang. http://www.lucidimagination.com/search/document/403305a3fda18599 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1144) replication hang
[ https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706868#action_12706868 ] Noble Paul commented on SOLR-1144: -- ReplicationHandler does not cause the hang on the master. On the slave the SnapPuller was waiting forever which I hope would have fixed with SOLR-1096 replication hang Key: SOLR-1144 URL: https://issues.apache.org/jira/browse/SOLR-1144 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Fix For: 1.4 It seems that replication can sometimes hang. http://www.lucidimagination.com/search/document/403305a3fda18599 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1144) replication hang
[ https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706199#action_12706199 ] Yonik Seeley commented on SOLR-1144: Hmmm, I had trouble finding SOLR-1096 before. But it looks like it was used mainly for adding a timeout. There's still an underlying bug somewhere, right? replication hang Key: SOLR-1144 URL: https://issues.apache.org/jira/browse/SOLR-1144 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Fix For: 1.4 It seems that replication can sometimes hang. http://www.lucidimagination.com/search/document/403305a3fda18599 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1144) replication hang
[ https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706302#action_12706302 ] Noble Paul commented on SOLR-1144: -- the stacktrace http://markmail.org/message/ecr6m4rf4iy2d652 . I suspect the following two threads are blocked {code} 'NioBlockingSelector.BlockPoller-2' Id=10, RUNNABLE on lock=, total cpu time=5580.ms user time=2120.ms at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSe lector.java:305) 'NioBlockingSelector.BlockPoller-1' Id=9, RUNNABLE on lock=, total cpu time=333280.ms user time=107520.ms at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollrrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSe lector.java:305) {code} replication hang Key: SOLR-1144 URL: https://issues.apache.org/jira/browse/SOLR-1144 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Fix For: 1.4 It seems that replication can sometimes hang. http://www.lucidimagination.com/search/document/403305a3fda18599 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1144) replication hang
[ https://issues.apache.org/jira/browse/SOLR-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705891#action_12705891 ] Noble Paul commented on SOLR-1144: -- isn't this same as SOLR-1096 ? replication hang Key: SOLR-1144 URL: https://issues.apache.org/jira/browse/SOLR-1144 Project: Solr Issue Type: Bug Reporter: Yonik Seeley It seems that replication can sometimes hang. http://www.lucidimagination.com/search/document/403305a3fda18599 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.