[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762524#action_12762524 ] Noble Paul commented on SOLR-1458: -- I plan to commit this shortly, Please comment if there is any concern Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Yonik Seeley Fix For: 1.4 Attachments: reserve.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762121#action_12762121 ] Noble Paul commented on SOLR-1458: -- bq.If you rely on the replication handler to reserve it, it won't work across a reboot, right? That is currently a problem . if replicateAfter=startup is not specified it does not work even for commits. So that is a known problem. anyway adding a method to reserve a commitpoint forever (till it is unreserved). IndexDeletionPolicyWrapper can have that functionality. So the underlying implementations do not have to bother. and the same functionality can be used by the backup command as well Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Yonik Seeley Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762129#action_12762129 ] Noble Paul commented on SOLR-1458: -- we need to ensure that any component which is using a commit point should reserve it till it is done. After that it should be unreserved. bq.If you rely on the replication handler to reserve it, it won't work across a reboot, right? There is a problem with the current configuration example set replicateAfter=optimize This means that if the master is restarted soon after an optimize , the slaves will not get the new commit point set replicateAfter=optimize replicateAfter=startup This means that the replication will pick up the latest commit point after a master restart. not necessarily an optimized one. So the solution should be , if replicateAfter=commit is absent, then pickup the latest optimized commitpoint Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Yonik Seeley Fix For: 1.4 Attachments: reserve.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761916#action_12761916 ] Yonik Seeley commented on SOLR-1458: Seems like if you want to replicate only optimized indexes, we should recommend the user configure SolrDeletionPolicy to always keep an optimized commit point around. If you rely on the replication handler to reserve it, it won't work across a reboot, right? Although I see no reason for custom delete policies, I'm not really against adding support for that in the replication handler as long as people are confident the changes don't introduce any new bugs. Regardless, I think the separate count for optimized commit points that I added to SolrDeletionPolicy should remain (esp since it fixed other bugs too). Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Yonik Seeley Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761288#action_12761288 ] Yonik Seeley commented on SOLR-1458: Since SolrDeletionPolicy already had the needed functionallity, it seemed like the most straightforward fix. bq. It removes an existing functionality (one can't set a custom deletion policy when replicateAfterCOmmit is set) True - I hadn't considered that. Of course I was confused why we allowed custom deletion policies in the first place. It's a dangerous place to mess around, and there haven't been any identifiable use cases, right? Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Yonik Seeley Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760799#action_12760799 ] Artem Russakovskii commented on SOLR-1458: -- Yonik, everything has been running for a day+ now and replication works as expected. On a side note, I did think that the replication notes are not very clear on the what replicateAfter on the master and pollInterval on the slave are for and what each does. I now understand what each is for but I think they could be explained more clearly. Just a suggestion. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master --
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760325#action_12760325 ] Artem Russakovskii commented on SOLR-1458: -- Is the fix included in the latest nightly? 9/28/09 one. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760354#action_12760354 ] Yonik Seeley commented on SOLR-1458: bq. Is the fix included in the latest nightly? 9/28/09 one. Yep. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759859#action_12759859 ] Shalin Shekhar Mangar commented on SOLR-1458: - bq. I haven't changed any configs yet, and this probably doesn't come as a shock to you guys, but the master just ran out of space. Upon inspection, I found 30+ snapshot dirs sitting around in /data. Artem, the Java replication does not make use of snapshot directories. They are generated if you have backupAfter in your configuration. That feature is only there for people who were using the script replication's snapshot directories for backup purposes. If you don't need it, just remove backupAfter. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759898#action_12759898 ] Yonik Seeley commented on SOLR-1458: Strange stuff - the last error I just saw was a corrupted index exception from the spellchecker - couldn't load the segments_n file. But the spellcheck building code is Lucene code - Solr's deletion policy should have no effect... weird. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code}
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759900#action_12759900 ] Shalin Shekhar Mangar commented on SOLR-1458: - bq. bq. I think this should be the deletion policy that keeps around the last optimized commit point if necessary. Yonik, shouldn't ReplicationHandler be the one to reserve commit points? Also, if we go down this way (having SolrDeletionPolicy decide these things), would a custom deletion policy play nicely with Solr? Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759927#action_12759927 ] Artem Russakovskii commented on SOLR-1458: -- Shalin, I've taken the backupAfter line directly from the SolrReplication wiki which talks about Java based replication: http://wiki.apache.org/solr/SolrReplication. I realize now it says in the comment above that line it's for backup only but why is it there in the first place? It threw me off a bit. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759952#action_12759952 ] Yonik Seeley commented on SOLR-1458: Testing Update: I rebooted my ubuntu box, did a clean solr checkout, re-applied the patch, and got a much higher rate of test passes. Looks like it was Gremlins. Last night I set it up to build continuously in a loop - and got about a 25% failure rate. Problem is, I didn't have it copy out failed tests for inspection, so I don't know why it failed, and it may be as simple as a loss of internet connectivity or DNS service, or apache going down, etc (yes we have tests that rely on external networks - that's a pain). I'm re-running tests now, with a stop on a test failure so I can figure out if anything is actually related to this proposed patch! Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759957#action_12759957 ] Yonik Seeley commented on SOLR-1458: Update: OK... my failures in DirectUpdateHandlerTest turned out to be testExpungeDeletes(). I've committed simpler test code previously attached to SOLR-1275. I was initially thrown off by seeing exceptions in building the spell check index... but the actual test failure was caused by testExpungeDeletes. So - is there a really bug lurking in the spellchecker component? I'm at a loss of how the old testExpungeDeletes code could trigger these exceptions (or of they did/do). It's also possible that these spellcheck exceptions spuriously happened before but they don't cause the test to fail. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759546#action_12759546 ] Yonik Seeley commented on SOLR-1458: Shouldn't there be some kind of option on the deletion policy... say keepLastOptimized? Then the ReplicationHandler would only have to flip it on (if it weren't already on). It doesn't seem like the ReplicationHandler should be the one to pick which commit points to reserve forever. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759671#action_12759671 ] Yonik Seeley commented on SOLR-1458: Looking into this more, I think this should be the deletion policy that keeps around the last optimized commit point if necessary. Also, in checking out SolrDeletionPolicy again, it doesn't seem like the maxCommitsToKeep logic will work if keepOptimizedOnly is true. I'm going to take a whack at rewriting updateCommits() Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759784#action_12759784 ] Artem Russakovskii commented on SOLR-1458: -- I haven't changed any configs yet, and this probably doesn't come as a shock to you guys, but the master just ran out of space. Upon inspection, I found 30+ snapshot dirs sitting around in /data. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759825#action_12759825 ] Yonik Seeley commented on SOLR-1458: Hmmm... just happened onto this bit of odd code: {code} void refreshCommitpoint() { IndexCommit commitPoint = core.getDeletionPolicy().getLatestCommit(); if(replicateOnCommit !commitPoint.isOptimized()){ indexCommitPoint = commitPoint; } if(replicateOnOptimize commitPoint.isOptimized()){ indexCommitPoint = commitPoint; } } {code} Looks like a bug... refreshCommitPoint always updates indexCommitPoint regardless of commitPoint. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759851#action_12759851 ] Lance Norskog commented on SOLR-1458: - I reported [SOLR-1383|https://issues.apache.org/jira/browse/SOLR-1383] a few weeks ago. It is one edge case of what you're all working on. Short version: running add 1 document/commit/replicate continuously is a reliable way to cause the deletion policy to misfire. Try the [detailed test scenario|https://issues.apache.org/jira/browse/SOLR-1383?focusedCommentId=12749190page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12749190]. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759179#action_12759179 ] Yonik Seeley commented on SOLR-1458: The best way to eliminate some of these race conditions would seem to be combine it into a single command. what is your current index version? if greater than 5, please give me the list of new files since then and please reserve them for x milliseconds But at this point (close to 1.4) I guess your patch is the most straightforward fix. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code}
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759247#action_12759247 ] Artem Russakovskii commented on SOLR-1458: -- Re: hit the master with the filelist command. First of all, this may have been a really late night for the person who wrote the wiki: Get list of lucene files present in the index: http://host:port/solr/replication?command=filelistindexversion=index-version-number . The version number can be obtained using the indexversion calmmand The last word there ;-] Now, I hit the master with the following: MASTER/replication/?command=indexversion, get back {code} response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstlong name=indexversion1253136035158/longlong name=generation4447/long /response {code} Then I use this in the following query: MASTER/replication/?command=filelistindexversion=1253136035158 but I get back {code} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime0/int/lststr name=statusinvalid indexversion/str /response {code} I tried the same against an instance that doesn't have the NullPointerException replication problem and still get the same error. Suggestions? Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759290#action_12759290 ] Artem Russakovskii commented on SOLR-1458: -- Also, after an optimize is issued to the master, the problem goes away. For a while, and then starts again. I do optimizes every hour, and commits are ongoing every minute. For instance, after an optimize on the master: {code} response lst name=responseHeaderint name=status0/intint name=QTime0/int/lstarr name=filelistlststr name=name_1bx2.frq/strlong name=lastmodified1253826921000/longlong name=size67855561/long/lstlststr name=name_1bx2.nrm/strlong name=lastmodified1253826921000/longlong name=size2515184/long/lstlststr name=name_1bx2.tii/strlong name=lastmodified1253826921000/longlong name=size581824/long/lstlststr name=name_1bx2.fnm/strlong name=lastmodified1253826906000/longlong name=size132/long/lstlststr name=name_1bx2.fdt/strlong name=lastmodified1253826906000/longlong name=size7805294/long/lstlststr name=name_1bx2.tis/strlong name=lastmodified1253826921000/longlong name=size43326001/long/lstlststr name=name_1bx2.fdx/strlong name=lastmodified1253826906000/longlong name=size4024292/long/lstlststr name=name_1bx2.prx/strlong name=lastmodified1253826921000/longlong name=size47213429/long/lstlststr name=namesegments_19jy/strlong name=lastmodified1253826922000/longlong name=size287/long/lst/arr /response {code} Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759361#action_12759361 ] Noble Paul commented on SOLR-1458: -- Thanks, Artem. You have been quite accommodating w/ my requests. I am already looking into it. Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure what else I can do. SOLR-926 may be relevant. Thanks. -- This message is
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759366#action_12759366 ] Noble Paul commented on SOLR-1458: -- diagnosis: commits keep happening and the optimized commits were getting removed by lucene because there were normal commits after the optimized commits immediete fix (probable): you may add the following snippet to your solrconfig.xml {code:xml} mainIndex deletionPolicy class=solr.SolrDeletionPolicy str name=keepOptimizedOnlytrue/str str name=maxCommitsToKeep3/str /deletionPolicy /mainIndex {code} real fix: if only replicateAfter optimize is present, ReplicationHandler should ensure that the optimized commit does not get deleted (even after normal commits happen). either by reserving it or by changing the policy Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759392#action_12759392 ] Artem Russakovskii commented on SOLR-1458: -- Paul - I'm just glad you guys are so fast to respond and eager to fix. Love OSS :-] Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure what else I can do. SOLR-926 may be relevant. Thanks. --
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758998#action_12758998 ] Noble Paul commented on SOLR-1458: -- can you hit the master with the filelist command and see the output. http://wiki.apache.org/solr/SolrReplication#line-155 Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure what else I can do. SOLR-926 may be relevant. Thanks. -- This message is automatically generated by JIRA. - You can reply to this
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759001#action_12759001 ] Yonik Seeley commented on SOLR-1458: Why would there not be a filelist? Any idea what the underlying error is? Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Attachments: SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure what else I can do. SOLR-926 may be relevant. Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759003#action_12759003 ] Noble Paul commented on SOLR-1458: -- isn't it possible that by the time filelist is invoked the indexcommit of the version is gone ? In that case no files would be available Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Attachments: SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure what else I can do. SOLR-926 may be relevant. Thanks. -- This message is
[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
[ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759004#action_12759004 ] Noble Paul commented on SOLR-1458: -- and we don't reserve the commit after an indexversion command . should we reserve the commitpoint if after an indexversion command? Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly -- Key: SOLR-1458 URL: https://issues.apache.org/jira/browse/SOLR-1458 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Environment: CentOS x64 8GB RAM Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the problem Host a: master Host b: slave Multiple single core Solr instances, using JNDI. Java replication Reporter: Artem Russakovskii Assignee: Noble Paul Attachments: SOLR-1458.patch After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. Here's what I'm getting in the slave logs, repeatedly for each shard: {code} SEVERE: SnapPull failed java.lang.NullPointerException at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. Here's replication.properties on one of the failed slave instances. {code} cat data/replication.properties #Replication details #Wed Sep 23 19:35:30 PDT 2009 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 previousCycleTimeInSeconds=0 timesFailed=113 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 indexReplicatedAt=1253759730020 replicationFailedAt=1253759730020 lastCycleBytesDownloaded=0 timesIndexReplicated=113 {code} and another {code} cat data/replication.properties #Replication details #Wed Sep 23 18:42:01 PDT 2009 replicationFailedAtList=1253756490034,1253756460169 previousCycleTimeInSeconds=1 timesFailed=2 indexReplicatedAtList=1253756521284,1253756490034,1253756460169 indexReplicatedAt=1253756521284 replicationFailedAt=1253756490034 lastCycleBytesDownloaded=22932293 timesIndexReplicated=3 {code} Some relevant configs: In solrconfig.xml: {code} !-- For docs see http://wiki.apache.org/solr/SolrReplication -- requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=backupAfteroptimize/str str name=commitReserveDuration00:00:20/str /lst lst name=slave str name=enable${enable.slave:false}/str !-- url of master, from properties file -- str name=masterUrl${master.url}/str !-- how often to check master -- str name=pollInterval00:00:30/str /lst /requestHandler {code} The slave then has this in solrcore.properties: {code} enable.slave=true master.url=URLOFMASTER/replication {code} and the master has {code} enable.master=true {code} I'd be glad to provide more details but I'm not sure what else I can do. SOLR-926 may be relevant. Thanks. -- This message is automatically