[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-10-06 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762524#action_12762524
 ] 

Noble Paul commented on SOLR-1458:
--

I plan to commit this shortly,
Please comment  if there is any concern

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Yonik Seeley
 Fix For: 1.4

 Attachments: reserve.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SolrDeletionPolicy.patch, SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-10-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762121#action_12762121
 ] 

Noble Paul commented on SOLR-1458:
--

bq.If you rely on the replication handler to reserve it, it won't work across a 
reboot, right?

That is currently a problem . if replicateAfter=startup is not specified it 
does not work even for commits. So that is a known problem. anyway adding a 
method to reserve a commitpoint forever (till it is unreserved). 

IndexDeletionPolicyWrapper can have that functionality. So the underlying 
implementations do not have to bother. and the same functionality can be used 
by the backup command as well

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Yonik Seeley
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-10-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762129#action_12762129
 ] 

Noble Paul commented on SOLR-1458:
--

we need to ensure that any component which is using a commit point should 
reserve it till it is done. After that it should be unreserved.

bq.If you rely on the replication handler to reserve it, it won't work across a 
reboot, right?

There is a problem with the current configuration 

example

set
replicateAfter=optimize

This means that if the master is restarted soon after an optimize , the slaves 
will not get the new commit point
set
replicateAfter=optimize
replicateAfter=startup

This means that the replication will pick up the latest commit point after a 
master restart. not necessarily an optimized one.

So the solution should be , if replicateAfter=commit is absent,  then pickup 
the latest optimized commitpoint 





 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Yonik Seeley
 Fix For: 1.4

 Attachments: reserve.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-10-03 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761916#action_12761916
 ] 

Yonik Seeley commented on SOLR-1458:


Seems like if you want to replicate only optimized indexes, we should recommend 
the user configure SolrDeletionPolicy to always keep an optimized commit point 
around.
If you rely on the replication handler to reserve it, it won't work across a 
reboot, right?

Although I see no reason for custom delete policies, I'm not really against 
adding support for that in the replication handler as long as people are 
confident the changes don't introduce any new bugs.
Regardless, I think the separate count for optimized commit points that I added 
to SolrDeletionPolicy should remain (esp since it fixed other bugs too).

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Yonik Seeley
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-10-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761288#action_12761288
 ] 

Yonik Seeley commented on SOLR-1458:


Since SolrDeletionPolicy already had the needed functionallity, it seemed like 
the most straightforward fix.

bq. It removes an existing functionality (one can't set a custom deletion 
policy when replicateAfterCOmmit is set) 

True - I hadn't considered that.  Of course I was confused why we allowed 
custom deletion policies in the first place.
It's a dangerous place to mess around, and there haven't been any identifiable 
use cases, right?


 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Yonik Seeley
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-29 Thread Artem Russakovskii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760799#action_12760799
 ] 

Artem Russakovskii commented on SOLR-1458:
--

Yonik, everything has been running for a day+ now and replication works as 
expected.

On a side note, I did think that the replication notes are not very clear on 
the what replicateAfter on the master and pollInterval on the slave are for and 
what each does. I now understand what each is for but I think they could be 
explained more clearly. Just a suggestion.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
   

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-28 Thread Artem Russakovskii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760325#action_12760325
 ] 

Artem Russakovskii commented on SOLR-1458:
--

Is the fix included in the latest nightly? 9/28/09 one.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760354#action_12760354
 ] 

Yonik Seeley commented on SOLR-1458:


bq. Is the fix included in the latest nightly? 9/28/09 one. 

Yep.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-26 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759859#action_12759859
 ] 

Shalin Shekhar Mangar commented on SOLR-1458:
-

bq. I haven't changed any configs yet, and this probably doesn't come as a 
shock to you guys, but the master just ran out of space. Upon inspection, I 
found 30+ snapshot dirs sitting around in /data.

Artem, the Java replication does not make use of snapshot directories. They are 
generated if you have backupAfter in your configuration. That feature is only 
there for people who were using the script replication's snapshot directories 
for backup purposes. If you don't need it, just remove backupAfter.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759898#action_12759898
 ] 

Yonik Seeley commented on SOLR-1458:


Strange stuff - the last error I just saw was a corrupted index exception from 
the spellchecker - couldn't load the segments_n file.
But the spellcheck building code is Lucene code - Solr's deletion policy should 
have no effect... weird.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-26 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759900#action_12759900
 ] 

Shalin Shekhar Mangar commented on SOLR-1458:
-

bq. bq. I think this should be the deletion policy that keeps around the last 
optimized commit point if necessary.

Yonik, shouldn't ReplicationHandler be the one to reserve commit points? Also, 
if we go down this way (having SolrDeletionPolicy decide these things), would a 
custom deletion policy play nicely with Solr?

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-26 Thread Artem Russakovskii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759927#action_12759927
 ] 

Artem Russakovskii commented on SOLR-1458:
--

Shalin, I've taken the backupAfter line directly from the SolrReplication wiki 
which talks about Java based replication: 
http://wiki.apache.org/solr/SolrReplication. I realize now it says in the 
comment above that line it's for backup only but why is it there in the first 
place? It threw me off a bit.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759952#action_12759952
 ] 

Yonik Seeley commented on SOLR-1458:


Testing Update: I rebooted my ubuntu box, did a clean solr checkout, re-applied 
the patch, and got a much higher rate of test passes.  Looks like it was 
Gremlins.

Last night I set it up to build continuously in a loop - and got about a 25% 
failure rate.  Problem is, I didn't have it copy out failed tests for 
inspection,
 so I don't know why it failed, and it may be as simple as a loss of internet 
connectivity or DNS service, or apache going down, etc (yes we have tests
 that rely on external networks - that's a pain).

I'm re-running tests now, with a stop on a test failure so I can figure out if 
anything is actually related to this proposed patch!

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759957#action_12759957
 ] 

Yonik Seeley commented on SOLR-1458:


Update: OK... my failures in DirectUpdateHandlerTest turned out to be 
testExpungeDeletes().  I've committed simpler test code previously attached to 
SOLR-1275.
I was initially thrown off by seeing exceptions in building the spell check 
index... but the actual test failure was caused by testExpungeDeletes.

So - is there a really bug lurking in the spellchecker component? I'm at a loss 
of how the old testExpungeDeletes code could trigger these exceptions (or of 
they did/do).  It's also possible that these spellcheck exceptions spuriously 
happened before but they don't cause the test to fail.


 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759546#action_12759546
 ] 

Yonik Seeley commented on SOLR-1458:


Shouldn't there be some kind of option on the deletion policy... say 
keepLastOptimized?
Then the ReplicationHandler would only have to flip it on (if it weren't 
already on).  It doesn't seem like the ReplicationHandler should be the one to 
pick which commit points to reserve forever.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759671#action_12759671
 ] 

Yonik Seeley commented on SOLR-1458:


Looking into this more, I think this should be the deletion policy that keeps 
around the last optimized commit point if necessary.
Also, in checking out SolrDeletionPolicy again, it doesn't seem like the 
maxCommitsToKeep logic will work if keepOptimizedOnly is true.
I'm going to take a whack at rewriting updateCommits()

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-25 Thread Artem Russakovskii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759784#action_12759784
 ] 

Artem Russakovskii commented on SOLR-1458:
--

I haven't changed any configs yet, and this probably doesn't come as a shock to 
you guys, but the master just ran out of space. Upon inspection, I found 30+ 
snapshot dirs sitting around in /data.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759825#action_12759825
 ] 

Yonik Seeley commented on SOLR-1458:


Hmmm... just happened onto this bit of odd code:

{code}
  void refreshCommitpoint() {
IndexCommit commitPoint = core.getDeletionPolicy().getLatestCommit();
if(replicateOnCommit  !commitPoint.isOptimized()){
  indexCommitPoint = commitPoint;
}
if(replicateOnOptimize  commitPoint.isOptimized()){
  indexCommitPoint = commitPoint;
}
  }
{code}

Looks like a bug... refreshCommitPoint always updates indexCommitPoint 
regardless of commitPoint.


 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SolrDeletionPolicy.patch, SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-25 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759851#action_12759851
 ] 

Lance Norskog commented on SOLR-1458:
-

I reported [SOLR-1383|https://issues.apache.org/jira/browse/SOLR-1383] a few 
weeks ago. It is one edge case of what you're all working on.  

Short version: running add 1 document/commit/replicate continuously is a 
reliable way to cause the deletion policy to misfire.

Try the [detailed test 
scenario|https://issues.apache.org/jira/browse/SOLR-1383?focusedCommentId=12749190page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12749190].


 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch, 
 SOLR-1458.patch, SOLR-1458.patch, SolrDeletionPolicy.patch, 
 SolrDeletionPolicy.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759179#action_12759179
 ] 

Yonik Seeley commented on SOLR-1458:


The best way to eliminate some of these race conditions would seem to be 
combine it into a single command.
what is your current index version?  if greater than 5, please give me the 
list of new files since then and please reserve them for x milliseconds

But at this point (close to 1.4) I guess your patch is the most straightforward 
fix.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-24 Thread Artem Russakovskii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759247#action_12759247
 ] 

Artem Russakovskii commented on SOLR-1458:
--

Re: hit the master with the filelist command.

First of all, this may have been a really late night for the person who wrote 
the wiki:
Get list of lucene files present in the index: 
http://host:port/solr/replication?command=filelistindexversion=index-version-number
 . The version number can be obtained using the indexversion calmmand
The last word there ;-]

Now, I hit the master with the following: 
MASTER/replication/?command=indexversion, get back
{code}
response 
lst name=responseHeaderint name=status0/intint 
name=QTime0/int/lstlong name=indexversion1253136035158/longlong 
name=generation4447/long 
/response 
{code}

Then I use this in the following query: 
MASTER/replication/?command=filelistindexversion=1253136035158

but I get back
{code}
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime0/int/lststr name=statusinvalid indexversion/str
/response
{code}

I tried the same against an instance that doesn't have the NullPointerException 
replication problem and still get the same error.

Suggestions?

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-24 Thread Artem Russakovskii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759290#action_12759290
 ] 

Artem Russakovskii commented on SOLR-1458:
--

Also, after an optimize is issued to the master, the problem goes away. For a 
while, and then starts again. I do optimizes every hour, and commits are 
ongoing every minute.

For instance, after an optimize on the master:
{code}
response 
lst name=responseHeaderint name=status0/intint 
name=QTime0/int/lstarr name=filelistlststr 
name=name_1bx2.frq/strlong name=lastmodified1253826921000/longlong 
name=size67855561/long/lstlststr name=name_1bx2.nrm/strlong 
name=lastmodified1253826921000/longlong 
name=size2515184/long/lstlststr name=name_1bx2.tii/strlong 
name=lastmodified1253826921000/longlong 
name=size581824/long/lstlststr name=name_1bx2.fnm/strlong 
name=lastmodified1253826906000/longlong 
name=size132/long/lstlststr name=name_1bx2.fdt/strlong 
name=lastmodified1253826906000/longlong 
name=size7805294/long/lstlststr name=name_1bx2.tis/strlong 
name=lastmodified1253826921000/longlong 
name=size43326001/long/lstlststr name=name_1bx2.fdx/strlong 
name=lastmodified1253826906000/longlong 
name=size4024292/long/lstlststr name=name_1bx2.prx/strlong 
name=lastmodified1253826921000/longlong 
name=size47213429/long/lstlststr name=namesegments_19jy/strlong 
name=lastmodified1253826922000/longlong 
name=size287/long/lst/arr 
/response 
{code}

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759361#action_12759361
 ] 

Noble Paul commented on SOLR-1458:
--

Thanks, Artem.
You have been quite accommodating w/ my requests. I am already looking into it.

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure what else I can do.  
 SOLR-926 may be relevant.
 Thanks.

-- 
This message is 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759366#action_12759366
 ] 

Noble Paul commented on SOLR-1458:
--

diagnosis: commits keep happening and the optimized commits were getting 
removed by lucene because there were normal commits after the optimized commits

immediete fix (probable): 

you may add the following snippet to your solrconfig.xml 

{code:xml}
  mainIndex
deletionPolicy class=solr.SolrDeletionPolicy
str name=keepOptimizedOnlytrue/str
str name=maxCommitsToKeep3/str
/deletionPolicy
/mainIndex
{code}

real fix:

if only replicateAfter optimize is present, ReplicationHandler should ensure 
that the optimized commit does not get deleted (even after normal commits 
happen). either by reserving it or by changing the policy



 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
  

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-24 Thread Artem Russakovskii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759392#action_12759392
 ] 

Artem Russakovskii commented on SOLR-1458:
--

Paul - I'm just glad you guys are so fast to respond and eager to fix. Love OSS 
:-]

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure what else I can do.  
 SOLR-926 may be relevant.
 Thanks.

-- 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758998#action_12758998
 ] 

Noble Paul commented on SOLR-1458:
--

can you hit the master with the filelist command  and see the output. 

http://wiki.apache.org/solr/SolrReplication#line-155

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul

 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure what else I can do.  
 SOLR-926 may be relevant.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759001#action_12759001
 ] 

Yonik Seeley commented on SOLR-1458:


Why would there not be a filelist? Any idea what the underlying error is?

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Attachments: SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure what else I can do.  
 SOLR-926 may be relevant.
 Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759003#action_12759003
 ] 

Noble Paul commented on SOLR-1458:
--

isn't it possible that by the time filelist is invoked the indexcommit of the 
version is gone ? In that case no files would be available

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Attachments: SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure what else I can do.  
 SOLR-926 may be relevant.
 Thanks.

-- 
This message is 

[jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly

2009-09-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759004#action_12759004
 ] 

Noble Paul commented on SOLR-1458:
--

and we don't reserve the commit after an indexversion command . should we 
reserve the commitpoint if after an indexversion command?

 Java Replication error: NullPointerException SEVERE: SnapPull failed on 
 2009-09-22 nightly
 --

 Key: SOLR-1458
 URL: https://issues.apache.org/jira/browse/SOLR-1458
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
 Environment: CentOS x64
 8GB RAM
 Tomcat, running with 7G max memory; memory usage is 2GB, so it's not the 
 problem
 Host a: master
 Host b: slave
 Multiple single core Solr instances, using JNDI.
 Java replication
Reporter: Artem Russakovskii
Assignee: Noble Paul
 Attachments: SOLR-1458.patch


 After finally figuring out the new Java based replication, we have started 
 both the slave and the master and issued optimize to all master Solr 
 instances. This triggered some replication to go through just fine, but it 
 looks like some of it is failing.
 Here's what I'm getting in the slave logs, repeatedly for each shard:
 {code} 
 SEVERE: SnapPull failed 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
 at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
 at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {code} 
 If I issue an optimize again on the master to one of the shards, it then 
 triggers a replication and replicates OK. I have a feeling that these 
 SnapPull failures appear later on but right now I don't have enough to form a 
 pattern.
 Here's replication.properties on one of the failed slave instances.
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 19:35:30 PDT 2009
 replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 previousCycleTimeInSeconds=0
 timesFailed=113
 indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
 indexReplicatedAt=1253759730020
 replicationFailedAt=1253759730020
 lastCycleBytesDownloaded=0
 timesIndexReplicated=113
 {code}
 and another
 {code}
 cat data/replication.properties 
 #Replication details
 #Wed Sep 23 18:42:01 PDT 2009
 replicationFailedAtList=1253756490034,1253756460169
 previousCycleTimeInSeconds=1
 timesFailed=2
 indexReplicatedAtList=1253756521284,1253756490034,1253756460169
 indexReplicatedAt=1253756521284
 replicationFailedAt=1253756490034
 lastCycleBytesDownloaded=22932293
 timesIndexReplicated=3
 {code}
 Some relevant configs:
 In solrconfig.xml:
 {code}
 !-- For docs see http://wiki.apache.org/solr/SolrReplication --
   requestHandler name=/replication class=solr.ReplicationHandler 
 lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAfteroptimize/str
 str name=backupAfteroptimize/str
 str name=commitReserveDuration00:00:20/str
 /lst
 lst name=slave
 str name=enable${enable.slave:false}/str
 !-- url of master, from properties file --
 str name=masterUrl${master.url}/str
 !-- how often to check master --
 str name=pollInterval00:00:30/str
 /lst
   /requestHandler
 {code}
 The slave then has this in solrcore.properties:
 {code}
 enable.slave=true
 master.url=URLOFMASTER/replication
 {code}
 and the master has
 {code}
 enable.master=true
 {code}
 I'd be glad to provide more details but I'm not sure what else I can do.  
 SOLR-926 may be relevant.
 Thanks.

-- 
This message is automatically