RE: replication getting stuck on a file
I have seen this happen before in our 3.6.1 deployment. It seemed related to high JVM memory consumption on the server when our index got too big (ie we were close to getting OOMs). That is probably why restarting solr sort of fixes it, assuming the file it is stuck on is the final file and it got 100% of it. Thanks Robi -Original Message- From: Rohit Harchandani [mailto:rhar...@gmail.com] Sent: Thursday, August 01, 2013 1:55 PM To: solr-user@lucene.apache.org Subject: Re: replication getting stuck on a file I am facing this problem in solr 4.0 too. Its definitely not related to autowarming. It just gets stuck while downloading a file and there is no way to abort the replication except restarting solr. On Wed, Jul 10, 2013 at 6:10 PM, adityab wrote: > I have seen this in 4.2.1 too. > Once replication is finished, on Admin UI we see 100% and time and > dlspeed information goes out of wack Same is reflected in mbeans. But > whats actually happening in the background is auto-warmup of caches > (in my case) May be some minor stats bug > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file > -tp4076707p4077112.html Sent from the Solr - User mailing list archive > at Nabble.com. >
Re: replication getting stuck on a file
I am facing this problem in solr 4.0 too. Its definitely not related to autowarming. It just gets stuck while downloading a file and there is no way to abort the replication except restarting solr. On Wed, Jul 10, 2013 at 6:10 PM, adityab wrote: > I have seen this in 4.2.1 too. > Once replication is finished, on Admin UI we see 100% and time and dlspeed > information goes out of wack Same is reflected in mbeans. But whats > actually > happening in the background is auto-warmup of caches (in my case) > May be some minor stats bug > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file-tp4076707p4077112.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: replication getting stuck on a file
I have seen this in 4.2.1 too. Once replication is finished, on Admin UI we see 100% and time and dlspeed information goes out of wack Same is reflected in mbeans. But whats actually happening in the background is auto-warmup of caches (in my case) May be some minor stats bug -- View this message in context: http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file-tp4076707p4077112.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replication getting stuck on a file
Hmmm, that is kind of funny. I know this is ugly, but what happens if you 1> stop the slave 2> completely delete the data/index directory (directory too, not just contents) 3> fire it back up? inelegant at best, but if it cures your problem Erick On Tue, Jul 9, 2013 at 5:57 PM, Petersen, Robert wrote: > Look at the speed and time remaining on this one, pretty funny: > > > Master http://ssbuyma01:8983/solr/1/replication > Latest Index Version:null, Generation: null > Replicatable Index Version:1276893670202, Generation: 127213 > Poll Interval00:05:00 > Local Index Index Version: 1276893670108, Generation: 127204 > Location: /var/LucidWorks/lucidworks/solr/1/data/index > Size: 23.13 GB > Times Replicated Since Startup: 48874 > Previous Replication Done At: Tue Jul 09 13:12:05 PDT 2013 > Config Files Replicated At: null > Config Files Replicated: null > Times Config Files Replicated Since Startup: null > Next Replication Cycle At: Tue Jul 09 13:17:04 PDT 2013 > Current Replication Status Start Time: Tue Jul 09 13:12:04 PDT 2013 > Files Downloaded: 10 / 538 > Downloaded: 1.67 MB / 23.13 GB [0.0%] > Downloading File: _34n2.prx, Downloaded: 140 bytes / 140 bytes [100.0%] > Time Elapsed: 6203s, Estimated Time Remaining: 88091277s, Speed: 281 bytes/s > > > -Original Message- > From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] > Sent: Tuesday, July 09, 2013 1:22 PM > To: solr-user@lucene.apache.org > Subject: replication getting stuck on a file > > Hi > > My solr 3.6.1 slave farm is suddenly getting stuck during replication. It > seems to stop on a random file on various slaves (not all) and not continue. > I've tried stoping and restarting tomcat etc but some slaves just can't get > the index pulled down. Note there is plenty of space on the hard drive. I > don't get it. Everything else seems fine. Does this ring a bell for anyone? > I have the slaves set for five minute polling intervals. > > Here is what I see in admin page, it just stays on that one file and won't > get past it while the speed steadily averages down to 0kbs: > > Master http://ssbuyma01:8983/solr/1/replication > Latest Index Version:null, Generation: null Replicatable Index > Version:1276893670111, Generation: 127205 > Poll Interval00:05:00 > Local Index Index Version: 1276893670084, Generation: 127202 > Location: /var/LucidWorks/lucidworks/solr/1/data/index > Size: 23.06 GB > Times Replicated Since Startup: 48903 > Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files > Replicated At: null Config Files Replicated: null Times Config Files > Replicated Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 > EDT 2013 > Current Replication Status Start Time: Tue Jul 09 12:55:00 EDT 2013 > Files Downloaded: 59 / 486 > Downloaded: 88.73 MB / 23.06 GB [0.0%] > Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time > Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s > > > Robert (Robi) Petersen > Senior Software Engineer > Search Department > > > > > > > > >
RE: replication getting stuck on a file
Look at the speed and time remaining on this one, pretty funny: Master http://ssbuyma01:8983/solr/1/replication Latest Index Version:null, Generation: null Replicatable Index Version:1276893670202, Generation: 127213 Poll Interval00:05:00 Local Index Index Version: 1276893670108, Generation: 127204 Location: /var/LucidWorks/lucidworks/solr/1/data/index Size: 23.13 GB Times Replicated Since Startup: 48874 Previous Replication Done At: Tue Jul 09 13:12:05 PDT 2013 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Tue Jul 09 13:17:04 PDT 2013 Current Replication Status Start Time: Tue Jul 09 13:12:04 PDT 2013 Files Downloaded: 10 / 538 Downloaded: 1.67 MB / 23.13 GB [0.0%] Downloading File: _34n2.prx, Downloaded: 140 bytes / 140 bytes [100.0%] Time Elapsed: 6203s, Estimated Time Remaining: 88091277s, Speed: 281 bytes/s -Original Message- From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] Sent: Tuesday, July 09, 2013 1:22 PM To: solr-user@lucene.apache.org Subject: replication getting stuck on a file Hi My solr 3.6.1 slave farm is suddenly getting stuck during replication. It seems to stop on a random file on various slaves (not all) and not continue. I've tried stoping and restarting tomcat etc but some slaves just can't get the index pulled down. Note there is plenty of space on the hard drive. I don't get it. Everything else seems fine. Does this ring a bell for anyone? I have the slaves set for five minute polling intervals. Here is what I see in admin page, it just stays on that one file and won't get past it while the speed steadily averages down to 0kbs: Master http://ssbuyma01:8983/solr/1/replication Latest Index Version:null, Generation: null Replicatable Index Version:1276893670111, Generation: 127205 Poll Interval00:05:00 Local Index Index Version: 1276893670084, Generation: 127202 Location: /var/LucidWorks/lucidworks/solr/1/data/index Size: 23.06 GB Times Replicated Since Startup: 48903 Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files Replicated At: null Config Files Replicated: null Times Config Files Replicated Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013 Current Replication Status Start Time: Tue Jul 09 12:55:00 EDT 2013 Files Downloaded: 59 / 486 Downloaded: 88.73 MB / 23.06 GB [0.0%] Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s Robert (Robi) Petersen Senior Software Engineer Search Department