RE: replication getting stuck on a file

2013-08-01 Thread Petersen, Robert
I have seen this happen before in our 3.6.1 deployment.  It seemed related to 
high JVM memory consumption on the server when our index got too big (ie we 
were close to getting OOMs).   That is probably why restarting solr sort of 
fixes it, assuming the file it is stuck on is the final file and it got 100% of 
it.

Thanks
Robi

-Original Message-
From: Rohit Harchandani [mailto:rhar...@gmail.com] 
Sent: Thursday, August 01, 2013 1:55 PM
To: solr-user@lucene.apache.org
Subject: Re: replication getting stuck on a file

I am facing this problem in solr 4.0 too. Its definitely not related to 
autowarming. It just gets stuck while downloading a file and there is no way to 
abort the replication except restarting solr.


On Wed, Jul 10, 2013 at 6:10 PM, adityab  wrote:

> I have seen this in 4.2.1 too.
> Once replication is finished, on Admin UI we see 100% and time and 
> dlspeed information goes out of wack Same is reflected in mbeans. But 
> whats actually happening in the background is auto-warmup of caches 
> (in my case) May be some minor stats bug
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file
> -tp4076707p4077112.html Sent from the Solr - User mailing list archive 
> at Nabble.com.
>



Re: replication getting stuck on a file

2013-08-01 Thread Rohit Harchandani
I am facing this problem in solr 4.0 too. Its definitely not related to
autowarming. It just gets stuck while downloading a file and there is no
way to abort the replication except restarting solr.


On Wed, Jul 10, 2013 at 6:10 PM, adityab  wrote:

> I have seen this in 4.2.1 too.
> Once replication is finished, on Admin UI we see 100% and time and dlspeed
> information goes out of wack Same is reflected in mbeans. But whats
> actually
> happening in the background is auto-warmup of caches (in my case)
> May be some minor stats bug
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file-tp4076707p4077112.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: replication getting stuck on a file

2013-07-10 Thread adityab
I have seen this in 4.2.1 too. 
Once replication is finished, on Admin UI we see 100% and time and dlspeed
information goes out of wack Same is reflected in mbeans. But whats actually
happening in the background is auto-warmup of caches (in my case)
May be some minor stats bug




--
View this message in context: 
http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file-tp4076707p4077112.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: replication getting stuck on a file

2013-07-10 Thread Erick Erickson
Hmmm, that is kind of funny. I know this is ugly, but what happens if you
1> stop the slave
2> completely delete the data/index directory (directory too, not just contents)
3> fire it back up?

inelegant at best, but if it cures your problem

Erick

On Tue, Jul 9, 2013 at 5:57 PM, Petersen, Robert
 wrote:
> Look at the speed and time remaining on this one, pretty funny:
>
>
> Master   http://ssbuyma01:8983/solr/1/replication
> Latest Index Version:null, Generation: null
> Replicatable Index Version:1276893670202, Generation: 127213
> Poll Interval00:05:00
> Local Index  Index Version: 1276893670108, Generation: 127204
> Location: /var/LucidWorks/lucidworks/solr/1/data/index
> Size: 23.13 GB
> Times Replicated Since Startup: 48874
> Previous Replication Done At: Tue Jul 09 13:12:05 PDT 2013
> Config Files Replicated At: null
> Config Files Replicated: null
> Times Config Files Replicated Since Startup: null
> Next Replication Cycle At: Tue Jul 09 13:17:04 PDT 2013
> Current Replication Status   Start Time: Tue Jul 09 13:12:04 PDT 2013
> Files Downloaded: 10 / 538
> Downloaded: 1.67 MB / 23.13 GB [0.0%]
> Downloading File: _34n2.prx, Downloaded: 140 bytes / 140 bytes [100.0%]
> Time Elapsed: 6203s, Estimated Time Remaining: 88091277s, Speed: 281 bytes/s
>
>
> -Original Message-
> From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com]
> Sent: Tuesday, July 09, 2013 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: replication getting stuck on a file
>
> Hi
>
> My solr 3.6.1 slave farm is suddenly getting stuck during replication.  It 
> seems to stop on a random file on various slaves (not all) and not continue.  
> I've tried stoping and restarting tomcat etc but some slaves just can't get 
> the index pulled down.  Note there is plenty of space on the hard drive.  I 
> don't get it.  Everything else seems fine.  Does this ring a bell for anyone? 
>  I have the slaves set for five minute polling intervals.
>
> Here is what I see in admin page, it just stays on that one file and won't 
> get past it while the speed steadily averages down to 0kbs:
>
> Master   http://ssbuyma01:8983/solr/1/replication
> Latest Index Version:null, Generation: null Replicatable Index 
> Version:1276893670111, Generation: 127205
> Poll Interval00:05:00
> Local Index  Index Version: 1276893670084, Generation: 127202
> Location: /var/LucidWorks/lucidworks/solr/1/data/index
> Size: 23.06 GB
> Times Replicated Since Startup: 48903
> Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files 
> Replicated At: null Config Files Replicated: null Times Config Files 
> Replicated Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 
> EDT 2013
> Current Replication Status   Start Time: Tue Jul 09 12:55:00 EDT 2013
> Files Downloaded: 59 / 486
> Downloaded: 88.73 MB / 23.06 GB [0.0%]
> Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time 
> Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s
>
>
> Robert (Robi) Petersen
> Senior Software Engineer
> Search Department
>
>
>
>
>
>
>
>
>


RE: replication getting stuck on a file

2013-07-09 Thread Petersen, Robert
Look at the speed and time remaining on this one, pretty funny:


Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1276893670202, Generation: 127213
Poll Interval00:05:00
Local Index  Index Version: 1276893670108, Generation: 127204
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.13 GB
Times Replicated Since Startup: 48874
Previous Replication Done At: Tue Jul 09 13:12:05 PDT 2013
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Tue Jul 09 13:17:04 PDT 2013
Current Replication Status   Start Time: Tue Jul 09 13:12:04 PDT 2013
Files Downloaded: 10 / 538
Downloaded: 1.67 MB / 23.13 GB [0.0%]
Downloading File: _34n2.prx, Downloaded: 140 bytes / 140 bytes [100.0%]
Time Elapsed: 6203s, Estimated Time Remaining: 88091277s, Speed: 281 bytes/s


-Original Message-
From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com] 
Sent: Tuesday, July 09, 2013 1:22 PM
To: solr-user@lucene.apache.org
Subject: replication getting stuck on a file

Hi 

My solr 3.6.1 slave farm is suddenly getting stuck during replication.  It 
seems to stop on a random file on various slaves (not all) and not continue.  
I've tried stoping and restarting tomcat etc but some slaves just can't get the 
index pulled down.  Note there is plenty of space on the hard drive.  I don't 
get it.  Everything else seems fine.  Does this ring a bell for anyone?  I have 
the slaves set for five minute polling intervals.

Here is what I see in admin page, it just stays on that one file and won't get 
past it while the speed steadily averages down to 0kbs:

Master   http://ssbuyma01:8983/solr/1/replication
Latest Index Version:null, Generation: null Replicatable Index 
Version:1276893670111, Generation: 127205
Poll Interval00:05:00
Local Index  Index Version: 1276893670084, Generation: 127202
Location: /var/LucidWorks/lucidworks/solr/1/data/index
Size: 23.06 GB
Times Replicated Since Startup: 48903
Previous Replication Done At: Tue Jul 09 12:55:01 EDT 2013 Config Files 
Replicated At: null Config Files Replicated: null Times Config Files Replicated 
Since Startup: null Next Replication Cycle At: Tue Jul 09 13:00:00 EDT 2013
Current Replication Status   Start Time: Tue Jul 09 12:55:00 EDT 2013
Files Downloaded: 59 / 486
Downloaded: 88.73 MB / 23.06 GB [0.0%]
Downloading File: _34mt.fnm, Downloaded: 1.35 MB / 1.35 MB [100.0%] Time 
Elapsed: 691s, Estimated Time Remaining: 183204s, Speed: 131.49 KB/s


Robert (Robi) Petersen
Senior Software Engineer
Search Department