Re: Solr 5.2.1 replication hangs possibly during segment merge after a delete operation

2019-01-29 Thread Ravi Prakash
Thanks.

I am not explicitly asking solr to optimize. I do send -commit yes in the POST 
command when I execute the delete query.

In the master-slave node where replication is hung I see this:

On the master:
-bash-4.1$ ls -al data/index/segments_*
-rw-rw-r--. 1 u g 1269 Jan 29 16:23 data/index/segments_13w4 
 
On the Slave:
-bash-4.1$ ls -al data/index.20181027004100961/segment*
-rw-rw-r--. 1 u g 1594 Jan 28 22:23 data/index.20181027004100961/segments_13no

And Slave replication admin page is spinning on 
Current File: segments_13no0 bytes / 1.56 KB [0%]

Usually I have seen and experienced that when it is unable to download a 
specific file from master, current replication fails (not sure if it times out) 
and triggers a full copy. Or at least I am able to abort replication from the 
UI.
In this case where it cannot find a segment file itself, possibly because a 
merge happened on the master and the segment file was recreated in master, it 
is not able to find the old segment file that the slave is looking for and 
instead of triggering a full copy but it just sits there. Abort Replication 
does nothing. Tried the abortfetch api as well which returns OK but UI 
continues to spin...

Service solr stop/start makes the next poll for replication succeed.

Not all master-slave setup hangs this way. They are also running the delete 
cron job once daily. That makes me believe that some deletes after some 
threshold is crossed kicks off a merge that creates a new segments file and 
that blocks the replication after that...

Ravi

On 1/29/19, 4:44 AM, "Shawn Heisey"  wrote:

Sent by an external sender
--

On 1/28/2019 5:39 PM, Ravi Prakash wrote:
> I have a situation where I am trying to setup a once daily cron job on 
the master node to delete old documents from the index based on our retention 
policy.

This reply may not do you any good.  Just wanted you to know up front 
that I might not be helpful.

> The cron job basically does this (min and max are a day dange):
>  DELETE="\"started:[${MINDATE} TO ${MAXDELDATE}]\""
>   /opt/solr/bin/post -c  -type application/json -out yes 
-commit yes -d {delete:{query:"$DELETE"}}

That is a delete by query.

Are you possibly asking Solr to background optimize the index before you 
do the deleteByQuery?  Because if you do something that begins a merge, 
then issue a deleteByQuery while the merge is happening, the delete and 
all further changes to the index will pause until the merge is done.  An 
optimize is a forced merge and can take a very long time to complete. 
Getting around that problem involves using deleteById instead of 
deleteByQuery.

I have no idea whether replication would be affected by the blocking 
that deleteByQuery causes.  I wouldn't expect it to be affected, but 
I've been surprised by Solr's behavior before.

Thanks,
Shawn




Re: Solr 5.2.1 replication hangs possibly during segment merge after a delete operation

2019-01-29 Thread Shawn Heisey

On 1/28/2019 5:39 PM, Ravi Prakash wrote:

I have a situation where I am trying to setup a once daily cron job on the 
master node to delete old documents from the index based on our retention 
policy.


This reply may not do you any good.  Just wanted you to know up front 
that I might not be helpful.



The cron job basically does this (min and max are a day dange):
 DELETE="\"started:[${MINDATE} TO ${MAXDELDATE}]\""
  /opt/solr/bin/post -c  -type application/json -out yes -commit yes -d 
{delete:{query:"$DELETE"}}


That is a delete by query.

Are you possibly asking Solr to background optimize the index before you 
do the deleteByQuery?  Because if you do something that begins a merge, 
then issue a deleteByQuery while the merge is happening, the delete and 
all further changes to the index will pause until the merge is done.  An 
optimize is a forced merge and can take a very long time to complete. 
Getting around that problem involves using deleteById instead of 
deleteByQuery.


I have no idea whether replication would be affected by the blocking 
that deleteByQuery causes.  I wouldn't expect it to be affected, but 
I've been surprised by Solr's behavior before.


Thanks,
Shawn


Solr 5.2.1 replication hangs possibly during segment merge after a delete operation

2019-01-28 Thread Ravi Prakash
I have a situation where I am trying to setup a once daily cron job on the 
master node to delete old documents from the index based on our retention 
policy.

I delete only 1days worth of data based on my schema which deletes couple of 
1000 docs and not more. This is a test cluster and the doc counts and size is 
not very high: Num Docs:515727; Max Doc:591322; Heap Memory Usage:-1; Deleted 
Docs:75595 And Index Version Gen Size Master (Searching) 1548694802284 51396 
969.28 MB Master (Replicable) 1548694802284 51396 - Slave (Searching) 
1548694802284 51396 969.28 MB

Sometimes I notice the replication hangs. No errors but it is trying to 
download a segments_* file (e.g. segments_1bnx7) and just sits there. No logs. 
I am unable to stop replication (using abortfetch) once it reaches this state. 
Disable polling works (which is set to 60 seconds) but that doesn't help. The 
only thing that helps is a service solr stop/start. Then the next poll works, 
and the slave version/gen/size/doc count/deleted counts matches the master.

Not every delete cron execution hangs. The segment file I notice being 
downloaded during the “hung” state is no longer available in the master. The 
master has already created a new segment* file.

The cron job basically does this (min and max are a day dange):
DELETE="\"started:[${MINDATE} TO ${MAXDELDATE}]\""
 /opt/solr/bin/post -c  -type application/json -out yes -commit 
yes -d {delete:{query:"$DELETE"}}

Any ideas?

Thanks.
Ravi