Re: CDCR stress-test issues

2020-07-17 Thread Jörn Franke
Instead of CDCR you may simply duplicate the pipeline across both data centers. 
Then there is no need at each step of the pipeline to replicate (storage to 
storage, index to index etc.).
Instead both pipelines run in different data centers in parallel.

> Am 24.06.2020 um 15:46 schrieb Oakley, Craig (NIH/NLM/NCBI) [C] 
> :
> 
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
> 
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
> 
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
> 
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
> 
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
> 
> Are  there any suggestions?
> 
> Thanks


RE: CDCR stress-test issues

2020-07-17 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Yes, I saw that yesterday.

I guess that I was not the only one who noticed the unreliability after all.

-Original Message-
From: Ishan Chattopadhyaya  
Sent: Friday, July 17, 2020 1:17 AM
To: solr-user 
Subject: Re: CDCR stress-test issues

FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>


Re: CDCR stress-test issues

2020-07-16 Thread Ishan Chattopadhyaya
FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>


RE: CDCR stress-test issues

2020-07-01 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
For the record, it is not just Solr7.4 which has the problem. When I start 
afresh with Solr8.5.2, both symptoms persist.

With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the 
Source SolrCloud and are never released regardless of maxNumLogsToKeep setting

And with Solr8.5.2, if four scripts run simultaneously for a few minutes, each 
script running a loop each iteration of which adds batches of 6 records to the 
Source SolrCloud, a couple dozen records wind up on the Source without ever 
arriving at the Target SolrCloud (although the Target does have records which 
were added after the missing records).

Does anyone yet have any suggestion how to get CDCR to work properly?


-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C]  
Sent: Wednesday, June 24, 2020 9:46 AM
To: solr-user@lucene.apache.org
Subject: CDCR stress-test issues

In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
couple of issues.

One is that the tlog files keep accumulating for some nodes in the CDCR system, 
particularly for the non-Leader nodes in the Source SolrCloud. No quantity of 
hard commits seem to cause any of these tlog files to be released. This can 
become a problem upon reboot if there are hundreds of thousands of tlog files, 
and Solr fails to start (complaining that there are too many open files).

The tlogs had been accumulating on all the nodes of the CDCR set of SolrClouds 
until I added these two lines to the solrconfig.xml file (for testing purposes, 
using numbers much lower than in the examples):
5
2
Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
clean up the tlog files, as does the Leader of the Source SolrCloud). If I use 
ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, and if 
I then start adding more data, the tlogs on the new Leader sometimes will go 
away, but then the old Leader begins accumulating tlog files. I am dubious 
whether frequent reassignment of Leadership would be a practical solution.

I also have several times attempted to simulate a production environment by 
running several loops simultaneously, each of which inserts multiple records on 
each iteration of the loop. Several times, I end up with a dozen records on 
(both replicas of) the Source which never make it to (either replica of) the 
Target. The Target has thousands of records which were inserted before the 
missing records, and thousands of records which were inserted after the missing 
records (and all these records, the replicated and the missing, were inserted 
by curl commands which only differed in sequential numbers incorporated into 
the values being inserted).

I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
7.5, 7.6": does that indicate that Solr 7.4 is not affected?

Are  there any suggestions?

Thanks


Re: CDCR stress-test issues

2020-06-24 Thread matthew sporleder
On Wed, Jun 24, 2020 at 9:46 AM Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
>
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks

Just going to "me too" where i've had (non cdcr) installs accumulate
tlogs until eventual rebuilds or crashes.