RE: CDCR

2020-12-01 Thread Gell-Holleron, Daniel
Hi Shalin, 

Just to add, in the exception with 'CdcrUpdateLogSynchronizer - Caught 
unexpected exception' it says it's because the SolrCore is loading. I don't 
know if this is down to the data being quite large? 

Thanks, 

Daniel 

-Original Message-
From: Gell-Holleron, Daniel 
Sent: 01 December 2020 11:49
To: solr-user@lucene.apache.org
Subject: RE: CDCR

Hi Shalin, 

I did try to do that but it hadn't made any difference, the remote clusters did 
not update. Autocommit is already set on the remote clusters as follows:


${solr.autoCommit.maxTime:15}
1
true


I can also see in the Solr admin pages, there is a warn message with 
CdcrUpdateLogSynchronizer - Caught unexpected exception. That's all I can see 
at the moment. No errors apart from that. 

Thanks, 

Daniel


-Original Message-
From: Shalin Shekhar Mangar 
Sent: 29 November 2020 05:19
To: solr-user@lucene.apache.org
Subject: Re: CDCR

EXTERNAL EMAIL - Be cautious of all links and attachments.

If you manually issue a commit operation on the remote clusters, do you see any 
updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing 
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates 
> (with no errors) even though the solr servers its replicating to 
> aren't updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be 
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

--
Regards,
Shalin Shekhar Mangar.


RE: CDCR

2020-12-01 Thread Gell-Holleron, Daniel
Hi Shalin, 

I did try to do that but it hadn't made any difference, the remote clusters did 
not update. Autocommit is already set on the remote clusters as follows:


${solr.autoCommit.maxTime:15}
1
true


I can also see in the Solr admin pages, there is a warn message with 
CdcrUpdateLogSynchronizer - Caught unexpected exception. That's all I can see 
at the moment. No errors apart from that. 

Thanks, 

Daniel


-Original Message-
From: Shalin Shekhar Mangar  
Sent: 29 November 2020 05:19
To: solr-user@lucene.apache.org
Subject: Re: CDCR

EXTERNAL EMAIL - Be cautious of all links and attachments.

If you manually issue a commit operation on the remote clusters, do you see any 
updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing 
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates 
> (with no errors) even though the solr servers its replicating to 
> aren't updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be 
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

--
Regards,
Shalin Shekhar Mangar.


Re: CDCR

2020-11-28 Thread Shalin Shekhar Mangar
If you manually issue a commit operation on the remote clusters, do you see
any updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel <
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates
> (with no errors) even though the solr servers its replicating to aren't
> updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

-- 
Regards,
Shalin Shekhar Mangar.


CDCR

2020-11-25 Thread Gell-Holleron, Daniel
Hello,

Does anybody have advice on why CDCR would say its Forwarding updates (with no 
errors) even though the solr servers its replicating to aren't updating?

We have just under 50 million documents, that are spread across 4 servers. Each 
server has a node each.

One side is updating happily so would think that sharding wouldn't be needed at 
this point?

We are using Solr version 7.7.1.

Thanks,

Daniel



Re: UpdateProcessorChains -cdcr processor along with ignore commit processor

2020-07-18 Thread Shawn Heisey

On 7/15/2020 11:39 PM, Natarajan, Rajeswari wrote:

Resending this again as I still could not make this work. So would like to know 
if this is even possible to have
both solr.CdcrUpdateProcessorFactory and 
solr.IgnoreCommitOptimizeUpdateProcessorFactory  in solrconfig.xml and get both 
functionalities work.


You need to create one update chain that uses both processors.  Only one 
update chain can be applied at a time.  So create one chain with all the 
processors you meed and use that.


Your config has two chains.  Only one of them can be active on each update.

Thanks,
Shawn


Re: CDCR stress-test issues

2020-07-17 Thread Jörn Franke
Instead of CDCR you may simply duplicate the pipeline across both data centers. 
Then there is no need at each step of the pipeline to replicate (storage to 
storage, index to index etc.).
Instead both pipelines run in different data centers in parallel.

> Am 24.06.2020 um 15:46 schrieb Oakley, Craig (NIH/NLM/NCBI) [C] 
> :
> 
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
> 
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
> 
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
> 
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
> 
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
> 
> Are  there any suggestions?
> 
> Thanks


RE: CDCR stress-test issues

2020-07-17 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Yes, I saw that yesterday.

I guess that I was not the only one who noticed the unreliability after all.

-Original Message-
From: Ishan Chattopadhyaya  
Sent: Friday, July 17, 2020 1:17 AM
To: solr-user 
Subject: Re: CDCR stress-test issues

FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>


Re: CDCR stress-test issues

2020-07-16 Thread Ishan Chattopadhyaya
FYI, CDCR support, as it exists in Solr today, has been deprecated in 8.6.
It suffers from serious design flaws and it allows such things to happen
that you observe. While there may be workarounds, it is advisable to not
rely on CDCR in production.

Thanks,
Ishan

On Thu, 2 Jul, 2020, 1:12 am Oakley, Craig (NIH/NLM/NCBI) [C],
 wrote:

> For the record, it is not just Solr7.4 which has the problem. When I start
> afresh with Solr8.5.2, both symptoms persist.
>
> With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the
> Source SolrCloud and are never released regardless of maxNumLogsToKeep
> setting
>
> And with Solr8.5.2, if four scripts run simultaneously for a few minutes,
> each script running a loop each iteration of which adds batches of 6
> records to the Source SolrCloud, a couple dozen records wind up on the
> Source without ever arriving at the Target SolrCloud (although the Target
> does have records which were added after the missing records).
>
> Does anyone yet have any suggestion how to get CDCR to work properly?
>
>
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Wednesday, June 24, 2020 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: CDCR stress-test issues
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR
> system, particularly for the non-Leader nodes in the Source SolrCloud. No
> quantity of hard commits seem to cause any of these tlog files to be
> released. This can become a problem upon reboot if there are hundreds of
> thousands of tlog files, and Solr fails to start (complaining that there
> are too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of
> SolrClouds until I added these two lines to the solrconfig.xml file (for
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud
> which accumulates tlog files (the Target SolrCloud does seem to have a
> tendency to clean up the tlog files, as does the Leader of the Source
> SolrCloud). If I use ADDREPLICAPROP and REBALANCELEADERS to change which
> node is the Leader, and if I then start adding more data, the tlogs on the
> new Leader sometimes will go away, but then the old Leader begins
> accumulating tlog files. I am dubious whether frequent reassignment of
> Leadership would be a practical solution.
>
> I also have several times attempted to simulate a production environment
> by running several loops simultaneously, each of which inserts multiple
> records on each iteration of the loop. Several times, I end up with a dozen
> records on (both replicas of) the Source which never make it to (either
> replica of) the Target. The Target has thousands of records which were
> inserted before the missing records, and thousands of records which were
> inserted after the missing records (and all these records, the replicated
> and the missing, were inserted by curl commands which only differed in
> sequential numbers incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says
> that the fix for Solr 7.3 had a problem; and the header says "Affects
> Version/s: 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks
>


UpdateProcessorChains -cdcr processor along with ignore commit processor

2020-07-15 Thread Natarajan, Rajeswari
Resending this again as I still could not make this work. So would like to know 
if this is even possible to have
both solr.CdcrUpdateProcessorFactory and 
solr.IgnoreCommitOptimizeUpdateProcessorFactory  in solrconfig.xml and get both 
functionalities work.
Please let me know.

Thank you,
Rajeswari
 
On 7/14/20, 12:40 PM, "Natarajan, Rajeswari"  
wrote:

Hi ,

Would like to have these two processors (cdcr and ignorecommit)  in 
solrconfig.xml .

    But cdcr fails with below error , with either cdcr-processor-chain or 
ignore-commit-from-client chain

version conflict for 60d35f0850afac66 expected=1671629672447737856 
actual=-1, retry=0 commError=false errorCode=409



 

  

    cdcr-processor-chain

  







  

  









  

200

  



  





Also tried as below. Getting error comaplining about custom processor



  

custom

  
>









 200






  



Is there a way these two processors can be applied together.

Thanks,
Rajeswari



UpdateProcessorChains -cdcr processor along with ignore commit processor

2020-07-14 Thread Natarajan, Rajeswari
Hi ,

Would like to have these two processors (cdcr and ignorecommit)  in 
solrconfig.xml .

But cdcr fails with below error , with either cdcr-processor-chain or 
ignore-commit-from-client chain

version conflict for 60d35f0850afac66 expected=1671629672447737856 
actual=-1, retry=0 commError=false errorCode=409



 

  

cdcr-processor-chain

  







  

  









  

200

  



  





Also tried as below. Getting error comaplining about custom processor



  

custom

  
>









 200






  



Is there a way these two processors can be applied together.

Thanks,
Rajeswari


RE: CDCR stress-test issues

2020-07-01 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
For the record, it is not just Solr7.4 which has the problem. When I start 
afresh with Solr8.5.2, both symptoms persist.

With Solr8.5.2, tlogs accumulate endlessly at the non-Leader nodes of the 
Source SolrCloud and are never released regardless of maxNumLogsToKeep setting

And with Solr8.5.2, if four scripts run simultaneously for a few minutes, each 
script running a loop each iteration of which adds batches of 6 records to the 
Source SolrCloud, a couple dozen records wind up on the Source without ever 
arriving at the Target SolrCloud (although the Target does have records which 
were added after the missing records).

Does anyone yet have any suggestion how to get CDCR to work properly?


-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C]  
Sent: Wednesday, June 24, 2020 9:46 AM
To: solr-user@lucene.apache.org
Subject: CDCR stress-test issues

In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
couple of issues.

One is that the tlog files keep accumulating for some nodes in the CDCR system, 
particularly for the non-Leader nodes in the Source SolrCloud. No quantity of 
hard commits seem to cause any of these tlog files to be released. This can 
become a problem upon reboot if there are hundreds of thousands of tlog files, 
and Solr fails to start (complaining that there are too many open files).

The tlogs had been accumulating on all the nodes of the CDCR set of SolrClouds 
until I added these two lines to the solrconfig.xml file (for testing purposes, 
using numbers much lower than in the examples):
5
2
Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
clean up the tlog files, as does the Leader of the Source SolrCloud). If I use 
ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, and if 
I then start adding more data, the tlogs on the new Leader sometimes will go 
away, but then the old Leader begins accumulating tlog files. I am dubious 
whether frequent reassignment of Leadership would be a practical solution.

I also have several times attempted to simulate a production environment by 
running several loops simultaneously, each of which inserts multiple records on 
each iteration of the loop. Several times, I end up with a dozen records on 
(both replicas of) the Source which never make it to (either replica of) the 
Target. The Target has thousands of records which were inserted before the 
missing records, and thousands of records which were inserted after the missing 
records (and all these records, the replicated and the missing, were inserted 
by curl commands which only differed in sequential numbers incorporated into 
the values being inserted).

I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
7.5, 7.6": does that indicate that Solr 7.4 is not affected?

Are  there any suggestions?

Thanks


Re: SOLR CDCR fails with JWT authorization configuration

2020-06-26 Thread Jan Høydahl
I found this in the documentation 
https://lucene.apache.org/solr/guide/8_5/cdcr-architecture.html#cdcr-limitations
 
<https://lucene.apache.org/solr/guide/8_5/cdcr-architecture.html#cdcr-limitations>
 :

    CDCR doesn’t support Basic Authentication features across clusters.

The JIRA for adding this capability is 
https://issues.apache.org/jira/browse/SOLR-11959 
<https://issues.apache.org/jira/browse/SOLR-11959> but it went stale in 2019.
You may add a comment there and hope for some traction, but don’t hold your 
breath...

Jan

> 26. jun. 2020 kl. 06:34 skrev Phatkar, Swapnil (Contractor) 
> :
> 
> Hi,
> 
> CDCR might be deprecated really soon now -->  In this case will it be there 
> any alternate to this. 
> 
> However, if this turns out to be not supported or a bug, then we can file a 
> JIRA issue.  --> it will be great if you raise the JIRA ticket for it. So 
> that we will be more clear that how does it response 
> To such scenario : 1. CDCR with https and JWT authentication   and the 
> necessary settings for it including security.json.
> 
> 
> Thanks 
> Swapnil 
> 
> 
> 
> -Original Message-
> From: Jan Høydahl  
> Sent: Thursday, June 25, 2020 6:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR CDCR fails with JWT authorization configuration
> 
> EXTERNAL SENDER:   Exercise caution with links and attachments.
> 
> I’m mostly trying to identify whether what you are trying to to is a 
> supported option at all, or of perhaps CDCR is only tested without 
> authentication in place.
> You would also be interested in the fact that CDCR might be deprecated really 
> soon now, see 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11718=DwIFaQ=7gn0PlAmraV3zr-k385KhKAz9NTx0dwockj5vIsr5Sw=wQj2B5ci2ikx0AXWDp1ftYhkwteAsJcW-MBY4WoYz1A=VnSEvEi02eWt0BicxkJixew62AkT8xPFWcdVyny0UOc=xaIigYTYxurNRitDyLsqVfTreB0Kz15mR69HnhGbKSI=
>   
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11718=DwIFaQ=7gn0PlAmraV3zr-k385KhKAz9NTx0dwockj5vIsr5Sw=wQj2B5ci2ikx0AXWDp1ftYhkwteAsJcW-MBY4WoYz1A=VnSEvEi02eWt0BicxkJixew62AkT8xPFWcdVyny0UOc=xaIigYTYxurNRitDyLsqVfTreB0Kz15mR69HnhGbKSI=
>  > CDCR is complex. JWT is complex. Combining the two might bee too much 
> unknown territory for beginners.
> 
> However, if this turns out to be not supported or a bug, then we can file a 
> JIRA issue. So far I hope that someone else with CDCR can give JWT a try to 
> reproduce what you are seeing.
> 
> Jan
> 
>> 25. jun. 2020 kl. 15:06 skrev Phatkar, Swapnil (Contractor) 
>> :
>> 
>> Hi,
>> 
>> 
>> 1. Solr is relying on PKI for the request (one cluster sends PKI 
>> header to the node in the other cluster)
>> -- > I have not configured anything explicitly. Just followed the steps 
>> mention 
>> @https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F4_cdcr-2Dconfig.html=DwIFaQ=7gn0PlAmraV3zr-k385KhKAz9NTx0dwockj5vIsr5Sw=wQj2B5ci2ikx0AXWDp1ftYhkwteAsJcW-MBY4WoYz1A=VnSEvEi02eWt0BicxkJixew62AkT8xPFWcdVyny0UOc=EHiRmseTycUfJdAgsdWoz1qiE9Y3DATFD4qPh0CkSig=
>>  . Is there any additional step ?
>> 
>> 2. That fails since the sending node is unknown to the receiving node 
>> since it is in another cluster
>> -->  I think that obvious because Source cluster and Target clusters 
>> --> are different. What I know is once we configure zkhost of Target 
>> --> cluster in Source cluster in solrconfig.xml it establish 
>> --> connection. But I will
>> like to know is there any other setting ?
>> 
>> 3. Have you tried BasicAuth and do you have the same issue then?
>> --> Nope . We were using  "class":"solr.JWTAuthPlugin" . Do I need to add 
>> authorization also to overcome JWT authorization ??
>> 
>> 
>> Can you please guide me considering me as newbie :) . And it will be 
>> also good to get sample security.json
>> 
>> Thanks
>> 
>> -Original Message-
>> From: Jan Høydahl 
>> Sent: Thursday, June 25, 2020 5:25 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SOLR CDCR fails with JWT authorization configuration
>> 
>> EXTERNAL SENDER:   Exercise caution with links and attachments.
>> 
>> Sorry, there is no forwardCredentials parameter for JWT, it is implicit. 
>> 
>> But from the response we can see two things:
>> 
>> 1. Solr is relying on PKI for the request (one cluster sends PKI 
>> header to the node in the other cluster) 2. That fails since the 
>> sending node is unknown to the receiving node since it is in another 
>> clus

RE: SOLR CDCR fails with JWT authorization configuration

2020-06-25 Thread Phatkar, Swapnil (Contractor)
Hi,

CDCR might be deprecated really soon now -->  In this case will it be there any 
alternate to this. 

However, if this turns out to be not supported or a bug, then we can file a 
JIRA issue.  --> it will be great if you raise the JIRA ticket for it. So that 
we will be more clear that how does it response 
To such scenario : 1. CDCR with https and JWT authentication   and the 
necessary settings for it including security.json.


Thanks 
Swapnil 



-Original Message-
From: Jan Høydahl  
Sent: Thursday, June 25, 2020 6:50 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR CDCR fails with JWT authorization configuration

EXTERNAL SENDER:   Exercise caution with links and attachments.

I’m mostly trying to identify whether what you are trying to to is a supported 
option at all, or of perhaps CDCR is only tested without authentication in 
place.
You would also be interested in the fact that CDCR might be deprecated really 
soon now, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11718=DwIFaQ=7gn0PlAmraV3zr-k385KhKAz9NTx0dwockj5vIsr5Sw=wQj2B5ci2ikx0AXWDp1ftYhkwteAsJcW-MBY4WoYz1A=VnSEvEi02eWt0BicxkJixew62AkT8xPFWcdVyny0UOc=xaIigYTYxurNRitDyLsqVfTreB0Kz15mR69HnhGbKSI=
  
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11718=DwIFaQ=7gn0PlAmraV3zr-k385KhKAz9NTx0dwockj5vIsr5Sw=wQj2B5ci2ikx0AXWDp1ftYhkwteAsJcW-MBY4WoYz1A=VnSEvEi02eWt0BicxkJixew62AkT8xPFWcdVyny0UOc=xaIigYTYxurNRitDyLsqVfTreB0Kz15mR69HnhGbKSI=
 > CDCR is complex. JWT is complex. Combining the two might bee too much 
unknown territory for beginners.

However, if this turns out to be not supported or a bug, then we can file a 
JIRA issue. So far I hope that someone else with CDCR can give JWT a try to 
reproduce what you are seeing.

Jan

> 25. jun. 2020 kl. 15:06 skrev Phatkar, Swapnil (Contractor) 
> :
> 
> Hi,
> 
> 
> 1. Solr is relying on PKI for the request (one cluster sends PKI 
> header to the node in the other cluster)
> -- > I have not configured anything explicitly. Just followed the steps 
> mention 
> @https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F4_cdcr-2Dconfig.html=DwIFaQ=7gn0PlAmraV3zr-k385KhKAz9NTx0dwockj5vIsr5Sw=wQj2B5ci2ikx0AXWDp1ftYhkwteAsJcW-MBY4WoYz1A=VnSEvEi02eWt0BicxkJixew62AkT8xPFWcdVyny0UOc=EHiRmseTycUfJdAgsdWoz1qiE9Y3DATFD4qPh0CkSig=
>  . Is there any additional step ?
> 
> 2. That fails since the sending node is unknown to the receiving node 
> since it is in another cluster
> -->  I think that obvious because Source cluster and Target clusters 
> --> are different. What I know is once we configure zkhost of Target 
> --> cluster in Source cluster in solrconfig.xml it establish 
> --> connection. But I will
> like to know is there any other setting ?
> 
> 3. Have you tried BasicAuth and do you have the same issue then?
> --> Nope . We were using  "class":"solr.JWTAuthPlugin" . Do I need to add 
> authorization also to overcome JWT authorization ??
> 
> 
> Can you please guide me considering me as newbie :) . And it will be 
> also good to get sample security.json
> 
> Thanks
> 
> -Original Message-
> From: Jan Høydahl 
> Sent: Thursday, June 25, 2020 5:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR CDCR fails with JWT authorization configuration
> 
> EXTERNAL SENDER:   Exercise caution with links and attachments.
> 
> Sorry, there is no forwardCredentials parameter for JWT, it is implicit. 
> 
> But from the response we can see two things:
> 
> 1. Solr is relying on PKI for the request (one cluster sends PKI 
> header to the node in the other cluster) 2. That fails since the 
> sending node is unknown to the receiving node since it is in another 
> cluster
> 
> I’m not familiar with the CDCR code used here. Have you tried BasicAuth and 
> do you have the same issue then?
> 
> Jan
> 
> 
>> 25. jun. 2020 kl. 13:20 skrev Phatkar, Swapnil (Contractor) 
>> :
>> 
>> 
>> 
>> Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have 
>> to make sure to forward JWT -- How do I forward JWT from source to target 
>> server ??
>> You could try 'forwardCredentials:true' in security.json -- How can I try  
>> this ?
>> 
>> Can you suggest me sample security.json which will address my issue mention 
>> in below mail trail:
>> 
>> I have security.json as given below : ( its just the format and 
>> values are removed as per policy )
>> 
>> {
>> "authentication":{
>>   "class":"solr.JWTAuthPlugin",
>>   "blockUnknown":true,
>>  "requireIss":false,

Re: SOLR CDCR fails with JWT authorization configuration

2020-06-25 Thread Jan Høydahl
I’m mostly trying to identify whether what you are trying to to is a supported 
option at all, or of perhaps CDCR is only tested without authentication in 
place.
You would also be interested in the fact that CDCR might be deprecated really 
soon now, see https://issues.apache.org/jira/browse/SOLR-11718 
<https://issues.apache.org/jira/browse/SOLR-11718>
CDCR is complex. JWT is complex. Combining the two might bee too much unknown 
territory for beginners.

However, if this turns out to be not supported or a bug, then we can file a 
JIRA issue. So far I hope that someone else with CDCR can give JWT a try to 
reproduce what you are seeing.

Jan

> 25. jun. 2020 kl. 15:06 skrev Phatkar, Swapnil (Contractor) 
> :
> 
> Hi,
> 
> 
> 1. Solr is relying on PKI for the request (one cluster sends PKI header to 
> the node in the other cluster) 
> -- > I have not configured anything explicitly. Just followed the steps 
> mention @https://lucene.apache.org/solr/guide/8_4/cdcr-config.html. Is there 
> any additional step ?
> 
> 2. That fails since the sending node is unknown to the receiving node since 
> it is in another cluster 
> -->  I think that obvious because Source cluster and Target clusters are 
> different. What I know is once we configure zkhost of Target cluster in 
> Source cluster in solrconfig.xml it establish connection. But I will 
> like to know is there any other setting ?
> 
> 3. Have you tried BasicAuth and do you have the same issue then?
> --> Nope . We were using  "class":"solr.JWTAuthPlugin" . Do I need to add 
> authorization also to overcome JWT authorization ??
> 
> 
> Can you please guide me considering me as newbie :) . And it will be also 
> good to get sample security.json
> 
> Thanks 
> 
> -Original Message-
> From: Jan Høydahl  
> Sent: Thursday, June 25, 2020 5:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR CDCR fails with JWT authorization configuration
> 
> EXTERNAL SENDER:   Exercise caution with links and attachments.
> 
> Sorry, there is no forwardCredentials parameter for JWT, it is implicit. 
> 
> But from the response we can see two things:
> 
> 1. Solr is relying on PKI for the request (one cluster sends PKI header to 
> the node in the other cluster) 2. That fails since the sending node is 
> unknown to the receiving node since it is in another cluster
> 
> I’m not familiar with the CDCR code used here. Have you tried BasicAuth and 
> do you have the same issue then?
> 
> Jan
> 
> 
>> 25. jun. 2020 kl. 13:20 skrev Phatkar, Swapnil (Contractor) 
>> :
>> 
>> 
>> 
>> Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have 
>> to make sure to forward JWT -- How do I forward JWT from source to target 
>> server ??
>> You could try 'forwardCredentials:true' in security.json -- How can I try  
>> this ?
>> 
>> Can you suggest me sample security.json which will address my issue mention 
>> in below mail trail:
>> 
>> I have security.json as given below : ( its just the format and values 
>> are removed as per policy )
>> 
>> {
>> "authentication":{
>>   "class":"solr.JWTAuthPlugin",
>>   "blockUnknown":true,
>>  "requireIss":false,
>>  "requireExp":false,
>>  "issuers":[
>>  {
>>  "name":
>>  "clientId":
>>  "jwk":{
>>  "kty":"RSA",
>>  "n":
>>  "e":
>>  "d":
>>  "p":
>>  "q":
>>  "dp":
>>  "dq":
>>  "qi":
>>  "alg":"RS256",
>>  "kid":
>>  "use":
>>  }
>>  }
>>  ]
>> }
>> }
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: Jan Høydahl 
>> Sent: Thursday, June 25, 2020 1:19 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SOLR CDCR fails with JWT authorization configuration
>> 
>> EXTERNAL SENDER:   Exercise caution with links and attachments.
>> 
>> Are both clusters setup with the same Identity Provider, so the

RE: SOLR CDCR fails with JWT authorization configuration

2020-06-25 Thread Phatkar, Swapnil (Contractor)
Hi,


1. Solr is relying on PKI for the request (one cluster sends PKI header to the 
node in the other cluster) 
-- > I have not configured anything explicitly. Just followed the steps mention 
@https://lucene.apache.org/solr/guide/8_4/cdcr-config.html. Is there any 
additional step ?

2. That fails since the sending node is unknown to the receiving node since it 
is in another cluster 
-->  I think that obvious because Source cluster and Target clusters are 
different. What I know is once we configure zkhost of Target cluster in Source 
cluster in solrconfig.xml it establish connection. But I will 
like to know is there any other setting ?

3. Have you tried BasicAuth and do you have the same issue then?
--> Nope . We were using  "class":"solr.JWTAuthPlugin" . Do I need to add 
authorization also to overcome JWT authorization ??


Can you please guide me considering me as newbie :) . And it will be also good 
to get sample security.json

Thanks 

-Original Message-
From: Jan Høydahl  
Sent: Thursday, June 25, 2020 5:25 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR CDCR fails with JWT authorization configuration

EXTERNAL SENDER:   Exercise caution with links and attachments.

Sorry, there is no forwardCredentials parameter for JWT, it is implicit. 

But from the response we can see two things:

1. Solr is relying on PKI for the request (one cluster sends PKI header to the 
node in the other cluster) 2. That fails since the sending node is unknown to 
the receiving node since it is in another cluster

I’m not familiar with the CDCR code used here. Have you tried BasicAuth and do 
you have the same issue then?

Jan


> 25. jun. 2020 kl. 13:20 skrev Phatkar, Swapnil (Contractor) 
> :
> 
> 
> 
> Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have to 
> make sure to forward JWT -- How do I forward JWT from source to target server 
> ??
> You could try 'forwardCredentials:true' in security.json -- How can I try  
> this ?
> 
> Can you suggest me sample security.json which will address my issue mention 
> in below mail trail:
> 
> I have security.json as given below : ( its just the format and values 
> are removed as per policy )
> 
> {
>  "authentication":{
>"class":"solr.JWTAuthPlugin",
>"blockUnknown":true,
>   "requireIss":false,
>   "requireExp":false,
>   "issuers":[
>   {
>   "name":
>   "clientId":
>   "jwk":{
>   "kty":"RSA",
>   "n":
>   "e":
>   "d":
>   "p":
>   "q":
>   "dp":
>   "dq":
>   "qi":
>   "alg":"RS256",
>   "kid":
>   "use":
>   }
>   }
>   ]
>  }
> }
> 
> 
> 
> 
> -Original Message-
> From: Jan Høydahl 
> Sent: Thursday, June 25, 2020 1:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR CDCR fails with JWT authorization configuration
> 
> EXTERNAL SENDER:   Exercise caution with links and attachments.
> 
> Are both clusters setup with the same Identity Provider, so the same JWT 
> token would be valid for both clusters?
> 
> If so, it should be (theoretically) possible to have the clusters talk to 
> each other, if you can get them to forward the Authorization header with the 
> JWT.
> Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have to 
> make sure to forward JWT and not just rely on PKI.
> PKI won’t work since the two clusters have different ZK and Solr by default 
> only trust PKI between nodes registered in ZK.
> 
> You could try 'forwardCredentials:true' in security.json, but I’m not sure 
> that is enough here. There may be code changes needed in CDCR components.
> 
> Jan
> 
>> 24. jun. 2020 kl. 19:42 skrev Phatkar, Swapnil (Contractor) 
>> :
>> 
>> Hi Team ,
>> 
>> I am trying to configure CDCR for SOLR 8.4.1 .
>> With the provided configuration I can able to replicate the indexes from 
>> Source server to Target server. This setup even works with SSL configuration 
>> using Https protocol.
>> But the moment I have introduced JWT authorization by enforcing 
>> security.json on both the server. I g

Re: SOLR CDCR fails with JWT authorization configuration

2020-06-25 Thread Jan Høydahl
Sorry, there is no forwardCredentials parameter for JWT, it is implicit. 

But from the response we can see two things:

1. Solr is relying on PKI for the request (one cluster sends PKI header to the 
node in the other cluster)
2. That fails since the sending node is unknown to the receiving node since it 
is in another cluster

I’m not familiar with the CDCR code used here. Have you tried BasicAuth and do 
you have the same issue then?

Jan


> 25. jun. 2020 kl. 13:20 skrev Phatkar, Swapnil (Contractor) 
> :
> 
> 
> 
> Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have to 
> make sure to forward JWT -- How do I forward JWT from source to target server 
> ??
> You could try 'forwardCredentials:true' in security.json -- How can I try  
> this ?
> 
> Can you suggest me sample security.json which will address my issue mention 
> in below mail trail:
> 
> I have security.json as given below : ( its just the format and values are 
> removed as per policy )
> 
> {
>  "authentication":{
>"class":"solr.JWTAuthPlugin",
>"blockUnknown":true,
>   "requireIss":false,
>   "requireExp":false,
>   "issuers":[
>   {
>   "name":
>   "clientId":
>   "jwk":{
>   "kty":"RSA",
>   "n":
>   "e":
>   "d":
>   "p":
>   "q":
>   "dp":
>   "dq":
>   "qi":
>   "alg":"RS256",
>   "kid":
>   "use":
>   }
>   }
>   ]
>  }
> }
> 
> 
> 
> 
> -Original Message-
> From: Jan Høydahl  
> Sent: Thursday, June 25, 2020 1:19 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR CDCR fails with JWT authorization configuration
> 
> EXTERNAL SENDER:   Exercise caution with links and attachments.
> 
> Are both clusters setup with the same Identity Provider, so the same JWT 
> token would be valid for both clusters?
> 
> If so, it should be (theoretically) possible to have the clusters talk to 
> each other, if you can get them to forward the Authorization header with the 
> JWT.
> Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have to 
> make sure to forward JWT and not just rely on PKI.
> PKI won’t work since the two clusters have different ZK and Solr by default 
> only trust PKI between nodes registered in ZK.
> 
> You could try 'forwardCredentials:true' in security.json, but I’m not sure 
> that is enough here. There may be code changes needed in CDCR components.
> 
> Jan
> 
>> 24. jun. 2020 kl. 19:42 skrev Phatkar, Swapnil (Contractor) 
>> :
>> 
>> Hi Team ,
>> 
>> I am trying to configure CDCR for SOLR 8.4.1 .
>> With the provided configuration I can able to replicate the indexes from 
>> Source server to Target server. This setup even works with SSL configuration 
>> using Https protocol.
>> But the moment I have introduced JWT authorization by enforcing 
>> security.json on both the server. I got an error at Target server side as 
>> shown below.
>> Due to which the index were not getting replicated at target server.
>> 
>> ERROR :
>> 
>> 0200623 12:29:55.956 [ERROR] {qtp892083096-82} [   ] 
>> [org.apache.solr.security.PKIAuthenticationPlugin, 119] |
>> Could not decipher a header :8983_solr $$$. No principal 
>> set
>> 
>> Caused by: java.util.concurrent.ExecutionException: 
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>> Error from server at 
>> https://:8983/solr/express_shard1_replica_n3: Expected mime 
>> type application/octet-stream but got text/html.   > http-equiv="Content-Type" content="text/html;charset=utf-8"/>
>> Error 401 Require authentication  HTTP 
>> ERROR 401 Problem accessing 
>> /solr/express_shard1_replica_n3/cdcr. Reason:
>> Require authentication
>> 
>> 
>> 
>> 
>> Caused by: 
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
>> Error from server at
>> https://:8983/solr/express_shard1_replica_n3: Expected mime 
>> type a

RE: SOLR CDCR fails with JWT authorization configuration

2020-06-25 Thread Phatkar, Swapnil (Contractor)


Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have to 
make sure to forward JWT -- How do I forward JWT from source to target server ??
You could try 'forwardCredentials:true' in security.json -- How can I try  this 
?

Can you suggest me sample security.json which will address my issue mention in 
below mail trail:

I have security.json as given below : ( its just the format and values are 
removed as per policy )

{
  "authentication":{
"class":"solr.JWTAuthPlugin",
"blockUnknown":true,
"requireIss":false,
"requireExp":false,
"issuers":[
{
"name":
"clientId":
"jwk":{
"kty":"RSA",
"n":
"e":
"d":
"p":
"q":
"dp":
"dq":
"qi":
"alg":"RS256",
"kid":
"use":
}
}
]
  }
}




-Original Message-
From: Jan Høydahl  
Sent: Thursday, June 25, 2020 1:19 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR CDCR fails with JWT authorization configuration

EXTERNAL SENDER:   Exercise caution with links and attachments.

Are both clusters setup with the same Identity Provider, so the same JWT token 
would be valid for both clusters?

If so, it should be (theoretically) possible to have the clusters talk to each 
other, if you can get them to forward the Authorization header with the JWT.
Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have to 
make sure to forward JWT and not just rely on PKI.
PKI won’t work since the two clusters have different ZK and Solr by default 
only trust PKI between nodes registered in ZK.

You could try 'forwardCredentials:true' in security.json, but I’m not sure that 
is enough here. There may be code changes needed in CDCR components.

Jan

> 24. jun. 2020 kl. 19:42 skrev Phatkar, Swapnil (Contractor) 
> :
> 
> Hi Team ,
> 
> I am trying to configure CDCR for SOLR 8.4.1 .
> With the provided configuration I can able to replicate the indexes from 
> Source server to Target server. This setup even works with SSL configuration 
> using Https protocol.
> But the moment I have introduced JWT authorization by enforcing security.json 
> on both the server. I got an error at Target server side as shown below.
> Due to which the index were not getting replicated at target server.
> 
> ERROR :
> 
> 0200623 12:29:55.956 [ERROR] {qtp892083096-82} [   ] 
> [org.apache.solr.security.PKIAuthenticationPlugin, 119] |
> Could not decipher a header :8983_solr $$$. No principal 
> set
> 
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at 
> https://:8983/solr/express_shard1_replica_n3: Expected mime 
> type application/octet-stream but got text/html.http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> Error 401 Require authentication  HTTP 
> ERROR 401 Problem accessing 
> /solr/express_shard1_replica_n3/cdcr. Reason:
> Require authentication
> 
> 
> 
> 
> Caused by: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Error from server at
> https://:8983/solr/express_shard1_replica_n3: Expected mime 
> type application/octet-stream but got text/html.http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> Error 401 Require authentication  HTTP 
> ERROR 401 Problem accessing 
> /solr/express_shard1_replica_n3/cdcr. Reason:
> Require authentication
> 
> 
> 
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:629)
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
>at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
>at 
> org.apache.solr.handler.CdcrRequestHandler$SliceCheckpointCallable.call(CdcrRequestHandler.java:868)
>at 
> org.apache.solr.handler.CdcrRequestHandler$SliceCheckpointCallable.cal
> l(CdcrRequestHandler.java:845)
> 
> 
> Thanks and Regards,
> Swapnil Phatkar
> 9167320216
> 



Re: SOLR CDCR fails with JWT authorization configuration

2020-06-25 Thread Jan Høydahl
Are both clusters setup with the same Identity Provider, so the same JWT token 
would be valid for both clusters?

If so, it should be (theoretically) possible to have the clusters talk to each 
other, if you can get them to forward the Authorization header with the JWT.
Whoever is sending calls to /solr/express_shard1_replica_n3/cdcr will have to 
make sure to forward JWT and not just rely on PKI.
PKI won’t work since the two clusters have different ZK and Solr by default 
only trust PKI between nodes registered in ZK.

You could try 'forwardCredentials:true' in security.json, but I’m not sure that 
is enough here. There may be code changes needed in CDCR components.

Jan

> 24. jun. 2020 kl. 19:42 skrev Phatkar, Swapnil (Contractor) 
> :
> 
> Hi Team ,
> 
> I am trying to configure CDCR for SOLR 8.4.1 .
> With the provided configuration I can able to replicate the indexes from 
> Source server to Target server. This setup even works with SSL configuration 
> using Https protocol.
> But the moment I have introduced JWT authorization by enforcing security.json 
> on both the server. I got an error at Target server side as shown below.
> Due to which the index were not getting replicated at target server.
> 
> ERROR :
> 
> 0200623 12:29:55.956 [ERROR] {qtp892083096-82} [   ] 
> [org.apache.solr.security.PKIAuthenticationPlugin, 119] |
> Could not decipher a header :8983_solr
> $$$. No principal set
> 
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at https://:8983/solr/express_shard1_replica_n3: 
> Expected mime type application/octet-stream but got text/html. 
> 
> 
> Error 401 Require authentication
> 
> HTTP ERROR 401
> Problem accessing /solr/express_shard1_replica_n3/cdcr. Reason:
> Require authentication
> 
> 
> 
> 
> Caused by: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at
> https://:8983/solr/express_shard1_replica_n3: Expected mime type 
> application/octet-stream but got text/html. 
> 
> 
> Error 401 Require authentication
> 
> HTTP ERROR 401
> Problem accessing /solr/express_shard1_replica_n3/cdcr. Reason:
> Require authentication
> 
> 
> 
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:629)
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
>at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
>at 
> org.apache.solr.handler.CdcrRequestHandler$SliceCheckpointCallable.call(CdcrRequestHandler.java:868)
>at 
> org.apache.solr.handler.CdcrRequestHandler$SliceCheckpointCallable.call(CdcrRequestHandler.java:845)
> 
> 
> Thanks and Regards,
> Swapnil Phatkar
> 9167320216
> 



SOLR CDCR fails with JWT authorization configuration

2020-06-24 Thread Phatkar, Swapnil (Contractor)
Hi Team ,

I am trying to configure CDCR for SOLR 8.4.1 .
With the provided configuration I can able to replicate the indexes from Source 
server to Target server. This setup even works with SSL configuration using 
Https protocol.
But the moment I have introduced JWT authorization by enforcing security.json 
on both the server. I got an error at Target server side as shown below.
Due to which the index were not getting replicated at target server.

ERROR :

0200623 12:29:55.956 [ERROR] {qtp892083096-82} [   ] 
[org.apache.solr.security.PKIAuthenticationPlugin, 119] |
Could not decipher a header :8983_solr
$$$. No principal set

 Caused by: java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at https://:8983/solr/express_shard1_replica_n3: 
Expected mime type application/octet-stream but got text/html. 


Error 401 Require authentication

HTTP ERROR 401
Problem accessing /solr/express_shard1_replica_n3/cdcr. Reason:
Require authentication




Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at
https://:8983/solr/express_shard1_replica_n3: Expected mime type 
application/octet-stream but got text/html. 


Error 401 Require authentication

HTTP ERROR 401
Problem accessing /solr/express_shard1_replica_n3/cdcr. Reason:
Require authentication



at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:629)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
at 
org.apache.solr.handler.CdcrRequestHandler$SliceCheckpointCallable.call(CdcrRequestHandler.java:868)
at 
org.apache.solr.handler.CdcrRequestHandler$SliceCheckpointCallable.call(CdcrRequestHandler.java:845)


Thanks and Regards,
Swapnil Phatkar
9167320216



Re: CDCR stress-test issues

2020-06-24 Thread matthew sporleder
On Wed, Jun 24, 2020 at 9:46 AM Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
>
> In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
> couple of issues.
>
> One is that the tlog files keep accumulating for some nodes in the CDCR 
> system, particularly for the non-Leader nodes in the Source SolrCloud. No 
> quantity of hard commits seem to cause any of these tlog files to be 
> released. This can become a problem upon reboot if there are hundreds of 
> thousands of tlog files, and Solr fails to start (complaining that there are 
> too many open files).
>
> The tlogs had been accumulating on all the nodes of the CDCR set of 
> SolrClouds until I added these two lines to the solrconfig.xml file (for 
> testing purposes, using numbers much lower than in the examples):
> 5
> 2
> Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
> accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
> clean up the tlog files, as does the Leader of the Source SolrCloud). If I 
> use ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, 
> and if I then start adding more data, the tlogs on the new Leader sometimes 
> will go away, but then the old Leader begins accumulating tlog files. I am 
> dubious whether frequent reassignment of Leadership would be a practical 
> solution.
>
> I also have several times attempted to simulate a production environment by 
> running several loops simultaneously, each of which inserts multiple records 
> on each iteration of the loop. Several times, I end up with a dozen records 
> on (both replicas of) the Source which never make it to (either replica of) 
> the Target. The Target has thousands of records which were inserted before 
> the missing records, and thousands of records which were inserted after the 
> missing records (and all these records, the replicated and the missing, were 
> inserted by curl commands which only differed in sequential numbers 
> incorporated into the values being inserted).
>
> I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
> the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
> 7.5, 7.6": does that indicate that Solr 7.4 is not affected?
>
> Are  there any suggestions?
>
> Thanks

Just going to "me too" where i've had (non cdcr) installs accumulate
tlogs until eventual rebuilds or crashes.


CDCR stress-test issues

2020-06-24 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
In attempting to stress-test CDCR (running Solr 7.4), I am running into a 
couple of issues.

One is that the tlog files keep accumulating for some nodes in the CDCR system, 
particularly for the non-Leader nodes in the Source SolrCloud. No quantity of 
hard commits seem to cause any of these tlog files to be released. This can 
become a problem upon reboot if there are hundreds of thousands of tlog files, 
and Solr fails to start (complaining that there are too many open files).

The tlogs had been accumulating on all the nodes of the CDCR set of SolrClouds 
until I added these two lines to the solrconfig.xml file (for testing purposes, 
using numbers much lower than in the examples):
5
2
Since then, it is mostly the non-Leader nodes of the Source SolrCloud which 
accumulates tlog files (the Target SolrCloud does seem to have a tendency to 
clean up the tlog files, as does the Leader of the Source SolrCloud). If I use 
ADDREPLICAPROP and REBALANCELEADERS to change which node is the Leader, and if 
I then start adding more data, the tlogs on the new Leader sometimes will go 
away, but then the old Leader begins accumulating tlog files. I am dubious 
whether frequent reassignment of Leadership would be a practical solution.

I also have several times attempted to simulate a production environment by 
running several loops simultaneously, each of which inserts multiple records on 
each iteration of the loop. Several times, I end up with a dozen records on 
(both replicas of) the Source which never make it to (either replica of) the 
Target. The Target has thousands of records which were inserted before the 
missing records, and thousands of records which were inserted after the missing 
records (and all these records, the replicated and the missing, were inserted 
by curl commands which only differed in sequential numbers incorporated into 
the values being inserted).

I also have a question regarding SOLR-13141: the 11/Feb/19 comment says that 
the fix for Solr 7.3 had a problem; and the header says "Affects Version/s: 
7.5, 7.6": does that indicate that Solr 7.4 is not affected?

Are  there any suggestions?

Thanks


RE: CDCR behaviour

2020-06-08 Thread Gell-Holleron, Daniel
HI Jason, 

Thanks for this. Without screenshots this is what I get:
Site A
Last Modified:less than a minute ago
Num Docs:5455
Max Doc:5524
Heap Memory Usage:-1
Deleted Docs:69
Version:699
Segment Count:3
Current: Y

Site B
Last Modified:3 days ago
Num Docs:5454
Max Doc:5523
Heap Memory Usage:-1
Deleted Docs:69
Version:640
Segment Count:3
Current: N

I noticed that if I run the command 
http://hostname:8983/solr/SiteB-Collection/update/?commit=true the index would 
then be current. 

I've messed around with auto commit settings in the solrconfig.xml file but had 
no success.

Any help would be greatly appreciated. 

Thanks, 

Daniel 

-Original Message-
From: Jason Gerlowski  
Sent: 05 June 2020 12:18
To: solr-user@lucene.apache.org
Subject: Re: CDCR behaviour

Hi Daniel,

Just a heads up that attachments and images are stripped pretty aggressively by 
the mailing list - none of your images made it through.
You might more success linking to the images in Dropbox or some other online 
storage medium.

Best,

Jason

On Thu, Jun 4, 2020 at 10:55 AM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hi,
>
>
>
> Looks for some advice, sent a few questions on CDCR the last couple of 
> days.
>
>
>
> I just want to see if this is expected behavior from Solr or not?
>
>
>
> When a document is added to Site A, it is then supposed to replicate 
> across, however in the statistics page I see the following:
>
>
>
> Site A
>
>
>
>
> Site B
>
>
>
>
>
> When I perform a search on Site B through the Solr admin page, I do 
> get results (which I find strange). The only way for the numb docs 
> parameter to be matching is restart Solr, I then get the below:
>
>
>
>
>
> I just want to know whether this behavior is expected or is a bug? My 
> expectation is that the data will always be current between the two sites.
>
>
>
> Thanks,
>
> Daniel
>
>
>


Re: CDCR behaviour

2020-06-05 Thread Jason Gerlowski
Hi Daniel,

Just a heads up that attachments and images are stripped pretty
aggressively by the mailing list - none of your images made it through.
You might more success linking to the images in Dropbox or some other
online storage medium.

Best,

Jason

On Thu, Jun 4, 2020 at 10:55 AM Gell-Holleron, Daniel <
daniel.gell-holle...@gb.unisys.com> wrote:

> Hi,
>
>
>
> Looks for some advice, sent a few questions on CDCR the last couple of
> days.
>
>
>
> I just want to see if this is expected behavior from Solr or not?
>
>
>
> When a document is added to Site A, it is then supposed to replicate
> across, however in the statistics page I see the following:
>
>
>
> Site A
>
>
>
>
> Site B
>
>
>
>
>
> When I perform a search on Site B through the Solr admin page, I do get
> results (which I find strange). The only way for the numb docs parameter to
> be matching is restart Solr, I then get the below:
>
>
>
>
>
> I just want to know whether this behavior is expected or is a bug? My
> expectation is that the data will always be current between the two sites.
>
>
>
> Thanks,
>
> Daniel
>
>
>


CDCR behaviour

2020-06-04 Thread Gell-Holleron, Daniel
Hi,

Looks for some advice, sent a few questions on CDCR the last couple of days.

I just want to see if this is expected behavior from Solr or not?

When a document is added to Site A, it is then supposed to replicate across, 
however in the statistics page I see the following:

Site A

[cid:image001.png@01D63A88.8AFB3310]

Site B

[cid:image002.png@01D63A88.8AFB3310]

When I perform a search on Site B through the Solr admin page, I do get results 
(which I find strange). The only way for the numb docs parameter to be matching 
is restart Solr, I then get the below:

[cid:image003.png@01D63A88.8AFB3310]

I just want to know whether this behavior is expected or is a bug? My 
expectation is that the data will always be current between the two sites.

Thanks,

Daniel



Bi-Directional CDCR

2020-06-03 Thread Gell-Holleron, Daniel
Hi there,

I need some advice on how Bi-Directional CDCR is properly configured.

I've created a collection on Site A (3 Solr nodes, 5 ZooKeepers). I've also 
created a collection on site B (3 Solr nodes, 5 ZooKeepers). These both have 
the same number of shards (not sure if that is a factor or not?)

I've configured the solrconfig.xml file as below on SiteA. I've then done the 
same on SiteB, where zkHosts are siteA's and the source and target have 
switched around. Once these were done I then ran the config update to ZooKeeper 
on both sites.


${solr.ulog.dir:}
${solr.ulog.numVersionBuckets:65536}
  


  
  
cdcr-processor-chain
  



  
  



  
siteA-zook01:2181,siteA-zook02:2181,siteA-zook03:2181,siteA-zook04:2181,siteA-zook05:2181

siteACollection
siteBCollection
  

  
8
1000
128
  

  
1000
  

After is I then did the following:


  *   Start ZooKeeper on Site A
  *   Start ZooKeeper on Site B
  *   Start SolrCloud on Site A
  *   Start SolrCloud on Site B
  *   I then activated the CDCR on Site A and Site B using the CDCR API
  *   I then disabled the buffer on Site A and Site B using the Disable Buffer 
API

When started up, all documents on Site A appeared to syncronise across to Site 
B and their corresponding shards. However when I create a new document, it will 
send across to Site A but won't to Site B. Site B will however recognize that 
the number of documents aren't current.

Not sure if I have missed something along the way here? I'm using Solr version 
7.7.1 on a Windows Server 2016 OS.

Thanks,

Daniel



Solr TLS for CDCR

2020-05-14 Thread ChienHuaWang
Does anyone have experience to setup TLS for Solr CDCR?

I read the documentation:
https://lucene.apache.org/solr/guide/7_6/enabling-ssl.html
Would this apply to CDCR once enable? or we need additional configuration
for CDCR?

Appreciate any feedback




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Tlogs are not purged when CDCR is enabled

2020-01-08 Thread Louis
Another finding is, no matter how I tried to disable buffer with the
following setup on target node, it is always enabled first time.


  
  
disabled
  


Once I call CDCR API to disable buffer, it turns to be disabled. I wonder if
https://issues.apache.org/jira/browse/SOLR-11652 is related to this issue..

How can I make the default state of buffer disabled if this setup doesn't
work?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


tlogs are not purged when CDCR is enabled

2020-01-08 Thread Louis
Using Solr 7.7.3-snapshot, 1 shard + 3 replicas on source and target cluster

When unidirectional CDCR enabled and buffer disabled, my understanding is,
when data is successfully forwarded to target and committed, tlogs on both
source and target should be purged.

However, the source node doesn't purge tlogs no matter how I tried(manually
committed as well) while tlogs on target are purged. (if I turn off CDCR and
import data, tlogs is nicely cleaned)
 
So I tested with some queries.. and there are no errors. queue size is 0,
and the last processed version is not -1 either.

I also double-checked CDCR buffer disabled on both source and target, and
CDCR(unidirectioanl) data replication is working fine(except that fact that
tlogs keep growing).

What am I missing and what else should I check next?

$ curl -k
https://localhost:8983/solr/tbh_manuals_uni_shard1_replica_n2/cdcr?action=QUEUES
{
  "responseHeader":{
"status":0,
"QTime":0},
  "queues":[
"host1:8981,host2:8981,host3:8981/solr",[
  "tbh_manuals_uni",[
"queueSize",0,
"lastTimestamp","2020-01-08T23:16:26.899Z"]]],
  "tlogTotalSize":503573,
  "tlogTotalCount":278,
  "updateLogSynchronizer":"stopped"}

$ curl -k
https://localhost:8983/solr/tbh_manuals_uni_shard1_replica_n2/cdcr?action=ERRORS
{
  "responseHeader":{
"status":0,
"QTime":1},
  "errors":[
"host1:8981,host2:8981,host3:8981/solr",[
  "tbh_manuals_uni",[
"consecutiveErrors",0,
"bad_request",0,
"internal",0,
"last",[}

$ curl -k
https://localhost:8983/solr/tbh_manuals_uni_shard1_replica_n2/cdcr?action=LASTPROCESSEDVERSION
{
  "responseHeader":{
"status":0,
"QTime":0},
  "lastProcessedVersion":1655203836093005824}



I actually see some errors on zookeeper.out file only in target's leader
node as follows. However honestly, I don't know what they mean..



2020-01-08 15:11:42,740 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590008 type:create cxid:0xd2
zxid:0x300b4 txntype:-1 reqpath:n/a Error Path:/solr/collections
Error:KeeperErrorCode = NodeExists for /solr/collections
2020-01-08 15:11:42,742 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590008 type:create cxid:0xd3
zxid:0x300b5 txntype:-1 reqpath:n/a Error
Path:/solr/collections/tbh_manuals_uni Error:KeeperErrorCode = NodeExists
for /solr/collections/tbh_manuals_uni
2020-01-08 15:11:42,744 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590008 type:create cxid:0xd4
zxid:0x300b6 txntype:-1 reqpath:n/a Error
Path:/solr/collections/tbh_manuals_uni/terms Error:KeeperErrorCode =
NodeExists for /solr/collections/tbh_manuals_uni/terms
2020-01-08 15:11:42,745 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590008 type:create cxid:0xd5
zxid:0x300b7 txntype:-1 reqpath:n/a Error
Path:/solr/collections/tbh_manuals_uni/terms/shard1 Error:KeeperErrorCode =
NodeExists for /solr/collections/tbh_manuals_uni/terms/shard1
2020-01-08 15:11:42,821 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590005 type:create cxid:0x23c
zxid:0x300ba txntype:-1 reqpath:n/a Error Path:/solr/collections
Error:KeeperErrorCode = NodeExists for /solr/collections
2020-01-08 15:11:42,823 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590005 type:create cxid:0x23d
zxid:0x300bb txntype:-1 reqpath:n/a Error
Path:/solr/collections/tbh_manuals_uni Error:KeeperErrorCode = NodeExists
for /solr/collections/tbh_manuals_uni
2020-01-08 15:11:42,825 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590005 type:create cxid:0x23e
zxid:0x300bc txntype:-1 reqpath:n/a Error
Path:/solr/collections/tbh_manuals_uni/terms Error:KeeperErrorCode =
NodeExists for /solr/collections/tbh_manuals_uni/terms
2020-01-08 15:11:42,827 [myid:2] - INFO  [ProcessThread(sid:2
cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when
processing sessionid:0x301d2ecaf590005 type:create cxid:0x23f
zxid:0x300bd txntype:-1 reqpath:n/a Error
Path:/solr/collections/tbh_manuals_uni/terms/shard1 Error:KeeperErrorCode =
NodeExists for /solr/collections/tbh_

source cluster sends incorrect recovery request to target cluster when CDCR is enabled

2020-01-07 Thread alwaysbluesky
Hi,

Running Solr 7.7.2, cluster with 3 replicas

When CDCR is enabled, one of the target nodes gets an incorrect recovery
request.

Below is the content of the state.json file from the zookeeper.

"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node3":{
"core":"tbh_manuals_test_bi2_shard1_replica_n1",
"base_url":"https://host1:8983/solr;,
"node_name":"host1:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false"},
  "core_node5":{
"core":"tbh_manuals_test_bi2_shard1_replica_n2",
"base_url":"https://host2:8983/solr;,
"node_name":"host2:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"},
  "core_node6":{
"core":"tbh_manuals_test_bi2_shard1_replica_n4",
"base_url":"https://host3:8983/solr;,
"node_name":"host3:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false"}}

As we see, host1 doesn't have tbh_manuals_test_bi2_shard1_replica_n4.
However, host1 is receiving the request that
tbh_manuals_test_bi2_shard1_replica_n4 will be recovered, which cause
"unable to locate core" error.

Below is the entire error message of host1 on target cluster

2020-01-08 03:05:52.355 INFO  (zkCallback-7-thread-14) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/tbh_manuals_test_bi2/state.json] for collection
[tbh_manuals_test_bi2] has occurred - updating... (live nodes size: [3])
2020-01-08 03:05:52.355 INFO  (zkCallback-7-thread-15) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/tbh_manuals_test_bi2/state.json] for collection
[tbh_manuals_test_bi2] has occurred - updating... (live nodes size: [3])
2020-01-08 03:05:52.378 INFO  (qtp1155769010-87) [  
x:tbh_manuals_test_bi2_shard1_replica_n4] o.a.s.h.a.CoreAdminOperation It
has been requested that we recover:
core=tbh_manuals_test_bi2_shard1_replica_n4
2020-01-08 03:05:52.379 ERROR (qtp1155769010-87) [  
x:tbh_manuals_test_bi2_shard1_replica_n4] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Unable to locate core
tbh_manuals_test_bi2_shard1_replica_n4
at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:167)
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:396)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
 

solr unable to locate core on CDCR

2020-01-07 Thread alwaysbluesky
Whenever I create a new collection on solr 7.7.2 cluster CDCR(uni
directional),

and once I disable buffer on both source and target nodes and start CDCR
process on the source node, then I encounter the error message "solr unable
to locate core..." on one of target node.

On source node, CDCR bootstrap failed because of the target node's failure
to locate core.

The exact moment thatthe error message occurs is when I trigger CDCR process
on the source node.

Is there any bug on solr 7.7.2 CDCR?


Following is the steps to reproduce.

1) create collection on both source and target nodes

2) disable buffer on the source and target

3) enable CDCR on source nodes

then target node can't locate the core, and the source node failed to start
CDCR bootstrap.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Three questions about huge tlog problem and CDCR

2019-12-20 Thread alwaysbluesky
sure.

I disabled buffer and started cdcr by calling api on both side.

And when I do indexing, I see the size of tlog folder stays within 1MB while
the size of index folder is increasing. 

So I imagined that tlog would be consumed by target node and cleared, and
data is being forwarded to target node.. but actually when I checked target
node, index in target nodes is still empty and data was loaded only in
source node.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Three questions about huge tlog problem and CDCR

2019-12-19 Thread damienk
Did you run /cdcr?action=DISABLEBUFFER on both sides?


On Fri, 20 Dec 2019 at 05:22, alwaysbluesky 
wrote:

> Thank you for the advice.
>
> By the way, when I upload a new collectin configuration to zookeepr and
> enable bidirectional CDCR for the collections on both prod and dr
> side(/cdcr?action=START), and reload the collections, CDCR
> usually didn't work. So if I restarted entire nodes in the cluster on both
> prod and dr, CDCR started working.
>
> Should I normally restart Solr after enabling/disabling the CDCR? Reloading
> the collections without Solr restart is not enough to apply the CDCR
> change?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Three questions about huge tlog problem and CDCR

2019-12-19 Thread alwaysbluesky
Thank you for the advice.

By the way, when I upload a new collectin configuration to zookeepr and
enable bidirectional CDCR for the collections on both prod and dr
side(/cdcr?action=START), and reload the collections, CDCR
usually didn't work. So if I restarted entire nodes in the cluster on both
prod and dr, CDCR started working.

Should I normally restart Solr after enabling/disabling the CDCR? Reloading
the collections without Solr restart is not enough to apply the CDCR change?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Three questions about huge tlog problem and CDCR

2019-12-19 Thread Erick Erickson
This usually indicates that the connection between DCs is broken and one or the 
other is falling behind.

Note: “bidirectional” does _not_ mean that you can index to both DCs 
simultaneously, rather than you can switch from indexing in one DC to the 
other….

Best,
Erick

> On Dec 19, 2019, at 1:01 AM, alwaysbluesky  wrote:
> 
> found a typo. correcting "updateLogSynchronizer" is set to 6(1 min), not
> 1 hour
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Three questions about huge tlog problem and CDCR

2019-12-18 Thread alwaysbluesky
found a typo. correcting "updateLogSynchronizer" is set to 6(1 min), not
1 hour



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Three questions about huge tlog problem and CDCR

2019-12-18 Thread Louis
* Environment: Solr Cloud 7.7.0, 3 nodes / CDCR bidirectional / CDCR buffer
disabled

Hello All,

I have some problem with tlog. They are getting bigger and bigger...

They don't seem to be deleted at all even after hard commit, so now the
total size of tlog files is more than 21GB..

Actually I see multiple tlog folders like,

 2.5GB tlog/
 6.7GB tlog.20190815170021077/ 
 6.7GB tlog.20190316225613751/
 ...

Are they all necessary for recovery? what is the tlog.2019 folders?


Based on my understanding, tlog files are for recovery when graceful
shutdown failed.. 

1) As long as I stop entire nodes gracefully, is it safe to delete tlog
files manually by using rm -rf ./tlogs?

2) I think that the reason why tlog files are not deleted is because of CDCR
not working properly.. So tlogs just stay forever until being synchronized..
And synchronization never happened and tlogs keep increasing.. Does my
theory make sense? 

3) Actually, we set up our replicator element's schdule to 1 hour and
updatelogsynchronizer element to 1 hour as well. Could this be the reason
for why CDCR is not working because of the interval is too long?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: CDCR cpu usage 100% with some errors

2019-10-28 Thread Louis
I just saw this article.
https://issues.apache.org/jira/browse/SOLR-13349

Can my issue be related to this?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


CDCR cpu usage 100% with some errors

2019-10-28 Thread Louis
* Solr Version 7.7. Using Cloud with CDCR
* 3 replicas 1 shard on production and disaster recovery

Hi,

Last week, I posted a question about tlogs -
https://lucene.472066.n3.nabble.com/tlogs-are-not-deleted-td4451323.html#a4451430

I disabled buffer based on the advice, but still, tlogs in "production" are
not being deleted. (tlogs in "disaster recovery" nodes are cleaned.) 

And there is another issue, which I suspect it to be related to the problem
that I previously posted. 

I am having tons of logs from our "disaster recovery" nodes. The log files
are building up at an incredibly fast rate with the messages below forever
and cpu usage is always 100% every day("production" nodes' cpu usage is
normal).

It looks like replicating from production server to disaster recovery, but
it actually never ends.

Is this high cpu usage on disaster recovery nodes be normal? 
And is tlogs, which is not being cleaned properly, on production nodes
related to high cpu usage on dr nodes?


*these are the sample messages from tons of logs in disaster recovert nodes
*

2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection1_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection2_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection3_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
2019-10-28 18:18:11.729 INFO  (cdcr-replicator-378-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection1
2019-10-28 18:18:11.730 INFO  (cdcr-replicator-282-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection2
2019-10-28 18:18:11.730 INFO  (cdcr-replicator-332-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection3
...


*And in the middle of logs, I see the following exception for some of the
collections.*


2019-10-28 18:18:11.732 WARN  (cdcr-replicator-404-thread-1) [   ]
o.a.s.h.CdcrReplicator Failed to forward update request to target:
collection_steps
java.lang.ClassCastException: java.lang.Long cannot be cast to
java.util.List
at
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.getVersion(CdcrUpdateLog.java:732)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.next(CdcrUpdateLog.java:635)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:77)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:50]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_181]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


CDCR cpu usage 100% with some errors

2019-10-28 Thread Louis
* Solr Version 7.7. Using Cloud with CDCR
* 3 replicas 1 shard on production and disaster recovery

Hi,

Last week, I posted a question about tlogs -
https://lucene.472066.n3.nabble.com/tlogs-are-not-deleted-td4451323.html#a4451430

I disabled buffer based on the advice, but still, tlogs in "production" are
not being deleted. (tlogs in "disaster recovery" nodes are cleaned.) 

And there is another issue, which I suspect it to be related to the problem
that I previously posted. 

I am having tons of logs from our "disaster recovery" nodes. The log files
are building up at an incredibly fast rate with the messages below forever
and cpu usage is always 100% every day("production" nodes' cpu usage is
normal).

It looks like replicating from production server to disaster recovery, but
it actually never ends.

Is this high cpu usage on disaster recovery nodes be normal? 
And is tlogs, which is not being cleaned properly, on production nodes
related to high cpu usage on dr nodes?


* *

2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection1_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection2_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection3_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION=javabin=2} status=0 QTime=0
2019-10-28 18:18:11.729 INFO  (cdcr-replicator-378-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection1
2019-10-28 18:18:11.730 INFO  (cdcr-replicator-282-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection2
2019-10-28 18:18:11.730 INFO  (cdcr-replicator-332-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection3
...


*And in the middle of logs, I see the following exception for some of the
collections.*


2019-10-28 18:18:11.732 WARN  (cdcr-replicator-404-thread-1) [   ]
o.a.s.h.CdcrReplicator Failed to forward update request to target:
collection_steps
java.lang.ClassCastException: java.lang.Long cannot be cast to
java.util.List
at
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.getVersion(CdcrUpdateLog.java:732)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.next(CdcrUpdateLog.java:635)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:77)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:50]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_181]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju
Thanks Shawn!
Can any of the committers comment about the CDCR error that I posted above?

Thanks
Jay



On Fri, Oct 25, 2019 at 2:56 PM Shawn Heisey  wrote:

> On 10/25/2019 3:22 PM, Jay Potharaju wrote:
> > Is there a solr slack channel?
>
> People with @apache.org email addresses can readily join the ASF
> workspace, I do not know whether it is possible for others.  That
> workspace might be only for ASF members.
>
> https://the-asf.slack.com
>
> In that workspace, there is a lucene-solr channel and a solr-dev channel.
>
> Thanks,
> Shawn
>


Re: cdcr replicator NPE errors

2019-10-25 Thread Shawn Heisey

On 10/25/2019 3:22 PM, Jay Potharaju wrote:

Is there a solr slack channel?


People with @apache.org email addresses can readily join the ASF 
workspace, I do not know whether it is possible for others.  That 
workspace might be only for ASF members.


https://the-asf.slack.com

In that workspace, there is a lucene-solr channel and a solr-dev channel.

Thanks,
Shawn


Re: cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju
Is there a solr slack channel?
Thanks
Jay Potharaju



On Fri, Oct 25, 2019 at 9:00 AM Jay Potharaju  wrote:

> Hi,
> I am frequently seeing cdcr-replicator null pointer exception errors in
> the logs.
> Any suggestions on how to address this?
> *Solr version: 7.7.2*
>
> ExecutorUtil
> Uncaught exception java.lang.NullPointerException thrown by thread:
> cdcr-replicator-773-thread-3
> java.lang.Exception: Submitter stack trace
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:184)
> at
> org.apache.solr.handler.CdcrReplicatorScheduler.lambda$start$1(CdcrReplicatorScheduler.java:76)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Thanks
> Jay
>
>


cdcr replicator NPE errors

2019-10-25 Thread Jay Potharaju
Hi,
I am frequently seeing cdcr-replicator null pointer exception errors in the
logs.
Any suggestions on how to address this?
*Solr version: 7.7.2*

ExecutorUtil
Uncaught exception java.lang.NullPointerException thrown by thread:
cdcr-replicator-773-thread-3
java.lang.Exception: Submitter stack trace
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:184)
at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$start$1(CdcrReplicatorScheduler.java:76)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Thanks
Jay


RE: CDCR tlog corruption leads to infinite loop

2019-09-11 Thread Webster Homer
We also see an accumulation of tlog files on the target solrs. One of our 
production clouds crashed due to too many open files
2019-09-11 15:59:39.570 ERROR (qtp1355531311-81540) 
[c:bioreliance-catalog-testarticle-20190713 s:shard2 r:core_node8 
x:bioreliance-catalog-testarticle-20190713_shard2_replica_n6] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
java.io.FileNotFoundException: 
/var/solr/data/bioreliance-catalog-testarticle-20190713_shard2_replica_n6/data/tlog/tlog.0005307.1642472809370222592
 (Too many open files)

We found 9106 open files. 

This is our update request handler


 


  ${solr.ulog.dir:}



  
   ${solr.autoCommit.maxTime:6} 
   false 
 

  
   ${solr.autoSoftCommit.maxTime:3000} 
 

  

solr.autoSoftCommit.maxTime is set to 3000
solr.autoCommit.maxTime is set to 6

-Original Message-
From: Webster Homer  
Sent: Monday, September 09, 2019 4:17 PM
To: solr-user@lucene.apache.org
Subject: CDCR tlog corruption leads to infinite loop

We are running Solr 7.2.0

Our configuration has several collections that are loaded into a solr cloud 
which is set to replicate using CDCR to 3 different solrclouds. All of our 
target collections have 2 shards with two replicas per shard. Our source 
collection has 2 shards, and 1 replica per shard.

Frequently we start to see errors where the target collections are out of date, 
and the cdcr action=errors endpoint shows large numbers of errors For example:
{"responseHeader": {
"status": 0,
"QTime": 0},
"errors": [
"uc1f-ecom-mzk01:2181,uc1f-ecom-mzk02:2181,uc1f-ecom-mzk03:2181/solr",
["sial-catalog-product-20190824",
[
"consecutiveErrors",
700357,
"bad_request",
0,
"internal",
700357,
"last",
[
"2019-09-09T19:17:57.453Z",
"internal",
"2019-09-09T19:17:56.949Z",
"internal",
"2019-09-09T19:17:56.448Z"
,"internal",...

We have found that one or more tlogs have become corrupt. It appears that the 
CDCR keeps trying to send data, but cannot read the data from the tlog and then 
it retrys, forever.
How does this happen?  It seems to be very frequent, on a weekly basis and 
difficult to trouble shoot Today we had it happen with one of our collections. 
Here is the listing for the tlog files:

$ ls -alht
total 604M
drwxr-xr-x 2 apache apache  44K Sep  9 14:27 .
-rw-r--r-- 1 apache apache 6.7M Sep  6 19:44 
tlog.766.1643975309914013696
-rw-r--r-- 1 apache apache  35M Sep  6 19:43 
tlog.765.1643975245907886080
-rw-r--r-- 1 apache apache  30M Sep  6 19:42 
tlog.764.1643975182924120064
-rw-r--r-- 1 apache apache  37M Sep  6 19:41 
tlog.763.1643975118316109824
-rw-r--r-- 1 apache apache  19M Sep  6 19:40 
tlog.762.1643975053918863360
-rw-r--r-- 1 apache apache  21M Sep  6 19:39 
tlog.761.1643974989726089216
-rw-r--r-- 1 apache apache  21M Sep  6 19:38 
tlog.760.1643974926010417152
-rw-r--r-- 1 apache apache  29M Sep  6 19:37 
tlog.759.1643974862567374848
-rw-r--r-- 1 apache apache 6.2M Sep  6 19:10 
tlog.758.1643973174027616256
-rw-r--r-- 1 apache apache 228K Sep  5 19:48 
tlog.757.1643885009483857920
-rw-r--r-- 1 apache apache  27M Sep  5 19:48 
tlog.756.1643884946565103616
-rw-r--r-- 1 apache apache  35M Sep  5 19:47 
tlog.755.1643884877912735744
-rw-r--r-- 1 apache apache  30M Sep  5 19:46 
tlog.754.1643884812724862976
-rw-r--r-- 1 apache apache  25M Sep  5 19:45 
tlog.753.1643884748976685056
-rw-r--r-- 1 apache apache  18M Sep  5 19:44 
tlog.752.1643884685794738176
-rw-r--r-- 1 apache apache  21M Sep  5 19:43 
tlog.751.1643884621330382848
-rw-r--r-- 1 apache apache  16M Sep  5 19:42 
tlog.750.1643884558054064128
-rw-r--r-- 1 apache apache  26M Sep  5 19:41 
tlog.749.1643884494725316608
-rw-r--r-- 1 apache apache 5.8M Sep  5 19:12 
tlog.748.1643882681969147904
-rw-r--r-- 1 apache apache  31M Sep  4 19:56 
tlog.747.1643794877229563904
-rw-r--r-- 1 apache apache  31M Sep  4 19:55 
tlog.746.1643794813706829824
-rw-r--r-- 1 apache apache  30M Sep  4 19:54 
tlog.745.1643794749615767552
-rw-r--r-- 1 apache apache  22M Sep  4 19:53 
tlog.744.1643794686253465600
-rw-r--r-- 1 apache apache  18M Sep  4 19:52 
tlog.743.1643794622319689728
-rw-r--r-- 1 apache apache  21M Sep  4 19:51 
tlog.742.1643794558055612416
-rw-r--r-- 1 apache apache  15M Sep  4 19:50 
tlog.741.1643794493330161664
-rw-r--r-- 1 apache apache  26M Sep  4 19:49 
tlog.740.1643794428790308864
-rw-r--r-- 1 apache apac

CDCR tlog corruption leads to infinite loop

2019-09-09 Thread Webster Homer
We are running Solr 7.2.0

Our configuration has several collections that are loaded into a solr cloud 
which is set to replicate using CDCR to 3 different solrclouds. All of our 
target collections have 2 shards with two replicas per shard. Our source 
collection has 2 shards, and 1 replica per shard.

Frequently we start to see errors where the target collections are out of date, 
and the cdcr action=errors endpoint shows large numbers of errors
For example:
{"responseHeader": {
"status": 0,
"QTime": 0},
"errors": [
"uc1f-ecom-mzk01:2181,uc1f-ecom-mzk02:2181,uc1f-ecom-mzk03:2181/solr",
["sial-catalog-product-20190824",
[
"consecutiveErrors",
700357,
"bad_request",
0,
"internal",
700357,
"last",
[
"2019-09-09T19:17:57.453Z",
"internal",
"2019-09-09T19:17:56.949Z",
"internal",
"2019-09-09T19:17:56.448Z"
,"internal",...

We have found that one or more tlogs have become corrupt. It appears that the 
CDCR keeps trying to send data, but cannot read the data from the tlog and then 
it retrys, forever.
How does this happen?  It seems to be very frequent, on a weekly basis and 
difficult to trouble shoot
Today we had it happen with one of our collections. Here is the listing for the 
tlog files:

$ ls -alht
total 604M
drwxr-xr-x 2 apache apache  44K Sep  9 14:27 .
-rw-r--r-- 1 apache apache 6.7M Sep  6 19:44 
tlog.766.1643975309914013696
-rw-r--r-- 1 apache apache  35M Sep  6 19:43 
tlog.765.1643975245907886080
-rw-r--r-- 1 apache apache  30M Sep  6 19:42 
tlog.764.1643975182924120064
-rw-r--r-- 1 apache apache  37M Sep  6 19:41 
tlog.763.1643975118316109824
-rw-r--r-- 1 apache apache  19M Sep  6 19:40 
tlog.762.1643975053918863360
-rw-r--r-- 1 apache apache  21M Sep  6 19:39 
tlog.761.1643974989726089216
-rw-r--r-- 1 apache apache  21M Sep  6 19:38 
tlog.760.1643974926010417152
-rw-r--r-- 1 apache apache  29M Sep  6 19:37 
tlog.759.1643974862567374848
-rw-r--r-- 1 apache apache 6.2M Sep  6 19:10 
tlog.758.1643973174027616256
-rw-r--r-- 1 apache apache 228K Sep  5 19:48 
tlog.757.1643885009483857920
-rw-r--r-- 1 apache apache  27M Sep  5 19:48 
tlog.756.1643884946565103616
-rw-r--r-- 1 apache apache  35M Sep  5 19:47 
tlog.755.1643884877912735744
-rw-r--r-- 1 apache apache  30M Sep  5 19:46 
tlog.754.1643884812724862976
-rw-r--r-- 1 apache apache  25M Sep  5 19:45 
tlog.753.1643884748976685056
-rw-r--r-- 1 apache apache  18M Sep  5 19:44 
tlog.752.1643884685794738176
-rw-r--r-- 1 apache apache  21M Sep  5 19:43 
tlog.751.1643884621330382848
-rw-r--r-- 1 apache apache  16M Sep  5 19:42 
tlog.750.1643884558054064128
-rw-r--r-- 1 apache apache  26M Sep  5 19:41 
tlog.749.1643884494725316608
-rw-r--r-- 1 apache apache 5.8M Sep  5 19:12 
tlog.748.1643882681969147904
-rw-r--r-- 1 apache apache  31M Sep  4 19:56 
tlog.747.1643794877229563904
-rw-r--r-- 1 apache apache  31M Sep  4 19:55 
tlog.746.1643794813706829824
-rw-r--r-- 1 apache apache  30M Sep  4 19:54 
tlog.745.1643794749615767552
-rw-r--r-- 1 apache apache  22M Sep  4 19:53 
tlog.744.1643794686253465600
-rw-r--r-- 1 apache apache  18M Sep  4 19:52 
tlog.743.1643794622319689728
-rw-r--r-- 1 apache apache  21M Sep  4 19:51 
tlog.742.1643794558055612416
-rw-r--r-- 1 apache apache  15M Sep  4 19:50 
tlog.741.1643794493330161664
-rw-r--r-- 1 apache apache  26M Sep  4 19:49 
tlog.740.1643794428790308864
-rw-r--r-- 1 apache apache  11M Sep  4 14:58 
tlog.737.1643701398824550400
drwxr-xr-x 5 apache apache   53 Aug 21 06:30 ..
[apache@dfw-pauth-msc01 tlog]$ ls -alht 
tlog.757.1643885009483857920
-rw-r--r-- 1 apache apache 228K Sep  5 19:48 
tlog.757.1643885009483857920
$ date
Mon Sep  9 14:27:31 CDT 2019
$ pwd
/var/solr/data/sial-catalog-product-20190824_shard1_replica_n1/data/tlog

CDCR started replicating after we deleted the oldest tlog file and restarted 
CDCR
tlog.737.1643701398824550400

About the same time I found a number of errors in the solr logs like this:
2019-09-04 19:58:01.393 ERROR 
(recoveryExecutor-162-thread-1-processing-n:dfw-pauth-msc01:8983_solr 
x:sial-catalog-product-20190824_shard1_replica_n1 s:shard1 
c:sial-catalog-product-20190824 r:core_node3) [c:sial-catalog-product-20190824 
s:shard1 r:core_node3 x:sial-catalog-product-20190824_shard1_replica_n1] 
o.a.s.u.UpdateLog java.lang.ClassCastException

This was the most common error at the time, I saw it for all of our collections
2019-09-04 19:57:46.572

Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Arnold Bronley
@Shawn: You are right. In my case, the collection name is same as
configuration name and that is why it works. Do you know if there is some
other property that I can use that refers to the collection name instead?

On Wed, Aug 28, 2019 at 3:52 PM Shawn Heisey  wrote:

> On 8/28/2019 1:42 PM, Arnold Bronley wrote:
> > I have configured the SolrCloud collection-wise only and there is no
> other
> > way. The way you have defined 3 zkHosts (comma separated values for
> zkHost
> > property), I tried that one before as it was more intuitive. But it did
> not
> > work for me. I had to use 3 different replica elements each for one of
> the
> > 3 SolrCloud clusters. source and target properties mention the same
> > collection name in my case. Instead of hardcoding it, I am using the
> > collection.configName variable which gets replaced by the collection name
> > to which this solrconfig.xml belongs to.
>
> I am pretty sure that ${collection.configName} refers to the
> configuration name stored in zookeeper, NOT the collection name.  There
> is nothing at all in Solr that requires those names to be the same, and
> for many SolrCloud installs, they are not the same.  If this is working
> for you, then you're probably naming your configs the same as the
> collection.  If you were to ever use the same config on multiple
> collections, that would probably stop working.
>
> I do not know if there is a property with the collection name.  There
> probably is.
>
> Thanks,
> Shawn
>


Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Shawn Heisey

On 8/28/2019 1:42 PM, Arnold Bronley wrote:

I have configured the SolrCloud collection-wise only and there is no other
way. The way you have defined 3 zkHosts (comma separated values for zkHost
property), I tried that one before as it was more intuitive. But it did not
work for me. I had to use 3 different replica elements each for one of the
3 SolrCloud clusters. source and target properties mention the same
collection name in my case. Instead of hardcoding it, I am using the
collection.configName variable which gets replaced by the collection name
to which this solrconfig.xml belongs to.


I am pretty sure that ${collection.configName} refers to the 
configuration name stored in zookeeper, NOT the collection name.  There 
is nothing at all in Solr that requires those names to be the same, and 
for many SolrCloud installs, they are not the same.  If this is working 
for you, then you're probably naming your configs the same as the 
collection.  If you were to ever use the same config on multiple 
collections, that would probably stop working.


I do not know if there is a property with the collection name.  There 
probably is.


Thanks,
Shawn


Re: Turn off CDCR for only selected target clusters

2019-08-28 Thread Arnold Bronley
Hi Erick,

I have configured the SolrCloud collection-wise only and there is no other
way. The way you have defined 3 zkHosts (comma separated values for zkHost
property), I tried that one before as it was more intuitive. But it did not
work for me. I had to use 3 different replica elements each for one of the
3 SolrCloud clusters. source and target properties mention the same
collection name in my case. Instead of hardcoding it, I am using the
collection.configName variable which gets replaced by the collection name
to which this solrconfig.xml belongs to.

If follow your configuration (which does not work in my case and I have
tested it), my question was how to NOT send CDCR updates to targetZkHost2
and targetZkHost3 but not targetZkHost1?

On Tue, Aug 13, 2019 at 3:23 PM Erick Erickson 
wrote:

> You configure CDCR by _collection_, so this question really makes no
> sense.
> You’d never mention collection.configName. So what I suspect is that you’re
> misreading the docs.
>
> 
> ${targetZkHost1},${targetZkHost2},${targetZkHost3}
> sourceCollection_on_local_cluster
> targetCollection_on_targetZkHost1 2 and 3
> 
>
> “Turning off CDCR” selective for ZooKeeper instances really makes no sense
> as the
> point of ZK ensembles is to keep running even if one goes away.
>
> So can you rephrase the question? Or state the problem you’re trying to
> solve another way?
>
> Best,
> Erick
>
> > On Aug 13, 2019, at 1:57 PM, Arnold Bronley 
> wrote:
> >
> > Hi,
> >
> > Is there a way to turn off the CDCR for only selected target clusters.
> >
> > Say, I have a configuration like following. I have 3 target clusters
> > targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn
> off
> > the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
> > targetZkHost1?
> >
> > E.g.
> >
> >  
> > 
> > ${targetZkHost1}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > ${targetZkHost2}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > ${targetZkHost3}
> > ${collection.configName}
> > ${collection.configName}
> > 
> >
> > 
> > 8
> > 1000
> > 128
> > 
> >
> > 
> > 1000
> > 
> >
> > 
> > disabled
> > 
> >  
>
>


Re: Turn off CDCR for only selected target clusters

2019-08-13 Thread Erick Erickson
You configure CDCR by _collection_, so this question really makes no sense. 
You’d never mention collection.configName. So what I suspect is that you’re
misreading the docs. 


${targetZkHost1},${targetZkHost2},${targetZkHost3}
sourceCollection_on_local_cluster
targetCollection_on_targetZkHost1 2 and 3


“Turning off CDCR” selective for ZooKeeper instances really makes no sense as 
the
point of ZK ensembles is to keep running even if one goes away.

So can you rephrase the question? Or state the problem you’re trying to solve 
another way?

Best,
Erick

> On Aug 13, 2019, at 1:57 PM, Arnold Bronley  wrote:
> 
> Hi,
> 
> Is there a way to turn off the CDCR for only selected target clusters.
> 
> Say, I have a configuration like following. I have 3 target clusters
> targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn off
> the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
> targetZkHost1?
> 
> E.g.
> 
>  
> 
> ${targetZkHost1}
> ${collection.configName}
> ${collection.configName}
> 
> 
> 
> ${targetZkHost2}
> ${collection.configName}
> ${collection.configName}
> 
> 
> 
> ${targetZkHost3}
> ${collection.configName}
> ${collection.configName}
> 
> 
> 
> 8
> 1000
> 128
> 
> 
> 
> 1000
> 
> 
> 
> disabled
> 
>  



Turn off CDCR for only selected target clusters

2019-08-13 Thread Arnold Bronley
Hi,

Is there a way to turn off the CDCR for only selected target clusters.

Say, I have a configuration like following. I have 3 target clusters
targetZkHost1, targetZkHost2 and targetZkHost3. Is it possible to turn off
the CDCR for targetZkHost2 and targetZkHost3 but keep it on for
targetZkHost1?

E.g.

  
 
${targetZkHost1}
${collection.configName}
${collection.configName}
 

 
${targetZkHost2}
${collection.configName}
${collection.configName}
 

 
${targetZkHost3}
${collection.configName}
${collection.configName}
 

 
8
1000
128
 

 
1000
 

 
disabled
 
  


Re: [CAUTION] Re: [CAUTION] Re: CDCR Queues API invocation with CloudSolrclient

2019-07-25 Thread Natarajan, Rajeswari
I tried Shawn's suggestion to use SolrQuery Object instead of  QT , still it is 
the same issue.

Regards,
Rajeswari

On 7/24/19, 4:54 PM, "Natarajan, Rajeswari"  
wrote:

Please look at the below test  which tests CDCR OPS Api. This has 
"BadApple" annotation (meaning the test fails intermittently)

https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/cloud/cdcr/CdcrOpsAndBoundariesTest.java#L73
This also  is because of  sometimes the Cloudsolrclient gets the value and 
sometimes not. This OPS api also needs to talk to core. OK indeed this issue 
looks like a bug

Thanks,
Rajeswari

On 7/24/19, 4:18 PM, "Natarajan, Rajeswari"  
wrote:

Btw , the code is copied from solr 7.6 source code.

Thanks,
Rajeswari

On 7/24/19, 4:12 PM, "Natarajan, Rajeswari" 
 wrote:

Thanks Shawn for the reply. I am not saying it is bug. I just would 
like to know how to get the "lastTimestamp" by invoking CluodSolrClient 
reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient 
client) throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  
I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and 
call its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" 
value reliabily by CloudSolrClient.

This part I really have no idea about.  The API documentation 
does say 
that monitoring actions are done at the core level and control 
actions 
are done at the collection level, so this might not be 
considered a bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn










Re: [CAUTION] Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Natarajan, Rajeswari
Please look at the below test  which tests CDCR OPS Api. This has "BadApple" 
annotation (meaning the test fails intermittently)
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/cloud/cdcr/CdcrOpsAndBoundariesTest.java#L73
This also  is because of  sometimes the Cloudsolrclient gets the value and 
sometimes not. This OPS api also needs to talk to core. OK indeed this issue 
looks like a bug

Thanks,
Rajeswari

On 7/24/19, 4:18 PM, "Natarajan, Rajeswari"  
wrote:

Btw , the code is copied from solr 7.6 source code.

Thanks,
Rajeswari

On 7/24/19, 4:12 PM, "Natarajan, Rajeswari"  
wrote:

Thanks Shawn for the reply. I am not saying it is bug. I just would 
like to know how to get the "lastTimestamp" by invoking CluodSolrClient 
reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient 
client) throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and call 
its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" value 
reliabily by CloudSolrClient.

This part I really have no idea about.  The API documentation does 
say 
that monitoring actions are done at the core level and control 
actions 
are done at the collection level, so this might not be considered a 
bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn








Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Natarajan, Rajeswari
Btw , the code is copied from solr 7.6 source code.

Thanks,
Rajeswari

On 7/24/19, 4:12 PM, "Natarajan, Rajeswari"  
wrote:

Thanks Shawn for the reply. I am not saying it is bug. I just would like to 
know how to get the "lastTimestamp" by invoking CluodSolrClient reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient client) 
throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and call its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" value 
reliabily by CloudSolrClient.

This part I really have no idea about.  The API documentation does say 
that monitoring actions are done at the core level and control actions 
    are done at the collection level, so this might not be considered a 
bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn






Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Natarajan, Rajeswari
Thanks Shawn for the reply. I am not saying it is bug. I just would like to 
know how to get the "lastTimestamp" by invoking CluodSolrClient reliabily.

Regards,
Rajeswari

On 7/24/19, 3:14 PM, "Shawn Heisey"  wrote:

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:
> Hi,
> 
> With the below API , the QueryResponse , sometimes have the 
"lastTimestamp" , sometimes not.
> protected static QueryResponse getCdcrQueue(CloudSolrClient client) 
throws SolrServerException, IOException {
>  ModifiableSolrParams params = new ModifiableSolrParams();
>  params.set(CommonParams.QT, "/cdcr");
>  params.set(CommonParams.ACTION, CdcrParams.QUEUES);
>  return client.query(params);
>}

Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.

Use a SolrQuery object instead of ModifiableSolrParams, and call its 
setRequestHandler method to set the request handler.

> Invoking 
http://:/solr//cdcr?action=QUEUES  has the same 
issue
> 
> But if invoked as 
http://:/solr//cdcr?action=QUEUES always gets the " 
lastTimestamp" value. Would like to know
> How to get the cdcr queues always return " lastTimestamp" value reliabily 
by CloudSolrClient.

This part I really have no idea about.  The API documentation does say 
that monitoring actions are done at the core level and control actions 
    are done at the collection level, so this might not be considered a bug. 
  Someone who knows CDCR really well will need to comment.

https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn




Re: CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Shawn Heisey

On 7/24/2019 3:50 PM, Natarajan, Rajeswari wrote:

Hi,

With the below API , the QueryResponse , sometimes have the "lastTimestamp" , 
sometimes not.
protected static QueryResponse getCdcrQueue(CloudSolrClient client) throws 
SolrServerException, IOException {
 ModifiableSolrParams params = new ModifiableSolrParams();
 params.set(CommonParams.QT, "/cdcr");
 params.set(CommonParams.ACTION, CdcrParams.QUEUES);
 return client.query(params);
   }


Side note:  Setting the handler path with the qt parameter was 
deprecated in Solr 3.6, which was released seven years ago.  I'm 
surprised it even still works.


Use a SolrQuery object instead of ModifiableSolrParams, and call its 
setRequestHandler method to set the request handler.



Invoking http://:/solr//cdcr?action=QUEUES  has 
the same issue

But if invoked as http://:/solr//cdcr?action=QUEUES always gets 
the " lastTimestamp" value. Would like to know
How to get the cdcr queues always return " lastTimestamp" value reliabily by 
CloudSolrClient.


This part I really have no idea about.  The API documentation does say 
that monitoring actions are done at the core level and control actions 
are done at the collection level, so this might not be considered a bug. 
 Someone who knows CDCR really well will need to comment.


https://lucene.apache.org/solr/guide/8_1/cdcr-api.html

Thanks,
Shawn


CDCR Queues API invocation with CloudSolrclient

2019-07-24 Thread Natarajan, Rajeswari
Hi,

With the below API , the QueryResponse , sometimes have the "lastTimestamp" , 
sometimes not.
protected static QueryResponse getCdcrQueue(CloudSolrClient client) throws 
SolrServerException, IOException {
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(CommonParams.QT, "/cdcr");
params.set(CommonParams.ACTION, CdcrParams.QUEUES);
return client.query(params);
  }

Invoking http://:/solr//cdcr?action=QUEUES  has 
the same issue

But if invoked as http://:/solr//cdcr?action=QUEUES 
always gets the " lastTimestamp" value. Would like to know
How to get the cdcr queues always return " lastTimestamp" value reliabily by 
CloudSolrClient.

Thank you,
Rajeswari
 



Re: [CAUTION] CDCR Monitoring - To figure out the latency between source and target replication delay

2019-06-18 Thread Natarajan, Rajeswari
I see below for CDCR Queues API Documentation 

The output is composed of a list “queues” which contains a list of (ZooKeeper) 
Target hosts, themselves containing a list of Target collections. For each 
collection, the current size of the queue and the timestamp of the last update 
operation successfully processed is provided. The timestamp of the update 
operation is the original timestamp, i.e., the time this operation was 
processed on the Source SolrCloud. This allows an estimate the latency of the 
replication process.

The timestamp of the update operation in the source solrcloud is given,  how 
does it help to figure out the latency of replication. Can someone please 
explain , am I missing something obvious. We want to generate alert  if there 
is a huge latency , looking to see how this can be done.

Thank you.
Rajeswari

On 5/30/19, 9:47 AM, "Natarajan, Rajeswari"  
wrote:

Hi,

Is there a way to  monitor the replication delay between Primary/Secondary 
Cluster for CDCR  and raise alerts ,if it exceeds above some threshold.

I see below API’s for monitoring.

·
    core/cdcr?action=QUEUES: Fetches statistics about the 
queue<https://lucene.apache.org/solr/guide/7_6/cdcr-api.html#queues> for each 
replica and about the update logs.
    ·     core/cdcr?action=OPS: Fetches statistics about the replication 
performance<https://lucene.apache.org/solr/guide/7_6/cdcr-api.html#ops> 
(operations per second) for each replica.
    · core/cdcr?action=ERRORS: Fetches statistics and other information 
about replication 
errors<https://lucene.apache.org/solr/guide/7_6/cdcr-api.html#errors> for each 
replica.

These report the stats, performance and errors.
Thanks,
Rajeswari





Re: bi-directional CDCR

2019-06-18 Thread Natarajan, Rajeswari
We are using bidirectional CDCR with solr 7.6 and it works for us. Did you look 
at the logs to see if there are any errors.

"Both Cluster 1 and Cluster 2 can act as Source and Target at any given
point of time but a cluster cannot be both Source and Target at the same
time."

The above means the publishing can take place on one cluster only at any point. 
Publishing cannot happen simultaneously on both clusters.

Hope this helps
Rajeswari

On 6/11/19, 7:13 PM, "Susheel Kumar"  wrote:

Hello,

What does that mean by below.  How do we set which cluster will act as
source or target at a time?

Both Cluster 1 and Cluster 2 can act as Source and Target at any given
point of time but a cluster cannot be both Source and Target at the same
time.
Also following the directions mentioned in this page doesn't make cdcr
works. No data flows from cluster 1  to cluster 2. The Solr 7.7.1.  Is
there something missing.

https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates




bi-directional CDCR

2019-06-11 Thread Susheel Kumar
Hello,

What does that mean by below.  How do we set which cluster will act as
source or target at a time?

Both Cluster 1 and Cluster 2 can act as Source and Target at any given
point of time but a cluster cannot be both Source and Target at the same
time.
Also following the directions mentioned in this page doesn't make cdcr
works. No data flows from cluster 1  to cluster 2. The Solr 7.7.1.  Is
there something missing.
https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates


CDCR Monitoring

2019-05-30 Thread Natarajan, Rajeswari
Hi,

Is there a way to  monitor the replication delay between Primary/Secondary 
Cluster for CDCR  and raise alerts ,if it exceeds above some threshold.

I see below API’s for monitoring.

·
core/cdcr?action=QUEUES: Fetches statistics about the 
queue<https://lucene.apache.org/solr/guide/7_6/cdcr-api.html#queues> for each 
replica and about the update logs.
· core/cdcr?action=OPS: Fetches statistics about the replication 
performance<https://lucene.apache.org/solr/guide/7_6/cdcr-api.html#ops> 
(operations per second) for each replica.
·     core/cdcr?action=ERRORS: Fetches statistics and other information 
about replication 
errors<https://lucene.apache.org/solr/guide/7_6/cdcr-api.html#errors> for each 
replica.

These report the stats, performance and errors.
Thanks,
Rajeswari



Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Thanks Amrith. Created a bug
https://issues.apache.org/jira/browse/SOLR-13481

Regards,
Rajeswari

On 5/19/19, 3:44 PM, "Amrit Sarkar"  wrote:

Sounds legit to me.

Can you create a Jira and list down the problem statement and design
solution there. I am confident it will attract committers' attention and
they can review the design and provide feedback.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thanks Amrith for creating a patch. But the code in the
> LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
> intended.
> Regards
> Rajeswari
>
> public Rsp request(Req req) throws SolrServerException, IOException {
> Rsp rsp = new Rsp();
> Exception ex = null;
> boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
> ADMIN_PATHS.contains(req.request.getPath());
> List skipped = null;
>
> final Integer numServersToTry = req.getNumServersToTry();
> int numServersTried = 0;
>
> boolean timeAllowedExceeded = false;
> long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
> long timeOutTime = System.nanoTime() + timeAllowedNano;
> for (String serverStr : req.getServers()) {
>   if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
> break;
>   }
>
>   serverStr = normalize(serverStr);
>   // if the server is currently a zombie, just skip to the next one
>   ServerWrapper wrapper = zombieServers.get(serverStr);
>   if (wrapper != null) {
> // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
> final int numDeadServersToTry = req.getNumDeadServersToTry();
> if (numDeadServersToTry > 0) {
>   if (skipped == null) {
> skipped = new ArrayList<>(numDeadServersToTry);
> skipped.add(wrapper);
>   }
>   else if (skipped.size() < numDeadServersToTry) {
> skipped.add(wrapper);
>   }
> }
> continue;
>   }
>   try {
> MDC.put("LBHttpSolrClient.url", serverStr);
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> HttpSolrClient client = makeSolrClient(serverStr);
>
> ++numServersTried;
> ex = doRequest(client, req, rsp, isNonRetryable, false, null);
> if (ex == null) {
>   return rsp; // SUCCESS
> }
>   } finally {
> MDC.remove("LBHttpSolrClient.url");
>   }
> }
>
> // try the servers we previously skipped
> if (skipped != null) {
>   for (ServerWrapper wrapper : skipped) {
> if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>   break;
> }
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> try {
>   MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
>   ++numServersTried;
>   ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
> wrapper.getKey());
>   if (ex == null) {
> return rsp; // SUCCESS
>   }
> } finally {
>   MDC.remove("LBHttpSolrClient.url");
> }
>   }
> }
>
>
> final String solrServerExceptionMessage;
> if (timeAllowedExceeded) {
>   solrServerExceptionMessage = "Time allowed to handle this request
> exceeded";
> } else {
>   if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request:"
> + " numServersTried="+numServersTried
> + " numServersToTry="+numServersToTry.intValue();
>   } else {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request";
>   }
> }
> if (ex == null) {
>   throw new SolrServerException(solrServerExceptionMessage);
> } else {
>   throw new SolrServerException(solrServerExceptionMessage+":" +
> zombieServers.keySet(), ex);
> }
>
>   }
>
> On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:
>
> >
> > Thanks Natrajan,
  

Re: [CDCR]Unable to locate core

2019-05-19 Thread Amrit Sarkar
Sounds legit to me.

Can you create a Jira and list down the problem statement and design
solution there. I am confident it will attract committers' attention and
they can review the design and provide feedback.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thanks Amrith for creating a patch. But the code in the
> LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
> intended.
> Regards
> Rajeswari
>
> public Rsp request(Req req) throws SolrServerException, IOException {
> Rsp rsp = new Rsp();
> Exception ex = null;
> boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
> ADMIN_PATHS.contains(req.request.getPath());
> List skipped = null;
>
> final Integer numServersToTry = req.getNumServersToTry();
> int numServersTried = 0;
>
> boolean timeAllowedExceeded = false;
> long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
> long timeOutTime = System.nanoTime() + timeAllowedNano;
> for (String serverStr : req.getServers()) {
>   if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
> break;
>   }
>
>   serverStr = normalize(serverStr);
>   // if the server is currently a zombie, just skip to the next one
>   ServerWrapper wrapper = zombieServers.get(serverStr);
>   if (wrapper != null) {
> // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
> final int numDeadServersToTry = req.getNumDeadServersToTry();
> if (numDeadServersToTry > 0) {
>   if (skipped == null) {
> skipped = new ArrayList<>(numDeadServersToTry);
> skipped.add(wrapper);
>   }
>   else if (skipped.size() < numDeadServersToTry) {
> skipped.add(wrapper);
>   }
> }
> continue;
>   }
>   try {
> MDC.put("LBHttpSolrClient.url", serverStr);
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> HttpSolrClient client = makeSolrClient(serverStr);
>
> ++numServersTried;
> ex = doRequest(client, req, rsp, isNonRetryable, false, null);
> if (ex == null) {
>   return rsp; // SUCCESS
> }
>   } finally {
> MDC.remove("LBHttpSolrClient.url");
>   }
> }
>
> // try the servers we previously skipped
> if (skipped != null) {
>   for (ServerWrapper wrapper : skipped) {
> if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>   break;
> }
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> try {
>   MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
>   ++numServersTried;
>   ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
> wrapper.getKey());
>   if (ex == null) {
> return rsp; // SUCCESS
>   }
> } finally {
>   MDC.remove("LBHttpSolrClient.url");
> }
>   }
> }
>
>
> final String solrServerExceptionMessage;
> if (timeAllowedExceeded) {
>   solrServerExceptionMessage = "Time allowed to handle this request
> exceeded";
> } else {
>   if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request:"
> + " numServersTried="+numServersTried
> + " numServersToTry="+numServersToTry.intValue();
>   } else {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request";
>   }
> }
> if (ex == null) {
>   throw new SolrServerException(solrServerExceptionMessage);
> } else {
>   throw new SolrServerException(solrServerExceptionMessage+":" +
> zombieServers.keySet(), ex);
> }
>
>   }
>
> On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:
>
> >
> > Thanks Natrajan,
> >
> > Solid analysis and I saw the issue being reported by multiple users
> in
> > past few months and unfortunately I baked an incomplete code.
> >
> > I think the correct way of solving this issue is to identify the
> correct
> > base-url for the respective core we need to trigger REQUESTRECOVERY
> to and
> > create a local HttpSolrClient instead of using CloudSolrClient from
> > CdcrReplicatorState. This will avoid unnecessary retry which will be
> > redundant in our case.
> >
> > I baked a small patch few weeks back and will upload it on the
> SOLR-11724
> > .
> >
>
>
>


Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Thanks Amrith for creating a patch. But the code in the LBHttpSolrClient.java 
needs to be fixed too, if the for loop  to work as intended.
Regards
Rajeswari

public Rsp request(Req req) throws SolrServerException, IOException {
Rsp rsp = new Rsp();
Exception ex = null;
boolean isNonRetryable = req.request instanceof IsUpdateRequest || 
ADMIN_PATHS.contains(req.request.getPath());
List skipped = null;

final Integer numServersToTry = req.getNumServersToTry();
int numServersTried = 0;

boolean timeAllowedExceeded = false;
long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
long timeOutTime = System.nanoTime() + timeAllowedNano;
for (String serverStr : req.getServers()) {
  if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
break;
  }
  
  serverStr = normalize(serverStr);
  // if the server is currently a zombie, just skip to the next one
  ServerWrapper wrapper = zombieServers.get(serverStr);
  if (wrapper != null) {
// System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
final int numDeadServersToTry = req.getNumDeadServersToTry();
if (numDeadServersToTry > 0) {
  if (skipped == null) {
skipped = new ArrayList<>(numDeadServersToTry);
skipped.add(wrapper);
  }
  else if (skipped.size() < numDeadServersToTry) {
skipped.add(wrapper);
  }
}
continue;
  }
  try {
MDC.put("LBHttpSolrClient.url", serverStr);

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

HttpSolrClient client = makeSolrClient(serverStr);

++numServersTried;
ex = doRequest(client, req, rsp, isNonRetryable, false, null);
if (ex == null) {
  return rsp; // SUCCESS
}
  } finally {
MDC.remove("LBHttpSolrClient.url");
  }
}

// try the servers we previously skipped
if (skipped != null) {
  for (ServerWrapper wrapper : skipped) {
if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) 
{
  break;
}

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

try {
  MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
  ++numServersTried;
  ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, 
wrapper.getKey());
  if (ex == null) {
return rsp; // SUCCESS
  }
} finally {
  MDC.remove("LBHttpSolrClient.url");
}
  }
}


final String solrServerExceptionMessage;
if (timeAllowedExceeded) {
  solrServerExceptionMessage = "Time allowed to handle this request 
exceeded";
} else {
  if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request:"
+ " numServersTried="+numServersTried
+ " numServersToTry="+numServersToTry.intValue();
  } else {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request";
  }
}
if (ex == null) {
  throw new SolrServerException(solrServerExceptionMessage);
} else {
  throw new SolrServerException(solrServerExceptionMessage+":" + 
zombieServers.keySet(), ex);
}

  }

On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:

>
> Thanks Natrajan,
>
> Solid analysis and I saw the issue being reported by multiple users in
> past few months and unfortunately I baked an incomplete code.
>
> I think the correct way of solving this issue is to identify the correct
> base-url for the respective core we need to trigger REQUESTRECOVERY to and
> create a local HttpSolrClient instead of using CloudSolrClient from
> CdcrReplicatorState. This will avoid unnecessary retry which will be
> redundant in our case.
>
> I baked a small patch few weeks back and will upload it on the SOLR-11724
> .
>




Re: CDCR one source multiple targets

2019-05-19 Thread Amrit Sarkar
Thanks, Arnold,

Is the documentation not clear with the manner multiple CDCR targets can be
configured?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Thu, Apr 11, 2019 at 2:59 AM Arnold Bronley 
wrote:

> This had a very simple solution if anybody else is wondering about the same
> issue.I had to define separate replica elements inside cdcr. Following is
> an example.
>
>   "replica"> target1:2181  techproducts techproducts str>   target2:2181  techproducts
>  name="target">techproducts"threadPoolSize">8 1000  "batchSize">128"schedule">1000name="buffer"> disabled   requestHandler>
>
> On Thu, Mar 21, 2019 at 10:40 AM Arnold Bronley 
> wrote:
>
> > I see a similar question asked but no answers there too.
> >
> http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
> > OP there is using multiple cdcr request handlers but in my case I am
> using
> > multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
> > for one source- multiple target cluster situation.
> > Can somebody please confirm whether this is even supported?
> >
> >
> > On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
> > wrote:
> >
> >> Hi,
> >>
> >> is it possible to use CDCR with one source SolrCloud cluster and
> multiple
> >> target SolrCloud clusters? I tried to edit the zkHost setting in source
> >> cluster's solrconfig file by adding multiple comma separated values for
> >> target zkhosts for multuple target clusters. But the CDCR replication
> >> happens only to one of the zkhosts and not all. If this is not supported
> >> then how should I go about implementing something like this?
> >>
> >>
> >
>


Re: CDCR - shards not in sync

2019-05-19 Thread Amrit Sarkar
Hi Jay,

Can you look at the logs and identify if there are any exceptions occurring
at particular Solr nodes the lagging shard is hosted?

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, Apr 15, 2019 at 8:33 PM Jay Potharaju  wrote:

> Hi,
> I have a collection with 8 shards. 6 out of the shards are in sync but the
> other 2 are lagging behind by more than 10 plus hours. The tlog is only 0.5
> GB in size. I have tried stopping and starting CDCR number of times but it
> has not helped.
> From what i have noticed there is always a shard that is slower than
> others.
>
> Solr version: 7.7.0
> CDCR config
>
>   
> 2
> 10
> 4500
>   
>
>   
> 6
>   
>
>
> Thanks
> Jay
>


[CDCR]Unable to locate core

2019-05-19 Thread Amrit Sarkar
>
> Thanks Natrajan,
>
> Solid analysis and I saw the issue being reported by multiple users in
> past few months and unfortunately I baked an incomplete code.
>
> I think the correct way of solving this issue is to identify the correct
> base-url for the respective core we need to trigger REQUESTRECOVERY to and
> create a local HttpSolrClient instead of using CloudSolrClient from
> CdcrReplicatorState. This will avoid unnecessary retry which will be
> redundant in our case.
>
> I baked a small patch few weeks back and will upload it on the SOLR-11724
> .
>


Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Here is my close analysis:


SolrClient request goes to the below method  "request " in the class 
LBHttpSolrClient.java
There is a for loop to try  different live servers , but when  doRequest method 
 (in the request method below) sends exception there is no catch , so next 
re-try is not done. To solve this issue , there should be catch around 
doRequest and then the second time it will re-try the correct request. But in 
case there are multiple live servers, the request might timeout also.  This 
needs to be fixed to make CDCR bootstrap  work reliable. If not sometimes it 
will work good and sometimes not. I can work on this patch  if this is agreed.


public Rsp request(Req req) throws SolrServerException, IOException {
Rsp rsp = new Rsp();
Exception ex = null;
boolean isNonRetryable = req.request instanceof IsUpdateRequest || 
ADMIN_PATHS.contains(req.request.getPath());
List skipped = null;

final Integer numServersToTry = req.getNumServersToTry();
int numServersTried = 0;

boolean timeAllowedExceeded = false;
long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
long timeOutTime = System.nanoTime() + timeAllowedNano;
for (String serverStr : req.getServers()) {
  if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
break;
  }
  
  serverStr = normalize(serverStr);
  // if the server is currently a zombie, just skip to the next one
  ServerWrapper wrapper = zombieServers.get(serverStr);
  if (wrapper != null) {
// System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
final int numDeadServersToTry = req.getNumDeadServersToTry();
if (numDeadServersToTry > 0) {
  if (skipped == null) {
skipped = new ArrayList<>(numDeadServersToTry);
skipped.add(wrapper);
  }
  else if (skipped.size() < numDeadServersToTry) {
skipped.add(wrapper);
  }
}
continue;
  }
  try {
MDC.put("LBHttpSolrClient.url", serverStr);

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
} 

HttpSolrClient client = makeSolrClient(serverStr);

++numServersTried;
ex = doRequest(client, req, rsp, isNonRetryable, false, null);
if (ex == null) {
  return rsp; // SUCCESS
}
   //NO CATCH HERE ,  SO IT FAILS
  } finally {
MDC.remove("LBHttpSolrClient.url");
  }
}

// try the servers we previously skipped
if (skipped != null) {
  for (ServerWrapper wrapper : skipped) {
if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) 
{
  break;
}

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

try {
  MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
  ++numServersTried;
  ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, 
wrapper.getKey());
  if (ex == null) {
return rsp; // SUCCESS
  }
} finally {
  MDC.remove("LBHttpSolrClient.url");
}
  }
}


final String solrServerExceptionMessage;
if (timeAllowedExceeded) {
  solrServerExceptionMessage = "Time allowed to handle this request 
exceeded";
} else {
  if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request:"
+ " numServersTried="+numServersTried
+ " numServersToTry="+numServersToTry.intValue();
  } else {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request";
  }
}
if (ex == null) {
  throw new SolrServerException(solrServerExceptionMessage);
} else {
  throw new SolrServerException(solrServerExceptionMessage+":" + 
zombieServers.keySet(), ex);
}

  }


Thanks,
Rajeswari


On 5/19/19, 9:39 AM, "Natarajan, Rajeswari"  
wrote:

Hi

We are using solr 7.6 and trying out bidirectional CDCR and I also hit this 
issue. 

Stacktrace

INFO  (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager CDCR bootstrap successful in 3 seconds
   
INFO  (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager Create new update log reader for target abcd_ta 
with checkpoint -1 @ abcd_ta:shard1
ERROR (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager Unable to bootstrap the target collection abcd_ta 
shard: shard1

Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Hi

We are using solr 7.6 and trying out bidirectional CDCR and I also hit this 
issue. 

Stacktrace

INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
CDCR bootstrap successful in 3 seconds  
 
INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
Create new update log reader for target abcd_ta with checkpoint -1 @ 
abcd_ta:shard1
ERROR (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
Unable to bootstrap the target collection abcd_ta shard: shard1 

olrj.impl.HttpSolrClient$RemoteSolrException: Error from server at 
http://10.169.50.182:8983/solr: Unable to locate core 
kanna_ta_shard1_replica_n1
lr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53] 
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
 ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]


I stepped through the code

private NamedList sendRequestRecoveryToFollower(SolrClient client, String 
coreName) throws SolrServerException, IOException {
CoreAdminRequest.RequestRecovery recoverRequestCmd = new 
CoreAdminRequest.RequestRecovery();

recoverRequestCmd.setAction(CoreAdminParams.CoreAdminAction.REQUESTRECOVERY);
recoverRequestCmd.setCoreName(coreName);
return client.request(recoverRequestCmd);
  }

 In the above method , recovery request command is admin command and it is 
specific to a core. In the  solrclient.request logic the code gets the 
liveservers and execute the command in a loop ,but  since this is admin command 
this is non re-triable.  Depending on which live server the code gets and where 
does the core lies , the recover request command might be successful or 
failure.  So I think there is problem with this code in trying to send the core 
command to all available live servers , the code I guess should find the 
correct server on which the core lies and send this request.

Regards,
Rajeswari

On 5/15/19, 10:59 AM, "Natarajan, Rajeswari"  
wrote:

I am also facing this issue. Any resolution found on this issue, Please 
update. Thanks

On 2/7/19, 10:42 AM, "Tim"  wrote:

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion 
(implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it 
sends
the recovery command to the node which has the leader for a given 
replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed 
that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on 
node1,
while the follower s3r8 is on node2, then the core recovery command 
meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html






Re: [CDCR]Unable to locate core

2019-05-15 Thread Natarajan, Rajeswari
I am also facing this issue. Any resolution found on this issue, Please update. 
Thanks

On 2/7/19, 10:42 AM, "Tim"  wrote:

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Seeking advice on SolrCloud production architecture with CDCR

2019-05-14 Thread Shawn Heisey

On 5/14/2019 4:55 PM, Cody Burleson wrote:

I’m worried, for example, about spreading the Zookeper cluster between the two 
data centers because of potential latency across the pond. Maybe we keep the ZK 
ensemble on one side of the pond only? I imagined, for instance,  2 ZK nodes on 
one server, and one on the other (in at least one data center). But maybe we 
need 5 ZKs, with 1 on each server in the other data center? Then how about the 
Solr nodes, shards, and replicas? If anybody has done some remotely similar 
setup for production purposes, I would be grateful for any tips (and down-right 
giddy for a diagram).


If you're planning a geographically diverse ZooKeeper setup, you cannot 
do it with only two datacenters.  You need at least three.  This is 
inherent to the design of ZK and cannot be changed.  With two data 
centers, you will always have one DC that if it goes down, ZK loses 
quorum.  When ZK loses quorum, SolrCloud loses the ability to react to 
failures and goes read-only.


You mentioned CDCR.  This involves two completely separate SolrCloud 
clusters -- a full ZK ensemble in each location.  So you would have 3 ZK 
servers and at least two Solr servers in one data center, and 3 ZK 
servers plus at least two Solr servers in the other data center.


Thanks,
Shawn


Seeking advice on SolrCloud production architecture with CDCR

2019-05-14 Thread Cody Burleson
Hi, all. We’re upgrading an old Solr 3.5 setup (master/slave replication) to 
SolrCloud (v7 or v8) and with the addition of a new data center (for dual data 
centers). I’ve done a lot of homework, but could still use some advice. While 
documentation explains Zookeper and SolrCloud pretty well, I don’t get a 
comfortable sense for how to lay everything out physically in the architecture.

At present, we have planned the same physical hardware as what we had for our 
master/slave setup (basically, 2 servers). Now, however, we’re going to 
duplicate that so that we also have the same in another data center: US and 
Europe. For this, the Cross Data Center Replication (CDCR; bi-directional) 
seems appropriate, but I’m not confident. Also, for the best fault tolerance 
and high-availability, I’m not real sure how to layout my Zookeper nodes and my 
Solr instances/shards/replicas physically across the servers. I’d like to start 
with the simplest possible setup and scale up only if necessary. Our index size 
is relatively small, I guess: ~150,000 documents.

I’m worried, for example, about spreading the Zookeper cluster between the two 
data centers because of potential latency across the pond. Maybe we keep the ZK 
ensemble on one side of the pond only? I imagined, for instance,  2 ZK nodes on 
one server, and one on the other (in at least one data center). But maybe we 
need 5 ZKs, with 1 on each server in the other data center? Then how about the 
Solr nodes, shards, and replicas? If anybody has done some remotely similar 
setup for production purposes, I would be grateful for any tips (and down-right 
giddy for a diagram).

I know I’m probably not even providing enough information to begin with, but 
perhaps someone will entertain a conversation?

Thanks, in advance, for sharing some of your valuable time and experience.

Cody


CDCR - shards not in sync

2019-04-15 Thread Jay Potharaju
Hi,
I have a collection with 8 shards. 6 out of the shards are in sync but the
other 2 are lagging behind by more than 10 plus hours. The tlog is only 0.5
GB in size. I have tried stopping and starting CDCR number of times but it
has not helped.
>From what i have noticed there is always a shard that is slower than others.

Solr version: 7.7.0
CDCR config

  
2
10
4500
  

  
6
  


Thanks
Jay


Re: CDCR one source multiple targets

2019-04-10 Thread Arnold Bronley
This had a very simple solution if anybody else is wondering about the same
issue.I had to define separate replica elements inside cdcr. Following is
an example.

  target1:2181  techproducts techproducts   target2:2181  techproducts techproducts   8 1000 128   1000disabled  

On Thu, Mar 21, 2019 at 10:40 AM Arnold Bronley 
wrote:

> I see a similar question asked but no answers there too.
> http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
> OP there is using multiple cdcr request handlers but in my case I am using
> multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
> for one source- multiple target cluster situation.
> Can somebody please confirm whether this is even supported?
>
>
> On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
> wrote:
>
>> Hi,
>>
>> is it possible to use CDCR with one source SolrCloud cluster and multiple
>> target SolrCloud clusters? I tried to edit the zkHost setting in source
>> cluster's solrconfig file by adding multiple comma separated values for
>> target zkhosts for multuple target clusters. But the CDCR replication
>> happens only to one of the zkhosts and not all. If this is not supported
>> then how should I go about implementing something like this?
>>
>>
>


Re: CDCR issues

2019-03-24 Thread Gus Heck
This sounds worthy of a jira. Especially if you can cite steps to reproduce.

On Fri, Mar 22, 2019, 10:51 PM Jay Potharaju  wrote:

> This might be causing the high CPU in 7.7.x.
>
>
> https://github.com/apache/lucene-solr/commit/eb652b84edf441d8369f5188cdd5e3ae2b151434#diff-e54b251d166135a1afb7938cfe152bb5
> That is related to this JDK bug
> https://bugs.openjdk.java.net/browse/JDK-8129861.
>
>
> Thanks
> Jay Potharaju
>
>
>
> On Thu, Mar 21, 2019 at 10:20 PM Jay Potharaju 
> wrote:
>
> > Hi,
> > I just enabled CDCR for one  collection. I am seeing high CPU usage and
> > the high number of tlog files and increasing.
> > The collection does not have lot of data , just started reindexing of
> > data.
> > .
> > Solr 7.7.0 , implicit sharding 8 shards
> > I have enabled buffer on source side and disabled buffer on target side.
> > The number of replicators is set to 4.
> >  Any suggestions on how to tackle high cpu and growing tlog. The tlog are
> > small in size but for the one shard I checked there were about 100 of
> them.
> >
> > Thanks
> > Jay
>


Re: CDCR issues

2019-03-22 Thread Jay Potharaju
This might be causing the high CPU in 7.7.x.

https://github.com/apache/lucene-solr/commit/eb652b84edf441d8369f5188cdd5e3ae2b151434#diff-e54b251d166135a1afb7938cfe152bb5
That is related to this JDK bug
https://bugs.openjdk.java.net/browse/JDK-8129861.


Thanks
Jay Potharaju



On Thu, Mar 21, 2019 at 10:20 PM Jay Potharaju 
wrote:

> Hi,
> I just enabled CDCR for one  collection. I am seeing high CPU usage and
> the high number of tlog files and increasing.
> The collection does not have lot of data , just started reindexing of
> data.
> .
> Solr 7.7.0 , implicit sharding 8 shards
> I have enabled buffer on source side and disabled buffer on target side.
> The number of replicators is set to 4.
>  Any suggestions on how to tackle high cpu and growing tlog. The tlog are
> small in size but for the one shard I checked there were about 100 of them.
>
> Thanks
> Jay


CDCR issues

2019-03-21 Thread Jay Potharaju
Hi,
I just enabled CDCR for one  collection. I am seeing high CPU usage and the 
high number of tlog files and increasing.
The collection does not have lot of data , just started reindexing of data. 
.
Solr 7.7.0 , implicit sharding 8 shards
I have enabled buffer on source side and disabled buffer on target side. 
The number of replicators is set to 4.
 Any suggestions on how to tackle high cpu and growing tlog. The tlog are small 
in size but for the one shard I checked there were about 100 of them. 

Thanks
Jay

Re: CDCR one source multiple targets

2019-03-21 Thread Arnold Bronley
I see a similar question asked but no answers there too.
http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
OP there is using multiple cdcr request handlers but in my case I am using
multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
for one source- multiple target cluster situation.
Can somebody please confirm whether this is even supported?


On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
wrote:

> Hi,
>
> is it possible to use CDCR with one source SolrCloud cluster and multiple
> target SolrCloud clusters? I tried to edit the zkHost setting in source
> cluster's solrconfig file by adding multiple comma separated values for
> target zkhosts for multuple target clusters. But the CDCR replication
> happens only to one of the zkhosts and not all. If this is not supported
> then how should I go about implementing something like this?
>
>


CDCR one source multiple targets

2019-03-20 Thread Arnold Bronley
Hi,

is it possible to use CDCR with one source SolrCloud cluster and multiple
target SolrCloud clusters? I tried to edit the zkHost setting in source
cluster's solrconfig file by adding multiple comma separated values for
target zkhosts for multuple target clusters. But the CDCR replication
happens only to one of the zkhosts and not all. If this is not supported
then how should I go about implementing something like this?


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Thanks, Nish. It turned out to be other issue. I had not restarted one of
the node in the cluster which had become leader meanwhile.
It is good to know though that there is malformed XML in the example. I
will try to submit a document fix soon.

On Thu, Mar 14, 2019 at 5:37 PM Nish Karve  wrote:

> Arnold,
>
> Have you copied the configuration from the Solr docs? The bi directional
> cluster configuration (for cluster 1) has a malformed XML. It is missing
> the closing tag for the updateLogSynchronizer under the request handler
> configuration.
>
> Please disregard if you have already considered that in your configuration.
> I had a lot of issues trying to figure out the issue when I realized that
> it was a documentation error.
>
> Thanks
> Nishant
>
>
> On Thu, Mar 14, 2019, 2:54 PM Arnold Bronley  wrote:
>
> > Configuration is almost identical for both clusters in terms of cdcr
> except
> > for zkHost parameter configuration.
> >
> > On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
> > wrote:
> >
> > > Exactly. I have it defined in both clusters. I am following the
> > > instructions from here .
> > >
> >
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
> > >
> > > On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> > > wrote:
> > >
> > >> Hi Arnold,
> > >>
> > >> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> > >> clusters' collections. Both clusters need to act as source and target.
> > >>
> > >> Amrit Sarkar
> > >> Search Engineer
> > >> Lucidworks, Inc.
> > >> 415-589-9269
> > >> www.lucidworks.com
> > >> Twitter http://twitter.com/lucidworks
> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > >> Medium: https://medium.com/@sarkaramrit2
> > >>
> > >>
> > >> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley <
> arnoldbron...@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues.
> > But
> > >> > after setting up bidirectional cdcr configuration, I am not able to
> > >> index a
> > >> > document.
> > >> >
> > >> > Following is the error that I am getting:
> > >> >
> > >> > Async exception during distributed update: Error from server at
> > >> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> > >> > request:
> > >> > http://host1
> > >> >
> > >> >
> > >>
> >
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> > >> >
> > >>
> >
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> > >> > Remote error message: unknown UpdateRequestProcessorChain:
> > >> > cdcr-processor-chain
> > >> >
> > >> > Do you know why I might be getting this error?
> > >> >
> > >>
> > >
> >
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Nish Karve
Arnold,

Have you copied the configuration from the Solr docs? The bi directional
cluster configuration (for cluster 1) has a malformed XML. It is missing
the closing tag for the updateLogSynchronizer under the request handler
configuration.

Please disregard if you have already considered that in your configuration.
I had a lot of issues trying to figure out the issue when I realized that
it was a documentation error.

Thanks
Nishant


On Thu, Mar 14, 2019, 2:54 PM Arnold Bronley  Configuration is almost identical for both clusters in terms of cdcr except
> for zkHost parameter configuration.
>
> On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
> wrote:
>
> > Exactly. I have it defined in both clusters. I am following the
> > instructions from here .
> >
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
> >
> > On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> > wrote:
> >
> >> Hi Arnold,
> >>
> >> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> >> clusters' collections. Both clusters need to act as source and target.
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> Medium: https://medium.com/@sarkaramrit2
> >>
> >>
> >> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley  >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues.
> But
> >> > after setting up bidirectional cdcr configuration, I am not able to
> >> index a
> >> > document.
> >> >
> >> > Following is the error that I am getting:
> >> >
> >> > Async exception during distributed update: Error from server at
> >> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> >> > request:
> >> > http://host1
> >> >
> >> >
> >>
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> >> >
> >>
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> >> > Remote error message: unknown UpdateRequestProcessorChain:
> >> > cdcr-processor-chain
> >> >
> >> > Do you know why I might be getting this error?
> >> >
> >>
> >
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Configuration is almost identical for both clusters in terms of cdcr except
for zkHost parameter configuration.

On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley 
wrote:

> Exactly. I have it defined in both clusters. I am following the
> instructions from here .
> https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates
>
> On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar 
> wrote:
>
>> Hi Arnold,
>>
>> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
>> clusters' collections. Both clusters need to act as source and target.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>>
>> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
>> wrote:
>>
>> > Hi,
>> >
>> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
>> > after setting up bidirectional cdcr configuration, I am not able to
>> index a
>> > document.
>> >
>> > Following is the error that I am getting:
>> >
>> > Async exception during distributed update: Error from server at
>> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
>> > request:
>> > http://host1
>> >
>> >
>> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
>> >
>> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
>> > Remote error message: unknown UpdateRequestProcessorChain:
>> > cdcr-processor-chain
>> >
>> > Do you know why I might be getting this error?
>> >
>>
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Exactly. I have it defined in both clusters. I am following the
instructions from here .
https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates

On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar  wrote:

> Hi Arnold,
>
> You need "cdcr-processor-chain" definitions in solrconfig.xml on both
> clusters' collections. Both clusters need to act as source and target.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
>
> On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
> wrote:
>
> > Hi,
> >
> > I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
> > after setting up bidirectional cdcr configuration, I am not able to
> index a
> > document.
> >
> > Following is the error that I am getting:
> >
> > Async exception during distributed update: Error from server at
> > http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> > request:
> > http://host1
> >
> >
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> >
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> > Remote error message: unknown UpdateRequestProcessorChain:
> > cdcr-processor-chain
> >
> > Do you know why I might be getting this error?
> >
>


Re: Bidirectional CDCR not working

2019-03-14 Thread Amrit Sarkar
Hi Arnold,

You need "cdcr-processor-chain" definitions in solrconfig.xml on both
clusters' collections. Both clusters need to act as source and target.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Fri, Mar 15, 2019 at 1:03 AM Arnold Bronley 
wrote:

> Hi,
>
> I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
> after setting up bidirectional cdcr configuration, I am not able to index a
> document.
>
> Following is the error that I am getting:
>
> Async exception during distributed update: Error from server at
> http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request
> request:
> http://host1
>
> :8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
> http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
> Remote error message: unknown UpdateRequestProcessorChain:
> cdcr-processor-chain
>
> Do you know why I might be getting this error?
>


Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Hi,

I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But
after setting up bidirectional cdcr configuration, I am not able to index a
document.

Following is the error that I am getting:

Async exception during distributed update: Error from server at
http://host1:8983/solr/techproducts_shard2_replica_n6: Bad Request request:
http://host1
:8983/solr/techproducts_shard2_replica_n6/update?update.chain=cdcr-processor-chain=TOLEADER=
http://host2:8983/solr/techproducts_shard1_replica_n1=javabin=2
Remote error message: unknown UpdateRequestProcessorChain:
cdcr-processor-chain

Do you know why I might be getting this error?


Re: [CDCR]Unable to locate core

2019-02-07 Thread Tim
So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: [EXTERNAL] Re: [CDCR]Unable to locate core

2019-02-02 Thread Timothy Springsteen
Thank you for the reply. Sorry I did not include more information in the first 
post.

So maybe there's some confusion here from my end. So both the target and source 
clusters are running in cloud mode. So I think you're correct that it is a 
different issue. So it looks like the source leader to target leader is 
successful but the target leader is then unsuccessful in replicating to its 
followers.

The "unable to locate core" message is originally coming from the target 
cluster.
Here are the logs being generated from the source for reference:
2019-02-02 20:10:19.551 INFO  
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr 
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12) 
[c:testcollection s:shard3 r:core_node12 x:testcollection_shard3_replica_n10] 
o.a.s.h.CdcrReplicatorManager CDCR bootstrap successful in 3 seconds
2019-02-02 20:10:19.564 INFO  
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr 
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12) 
[c:testcollection s:shard3 r:core_node12 x:testcollection_shard3_replica_n10] 
o.a.s.h.CdcrReplicatorManager Create new update log reader for target 
testcollection with checkpoint 1624389130873995265 @ testcollection:shard3
2019-02-02 20:10:19.568 ERROR 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr 
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12) 
[c:testcollection s:shard3 r:core_node12 x:testcollection_shard3_replica_n10] 
o.a.s.h.CdcrReplicatorManager Unable to bootstrap the target collection 
testcollection shard: shard3
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://targethost001.com:30100/solr: Unable to locate core 
testcollection_shard2_replica_n4
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) 
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:58]
at 
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollower(CdcrReplicatorManager.java:439)
 ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:55]
at 
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollowers(CdcrReplicatorManager.java:428)
 ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:55]
at 
org.apache.solr.handler.CdcrReplicatorManager.access$300(CdcrReplicatorManager.java:63)
 ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:55]
at 
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable.run(CdcrReplicatorManager.java:306)
 ~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-18 13:07:55]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_192]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_192]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
 ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi 
- 2018-09-1

Re: [CDCR]Unable to locate core

2019-02-02 Thread Tim
Thank you for the reply. Sorry I did not include more information in the
first post. 

So maybe there's some confusion here from my end. So both the target and
source clusters are running in cloud mode. So I think you're correct that it
is a different issue. So it looks like the source leader to target leader is
successful but the target leader is then unsuccessful in replicating to its
followers.

The "unable to locate core" message is originally coming from the target
cluster. 
*Here are the logs being generated from the source for reference:*
2019-02-02 20:10:19.551 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager CDCR
bootstrap successful in 3 seconds
2019-02-02 20:10:19.564 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Create
new update log reader for target testcollection with checkpoint
1624389130873995265 @ testcollection:shard3
2019-02-02 20:10:19.568 ERROR
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Unable to
bootstrap the target collection testcollection shard: shard3
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://targethost001.com:30100/solr: Unable to locate core
testcollection_shard2_replica_n4
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollower(CdcrReplicatorManager.java:439)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollowers(CdcrReplicatorManager.java:428)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.access$300(CdcrReplicatorManager.java:63)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable.run(CdcrReplicatorManager.java:306)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_192]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_192]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?

Re: [CDCR]Unable to locate core

2019-02-02 Thread Erick Erickson
CDCR does _not_ replicate to followers, it is a leader<->leader replication
of the raw document.

Once the document has been forwarded to the target's leader, then the
leader on the target system should forward it to followers on that
system just like any other update.

The Solr JIRA is unlikely the problem from what you describe.

1> are you sure you are _committing_ on the target system?
2> "unable to locate core" comes from where? The source? Target?
   CDCR?
3> is your target collection properly set up? Because it sounds
   a bit like your target cluster isn't running in SolrCloud mode.

Best,
Erick

On Fri, Feb 1, 2019 at 12:48 PM Tim  wrote:
>
> After some more investigation it seems that we're running into the  same bug
> found here <https://issues.apache.org/jira/browse/SOLR-11724>  .
>
> However if my understanding is correct that bug in 7.3 was patched out.
> Unfortunately we're running into the same behavior in 7.5
>
> CDCR is replicating successfully to the leader node but is not replicating
> to the followers.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [CDCR]Unable to locate core

2019-02-01 Thread Tim
After some more investigation it seems that we're running into the  same bug
found here <https://issues.apache.org/jira/browse/SOLR-11724>  .

However if my understanding is correct that bug in 7.3 was patched out.
Unfortunately we're running into the same behavior in 7.5

CDCR is replicating successfully to the leader node but is not replicating
to the followers.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


[CDCR]Unable to locate core

2019-01-30 Thread Tim
I'm trying to setup CDCR but I'm running into an issue where one or two
shards/replicas will not be replicated but the rest will out of the six
cores.

The only error that appears in the logs is: "Unable to locate core". 

Occasionally restarting the instance will fix this but then the issue will
repeat itself next time there is an update to the source collection. But it
will not necessarily happen to the same core again.

Has anyone run into an error such as this before? 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: CDCR "all" collections

2019-01-24 Thread Erick Erickson
Bram:

Hmmm You can't do that OOB right now, but it might not be a hard thing to add.

The current configuration allows the source collection to have a
different name than the
target collection so if you could make the assumption that the two
collections always had
the same name, it might be trivial.

WARNING! this is something that just occurred to me. I have NOT
thought it through,
but if it works it'd be very cool ;)

How brave do you feel? This _might_ be totally trivial. I'm looking at
the current trunk, but
in CdcrReplicationManager, line 97 looks like this:

String targetCollection = params.get(CdcrParams.TARGET_COLLECTION_PARAM);

It _might_ (and again, I have NOT explored this in detail) be as
simple as adding
after that line:

if (targetCollection == null) {
targetCollection = params.get(CdcrParams.SOURCE_COLLECTION_PARAM);
}

or similar. Then leave

collection1

out of the solrconfig file.

While the code change is trivial, the work is in verifying that it
works and I'm afraid
I don't personally have the time to do that verification, but I'd be
glad to commit if
if someone else does and submits a patch, including at least one unit test.

The tricky parts would be insuring nothing bad happens if, for
instance, the target
collection never got created, making sure the tlogs didn't grow, that
kind of thing.

Best,
Erick

On Thu, Jan 24, 2019 at 3:51 AM Bram Van Dam  wrote:
>
> Hey folks,
>
> Is there any way to set up CDCR for *all* collections, including any
> newly created ones? Having to modify the solrconfig in ZK every time a
> collection is added is a bit of a pain, especially because I'm assuming
> it requires a restart to activate the config?
>
> Basically if I have DC Src and DC Tgt, I want every collection from Src
> to be replicated to Tgt. Even when I create a new collection on Src.
>
> Thanks,
>
>  - Bram


CDCR "all" collections

2019-01-24 Thread Bram Van Dam
Hey folks,

Is there any way to set up CDCR for *all* collections, including any
newly created ones? Having to modify the solrconfig in ZK every time a
collection is added is a bit of a pain, especially because I'm assuming
it requires a restart to activate the config?

Basically if I have DC Src and DC Tgt, I want every collection from Src
to be replicated to Tgt. Even when I create a new collection on Src.

Thanks,

 - Bram


CDCR Replication sensitive to network problems

2018-12-07 Thread Webster Homer
We are using Solr 7.2. We have two solrclouds that are hosted on Google clouds. 
These are targets for an on Prem solr cloud where we run our ETL loads  and 
have CDCR replicate it to the Google clouds. This mostly works pretty well. 
However, networks can fail. When the network has a brief outage we frequently 
then see corrupted tlog files. Frequently we see 0 length tlog files or files 
that appear to be truncated. When this happens we see lots of cdcr errors. If 
there is a corrupt tlog, we delete it and things go back to normal.
The frequency of the errors is troubling. CDCR needs to be more robust with 
networking issues. I don't know how tlogs get corrupted in this scenario, but 
they obviously do.

Today we started seeing lots of CdcrReplicator errors but could not find a 
corrupt tlog. This is a trace from the logs
java.io.EOFException
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
at 
org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:863)
at 
org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:857)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:266)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:603)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:315)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:747)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:272)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at 
org.apache.solr.update.TransactionLog$LogReader.next(TransactionLog.java:690)
at 
org.apache.solr.update.CdcrTransactionLog$CdcrLogReader.next(CdcrTransactionLog.java:304)
at 
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.next(CdcrUpdateLog.java:633)
at 
org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:77)
at 
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Our admins restarted the source solr servers and that seems to have helped.


Re: Negative CDCR Queue Size?

2018-11-09 Thread Amrit Sarkar
Hi Webster,

The queue size "*-1*" suggests the target is not initialized, and you
should see a "WARN" in the logs suggesting something bad happened at the
respective target. I am also posting the source code for reference.

Any chance you can look for WARN in the logs or probably check at
respective source and target the CDCR is configured and was running ok?
without any manual intervention?

Also, you mentioned there are a number of intermittent issues with CDCR, I
see you have reported few Jiras. I will be grateful if you can report the
rest?

Code:

> for (CdcrReplicatorState state : replicatorManager.getReplicatorStates()) {
>   NamedList queueStats = new NamedList();
>   CdcrUpdateLog.CdcrLogReader logReader = state.getLogReader();
>   if (logReader == null) {
> String collectionName = 
> req.getCore().getCoreDescriptor().getCloudDescriptor().getCollectionName();
> String shard = 
> req.getCore().getCoreDescriptor().getCloudDescriptor().getShardId();
> log.warn("The log reader for target collection {} is not initialised @ 
> {}:{}",
> state.getTargetCollection(), collectionName, shard);
> queueStats.add(CdcrParams.QUEUE_SIZE, -1l);
>   } else {
> queueStats.add(CdcrParams.QUEUE_SIZE, 
> logReader.getNumberOfRemainingRecords());
>   }
>   queueStats.add(CdcrParams.LAST_TIMESTAMP, 
> state.getTimestampOfLastProcessedOperation());
>   if (hosts.get(state.getZkHost()) == null) {
> hosts.add(state.getZkHost(), new NamedList());
>   }
>   ((NamedList) hosts.get(state.getZkHost())).add(state.getTargetCollection(), 
> queueStats);
> }
> rsp.add(CdcrParams.QUEUES, hosts);
>
>
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Wed, Nov 7, 2018 at 12:47 AM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> I'm sorry I should have included that. We are running Solr 7.2. We use
> CDCR for almost all of our collections. We have experienced several
> intermittent problems with CDCR, this one seems to be new, at least I
> hadn't seen it before
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, November 06, 2018 12:36 PM
> To: solr-user 
> Subject: Re: Negative CDCR Queue Size?
>
> What version of Solr? CDCR has changed quite a bit in the 7x  code line so
> it's important to know the version.
>
> On Tue, Nov 6, 2018 at 10:32 AM Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
> >
> > Several times I have noticed that the CDCR action=QUEUES will return a
> negative queueSize. When this happens we seem to be missing data in the
> target collection. How can this happen? What does a negative Queue size
> mean? The timestamp is an empty string.
> >
> > We have two targets for a source. One looks like this, with a negative
> > queue size
> > queues":
> > ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco
> > m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize
> > ",-1,"lastTimestamp",""]],
> >
> > The other is healthy
> > "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom
> > -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize"
> > ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
> >
> > We are not seeing CDCR errors.
> >
> > What could cause this behavior?
>


RE: Negative CDCR Queue Size?

2018-11-06 Thread Webster Homer
I'm sorry I should have included that. We are running Solr 7.2. We use CDCR for 
almost all of our collections. We have experienced several intermittent 
problems with CDCR, this one seems to be new, at least I hadn't seen it before

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 06, 2018 12:36 PM
To: solr-user 
Subject: Re: Negative CDCR Queue Size?

What version of Solr? CDCR has changed quite a bit in the 7x  code line so it's 
important to know the version.

On Tue, Nov 6, 2018 at 10:32 AM Webster Homer 
 wrote:
>
> Several times I have noticed that the CDCR action=QUEUES will return a 
> negative queueSize. When this happens we seem to be missing data in the 
> target collection. How can this happen? What does a negative Queue size mean? 
> The timestamp is an empty string.
>
> We have two targets for a source. One looks like this, with a negative 
> queue size
> queues": 
> ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco
> m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize
> ",-1,"lastTimestamp",""]],
>
> The other is healthy
> "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom
> -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize"
> ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
>
> We are not seeing CDCR errors.
>
> What could cause this behavior?


Re: Negative CDCR Queue Size?

2018-11-06 Thread Erick Erickson
What version of Solr? CDCR has changed quite a bit in the 7x  code
line so it's important to know the version.

On Tue, Nov 6, 2018 at 10:32 AM Webster Homer
 wrote:
>
> Several times I have noticed that the CDCR action=QUEUES will return a 
> negative queueSize. When this happens we seem to be missing data in the 
> target collection. How can this happen? What does a negative Queue size mean? 
> The timestamp is an empty string.
>
> We have two targets for a source. One looks like this, with a negative queue 
> size
> queues": 
> ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",-1,"lastTimestamp",""]],
>
> The other is healthy
> "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
>
> We are not seeing CDCR errors.
>
> What could cause this behavior?


Negative CDCR Queue Size?

2018-11-06 Thread Webster Homer
Several times I have noticed that the CDCR action=QUEUES will return a negative 
queueSize. When this happens we seem to be missing data in the target 
collection. How can this happen? What does a negative Queue size mean? The 
timestamp is an empty string.

We have two targets for a source. One looks like this, with a negative queue 
size
queues": 
["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",-1,"lastTimestamp",""]],

The other is healthy
"ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]

We are not seeing CDCR errors.

What could cause this behavior?


Re: SolrCloud CDCR with 3+ DCs

2018-09-07 Thread Amrit Sarkar
Yeah, I am not sure about how the Authentication band aid feature will
work, the mentioned stackoverflow link. It is about time we include basic
authentication support in CDCR.

On Thu, 6 Sep 2018, 8:41 pm cdatta,  wrote:

> Hi Amrit, Thanks for your response.
>
> We wiped out our complete installation and started a fresh one. Now the
> multi-direction replication is working but we are seeing errors related to
> the authentication sporadically.
>
> Thanks & Regards,
> Chandi Datta
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Solr CDCR replication not working

2018-09-07 Thread Amrit Sarkar
Basic Authentication in clusters is not supported as of today in CDCR.

On Fri, 7 Sep 2018, 4:53 pm Mrityunjaya Pathak, 
wrote:

> I have setup two solr cloud instances in two different Datacenters Target
> solr cloud machine is copy of source machine with basicAuth enabled on
> them. I am unable to see any replication on target.
>
> Solr Version :6.6.3
>
> I have done config changes as suggested on
> https://lucene.apache.org/solr/guide/6_6/cross-data-center-replication-cdcr.html
>
> Source Config Changes
>
> 
> 
> ...
> 
>   
> serverIP:2181,serverIP:2182,serverIP:2183
> sitecore_master_index
> sitecore_master_index
>   
>
>   
> 8
> 1000
> 128
>   
>
>   
> 1000
>   
> 
>   
>
> 
>   ${solr.ulog.dir:}
>name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
> 
>
>
> 
>   ${solr.autoCommit.maxTime:15000}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>   
>
>   ...
>   
>
> Target Config Changes
>
> 
> 
> ...
> 
>   
> disabled
>   
> 
> 
>   
>   
> 
> 
>   
> cdcr-proc-chain
>   
> 
>  
>
> 
>   ${solr.ulog.dir:}
>name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}
> 
>
> 
>   ${solr.autoCommit.maxTime:15000}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>
>   
>
>   ...
>   
>
> Below are logs from Source target.
>
> ERROR (zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [   ]
> o.a.s.c.s.i.CloudSolrClient Request to collection collection1 failed due to
> (510) org.apache.solr.common.SolrException: Could not find a healthy node
> to handle the request., retry? 5
> 2018-09-07 10:36:14.295 WARN
> (zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [   ]
> o.a.s.h.CdcrReplicatorManager Unable to instantiate the log reader for
> target collection collection1
> org.apache.solr.common.SolrException: Could not find a healthy node to
> handle the request.
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1377)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1134)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1073)
> at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
> at
> org.apache.solr.handler.CdcrReplicatorManager.getCheckpoint(CdcrReplicatorManager.java:196)
> at
> org.apache.solr.handler.CdcrReplicatorManager.initLogReaders(CdcrReplicatorManager.java:159)
> at
> org.apache.solr.handler.CdcrReplicatorManager.stateUpdate(CdcrReplicatorManager.java:134)
> at
> org.apache.solr.handler.CdcrStateManager.callback(CdcrStateManager.java:36)
> at
> org.apache.solr.handler.CdcrLeaderStateManager.setAmILeader(CdcrLeaderStateManager.java:108)
> at
> org.apache.solr.handler.CdcrLeaderStateManager.checkIfIAmLeader(CdcrLeaderStateManager.java:95)
> at
> org.apache.solr.handler.CdcrLeaderStateManager.access$400(CdcrLeaderStateManager.java:40)
> at
> org.apache.solr.handler.CdcrLeaderStateManager$LeaderStateWatcher.process(CdcrLeaderStateManager.java:150)
> at
> org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:269)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-09-07 10:36:14.310 INFO
> (coreLoadExecutor-8-thre

Solr CDCR replication not working

2018-09-07 Thread Mrityunjaya Pathak
I have setup two solr cloud instances in two different Datacenters Target solr 
cloud machine is copy of source machine with basicAuth enabled on them. I am 
unable to see any replication on target.

Solr Version :6.6.3

I have done config changes as suggested on 
https://lucene.apache.org/solr/guide/6_6/cross-data-center-replication-cdcr.html

Source Config Changes



...

  
serverIP:2181,serverIP:2182,serverIP:2183
sitecore_master_index
sitecore_master_index
  

  
8
1000
128
  

  
1000
  

  


  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}




  ${solr.autoCommit.maxTime:15000}
  false



  ${solr.autoSoftCommit.maxTime:-1}

  

  ...
  

Target Config Changes



...

  
disabled
  


  
  


  
cdcr-proc-chain
  

 


  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}



  ${solr.autoCommit.maxTime:15000}
  false



  ${solr.autoSoftCommit.maxTime:-1}


  

  ...
  

Below are logs from Source target.

ERROR (zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [   ] 
o.a.s.c.s.i.CloudSolrClient Request to collection collection1 failed due to 
(510) org.apache.solr.common.SolrException: Could not find a healthy node to 
handle the request., retry? 5
2018-09-07 10:36:14.295 WARN  
(zkCallback-4-thread-2-processing-n:sourceIP:8983_solr) [   ] 
o.a.s.h.CdcrReplicatorManager Unable to instantiate the log reader for target 
collection collection1
org.apache.solr.common.SolrException: Could not find a healthy node to handle 
the request.
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1377)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1134)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1237)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1073)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.handler.CdcrReplicatorManager.getCheckpoint(CdcrReplicatorManager.java:196)
at 
org.apache.solr.handler.CdcrReplicatorManager.initLogReaders(CdcrReplicatorManager.java:159)
at 
org.apache.solr.handler.CdcrReplicatorManager.stateUpdate(CdcrReplicatorManager.java:134)
at 
org.apache.solr.handler.CdcrStateManager.callback(CdcrStateManager.java:36)
at 
org.apache.solr.handler.CdcrLeaderStateManager.setAmILeader(CdcrLeaderStateManager.java:108)
at 
org.apache.solr.handler.CdcrLeaderStateManager.checkIfIAmLeader(CdcrLeaderStateManager.java:95)
at 
org.apache.solr.handler.CdcrLeaderStateManager.access$400(CdcrLeaderStateManager.java:40)
at 
org.apache.solr.handler.CdcrLeaderStateManager$LeaderStateWatcher.process(CdcrLeaderStateManager.java:150)
at 
org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:269)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-07 10:36:14.310 INFO  
(coreLoadExecutor-8-thread-3-processing-n:sourceIP:8983_solr) [   ] 
o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.6.3
2018-09-07 10:36:14.315 INFO  
(zkCallback-4-thread-1-processing-n:sourceIP:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent 
state:SyncConnected type:NodeDataChanged 
path:/collections/collection1/state.json] for collection [sitecore] has 
occurred - updating... (live nodes size: [1])
2018-09-07 10:36:14.343 WARN  
(cdcr-replicator-211-thread-1-processing-n:sourceIP:8983_solr) [   ] 
o.a.s.h.CdcrReplicator Log reader for target collection1 is not initialised, it 
will be ignored.

I am unable to see anything on target. It will be great if someone can help me 
in it.
Can you help in this?
Regards
Mrityunjaya


  1   2   3   4   >