Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Thanks Amrith. Created a bug
https://issues.apache.org/jira/browse/SOLR-13481

Regards,
Rajeswari

On 5/19/19, 3:44 PM, "Amrit Sarkar"  wrote:

Sounds legit to me.

Can you create a Jira and list down the problem statement and design
solution there. I am confident it will attract committers' attention and
they can review the design and provide feedback.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thanks Amrith for creating a patch. But the code in the
> LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
> intended.
> Regards
> Rajeswari
>
> public Rsp request(Req req) throws SolrServerException, IOException {
> Rsp rsp = new Rsp();
> Exception ex = null;
> boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
> ADMIN_PATHS.contains(req.request.getPath());
> List skipped = null;
>
> final Integer numServersToTry = req.getNumServersToTry();
> int numServersTried = 0;
>
> boolean timeAllowedExceeded = false;
> long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
> long timeOutTime = System.nanoTime() + timeAllowedNano;
> for (String serverStr : req.getServers()) {
>   if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
> break;
>   }
>
>   serverStr = normalize(serverStr);
>   // if the server is currently a zombie, just skip to the next one
>   ServerWrapper wrapper = zombieServers.get(serverStr);
>   if (wrapper != null) {
> // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
> final int numDeadServersToTry = req.getNumDeadServersToTry();
> if (numDeadServersToTry > 0) {
>   if (skipped == null) {
> skipped = new ArrayList<>(numDeadServersToTry);
> skipped.add(wrapper);
>   }
>   else if (skipped.size() < numDeadServersToTry) {
> skipped.add(wrapper);
>   }
> }
> continue;
>   }
>   try {
> MDC.put("LBHttpSolrClient.url", serverStr);
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> HttpSolrClient client = makeSolrClient(serverStr);
>
> ++numServersTried;
> ex = doRequest(client, req, rsp, isNonRetryable, false, null);
> if (ex == null) {
>   return rsp; // SUCCESS
> }
>   } finally {
> MDC.remove("LBHttpSolrClient.url");
>   }
> }
>
> // try the servers we previously skipped
> if (skipped != null) {
>   for (ServerWrapper wrapper : skipped) {
> if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>   break;
> }
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> try {
>   MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
>   ++numServersTried;
>   ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
> wrapper.getKey());
>   if (ex == null) {
> return rsp; // SUCCESS
>   }
> } finally {
>   MDC.remove("LBHttpSolrClient.url");
> }
>   }
> }
>
>
> final String solrServerExceptionMessage;
> if (timeAllowedExceeded) {
>   solrServerExceptionMessage = "Time allowed to handle this request
> exceeded";
> } else {
>   if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request:"
> + " numServersTried="+numServersTried
> + " numServersToTry="+numServersToTry.intValue();
>   } else {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request";
>   }
> }
> if (ex == null) {
>   throw new SolrServerException(solrServerExceptionMessage);
> } else {
>   throw new SolrServerException(solrServerExceptionMessage+":" +
> zombieServers.keySet(), ex);
> }
>
>   }
>
> On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:
>
> >
> > Thanks Natrajan,
  

Re: [CDCR]Unable to locate core

2019-05-19 Thread Amrit Sarkar
Sounds legit to me.

Can you create a Jira and list down the problem statement and design
solution there. I am confident it will attract committers' attention and
they can review the design and provide feedback.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Thanks Amrith for creating a patch. But the code in the
> LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
> intended.
> Regards
> Rajeswari
>
> public Rsp request(Req req) throws SolrServerException, IOException {
> Rsp rsp = new Rsp();
> Exception ex = null;
> boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
> ADMIN_PATHS.contains(req.request.getPath());
> List skipped = null;
>
> final Integer numServersToTry = req.getNumServersToTry();
> int numServersTried = 0;
>
> boolean timeAllowedExceeded = false;
> long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
> long timeOutTime = System.nanoTime() + timeAllowedNano;
> for (String serverStr : req.getServers()) {
>   if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
> break;
>   }
>
>   serverStr = normalize(serverStr);
>   // if the server is currently a zombie, just skip to the next one
>   ServerWrapper wrapper = zombieServers.get(serverStr);
>   if (wrapper != null) {
> // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
> final int numDeadServersToTry = req.getNumDeadServersToTry();
> if (numDeadServersToTry > 0) {
>   if (skipped == null) {
> skipped = new ArrayList<>(numDeadServersToTry);
> skipped.add(wrapper);
>   }
>   else if (skipped.size() < numDeadServersToTry) {
> skipped.add(wrapper);
>   }
> }
> continue;
>   }
>   try {
> MDC.put("LBHttpSolrClient.url", serverStr);
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> HttpSolrClient client = makeSolrClient(serverStr);
>
> ++numServersTried;
> ex = doRequest(client, req, rsp, isNonRetryable, false, null);
> if (ex == null) {
>   return rsp; // SUCCESS
> }
>   } finally {
> MDC.remove("LBHttpSolrClient.url");
>   }
> }
>
> // try the servers we previously skipped
> if (skipped != null) {
>   for (ServerWrapper wrapper : skipped) {
> if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>   break;
> }
>
> if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>   break;
> }
>
> try {
>   MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
>   ++numServersTried;
>   ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
> wrapper.getKey());
>   if (ex == null) {
> return rsp; // SUCCESS
>   }
> } finally {
>   MDC.remove("LBHttpSolrClient.url");
> }
>   }
> }
>
>
> final String solrServerExceptionMessage;
> if (timeAllowedExceeded) {
>   solrServerExceptionMessage = "Time allowed to handle this request
> exceeded";
> } else {
>   if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request:"
> + " numServersTried="+numServersTried
> + " numServersToTry="+numServersToTry.intValue();
>   } else {
> solrServerExceptionMessage = "No live SolrServers available to
> handle this request";
>   }
> }
> if (ex == null) {
>   throw new SolrServerException(solrServerExceptionMessage);
> } else {
>   throw new SolrServerException(solrServerExceptionMessage+":" +
> zombieServers.keySet(), ex);
> }
>
>   }
>
> On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:
>
> >
> > Thanks Natrajan,
> >
> > Solid analysis and I saw the issue being reported by multiple users
> in
> > past few months and unfortunately I baked an incomplete code.
> >
> > I think the correct way of solving this issue is to identify the
> correct
> > base-url for the respective core we need to trigger REQUESTRECOVERY
> to and
> > create a local HttpSolrClient instead of using CloudSolrClient from
> > CdcrReplicatorState. This will avoid unnecessary retry which will be
> > redundant in our case.
> >
> > I baked a small patch few weeks back and will upload it on the
> SOLR-11724
> > .
> >
>
>
>


Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Thanks Amrith for creating a patch. But the code in the LBHttpSolrClient.java 
needs to be fixed too, if the for loop  to work as intended.
Regards
Rajeswari

public Rsp request(Req req) throws SolrServerException, IOException {
Rsp rsp = new Rsp();
Exception ex = null;
boolean isNonRetryable = req.request instanceof IsUpdateRequest || 
ADMIN_PATHS.contains(req.request.getPath());
List skipped = null;

final Integer numServersToTry = req.getNumServersToTry();
int numServersTried = 0;

boolean timeAllowedExceeded = false;
long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
long timeOutTime = System.nanoTime() + timeAllowedNano;
for (String serverStr : req.getServers()) {
  if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
break;
  }
  
  serverStr = normalize(serverStr);
  // if the server is currently a zombie, just skip to the next one
  ServerWrapper wrapper = zombieServers.get(serverStr);
  if (wrapper != null) {
// System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
final int numDeadServersToTry = req.getNumDeadServersToTry();
if (numDeadServersToTry > 0) {
  if (skipped == null) {
skipped = new ArrayList<>(numDeadServersToTry);
skipped.add(wrapper);
  }
  else if (skipped.size() < numDeadServersToTry) {
skipped.add(wrapper);
  }
}
continue;
  }
  try {
MDC.put("LBHttpSolrClient.url", serverStr);

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

HttpSolrClient client = makeSolrClient(serverStr);

++numServersTried;
ex = doRequest(client, req, rsp, isNonRetryable, false, null);
if (ex == null) {
  return rsp; // SUCCESS
}
  } finally {
MDC.remove("LBHttpSolrClient.url");
  }
}

// try the servers we previously skipped
if (skipped != null) {
  for (ServerWrapper wrapper : skipped) {
if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) 
{
  break;
}

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

try {
  MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
  ++numServersTried;
  ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, 
wrapper.getKey());
  if (ex == null) {
return rsp; // SUCCESS
  }
} finally {
  MDC.remove("LBHttpSolrClient.url");
}
  }
}


final String solrServerExceptionMessage;
if (timeAllowedExceeded) {
  solrServerExceptionMessage = "Time allowed to handle this request 
exceeded";
} else {
  if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request:"
+ " numServersTried="+numServersTried
+ " numServersToTry="+numServersToTry.intValue();
  } else {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request";
  }
}
if (ex == null) {
  throw new SolrServerException(solrServerExceptionMessage);
} else {
  throw new SolrServerException(solrServerExceptionMessage+":" + 
zombieServers.keySet(), ex);
}

  }

On 5/19/19, 3:12 PM, "Amrit Sarkar"  wrote:

>
> Thanks Natrajan,
>
> Solid analysis and I saw the issue being reported by multiple users in
> past few months and unfortunately I baked an incomplete code.
>
> I think the correct way of solving this issue is to identify the correct
> base-url for the respective core we need to trigger REQUESTRECOVERY to and
> create a local HttpSolrClient instead of using CloudSolrClient from
> CdcrReplicatorState. This will avoid unnecessary retry which will be
> redundant in our case.
>
> I baked a small patch few weeks back and will upload it on the SOLR-11724
> .
>




[CDCR]Unable to locate core

2019-05-19 Thread Amrit Sarkar
>
> Thanks Natrajan,
>
> Solid analysis and I saw the issue being reported by multiple users in
> past few months and unfortunately I baked an incomplete code.
>
> I think the correct way of solving this issue is to identify the correct
> base-url for the respective core we need to trigger REQUESTRECOVERY to and
> create a local HttpSolrClient instead of using CloudSolrClient from
> CdcrReplicatorState. This will avoid unnecessary retry which will be
> redundant in our case.
>
> I baked a small patch few weeks back and will upload it on the SOLR-11724
> .
>


Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Here is my close analysis:


SolrClient request goes to the below method  "request " in the class 
LBHttpSolrClient.java
There is a for loop to try  different live servers , but when  doRequest method 
 (in the request method below) sends exception there is no catch , so next 
re-try is not done. To solve this issue , there should be catch around 
doRequest and then the second time it will re-try the correct request. But in 
case there are multiple live servers, the request might timeout also.  This 
needs to be fixed to make CDCR bootstrap  work reliable. If not sometimes it 
will work good and sometimes not. I can work on this patch  if this is agreed.


public Rsp request(Req req) throws SolrServerException, IOException {
Rsp rsp = new Rsp();
Exception ex = null;
boolean isNonRetryable = req.request instanceof IsUpdateRequest || 
ADMIN_PATHS.contains(req.request.getPath());
List skipped = null;

final Integer numServersToTry = req.getNumServersToTry();
int numServersTried = 0;

boolean timeAllowedExceeded = false;
long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
long timeOutTime = System.nanoTime() + timeAllowedNano;
for (String serverStr : req.getServers()) {
  if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
break;
  }
  
  serverStr = normalize(serverStr);
  // if the server is currently a zombie, just skip to the next one
  ServerWrapper wrapper = zombieServers.get(serverStr);
  if (wrapper != null) {
// System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
final int numDeadServersToTry = req.getNumDeadServersToTry();
if (numDeadServersToTry > 0) {
  if (skipped == null) {
skipped = new ArrayList<>(numDeadServersToTry);
skipped.add(wrapper);
  }
  else if (skipped.size() < numDeadServersToTry) {
skipped.add(wrapper);
  }
}
continue;
  }
  try {
MDC.put("LBHttpSolrClient.url", serverStr);

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
} 

HttpSolrClient client = makeSolrClient(serverStr);

++numServersTried;
ex = doRequest(client, req, rsp, isNonRetryable, false, null);
if (ex == null) {
  return rsp; // SUCCESS
}
   //NO CATCH HERE ,  SO IT FAILS
  } finally {
MDC.remove("LBHttpSolrClient.url");
  }
}

// try the servers we previously skipped
if (skipped != null) {
  for (ServerWrapper wrapper : skipped) {
if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) 
{
  break;
}

if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
  break;
}

try {
  MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
  ++numServersTried;
  ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, 
wrapper.getKey());
  if (ex == null) {
return rsp; // SUCCESS
  }
} finally {
  MDC.remove("LBHttpSolrClient.url");
}
  }
}


final String solrServerExceptionMessage;
if (timeAllowedExceeded) {
  solrServerExceptionMessage = "Time allowed to handle this request 
exceeded";
} else {
  if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request:"
+ " numServersTried="+numServersTried
+ " numServersToTry="+numServersToTry.intValue();
  } else {
solrServerExceptionMessage = "No live SolrServers available to handle 
this request";
  }
}
if (ex == null) {
  throw new SolrServerException(solrServerExceptionMessage);
} else {
  throw new SolrServerException(solrServerExceptionMessage+":" + 
zombieServers.keySet(), ex);
}

  }


Thanks,
Rajeswari


On 5/19/19, 9:39 AM, "Natarajan, Rajeswari"  
wrote:

Hi

We are using solr 7.6 and trying out bidirectional CDCR and I also hit this 
issue. 

Stacktrace

INFO  (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager CDCR bootstrap successful in 3 seconds
   
INFO  (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager Create new update log reader for target abcd_ta 
with checkpoint -1 @ abcd_ta:shard1
ERROR (cdcr-bootstrap-status-17-thread-1) [   ] 
o.a.s.h.CdcrReplicatorManager Unable to bootstrap the target collection abcd_ta 
shard: shard1 
olrj.impl.HttpSolrClient$RemoteSolrException: Error from server at 
http://10.169.50.182:8983/solr: Unable to locate core 

Re: [CDCR]Unable to locate core

2019-05-19 Thread Natarajan, Rajeswari
Hi

We are using solr 7.6 and trying out bidirectional CDCR and I also hit this 
issue. 

Stacktrace

INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
CDCR bootstrap successful in 3 seconds  
 
INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
Create new update log reader for target abcd_ta with checkpoint -1 @ 
abcd_ta:shard1
ERROR (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager 
Unable to bootstrap the target collection abcd_ta shard: shard1 

olrj.impl.HttpSolrClient$RemoteSolrException: Error from server at 
http://10.169.50.182:8983/solr: Unable to locate core 
kanna_ta_shard1_replica_n1
lr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53] 
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107) 
~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
 ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize 
- 2018-12-07 14:47:53]


I stepped through the code

private NamedList sendRequestRecoveryToFollower(SolrClient client, String 
coreName) throws SolrServerException, IOException {
CoreAdminRequest.RequestRecovery recoverRequestCmd = new 
CoreAdminRequest.RequestRecovery();

recoverRequestCmd.setAction(CoreAdminParams.CoreAdminAction.REQUESTRECOVERY);
recoverRequestCmd.setCoreName(coreName);
return client.request(recoverRequestCmd);
  }

 In the above method , recovery request command is admin command and it is 
specific to a core. In the  solrclient.request logic the code gets the 
liveservers and execute the command in a loop ,but  since this is admin command 
this is non re-triable.  Depending on which live server the code gets and where 
does the core lies , the recover request command might be successful or 
failure.  So I think there is problem with this code in trying to send the core 
command to all available live servers , the code I guess should find the 
correct server on which the core lies and send this request.

Regards,
Rajeswari

On 5/15/19, 10:59 AM, "Natarajan, Rajeswari"  
wrote:

I am also facing this issue. Any resolution found on this issue, Please 
update. Thanks

On 2/7/19, 10:42 AM, "Tim"  wrote:

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion 
(implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it 
sends
the recovery command to the node which has the leader for a given 
replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed 
that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on 
node1,
while the follower s3r8 is on node2, then the core recovery command 
meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html






Re: [CDCR]Unable to locate core

2019-05-15 Thread Natarajan, Rajeswari
I am also facing this issue. Any resolution found on this issue, Please update. 
Thanks

On 2/7/19, 10:42 AM, "Tim"  wrote:

So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: [CDCR]Unable to locate core

2019-02-07 Thread Tim
So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: [EXTERNAL] Re: [CDCR]Unable to locate core

2019-02-02 Thread Timothy Springsteen
--Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, February 2, 2019 9:56 AM
To: solr-user 
Subject: [EXTERNAL] Re: [CDCR]Unable to locate core

CDCR does _not_ replicate to followers, it is a leader<->leader replication of 
the raw document.

Once the document has been forwarded to the target's leader, then the leader on 
the target system should forward it to followers on that system just like any 
other update.

The Solr JIRA is unlikely the problem from what you describe.

1> are you sure you are _committing_ on the target system?
2> "unable to locate core" comes from where? The source? Target?
   CDCR?
3> is your target collection properly set up? Because it sounds
   a bit like your target cluster isn't running in SolrCloud mode.

Best,
Erick

On Fri, Feb 1, 2019 at 12:48 PM Tim  wrote:
>
> After some more investigation it seems that we're running into the
> same bug found here <https://issues.apache.org/jira/browse/SOLR-11724>  .
>
> However if my understanding is correct that bug in 7.3 was patched out.
> Unfortunately we're running into the same behavior in 7.5
>
> CDCR is replicating successfully to the leader node but is not
> replicating to the followers.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.


Re: [CDCR]Unable to locate core

2019-02-02 Thread Tim
Thank you for the reply. Sorry I did not include more information in the
first post. 

So maybe there's some confusion here from my end. So both the target and
source clusters are running in cloud mode. So I think you're correct that it
is a different issue. So it looks like the source leader to target leader is
successful but the target leader is then unsuccessful in replicating to its
followers.

The "unable to locate core" message is originally coming from the target
cluster. 
*Here are the logs being generated from the source for reference:*
2019-02-02 20:10:19.551 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager CDCR
bootstrap successful in 3 seconds
2019-02-02 20:10:19.564 INFO 
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Create
new update log reader for target testcollection with checkpoint
1624389130873995265 @ testcollection:shard3
2019-02-02 20:10:19.568 ERROR
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Unable to
bootstrap the target collection testcollection shard: shard3
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://targethost001.com:30100/solr: Unable to locate core
testcollection_shard2_replica_n4
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollower(CdcrReplicatorManager.java:439)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollowers(CdcrReplicatorManager.java:428)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager.access$300(CdcrReplicatorManager.java:63)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable.run(CdcrReplicatorManager.java:306)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_192]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_192]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_192]
  

Re: [CDCR]Unable to locate core

2019-02-02 Thread Erick Erickson
CDCR does _not_ replicate to followers, it is a leader<->leader replication
of the raw document.

Once the document has been forwarded to the target's leader, then the
leader on the target system should forward it to followers on that
system just like any other update.

The Solr JIRA is unlikely the problem from what you describe.

1> are you sure you are _committing_ on the target system?
2> "unable to locate core" comes from where? The source? Target?
   CDCR?
3> is your target collection properly set up? Because it sounds
   a bit like your target cluster isn't running in SolrCloud mode.

Best,
Erick

On Fri, Feb 1, 2019 at 12:48 PM Tim  wrote:
>
> After some more investigation it seems that we're running into the  same bug
> found here   .
>
> However if my understanding is correct that bug in 7.3 was patched out.
> Unfortunately we're running into the same behavior in 7.5
>
> CDCR is replicating successfully to the leader node but is not replicating
> to the followers.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [CDCR]Unable to locate core

2019-02-01 Thread Tim
After some more investigation it seems that we're running into the  same bug
found here   .

However if my understanding is correct that bug in 7.3 was patched out.
Unfortunately we're running into the same behavior in 7.5

CDCR is replicating successfully to the leader node but is not replicating
to the followers.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


[CDCR]Unable to locate core

2019-01-30 Thread Tim
I'm trying to setup CDCR but I'm running into an issue where one or two
shards/replicas will not be replicated but the rest will out of the six
cores.

The only error that appears in the logs is: "Unable to locate core". 

Occasionally restarting the instance will fix this but then the issue will
repeat itself next time there is an update to the source collection. But it
will not necessarily happen to the same core again.

Has anyone run into an error such as this before? 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html