Re: Replication issue with version 0 index in SOLR 7.5

2019-06-26 Thread Patrick Bordelon
One other question related to this.

I know the change was made for a specific problem that was occurring but has
this caused a similar problem as mine with anyone else?

We're looking to try changing the second 'if' statement to add an extra
conditional to prevent it from performing the "deleteAll" operation unless
absolutely specified.

The idea is to use the skipCommitOnMasterVersionZero and set it so that the
if statement will never be true on a new generation index on the primary.

We're going to try some modifications on our polling strategy as a temporary
solution while we test out changing that section of the index fetcher.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Patrick Bordelon
I removed the replicate after startup from our solrconfig.xml file. However
that didn't solve the issue. When I rebuilt the primary, the associated
replicas all went to 0 documents. 





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Patrick Bordelon
We are currently using the replicate after commit and startup


${replication.enable.master:false}
commit
startup
schema.xml,stopwords.txt




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Replication issue with version 0 index in SOLR 7.5

2019-06-24 Thread Patrick Bordelon
Hi,

We recently upgraded to SOLR 7.5 in AWS, we had previously been running SOLR
6.5. In our current configuration we have our applications broken into a
single instance primary environment and a multi-instance replica environment
separated behind a load balancer for each environment. 

Until recently we've been able to reload the primary without the replicas
updating until there was a full index. However when we upgraded to 7.5 we
started noticing that after terminating and rebuilding a primary instance
that the associated replicas would all start showing 0 documents in all
indexes. After some research we believe we've tracked down the issue.
SOLR-11293.

SOLR-11293 changes

  

This fix changed the way the replication handler checks before updating a
replica when the primary has an empty index. Whether it's from deleting the
old index or from terminating the instance. 

This is the code as it was in 6.5 replication handler

  if (latestVersion == 0L) {
if (forceReplication && commit.getGeneration() != 0) {
  // since we won't get the files for an empty index,
  // we just clear ours and commit
  RefCounted iw =
solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
  try {
iw.get().deleteAll();
  } finally {
iw.decref();
  }
  SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
ModifiableSolrParams());
  solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
false));
}


Without forced replication the index on the replica won't perform the
deletaAll operation and will keep the old index until a new index version is
created.

However in 7.5 the code was changed to this.

  if (latestVersion == 0L) {
if (commit.getGeneration() != 0) {
  // since we won't get the files for an empty index,
  // we just clear ours and commit
  log.info("New index in Master. Deleting mine...");
  RefCounted iw =
solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
  try {
iw.get().deleteAll();
  } finally {
iw.decref();
  }
  assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
  if (skipCommitOnMasterVersionZero) {
openNewSearcherAndUpdateCommitPoint();
  } else {
SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
ModifiableSolrParams());
solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
false));
  }
}

With the removal of the forceReplication check we believe the replica always
deletes it's index when it detects that a new version 0 index is created. 

This is a problem as we can't afford to have active replicas to have 0
documents on them in the event of a failure of the primary. Since we can't
control the termination on AWS instances this opens up a problem as any
primary outage has a chance of jeopardizing the replicas viability. 

Is there a way to restore this functionality in the current or future
releases? We are willing to upgrade to a later version including the latest
if it will help resolve this problem.

If you suggest we use a load balancer health check to prevent this we
already are. However the load balancer type we are using (application) has a
feature that allows access through it when all instances under it are
failing. This bypasses our health check and still allows the replicas to
poll from the primary even when it's not fully loaded. We can't change load
balancer types as there are other features that we are taking advantage of
and can't change currently.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html