[ 
https://issues.apache.org/jira/browse/SOLR-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-8709:
---------------------------------
    Description: 
Currently the TopicStream can miss documents if version numbers are received 
out-of-order. The TopicStream sorts on version number so it will only miss 
out-of-order versions that span commit boundaries. *Stress testing was not able 
to create a missed document scenario* (see comment below), but code review 
points to the possibility of this happening.

In order to resolve this issue we can adopt an approach that keeps a checksum 
of the version numbers for a sliding time window. This checksum can be checked 
each run and if the checksums don't match the documents from the time window 
can be resent. As long as the time window is longer then the softCommit 
interval, this will guarantee delivery of all documents for the Topic. This 
won't guarantee *one time delivery* but should be provide a reasonable 
expectation of one time delivery.

  was:
Currently the TopicStream can miss documents if version numbers are received 
out-of-order. The TopicStream sorts on version number so it will only miss 
out-of-order versions that span commit boundaries. *Stress testing was not able 
create a missed document scenario* (see comment below), but code review points 
to the possibility of this happening.

In order to resolve this issue we can adopt an approach that keeps a checksum 
of the version numbers for a sliding time window. This checksum can be checked 
each run and if the checksums don't match the documents from the time window 
can be resent. As long as the time window is longer then the softCommit 
interval, this will guarantee delivery of all documents for the Topic. This 
won't guarantee *one time delivery* but should be provide a reasonable 
expectation of one time delivery.


> Add checksum to the TopicStream to ensure delivery of all documents within a 
> Topic
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-8709
>                 URL: https://issues.apache.org/jira/browse/SOLR-8709
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>
> Currently the TopicStream can miss documents if version numbers are received 
> out-of-order. The TopicStream sorts on version number so it will only miss 
> out-of-order versions that span commit boundaries. *Stress testing was not 
> able to create a missed document scenario* (see comment below), but code 
> review points to the possibility of this happening.
> In order to resolve this issue we can adopt an approach that keeps a checksum 
> of the version numbers for a sliding time window. This checksum can be 
> checked each run and if the checksums don't match the documents from the time 
> window can be resent. As long as the time window is longer then the 
> softCommit interval, this will guarantee delivery of all documents for the 
> Topic. This won't guarantee *one time delivery* but should be provide a 
> reasonable expectation of one time delivery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to