[ 
https://issues.apache.org/jira/browse/SOLR-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212621#comment-17212621
 ] 

Evgeny Ivanskiy commented on SOLR-9207:
---------------------------------------

Hi [~praste], [~shalin].

We are seeing an intermittent issue where some of our collections fail to elect 
a leader after a restart. The log shows that hosts are failing to become leader 
due to a sync failure.
And that the sync failure is due to not receiving the expected number of 
updates 
Investigation shows that there are duplicate versions in tlogs.
So, in this case: 
If we get versions: *1,1,2,2,3,3* we than request the updates in range *1...3*. 
As result we get *3 updates* but *totalRequestedUpdates is 6* and sync failed.
Is there is an assumption that getVersions should return distinct values or 
that is the bug in PeerSync.handleVersionsWithRanges which does't take into 
account duplicate versions?

 

 

> PeerSync recovery fails if number of updates requested is high
> --------------------------------------------------------------
>
>                 Key: SOLR-9207
>                 URL: https://issues.apache.org/jira/browse/SOLR-9207
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.1, 6.0
>            Reporter: Pushkar Raste
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 6.2, 7.0
>
>         Attachments: SOLR-9207.patch, SOLR-9207.patch, SOLR-9207.patch_updated
>
>
> {{PeerSync}} recovery fails if we request more than ~99K updates. 
> If update solrconfig to retain more {{tlogs}} to leverage 
> https://issues.apache.org/jira/browse/SOLR-6359
> During out testing we found out that recovery using {{PeerSync}} fails if we 
> ask for more than ~99K updates, with following error
> {code}
>  WARN  PeerSync [RecoveryThread] - PeerSync: core=hold_shard1 url=<shardUrl>
> exception talking to <leaderUrl>, failed
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
> Expected mime type application/octet-stream but got application/xml. 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="error"><str name="msg">application/x-www-form-urlencoded content 
> length (4761994 bytes) exceeds upload limit of 2048 KB</str><in
> t name="code">400</int></lst>
> </response>
> {code}
> We arrived at ~99K with following match
> * max_version_number = Long.MAX_VALUE = 9223372036854775807  
> * bytes per version number =  20 (on the wire as POST request sends version 
> number as string)
> * additional bytes for separator ,
> * max_versions_in_single_request = 2MB/21 = ~99864
> I could think of 2 ways to fix it
> 1. Ask for about updates in chunks of 90K inside {{PeerSync.requestUpdates()}}
> 2. Use application/octet-stream encoding 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to