[ https://issues.apache.org/jira/browse/SOLR-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212621#comment-17212621 ]
Evgeny Ivanskiy commented on SOLR-9207: --------------------------------------- Hi [~praste], [~shalin]. We are seeing an intermittent issue where some of our collections fail to elect a leader after a restart. The log shows that hosts are failing to become leader due to a sync failure. And that the sync failure is due to not receiving the expected number of updates Investigation shows that there are duplicate versions in tlogs. So, in this case: If we get versions: *1,1,2,2,3,3* we than request the updates in range *1...3*. As result we get *3 updates* but *totalRequestedUpdates is 6* and sync failed. Is there is an assumption that getVersions should return distinct values or that is the bug in PeerSync.handleVersionsWithRanges which does't take into account duplicate versions? > PeerSync recovery fails if number of updates requested is high > -------------------------------------------------------------- > > Key: SOLR-9207 > URL: https://issues.apache.org/jira/browse/SOLR-9207 > Project: Solr > Issue Type: Bug > Affects Versions: 5.1, 6.0 > Reporter: Pushkar Raste > Assignee: Shalin Shekhar Mangar > Priority: Minor > Fix For: 6.2, 7.0 > > Attachments: SOLR-9207.patch, SOLR-9207.patch, SOLR-9207.patch_updated > > > {{PeerSync}} recovery fails if we request more than ~99K updates. > If update solrconfig to retain more {{tlogs}} to leverage > https://issues.apache.org/jira/browse/SOLR-6359 > During out testing we found out that recovery using {{PeerSync}} fails if we > ask for more than ~99K updates, with following error > {code} > WARN PeerSync [RecoveryThread] - PeerSync: core=hold_shard1 url=<shardUrl> > exception talking to <leaderUrl>, failed > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > Expected mime type application/octet-stream but got application/xml. > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="error"><str name="msg">application/x-www-form-urlencoded content > length (4761994 bytes) exceeds upload limit of 2048 KB</str><in > t name="code">400</int></lst> > </response> > {code} > We arrived at ~99K with following match > * max_version_number = Long.MAX_VALUE = 9223372036854775807 > * bytes per version number = 20 (on the wire as POST request sends version > number as string) > * additional bytes for separator , > * max_versions_in_single_request = 2MB/21 = ~99864 > I could think of 2 ways to fix it > 1. Ask for about updates in chunks of 90K inside {{PeerSync.requestUpdates()}} > 2. Use application/octet-stream encoding -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org