[ https://issues.apache.org/jira/browse/SOLR-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346944#comment-15346944 ]
Pushkar Raste commented on SOLR-9207: ------------------------------------- Here is high level description PeerSync currently computes versions the node recovery is missing and then sends all the version numbers to a replica to get corresponding updates. When a node under recovery is missing too many updates, the payload of {{getUpdates}} goes above 2MB and jetty would reject the request. Problem can be solved using one of the following technique # Increasing jetty payload limit pay solve this problem. We still would be sending a lot of data over the network, which might not be needed. # Stream versions to replica while asking for updates. # Request versions in chunks of about 90K versions at a time # gzip versions , and unzip it on the other side. # Ask for version using version ranges instead of sending individual versions. Approaches 1-3 require sending lot of data over the wire. Approach #3 also requires making multiple calls. Additionally #3 might not be feasible consider how current code works by submitting requests to {{shardHandler}} and calling {{handleResponse}}. #4 may work, but looks a little inelegant. Hence I settle on approach #5 (suggested by Ramkumar). Here is how it works * Let's say replica has version [1, 2, 3, 4, 5, 6] and leader has versions [1, 2, 3, 4, 5, 6, 10, -11, 12, 13, 15, 18] * While recovery using {{PeerSync}} strategy, replica computes, that range it is missing is {{10...18}} * Replica now requests for versions by specifying range {{10...18}} instead of sending all the individual versions (namely 10,11,-11,12,13,15,18) * I have made using version ranges for PeerSync configurable, by introducing following configuration section {code} <peerSync> <str name="useRangeVersions">${solr.peerSync.useRangeVersions:false}</str> </peerSync> {code} * Further I have it backwards compatible and a recovering node will use version ranges only if node it asks for updates can process version ranges > PeerSync recovery failes if number of updates requested is high > --------------------------------------------------------------- > > Key: SOLR-9207 > URL: https://issues.apache.org/jira/browse/SOLR-9207 > Project: Solr > Issue Type: Bug > Affects Versions: 5.1, 6.0 > Reporter: Pushkar Raste > Priority: Minor > Attachments: SOLR-9207.patch > > > {{PeerSync}} recovery fails if we request more than ~99K updates. > If update solrconfig to retain more {{tlogs}} to leverage > https://issues.apache.org/jira/browse/SOLR-6359 > During out testing we found out that recovery using {{PeerSync}} fails if we > ask for more than ~99K updates, with following error > {code} > WARN PeerSync [RecoveryThread] - PeerSync: core=hold_shard1 url=<shardUrl> > exception talking to <leaderUrl>, failed > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > Expected mime type application/octet-stream but got application/xml. > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="error"><str name="msg">application/x-www-form-urlencoded content > length (4761994 bytes) exceeds upload limit of 2048 KB</str><in > t name="code">400</int></lst> > </response> > {code} > We arrived at ~99K with following match > * max_version_number = Long.MAX_VALUE = 9223372036854775807 > * bytes per version number = 20 (on the wire as POST request sends version > number as string) > * additional bytes for separator , > * max_versions_in_single_request = 2MB/21 = ~99864 > I could think of 2 ways to fix it > 1. Ask for about updates in chunks of 90K inside {{PeerSync.requestUpdates()}} > 2. Use application/octet-stream encoding -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org