[ 
https://issues.apache.org/jira/browse/SOLR-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346944#comment-15346944
 ] 

Pushkar Raste commented on SOLR-9207:
-------------------------------------

Here is high level description

PeerSync currently computes versions the node recovery is missing and then 
sends all the version numbers to a replica to get corresponding updates. When a 
node under recovery is missing too many updates, the payload of {{getUpdates}} 
goes above 2MB and jetty would reject the request. Problem can be solved using 
one of the following technique

# Increasing jetty payload limit pay solve this problem. We still would be 
sending a lot of data over the network, which might not be needed.
# Stream versions to replica while asking for updates. 
# Request versions in chunks of about 90K versions at a time
# gzip versions , and unzip it on the other side.
# Ask for version using version ranges instead of sending individual versions.

Approaches 1-3 require sending lot of data over the wire. 
Approach #3 also requires making multiple calls. Additionally #3 might not be 
feasible consider how current code works by submitting requests to 
{{shardHandler}} and calling {{handleResponse}}.
#4 may work, but looks a little inelegant. 

Hence I settle on approach #5 (suggested by Ramkumar). Here is how it works 
* Let's say replica has version [1, 2, 3, 4, 5, 6] and leader has versions [1, 
2, 3, 4, 5, 6, 10, -11, 12, 13, 15, 18]
* While recovery using {{PeerSync}} strategy, replica computes, that range it 
is missing is {{10...18}}
* Replica now requests for versions by specifying range {{10...18}} instead of 
sending all the individual versions (namely 10,11,-11,12,13,15,18)
* I have made using version ranges for PeerSync configurable, by introducing 
following configuration section
{code}
  <peerSync>
    <str name="useRangeVersions">${solr.peerSync.useRangeVersions:false}</str>
  </peerSync>
{code}
* Further I have it backwards compatible and a recovering node will use version 
ranges only if node it asks for updates can process version ranges

> PeerSync recovery failes if number of updates requested is high
> ---------------------------------------------------------------
>
>                 Key: SOLR-9207
>                 URL: https://issues.apache.org/jira/browse/SOLR-9207
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.1, 6.0
>            Reporter: Pushkar Raste
>            Priority: Minor
>         Attachments: SOLR-9207.patch
>
>
> {{PeerSync}} recovery fails if we request more than ~99K updates. 
> If update solrconfig to retain more {{tlogs}} to leverage 
> https://issues.apache.org/jira/browse/SOLR-6359
> During out testing we found out that recovery using {{PeerSync}} fails if we 
> ask for more than ~99K updates, with following error
> {code}
>  WARN  PeerSync [RecoveryThread] - PeerSync: core=hold_shard1 url=<shardUrl>
> exception talking to <leaderUrl>, failed
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
> Expected mime type application/octet-stream but got application/xml. 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="error"><str name="msg">application/x-www-form-urlencoded content 
> length (4761994 bytes) exceeds upload limit of 2048 KB</str><in
> t name="code">400</int></lst>
> </response>
> {code}
> We arrived at ~99K with following match
> * max_version_number = Long.MAX_VALUE = 9223372036854775807  
> * bytes per version number =  20 (on the wire as POST request sends version 
> number as string)
> * additional bytes for separator ,
> * max_versions_in_single_request = 2MB/21 = ~99864
> I could think of 2 ways to fix it
> 1. Ask for about updates in chunks of 90K inside {{PeerSync.requestUpdates()}}
> 2. Use application/octet-stream encoding 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to