[ 
https://issues.apache.org/jira/browse/SOLR-18098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-18098:
----------------------------------
    Labels: pull-request-available  (was: )

> Replication fails with EOFException for files with sizes that are exact 
> multiples of PACKET_SZ (1 MB)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-18098
>                 URL: https://issues.apache.org/jira/browse/SOLR-18098
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java)
>    Affects Versions: 9.7, 9.8, 9.8.1, 9.10, 9.9.0, 9.10.1
>            Reporter: Shubham Ranjan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Problem
> Replication fails with {{EOFException}} when transferring files whose sizes 
> are exact multiples of 1 MB (e.g., 1 MB, 2 MB, 63 MB, etc.).
> {code:java}
>   ERROR org.apache.solr.handler.IndexFetcher File _5xc54.cfs downloaded in 
> ERROR, downloaded 66060288 of 66060288 bytes
>   Caused by: java.io.EOFException
>       at 
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1767)
>   {code}
> File size: 66,060,288 bytes = 63 MB exactly
> h2. Root Cause
> Packet protocol mismatch between leader (sender) and follower (receiver):
>   - Leader sends files in 1 MB packets with checksums
>   - For files that are exact MB multiples, leader sends a final zero-length 
> packet WITH an 8-byte checksum
>   - Follower's bug: when it reads {{packetSize = 0}}, it skips to the next 
> iteration WITHOUT consuming the checksum
>   - This causes stream misalignment - next read interprets checksum bytes as 
> packet size, then fails
>   Buggy code in {{IndexFetcher.java}} lines 1760-1761:
>   {code:java}
>   if (packetSize <= 0) {
>       continue;  // BUG: Does not consume 8-byte checksum, misaligns stream
>   }
>   {code}
> h2. Impact
> Replicas cannot recover when affected files exist



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to