shubhamranjan opened a new pull request, #4205: URL: https://github.com/apache/solr/pull/4205
https://issues.apache.org/jira/browse/SOLR-18098 # Description Replication fails with `EOFException` when transferring files whose size is an exact multiple of `PACKET_SZ` (1 MB). For example, replicating a file that is exactly 1 MB, 2 MB, etc. causes the follower to crash. # Solution The root cause is in `IndexFetcher.FileFetcher.fetchPackets()`. The replication packet protocol has three packet types: 1. **Data packet**: `int(size) + long(checksum) + byte[size]` 2. **Zero-length data packet**: `int(0) + long(checksum)` — sent when the last chunk fills exactly `PACKET_SZ` 3. **EOF marker**: `int(0)` — no checksum follows The old code treated *any* `packetSize == 0` as a loop-continue, skipping the checksum at step 2. Those 8 unread checksum bytes were then interpreted as the next packet size → garbage value → `EOFException`. The fix reorders `fetchPackets()` to: 1. Detect the EOF marker (`size=0` and `fis.peek() == -1`) 2. Read the checksum for **all** data packets, including zero-length ones 3. Skip zero-length data packets only after consuming their checksum **AI Disclosure:** Claude (Anthropic) was used as an aid during diagnosis and development — specifically for analyzing the packet protocol interaction between `DirectoryFileStream.write()` and `fetchPackets()`, reasoning through the checksum read misalignment, and drafting test cases. All changes were reviewed, verified, and refined by a human (me) before submission. # Tests Added `IndexFetcherPacketProtocolTest` with 18 unit tests that exercise the packet protocol between `DirectoryFileStream` (sender) and `FileFetcher.fetchPackets` (receiver) in isolation: - **Exact multiples of PACKET_SZ**: 1 MB, 2 MB, 3 MB, 63 MB - **Non-multiples**: empty, 1 byte, 100 bytes, 100 KB, 512 KB - **Boundary cases**: PACKET_SZ ± 1 byte, 1.5× PACKET_SZ, 2× PACKET_SZ ± 1 - **Error handling**: checksum mismatch detection - **Buffer resize**: large multi-packet file (5 MB + 12345 bytes) - **Successive transfers**: multiple exact-size files in sequence Run with: `./gradlew :solr:core:test --tests "IndexFetcherPacketProtocolTest"` # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
