sarum90 opened a new pull request, #609: URL: https://github.com/apache/arrow-rs-object-store/pull/609
# Which issue does this PR close? N/A - no existing issue. Happy to create one if preferred. # Rationale for this change If a reqwest timeout occurs after consuming all bytes from an S3 response body but before receiving the EOF signal, `retry_stream` attempts to retry with a `Range` header like `bytes=5-4` (where 5 is the file size). This is an invalid/backwards range. Per RFC 7233, servers MUST ignore syntactically invalid Range headers. S3 follows this spec and returns `200 OK` with the **full file** instead of `206 Partial Content` or `416 Range Not Satisfiable`. Without validation, `retry_stream` would read the full file again, causing data duplication (e.g., `"hellohello"` instead of `"hello"`). # What changes are included in this PR? 1. **Early EOF check**: If `range.start >= range.end` before retrying, return EOF immediately rather than sending an invalid range request. 2. **206 status validation**: Verify the retry response is `206 Partial Content`. If the server returns `200 OK` (indicating it ignored our Range header), fail the retry and return the original error. 3. **Content-Range validation**: Verify the `Content-Range` header matches the requested range. This catches cases where the server returns a different range than requested. 4. **Warning logs**: Added `warn!` logs for the new validation failures to aid debugging. # Are there any user-facing changes? No breaking changes. Users may see different error behavior in edge cases where retries were previously succeeding incorrectly (by silently duplicating data). # Test Coverage Automated test coverage for this specific bug is limited for two reasons: 1. **Requires real S3**: LocalStack returns `416 Range Not Satisfiable` for invalid ranges, which is actually non-compliant with RFC 7233. Real S3 correctly ignores the invalid range and returns `200 OK` with the full file, which is what triggers the bug. 2. **Hard to trigger**: The bug requires a timeout (or other error) from reqwest *after* all data is read but *before* EOF is signaled. This is timing-dependent and would require integration tests with long sleeps to force a timeout. A manual reproduction script is available in this gist: https://gist.github.com/sarum90/cb95633750390db690ef701f08aa20ed The script demonstrates the bug against real S3 and shows the data duplication. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
