[ https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833590#comment-17833590 ]
ASF GitHub Bot commented on HADOOP-19098: ----------------------------------------- steveloughran opened a new pull request, #6700: URL: https://github.com/apache/hadoop/pull/6700 Clarifies behaviour of VectorIO methods with contract tests as well as specification. * Add precondition range checks to all implementations * Identify and fix bug where direct buffer reads was broken (HADOOP-19101; this surfaced in ABFS contract tests) * Logging in VectoredReadUtils. * TestVectoredReadUtils verifies validation logic. * FileRangeImpl toString() improvements * CombinedFileRange tracks bytes in range which are wanted; toString() output logs this. HDFS * Add test TestHDFSContractVectoredRead ABFS * Add test ITestAbfsFileSystemContractVectoredRead S3A * checks for vector IO being stopped in all iterative vector operations, including draining * maps read() returning -1 to failure * passes in file length to validation * Error reporting to only completeExceptionally() those ranges which had not yet read data in. * Improved logging. readVectored() * made synchronized. This is only for the invocation; the actual async retrieves are unsynchronized. * closes input stream on invocation * switches to random IO, so avoids keeping any long-lived connection around. + AbstractSTestS3AHugeFiles enhancements. + ADDENDUM: test fix in ITestS3AContractVectoredRead Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation Contributed by Steve Loughran ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Vector IO: consistent specified rejection of overlapping ranges > --------------------------------------------------------------- > > Key: HADOOP-19098 > URL: https://issues.apache.org/jira/browse/HADOOP-19098 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/s3 > Affects Versions: 3.3.6 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Related to PARQUET-2171 q: "how do you deal with overlapping ranges?" > I believe s3a rejects this, but the other impls may not. > Proposed > FS spec to say > * "overlap triggers IllegalArgumentException". > * special case: 0 byte ranges may be short circuited to return empty buffer > even without checking file length etc. > Contract tests to validate this > (+ common helper code to do this). > I'll copy the validation stuff into the parquet PR for consistency with older > releases -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org