Thomas created HADOOP-14535:
-------------------------------
Summary: Support for random access and seek of block blobs
Key: HADOOP-14535
URL: https://issues.apache.org/jira/browse/HADOOP-14535
Project: Hadoop Common
Issue Type: Improvement
Components: fs/azure
Reporter: Thomas
Fix For: 2.9.0, 3.0.0-alpha4
This change adds a seek-able stream for reading block blobs to the wasb:// file
system.
If seek() is not used or if only forward seek() is used, the behavior of read()
is unchanged.
That is, the stream is optimized for sequential reads by reading chunks (over
the network) in
the size specified by "fs.azure.read.request.size" (default is 4 megabytes).
If reverse seek() is used, the behavior of read() changes in favor of reading
the actual number
of bytes requested in the call to read(), with some contraints. If the size
requested is smaller
than 16 kilobytes and cannot be satisfied by the internal buffer, the network
read will be 16
kilobytes. If the size requested is greater than 4 megabytes, it will be
satisifed by sequential
4 megabyte reads over the network.
This change improves the performance of FSInputStream.seek() by not closing and
re-opening the
stream, which for block blobs also involves a network operation to read the
blob metadata. Now
NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves the
read position.
[^attachment-name.zip]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]