HDFS File API should be extended to include positional read
-----------------------------------------------------------
Key: HADOOP-519
URL: http://issues.apache.org/jira/browse/HADOOP-519
Project: Hadoop
Issue Type: New Feature
Components: dfs
Affects Versions: 0.6.0
Environment: All
Reporter: Milind Bhandarkar
Assigned To: Milind Bhandarkar
Fix For: 0.7.0
HDFS Input streams should support positional read. Positional read (such as the
pread syscall on linux) allows reading for a specified offset without affecting
the current file offset. Since the underlying file state is not touched, pread
can be used efficiently in multi-threaded programs.
Here is how I plan to implement it.
Provide PositionedReadable interface, with the following methods:
int read(long position, byte[] buffer, int offset, int length);
void readFully(long position, byte[] buffer, int offset, int length);
void readFully(long position, byte[] buffer);
Abstract class FSInputStream would provide default implementation of the above
methods using getPos(), seek() and read() methods. The default implementation
is inefficient in multi-threaded programs since it locks the object while
seeking, reading, and restoring to old state.
DFSClient.DFSInputStream, which extends FSInputStream will provide an efficient
non-synchronized implementation for above calls.
In addition, FSDataInputStream, which is a wrapper around FSInputStream, will
provide wrapper methods for above read methods as well.
Patch forthcoming early next week.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira