[ 
http://issues.apache.org/jira/browse/HADOOP-519?page=comments#action_12433910 ] 
            
Doug Cutting commented on HADOOP-519:
-------------------------------------

So, to be clear, you will modify both FSInputStream and FSDataInputStream to 
implement the new PositionReadable interface, right?  (FSDataInputStream should 
also implement Seekable too: it already has the methods, but does not yet 
reference the interface.)

Only the primitive FSInputStream.read(long,byte[],int,int) method needs to be 
synchronized.  The others can be unsynchronized and implemented only in base 
classes, inherited by optimized subclasses.

An optimized implementation of read(long,byte[],int,int) can be provided in 
both DFSInputStream and LocalFSInputStream (the latter using nio's 
FileChannel.read(ByteBuffer,long)).  It might be simpler if the PositionReadble 
API were instead read(ByteBuffer, long), so that the client can manage 
ByteBuffer allocation.

Finally, we should change implementations of read(byte[],int,int) and 
seek(long) to be synchronized.  This won't hurt, since they're not currently 
thread safe, and it will make the positioned-read methods thread-safe.

> HDFS File API should be extended to include positional read
> -----------------------------------------------------------
>
>                 Key: HADOOP-519
>                 URL: http://issues.apache.org/jira/browse/HADOOP-519
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.6.0
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.7.0
>
>
> HDFS Input streams should support positional read. Positional read (such as 
> the pread syscall on linux) allows reading for a specified offset without 
> affecting the current file offset. Since the underlying file state is not 
> touched, pread can be used efficiently in multi-threaded programs.
> Here is how I plan to implement it.
> Provide PositionedReadable interface, with the following methods:
> int read(long position, byte[] buffer, int offset, int length);
> void readFully(long position, byte[] buffer, int offset, int length);
> void readFully(long position, byte[] buffer);
> Abstract class FSInputStream would provide default implementation of the 
> above methods using getPos(), seek() and read() methods. The default 
> implementation is inefficient in multi-threaded programs since it locks the 
> object while seeking, reading, and restoring to old state.
> DFSClient.DFSInputStream, which extends FSInputStream will provide an 
> efficient non-synchronized implementation for above calls.
> In addition, FSDataInputStream, which is a wrapper around FSInputStream, will 
> provide wrapper methods for above read methods as well.
> Patch forthcoming early next week.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to