[ http://issues.apache.org/jira/browse/HADOOP-519?page=comments#action_12433929 ] Milind Bhandarkar commented on HADOOP-519: ------------------------------------------
>>So, to be clear, you will modify both FSInputStream and FSDataInputStream to >>implement the new PositionReadable interface, right? Yes. >>Only the primitive FSInputStream.read(long,byte[],int,int) method needs to be >>synchronized. The others can be unsynchronized and implemented >>only in base >>classes, inherited by optimized subclasses. That's right. >>An optimized implementation of read(long,byte[],int,int) can be provided in >>both DFSInputStream and LocalFSInputStream (the latter using nio's >>>>FileChannel.read(ByteBuffer,long)). It might be simpler if the >>PositionReadble API were instead read(ByteBuffer, long), so that the client >>can >>manage ByteBuffer allocation. I was going to make LocalFSInputStream to use the default synchronized implementation. I will study the solution you have suggested and will try to see if there are any tangible benefits. >>Finally, we should change implementations of read(byte[],int,int) and >>seek(long) to be synchronized. This won't hurt, since they're not currently >>>>thread safe, and it will make the positioned-read methods thread-safe. Yes. > HDFS File API should be extended to include positional read > ----------------------------------------------------------- > > Key: HADOOP-519 > URL: http://issues.apache.org/jira/browse/HADOOP-519 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Affects Versions: 0.6.0 > Environment: All > Reporter: Milind Bhandarkar > Assigned To: Milind Bhandarkar > Fix For: 0.7.0 > > > HDFS Input streams should support positional read. Positional read (such as > the pread syscall on linux) allows reading for a specified offset without > affecting the current file offset. Since the underlying file state is not > touched, pread can be used efficiently in multi-threaded programs. > Here is how I plan to implement it. > Provide PositionedReadable interface, with the following methods: > int read(long position, byte[] buffer, int offset, int length); > void readFully(long position, byte[] buffer, int offset, int length); > void readFully(long position, byte[] buffer); > Abstract class FSInputStream would provide default implementation of the > above methods using getPos(), seek() and read() methods. The default > implementation is inefficient in multi-threaded programs since it locks the > object while seeking, reading, and restoring to old state. > DFSClient.DFSInputStream, which extends FSInputStream will provide an > efficient non-synchronized implementation for above calls. > In addition, FSDataInputStream, which is a wrapper around FSInputStream, will > provide wrapper methods for above read methods as well. > Patch forthcoming early next week. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
