Re: FSDataInputStream.read(byte[]) only reads to a block boundary?

Matei Zaharia Sun, 28 Jun 2009 15:02:10 -0700

This kind of partial read is often used by the OS to return to your
application as soon as possible if trying to read more data would block, in
case you can begin computing on the partial data. In some applications, it's
not useful, but when you can begin computing on partial data, it allows the
OS to overlap IO with your computation, improving throughput. I think
FSDataInputStream returns at the block boundary for the same reason.


On Sun, Jun 28, 2009 at 11:16 AM, Raghu Angadi <rang...@yahoo-inc.com>wrote:

>
> This seems to be the case. I don't think there is any specific reason not
> to read across the block boundary...
>
> Even if HDFS does read across the blocks, it is still not a good idea to
> ignore the JavaDoc for read(). If you want all the bytes read, then you
> should have a while loop or one of the readFully() variants. For e.g. if you
> later change your code by wrapping a BufferedInputStream around 'in', you
> would still get partial reads even if HDFS reads all the data.
>
> Raghu.
>
>
> forbbs forbbs wrote:
>
>> The hadoop version is 0.19.0.
>> My file is larger than 64MB, and the block size is 64MB.
>>
>> The output of the code below is '10'. May I read across the block
>> boundary?  Or I should use 'while (left..){}' style code?
>>
>>  public static void main(String[] args) throws IOException
>>  {
>>    Configuration conf = new Configuration();
>>    FileSystem fs = FileSystem.get(conf);
>>    FSDataInputStream fin = fs.open(new Path(args[0]));
>>
>>    fin.seek(64*1024*1024 - 10);
>>    byte[] buffer = new byte[32*1024];
>>    int len = fin.read(buffer);
>>    //int len = fin.read(buffer, 0, 128);
>>    System.out.println(len);
>>
>>    fin.close();
>>  }
>>
>
>

Re: FSDataInputStream.read(byte[]) only reads to a block boundary?

Reply via email to