On Sun, Jun 28, 2009 at 3:01 PM, Matei Zaharia <ma...@cloudera.com> wrote:
> This kind of partial read is often used by the OS to return to your > application as soon as possible if trying to read more data would block, in > case you can begin computing on the partial data. In some applications, > it's > not useful, but when you can begin computing on partial data, it allows the > OS to overlap IO with your computation, improving throughput. I think > FSDataInputStream returns at the block boundary for the same reason. > It is very unusual, nay, unexpected to the point of bizarre, for the OS to do so on a regular file. Typically only seen on network fds. > > On Sun, Jun 28, 2009 at 11:16 AM, Raghu Angadi <rang...@yahoo-inc.com > >wrote: > > > > > This seems to be the case. I don't think there is any specific reason not > > to read across the block boundary... > > > > Even if HDFS does read across the blocks, it is still not a good idea to > > ignore the JavaDoc for read(). If you want all the bytes read, then you > > should have a while loop or one of the readFully() variants. For e.g. if > you > > later change your code by wrapping a BufferedInputStream around 'in', you > > would still get partial reads even if HDFS reads all the data. > > > > Raghu. > > > > > > forbbs forbbs wrote: > > > >> The hadoop version is 0.19.0. > >> My file is larger than 64MB, and the block size is 64MB. > >> > >> The output of the code below is '10'. May I read across the block > >> boundary? Or I should use 'while (left..){}' style code? > >> > >> public static void main(String[] args) throws IOException > >> { > >> Configuration conf = new Configuration(); > >> FileSystem fs = FileSystem.get(conf); > >> FSDataInputStream fin = fs.open(new Path(args[0])); > >> > >> fin.seek(64*1024*1024 - 10); > >> byte[] buffer = new byte[32*1024]; > >> int len = fin.read(buffer); > >> //int len = fin.read(buffer, 0, 128); > >> System.out.println(len); > >> > >> fin.close(); > >> } > >> > > > > >