This seems to be the case. I don't think there is any specific reason
not to read across the block boundary...
Even if HDFS does read across the blocks, it is still not a good idea to
ignore the JavaDoc for read(). If you want all the bytes read, then you
should have a while loop or one of the readFully() variants. For e.g. if
you later change your code by wrapping a BufferedInputStream around
'in', you would still get partial reads even if HDFS reads all the data.
Raghu.
forbbs forbbs wrote:
The hadoop version is 0.19.0.
My file is larger than 64MB, and the block size is 64MB.
The output of the code below is '10'. May I read across the block
boundary? Or I should use 'while (left..){}' style code?
public static void main(String[] args) throws IOException
{
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
FSDataInputStream fin = fs.open(new Path(args[0]));
fin.seek(64*1024*1024 - 10);
byte[] buffer = new byte[32*1024];
int len = fin.read(buffer);
//int len = fin.read(buffer, 0, 128);
System.out.println(len);
fin.close();
}