2011/9/7 kang hua <kanghua...@msn.com>: > > Hi friends: > I has two question. > first one is: > I use libhdfs's hflush to flush my data to a file, in same process > context I can read it. But I find that file unchanged if I check from hadoop > shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in > program); however when I reboot hdfs, I can read that file's content that I > flushed again。 why ?
If we were to update the file metadata on hflush, it would be very expensive, since the metadata lives in the NameNode. If you do hadoop fs -cat xxx, you should see the entirety of the flushed data. > can I hflush data to file without close it,at same time read data flushed > by other process ? yes. > > second one is: > does once close hdfs file, the last written block is untouched. even open > that file with append mode, namenode will alloc a new block to for append > data? No, it reopens the last block of the existing file for append. > I find if I close file and open it with append mode again and again. hdfs > report will show "used space much more that the file logic size" Not sure I follow what you mean by this. Can you give more detail? > btw: I use cloudera ch2 The actual "append()" function has some bugs in all of the 0.20 releases, including Cloudera's. The hflush/sync() API is fine to use, but I would recommend against using append(). -Todd -- Todd Lipcon Software Engineer, Cloudera