Thanks my friend!please allow me to ask more question about detail thinks! 1 yes, I can use hadoop fs -tail or -cat xxx to see that file content, But how can I get that file real size in other process if namenode is not change ? I real want is to read the date in tail of that file. 2 why "when I reboot hdfs, I can see that file's content that I flushed again by "hadoop fs -ls xxx" " 3 In append mode. close file and open it with append mode again and again . real dataspace is normally increase, but nodename show dfs used space increase to fast. it is a bug ? 4 which version of hdfs that append is no bug ?
thanks again.kanghua > From: t...@cloudera.com > Date: Wed, 7 Sep 2011 14:17:10 -0700 > Subject: Re: Question about hdfs close * hflush behavior > To: hdfs-user@hadoop.apache.orgSend > > 2011/9/7 kang hua <kanghua...@msn.com>: > > > > Hi friends: > > I has two question. > > first one is: > > I use libhdfs's hflush to flush my data to a file, in same process > > context I can read it. But I find that file unchanged if I check from hadoop > > shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in > > program); however when I reboot hdfs, I can read that file's content that I > > flushed again。 why ? > > If we were to update the file metadata on hflush, it would be very > expensive, since the metadata lives in the NameNode. > > If you do hadoop fs -cat xxx, you should see the entirety of the flushed data. > > > can I hflush data to file without close it,at same time read data flushed > > by other process ? > > yes. > > > second one is: > > does once close hdfs file, the last written block is untouched. even open > > that file with append mode, namenode will alloc a new block to for append > > data? > > No, it reopens the last block of the existing file for append. > > > I find if I close file and open it with append mode again and again. hdfs > > report will show "used space much more that the file logic size" > > Not sure I follow what you mean by this. Can you give more detail? > > > btw: I use cloudera ch2 > > The actual "append()" function has some bugs in all of the 0.20 > releases, including Cloudera's. The hflush/sync() API is fine to use, > but I would recommend against using append(). > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera