RE: Question about hdfs close * hflush behavior

kang hua Wed, 07 Sep 2011 19:28:50 -0700

Thanks my friend!please allow me to ask more question about detail thinks!
1 yes, I can use hadoop fs -tail or -cat xxx to see that file content, But how 
can I get that file real size in other process if namenode is not change ?  I 
real want is to read the date in tail  of that file.    2 why "when I reboot 
hdfs, I can see that file's content that I flushed again by "hadoop fs -ls xxx" 
"  
3 In append mode.  close file and open it with append mode again and again . 
real dataspace is normally increase, but nodename  show dfs used space increase 
to fast. it is a bug ?
4 which version of hdfs that append is no bug ?


thanks again.kanghua

> From: t...@cloudera.com
> Date: Wed, 7 Sep 2011 14:17:10 -0700
> Subject: Re: Question about hdfs close * hflush behavior
> To: hdfs-user@hadoop.apache.orgSend
> 
> 2011/9/7 kang hua <kanghua...@msn.com>:
> >
> > Hi friends:
> >    I has two question.
> >    first one is:
> >    I use libhdfs's hflush to flush my data to a file, in same process
> > context I can read it. But I find that file unchanged if I check from hadoop
> > shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in
> > program); however when I reboot hdfs, I can read that file's content that I
> > flushed again。 why ?
> 
> If we were to update the file metadata on hflush, it would be very
> expensive, since the metadata lives in the NameNode.
> 
> If you do hadoop fs -cat xxx, you should see the entirety of the flushed data.
> 
> >    can I hflush data to file without close it,at same time read data flushed
> > by other process ？
> 
> yes.
> 





> >    second one is:
> >    does once close hdfs file, the last written block is untouched. even open
> > that file with append mode, namenode will alloc a new block to for append
> > data?
> 
> No, it reopens the last block of the existing file for append.
> 
> >    I find if I close file and open it with append mode again and again. hdfs
> > report will show "used space much more that the file logic size"
> 
> Not sure I follow what you mean by this. Can you give more detail?
> 
> >    btw: I use cloudera ch2
> 
> The actual "append()" function has some bugs in all of the 0.20
> releases, including Cloudera's. The hflush/sync() API is fine to use,
> but I would recommend against using append().
> 
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

RE: Question about hdfs close * hflush behavior

Reply via email to