In Hadoop 1.0 (from 0.20-append), there's just a single "sync(…)"
output-stream call that does a metadata update to persist the data
already written to the under-construction blocks and flushes the open
file for the block at the DNs (but does *not* flush the file
descriptor of the file at the OS level, via
http://linux.die.net/man/2/fsync).

In Hadoop 2.0 (what is 0.23.x today), there are two APIs - hflush and
hsync. The former would be akin to the above and old sync(…) call,
while the latter is designed to do one step further and call the fsync
syscall (http://linux.die.net/man/2/fsync) to ensure that the data is
really persisted. However, currently, as of 0.23.2 at least, hsync()
isn't completely implemented, and just calls hflush() instead so the
behavior is the same.

See https://issues.apache.org/jira/browse/HDFS-265 for all the
discussion around this change between 1.0 and 2.0.

Also see the API docs for both here for their javadocs:
http://hadoop.apache.org/common/docs/r0.23.1/api/org/apache/hadoop/fs/FSDataOutputStream.html#hflush()
and 
http://hadoop.apache.org/common/docs/r0.23.1/api/org/apache/hadoop/fs/FSDataOutputStream.html#hsync()

Ticket https://issues.apache.org/jira/browse/HDFS-744 tracks
completion of the hsync() feature.

On Thu, Apr 12, 2012 at 3:59 PM, Inder Pall <inder.p...@gmail.com> wrote:
> Folks,
>
> Can some one shed out more technical details than what the javadoc talks
> about.
> Also, which one should be used when?
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub



-- 
Harsh J

Reply via email to