Thanks for correcting me there on the syncFs call Luke. I seemed to have missed that method when searching branch-1 code.
On Thu, May 31, 2012 at 6:54 AM, Luke Lu <l...@apache.org> wrote: > > SequenceFile.Writer#syncFs is in Hadoop 1.0.0 (actually since > 0.20.205), which calls the underlying FSDataOutputStream#sync which is > actually hflush semantically (data not durable in case of data center > wide power outage). hsync implementation is not yet in 2.0. HDFS-744 > just brought hsync in trunk. > > __Luke > > On Fri, May 25, 2012 at 9:30 AM, Harsh J <ha...@cloudera.com> wrote: > > Mohit, > > > > Not if you call sync (or hflush/hsync in 2.0) periodically to persist > > your changes to the file. SequenceFile doesn't currently have a > > sync-API inbuilt in it (in 1.0 at least), but you can call sync on the > > underlying output stream instead at the moment. This is possible to do > > in 1.0 (just own the output stream). > > > > Your use case also sounds like you may want to simply use Apache Flume > > (Incubating) [http://incubator.apache.org/flume/] that already does > > provide these features and the WAL-kinda reliability you seek. > > > > On Fri, May 25, 2012 at 8:24 PM, Mohit Anchlia <mohitanch...@gmail.com> > > wrote: > >> We get click data through API calls. I now need to send this data to our > >> hadoop environment. I am wondering if I could open one sequence file and > >> write to it until it's of certain size. Once it's over the specified size I > >> can close that file and open a new one. Is this a good approach? > >> > >> Only thing I worry about is what happens if the server crashes before I am > >> able to cleanly close the file. Would I lose all previous data? > > > > > > > > -- > > Harsh J -- Harsh J