Technically, yes, you can run MR jobs on non-closed files (It'll run the reader in the same way as your -cat) , but your would only be able to read until the last complete block, or until the point sync() was called on the output stream.
It is better if your file-writer uses the sync() API judiciously to mark sync points after a considerable amount of records, so that your MR readers in tasks read until whole records and not just block boundaries. For a description on sync() API, read the section 'Coherency Model' in Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68. On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <n...@taykey.com> wrote: > hi all, > > we are looking for a way, to map-reduce on a non-closed files. > we currently able to run a > hadoop fs -cat <non-closed-file> > > non-closed files - files that are currently been written, and have not been > closed yet. > > is there any way to run map-reduce a on non-closed files ?? > > > 10x in advance for any answer > -- > Niv Mizrahi > Taykey | www.taykey.com > -- Harsh J