Technically, yes, you can run MR jobs on non-closed files (It'll run
the reader in the same way as your -cat) , but your would only be able
to read until the last complete block, or until the point sync() was
called on the output stream.

It is better if your file-writer uses the sync() API judiciously to
mark sync points after a considerable amount of records, so that your
MR readers in tasks read until whole records and not just block
boundaries.

For a description on sync() API, read the section 'Coherency Model' in
Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68.

On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <n...@taykey.com> wrote:
> hi all,
>
>  we are looking for a way, to map-reduce on a non-closed files.
>  we currently able to run a
> hadoop fs -cat <non-closed-file>
>
> non-closed files - files that are currently been written, and have not been
> closed yet.
>
> is there any way to run map-reduce a on non-closed files ??
>
>
> 10x in advance for any answer
> --
> Niv Mizrahi
> Taykey | www.taykey.com
>



-- 
Harsh J

Reply via email to