I don't think that binary input works with streaming because of the assumption of one record per line.
If you want to script map-reduce programs, would you be open to a Groovy implementation that avoids these problems? On 4/7/08 6:42 AM, "John Menzer" <[EMAIL PROTECTED]> wrote: > > hi, > > i would like to use binary input and output data in combination with hadoop > streaming. > > the reason why i want to use binary data is, that parsing text to float > seems to consume a big lot of time compared to directly reading the binary > floats. > > i am using a C-coded mapper (getting streaming data from stdin and writing > to stdout) and no reducer. > > so my question is: how do i implement binary input output in this context? > as far as i understand i need to put an '\n' char at the end of each > binary-'line'. so hadoop knows how to split/distribute the input data among > the nodes and how to collect it for output(??) > > is this approach reasonable? > > thanks, > john