Hi, I have a question about the <key, values> that the reducer gets in Hadoop Streaming.
I wrote a simple mapper.sh, reducer.sh script files: mapper.sh : #!/bin/bash while read data do #tokenize the data and output the values <word, 1> echo $data | awk '{token=0; while(++token<=NF) print $token"\t1"}' done reducer.sh : #!/bin/bash while read data do echo -e $data done The mapper tokenizes a line of input and outputs <word, 1> pairs to standard output. The reducer just outputs what it gets from standard input. I have a simple input file: cat in the hat ate my mat the I was expecting the final output to be something like: the 1 1 1 cat 1 etc. but instead each word has its own line, which makes me think that <key,value> is being given to the reducer and not <key, values> which is default for normal Hadoop (in Java) right? the 1 the 1 the 1 cat 1 Is there any way to get <key, values> for the reducer and not a bunch of <key, value> pairs? I looked into the -reducer aggregate option, but there doesn't seem to be a way to customize what the reducer does with the <key, values> other than max,min functions. Thanks. -- View this message in context: http://www.nabble.com/hadoop-streaming-reducer-values-tp23514523p23514523.html Sent from the Hadoop core-user mailing list archive at Nabble.com.