[
https://issues.apache.org/jira/browse/HADOOP-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579027#action_12579027
]
Hadoop QA commented on HADOOP-2806:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377872/patch-2806.txt
against trunk revision 619744.
@author +1. The patch does not contain any @author tags.
tests included -1. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new javac compiler
warnings.
release audit +1. The applied patch does not generate any new release
audit warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/console
This message is automatically generated.
> Streaming has no way to force entire record (or null) as key
> ------------------------------------------------------------
>
> Key: HADOOP-2806
> URL: https://issues.apache.org/jira/browse/HADOOP-2806
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/streaming
> Reporter: Marco Nicosia
> Assignee: Amareshwari Sriramadasu
> Priority: Minor
> Fix For: 0.17.0
>
> Attachments: patch-2806.txt
>
>
> I think perhaps streaming needs a "-allkey" or "-nullkey" option? Otherwise,
> I'm concerned there is a subtle streaming documentation problem.
> These two docs:
> http://hadoop.apache.org/core/docs/current/streaming.html
> http://wiki.apache.org/hadoop/HadoopStreaming (Should be merged with above?)
> ... seem to ignore that streaming, by default, splits key/value on TAB. Sure,
> they mention it, but in all the simple (no separator) examples, they don't
> seem to take into account that streaming may inconsistently decide whether
> the whole line is the key, or just up to the first tab, should one occur.
> This means that some records might be sorted differently as compared to
> others based on whether or not there's a tab?
> Here's a very simple pair of examples, that to the naive, should produce the
> same output, but do not:
> > [hod] (marco) >> run dfs -fs local -cat str-tabs
> > a 1
> > b 3
> > a 4
> >
> > [hod] (marco) >> run dfs -put str-tabs str-tabs
> >
> > [hod] (marco) >> run jar hadoop-streaming.jar -input str-tabs -output
> > str-tabs.out -mapper /bin/cat -reducer /bin/cat
> > [blah blah blah]
> >
> > [hod] (marco) >> run dfs -cat str-tabs.out/part-00000
> > a 4
> > a 1
> > b 3
> Compare to this negative-test:
> > [hod] (marco) >> run dfs -fs local -cat str-notabs
> > a 1
> > b 3
> > a 4
> >
> > [hod] (marco) >> run dfs -put str-notabs str-notabs
> >
> > [hod] (marco) >> run jar hadoop-streaming.jar -input str-notabs -output
> > str-notabs.out -mapper /bin/cat -reducer /bin/cat
> > [blah blah blah]
> >
> > [hod] (marco) >> run dfs -cat str-notabs.out/part-00000
> > a 1
> > a 4
> > b 3
> >
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.