[
https://issues.apache.org/jira/browse/HADOOP-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552003
]
arkady borkovsky commented on HADOOP-1327:
------------------------------------------
0. Assume that this is the only Tutorial a new Hadoop user needs to read (once
she knows hdfs -ls , and -cat, knows the URLs for Job tracker)
1. In FAQ section: "Can I use UNIX pipes? For example, will -mapper "cut -f1 |
sed s/foo/bar/g" work?"
use a script for this (i.e.) put the command into a file, and use that files
as a streaming command)
2. n FAQ section: "How do I process files, one per map?"
does not really give a solution for streaming, but rather has some Java class
3. -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
does this work?
last time I tried, it was inserting unnecessary keys.
If it is not fixed yet, it should not be in the example
4. Describe the the task env variables -- many of them are extremely useful
5. FAQ to answer:
how can I make sure that each input file goes to a single mapper, without
splitting?
(the answer is to set maxsplit parameter)
6. FAQ to answer
how can I make sure that my mapper gets the input exactly as it is store in DFS?
and
in case of compressed input, how can I make sure that my mapper gets as input
exactly what come out from the decopressing?
7. Describe more the use of compression
-- for the input
-- different compression formats for output
-- if the input is a compressed representation of multiple files, how can I get
each uncompressed file to go to a separated mapper? (assuming that the files
are large enough)
8. "Field selection" and "Aggregate package" although very nice and useful may
be put into a separate page, as they are not fundamental for Steaming
(while "secondary sort, the -partitioner" is fundamental -- I'd recommend to
use "split by field 1, sort by filed 2" as default, for most users).
9. It would be very convenient to have section numbers.
> Doc on Streaming
> ----------------
>
> Key: HADOOP-1327
> URL: https://issues.apache.org/jira/browse/HADOOP-1327
> Project: Hadoop
> Issue Type: Improvement
> Components: documentation
> Reporter: Runping Qi
> Assignee: Rob Weltman
> Fix For: 0.15.2
>
> Attachments: HADOOP-1327.patch, site.xml, streaming-doc.patch,
> streaming.html, streaming.xml
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.