[ https://issues.apache.org/jira/browse/HADOOP-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552003 ]
arkady borkovsky commented on HADOOP-1327: ------------------------------------------ 0. Assume that this is the only Tutorial a new Hadoop user needs to read (once she knows hdfs -ls , and -cat, knows the URLs for Job tracker) 1. In FAQ section: "Can I use UNIX pipes? For example, will -mapper "cut -f1 | sed s/foo/bar/g" work?" use a script for this (i.e.) put the command into a file, and use that files as a streaming command) 2. n FAQ section: "How do I process files, one per map?" does not really give a solution for streaming, but rather has some Java class 3. -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ does this work? last time I tried, it was inserting unnecessary keys. If it is not fixed yet, it should not be in the example 4. Describe the the task env variables -- many of them are extremely useful 5. FAQ to answer: how can I make sure that each input file goes to a single mapper, without splitting? (the answer is to set maxsplit parameter) 6. FAQ to answer how can I make sure that my mapper gets the input exactly as it is store in DFS? and in case of compressed input, how can I make sure that my mapper gets as input exactly what come out from the decopressing? 7. Describe more the use of compression -- for the input -- different compression formats for output -- if the input is a compressed representation of multiple files, how can I get each uncompressed file to go to a separated mapper? (assuming that the files are large enough) 8. "Field selection" and "Aggregate package" although very nice and useful may be put into a separate page, as they are not fundamental for Steaming (while "secondary sort, the -partitioner" is fundamental -- I'd recommend to use "split by field 1, sort by filed 2" as default, for most users). 9. It would be very convenient to have section numbers. > Doc on Streaming > ---------------- > > Key: HADOOP-1327 > URL: https://issues.apache.org/jira/browse/HADOOP-1327 > Project: Hadoop > Issue Type: Improvement > Components: documentation > Reporter: Runping Qi > Assignee: Rob Weltman > Fix For: 0.15.2 > > Attachments: HADOOP-1327.patch, site.xml, streaming-doc.patch, > streaming.html, streaming.xml > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.