[ https://issues.apache.org/jira/browse/MAPREDUCE-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871041#action_12871041 ]
Ravi Gummadi commented on MAPREDUCE-1697: ----------------------------------------- Latest patch looks good. +1 > Document the behavior of -file option in streaming > -------------------------------------------------- > > Key: MAPREDUCE-1697 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1697 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming, documentation > Affects Versions: 0.20.1 > Reporter: Amareshwari Sriramadasu > Fix For: 0.21.0, 0.22.0 > > Attachments: patch-1697-1.txt, patch-1697.txt > > > The behavior of -file option in streaming is not documented anywhere. > The behavior of -file is the following : > 1) All the files passed through -file option are packaged into job.jar. > 2) If -file option is used for .class or .jar files, they are unjarred on > tasktracker and placed in > ${mapred.local.dir}/taskTracker/jobcache/job_ID/jars/classes or /lib, > respectively. Symlinks to the directories classes and lib are created from > the cwd of the task, . The names of symlinks are "classes", "lib". So file > names of .class or .jar files do not appear in cwd of the task. > Paths to these files are automatically added to classpath. The tricky part is > that hadoop framework can pick .class or .jar using classpath, but actual > mapper script cannot. If you'd like to access these .class or .jar inside > script, please do something like "java -cp lib/*;classes/* <ClassName>". > 3) If -file option is used for files other than .class or .jar (e.g, .txt or > .pl), these files are unjarred into > ${mapred.local.dir}/taskTracker/jobcache/job_ID/jars/. Symlinks to these > files are created from the cwd of the task. Names of these symlinks are > actually file names. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.