[ 
https://issues.apache.org/jira/browse/PIG-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Warrington updated PIG-1702:
---------------------------------

    Attachment: PIG-1702-0.patch

Here is a patch that fixes the Header output by retrieving the information (the 
path, start offset, and length) from the FileSplit. 

One potential issue with this code is that it has to gain a reference to the 
current MapContext, which it does from PigMapReduce.sJobContext, and if PIG is 
running in local mode, there may be a race condition. PIG-1831 solved a similar 
issue with the configuration. Would it be wise to use a thread local variable 
in PigMapReduce for the context as well?

> Streaming debug output outputs null input-split information
> -----------------------------------------------------------
>
>                 Key: PIG-1702
>                 URL: https://issues.apache.org/jira/browse/PIG-1702
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Adam Warrington
>            Priority: Minor
>         Attachments: PIG-1702-0.patch
>
>
> Within the Pig streaming command execution, debug information is printed out 
> to stderr which specified the input file, as well as split information. The 
> function is 
> org.apache.pig.backend.hadoop.streaming.HadoopExecutableManager.writeDebugHeader().
>  Pig 0.7 outputs null for the split file, and -1 for the split start-offset 
> and split length. Example output:
> ===== Task Information Header =====
> Command: test.pl 
> (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)
> Start time: Mon Oct 25 21:24:45 EDT 2010
> Input-split file: null
> Input-split start-offset: -1
> Input-split length: -1
> Within the writeDebugHeader() function, the input file information is 
> obtained by querying for the "map.input.file" configuration variable. This 
> configuration variable was set by the old hadoop m/r api, but not by the 0.20 
> api, which Pig 0.7 now uses. The new way to get this information is with 
> something like: ((FileSplit) context.getInputSplit).getPath(). See 
> HADOOP-5973.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to