Re: Capturing Pig action output

Mona Chitnis Thu, 30 Aug 2012 13:01:56 -0700

Hi Eduardo,

The log file where the Oozie pig action's output is written to, is a local
file in the current working directory of the map task, and not on HDFS.
Also, passing a custom path using "pig -logfile <HDFS_PATH>" is not
allowed. Can you try with passing another argument at the end of your
arguments list to your pig action as a redirection to some file of choice?


E.g. Trying to recreate something like the following. Note the redirection
to 'myfile.txt' at the end
pig -param $xyz=1000 myscript.pig &>myfile.txt

I haven't tried this out myself but it will be helpful to find out.
Otherwise, Virag's suggestion can help you access specific stats related
information

Regards,

--
Mona Chitnis




On 8/30/12 11:59 AM, "Virag Kothari" <[email protected]> wrote:

>Hi,
>
>From 3.2 onwards, counters and hadoop job ids for Pig and Map-reduce can
>be accessed through the API or EL function.
>
>First, the following should be set in wf configuration. This will store
>the Pig/MR related statistics in the DB.
><property>
>                                       
> <name>oozie.action.external.stats.write</name>
>                                       <value>true</value>
>                               </property>
>
>Then, the stats and jobIds can be accessed using the verbose API
>oozie job -info <jobId> -verbose
>
>Also, the hadoop job Id's can be retrieved for a Pig action through
>El-function
>
>wf:actionData(<pig-action-name>)["hadoopJobs"]
>
>
>Detailed docs at 
>http://incubator.apache.org/oozie/docs/3.2.0-incubating/docs/WorkflowFunct
>i
>onalSpec.html. Look under "4.2.5 Hadoop EL Functions"
>
>Thanks,
>Virag
>
>
>
>
>
>On 8/30/12 10:31 AM, "Eduardo Afonso Ferreira" <[email protected]> wrote:
>
>>Hi there,
>>
>>I have a pig that runs periodically by oozie via coordinator with a set
>>frequency.
>>I wanted to capture the Pig script output because I need to look at some
>>information on the results to keep track of several things.
>>I know I can look at the output by doing a whole bunch of clicks starting
>>at the oozie web console as follows:
>>
>>- Open oozie web console (ex.: http://localhost:11000/oozie/)
>>- Find and click the specific job under "Workflow Jobs"
>>- Select (click) the pig action in the window that pops up
>>- Click the magnifying glass icon on the "Console URL" field
>>- Click the Map of the launcher job
>>- Click the task ID
>>- Click All under "Task Logs"
>>
>>My question is how can I know the exact name and location of that log
>>file in HDFS so I can programmaticaly retrieve the file from HDFS and
>>parse and look for what I need?
>>
>>Is this something I can determine ahead of time, like pass a
>>parameter/argument to the action/pig so that it will store the log where
>>I want with the file name I want?
>>
>>Thanks in advance for your help.
>>Eduardo.
>

Re: Capturing Pig action output

Reply via email to