RE: Capturing Pig action output

Kamal Hakim Tue, 04 Sep 2012 14:40:55 -0700

Hi Eduardo ,

You could run the Pig command as an in-line script command and capture the 
output into an output file.

Example:

pig -f pigscript.pig > pig_log.log

Kamal Hakim
American Express
Big Data Architecture
Phone: 602-537-6819
________________________________________
From: Eduardo Afonso Ferreira [[email protected]]
Sent: Tuesday, September 04, 2012 02:24 PM
To: [email protected]
Subject: Re: Capturing Pig action output

Hey, thanks for the response.

What I need is not the log file that's created by Pig when there's an error, 
you know, the one that lists stack trace and things like that.

I need the output that Pig sends to stdout/stderr. This is the output that 
includes information about hadoop jobs created, started/finished timestamps, 
success/failure, etc.

Here's an example of part of that output:

________________________________

HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt    Features
0.20.2-cdh3u3    0.11.0-SNAPSHOT    mapred    2012-09-04 17:00:56    2012-09-04 
17:06:26    GROUP_BY,DISTINCT,FILTER

Success!

Job Stats (time in seconds):
JobId    Maps    Reduces    MaxMapTime    MinMapTIme    AvgMapTime    
MaxReduceTime    MinReduceTime    AvgReduceTime    Alias    Feature    Outputs
job_201206281058_812355    14    4    20    11    16    250    235    246    
B,esi,filt1,keys,slots    DISTINCT
job_201206281058_812414    4    4    7    6    7    26    22    24    1-4,C,E   
 GROUP_BY,DISTINCT    hbase://active_video_plays,

Input(s):
Successfully read 144190 records (5079 bytes) from: "hbase://events_sessions"

Output(s):
Successfully stored 20835 records in: "hbase://active_video_plays"

Counters:
Total records written : 20835
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
________________________________

Virag,

We're currently using Oozie 2.3.2 (2.3.2-cdh3u3)and I guess the wf 
configuration you mentioned is not available on this version.

Eduardo.

________________________________
 From: Mona Chitnis <[email protected]>
To: "[email protected]" <[email protected]>; 
Eduardo Afonso Ferreira <[email protected]>
Sent: Thursday, August 30, 2012 4:01 PM
Subject: Re: Capturing Pig action output

Hi Eduardo,

The log file where the Oozie pig action's output is written to, is a local
file in the current working directory of the map task, and not on HDFS.
Also, passing a custom path using "pig -logfile <HDFS_PATH>" is not
allowed. Can you try with passing another argument at the end of your
arguments list to your pig action as a redirection to some file of choice?

E.g. Trying to recreate something like the following. Note the redirection
to 'myfile.txt' at the end
pig -param $xyz=1000 myscript.pig &>myfile.txt

I haven't tried this out myself but it will be helpful to find out.
Otherwise, Virag's suggestion can help you access specific stats related
information

Regards,

--
Mona Chitnis

On 8/30/12 11:59 AM, "Virag Kothari" <[email protected]> wrote:

>Hi,
>
>From 3.2 onwards, counters and hadoop job ids for Pig and Map-reduce can
>be accessed through the API or EL function.
>
>First, the following should be set in wf configuration. This will store
>the Pig/MR related statistics in the DB.
><property>
>                    <name>oozie.action.external.stats.write</name>
>                    <value>true</value>
>                </property>
>
>Then, the stats and jobIds can be accessed using the verbose API
>oozie job -info <jobId> -verbose
>
>Also, the hadoop job Id's can be retrieved for a Pig action through
>El-function
>
>wf:actionData(<pig-action-name>)["hadoopJobs"]
>
>
>Detailed docs at
>http://incubator.apache.org/oozie/docs/3.2.0-incubating/docs/WorkflowFunct
>i
>onalSpec.html. Look under "4.2.5 Hadoop EL Functions"
>
>Thanks,
>Virag
>
>
>
>
>
>On 8/30/12 10:31 AM, "Eduardo Afonso Ferreira" <[email protected]> wrote:
>
>>Hi there,
>>
>>I have a pig that runs periodically by oozie via coordinator with a set
>>frequency.
>>I wanted to capture the Pig script output because I need to look at some
>>information on the results to keep track of several things.
>>I know I can look at the output by doing a whole bunch of clicks starting
>>at the oozie web console as follows:
>>
>>- Open oozie web console (ex.: http://localhost:11000/oozie/)
>>- Find and click the specific job under "Workflow Jobs"
>>- Select (click) the pig action in the window that pops up
>>- Click the magnifying glass icon on the "Console URL" field
>>- Click the Map of the launcher job
>>- Click the task ID
>>- Click All under "Task Logs"
>>
>>My question is how can I know the exact name and location of that log
>>file in HDFS so I can programmaticaly retrieve the file from HDFS and
>>parse and look for what I need?
>>
>>Is this something I can determine ahead of time, like pass a
>>parameter/argument to the action/pig so that it will store the log where
>>I want with the file name I want?
>>
>>Thanks in advance for your help.
>>Eduardo.
>
American Express made the following annotations on Tue Sep 04 2012 14:40:20 

****************************************************************************** 

"This message and any attachments are solely for the intended recipient and may 
contain confidential or privileged information. If you are not the intended 
recipient, any disclosure, copying, use, or distribution of the information 
included in this message and any attachments is prohibited. If you have 
received this communication in error, please notify us by reply e-mail and 
immediately and permanently delete this message and any attachments. Thank 
you." 

American Express a ajouté le commentaire suivant le Tue Sep 04 2012 14:40:20 

Ce courrier et toute pièce jointe qu'il contient sont réservés au seul 
destinataire indiqué et peuvent renfermer des renseignements confidentiels et 
privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, 
duplication, utilisation ou distribution du courrier ou de toute pièce jointe 
est interdite. Si vous avez reçu cette communication par erreur, veuillez nous 
en aviser par courrier et détruire immédiatement le courrier et les pièces 
jointes. Merci. 

****************************************************************************** 
-------------------------------------------------------------------------------

RE: Capturing Pig action output

Reply via email to