You can write a nightly cron that runs the JobHistoryLoader job and stores parsed scripts to hdfs...
D On Wed, Jun 6, 2012 at 5:16 PM, Prashant Kommireddi <[email protected]> wrote: > I think that would be more of a post-process vs having Pig write the same > to a HDFS location. That would avoid having to parse it from job.xml. > > On Wed, Jun 6, 2012 at 4:19 PM, Daniel Dai <[email protected]> wrote: > >> One existing solution is "pig.script" entry inside job.xml, it is the >> serialized Pig script. JobHistoryLoader can load job.xml files and grab >> those entries. Does that solve your problem? >> >> Daniel >> >> On Wed, Jun 6, 2012 at 3:52 PM, Prashant Kommireddi <[email protected] >> >wrote: >> >> > Hi All, >> > >> > What do you guys think about adding a feature to be able to persist the >> > script (file or cache in case of grunt) on HDFS or locally based on an >> > admin setting (pig.properties). This will help infrastructure/ops teams >> > analyze nature of Pig scripts and be able to make certain decisions based >> > on it (optimizing data storage based on access patterns etc). This is >> > actually something we want to do but the challenge is there is no central >> > place where we can track user scripts. >> > >> > It could be a config param "pig.persist.script=/pig/". The script could >> be >> > stored with a configurable name -> ${mapred.job.name}+${user.name >> > }+timestamp" >> > either on HDFS or local based on the configuration setting. >> > >> > Thanks, >> > Prashant >> > >>
