Ok, I'm sold. :) On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <[email protected]>wrote:
> Thanks Bill. > > The rationale behind providing a List is that it simply provides a lot > more methods than an iterator. You are right in saying one could do that in > the caller code, I have a feeling providing this helper in the API would be > beneficial. For eg, a framework that is used by clients could initiate > several pig scripts/store commands at once. At the framework layer, you > might want to be able to determine the number of MR jobs in total spawned > by these multiple scripts and query stats on those. That's just one > use-case, there could be more methods on List that a user could be > interested in. > > -Prashant > > > On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[email protected]>wrote: > >> Hi Prashant, >> >> [Replying to the dev list to get others take on these...] >> >> Just curious, why do you prefer a List of JobStats over the already >> existing iterator? I hesitate to add one-liner methods if it's something >> that can be a one-liner my the caller, unless the use case if very common. >> >> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to >> me. >> >> I'm not sure about the rationale behind the differences between >> registerScript and store(). Store() and registerQuery() are able to >> manually add to the DAG as statements come in, but register script needs >> parsing for execution. That's probably why execution is delegated to the >> GruntParser. The resulting DAG for a single-store script should be the same >> though. It seems like registerScript() should be able to return a list of >> ExecJobs. >> >> thanks, >> Bill >> >> >> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <[email protected] >> > wrote: >> >>> Hi Bill, >>> >>> I am looking at PigStats and JobGraph, and am thinking of adding some >>> functions. Let me know what you think. >>> >>> *getJobList()* returns a List representation of the iterator. >>> >>> public List<JobStats> getJobList() { >>> return IteratorUtils.toList(iterator()); >>> } >>> >>> What do you think about making getSuccessfulJobs() and getFailedJobs() >>> public and exposing it to the API? Currently they are package-private? >>> >>> Had another question, seems like the execution flow for >>> PigServer.registerScript/Query is different from PigServer.store(). Was >>> there a reason to make these different? The function store() returns an >>> ExecJob which is great to get info regarding the runs, but registerScript() >>> calls the GruntParser for execution which I think is a different flow? >>> >>> Thanks, >>> Prashant >>> >>> >>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[email protected]>wrote: >>> >>>> Makes sense to me. We could return a PigStats object. >>>> >>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi < >>>> [email protected]>wrote: >>>> >>>> > Hi All, >>>> > >>>> > I am looking at PigServer methods for running scripts/queries and it >>>> seems >>>> > like currently theie return type is void which does not tell much >>>> about job >>>> > completion. >>>> > >>>> > public void registerScript(InputStream in, Map<String,String> >>>> > params,List<String> paramsFiles) throws IOException { >>>> > try { >>>> > String substituted = doParamSubstitution(in, params, >>>> > paramsFiles); >>>> > GruntParser grunt = new GruntParser(new >>>> > StringReader(substituted)); >>>> > grunt.setInteractive(false); >>>> > grunt.setParams(this); >>>> > grunt.parseStopOnError(true); >>>> > } catch (org.apache.pig.tools.pigscript.parser.ParseException >>>> e) { >>>> > log.error(e.getLocalizedMessage()); >>>> > throw new IOException(e.getCause()); >>>> > } >>>> > } >>>> > >>>> > >>>> > We do have a handle on number of jobs succeeded/failed as part of the >>>> job >>>> > run, so that is something we should add as return type? >>>> > >>>> > Thanks, >>>> > Prashant >>>> > >>>> >>>> >>>> >>>> -- >>>> *Note that I'm no longer using my Yahoo! email address. Please email me >>>> at >>>> [email protected] going forward.* >>>> >>> >>> >> >> >> -- >> *Note that I'm no longer using my Yahoo! email address. Please email me >> at [email protected] going forward.* >> > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
