True, that does what would serve the purpose. However, I feel the abstraction could be at a lower level so callers of other functions such as "store" could use it too.
On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[email protected]> wrote: > Doesn't executeBatch() return exactly what you want? > > > > On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi > <[email protected]> wrote: > > I knew I had those negotiation skills :) > > > > Patch is available, please review. It's a minor one > > https://issues.apache.org/jira/browse/PIG-2964 > > > > -Prashant > > > > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[email protected]> > wrote: > > > >> Ok, I'm sold. :) > >> > >> > >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi < > [email protected] > >> > wrote: > >> > >>> Thanks Bill. > >>> > >>> The rationale behind providing a List is that it simply provides a lot > >>> more methods than an iterator. You are right in saying one could do > that in > >>> the caller code, I have a feeling providing this helper in the API > would be > >>> beneficial. For eg, a framework that is used by clients could initiate > >>> several pig scripts/store commands at once. At the framework layer, you > >>> might want to be able to determine the number of MR jobs in total > spawned > >>> by these multiple scripts and query stats on those. That's just one > >>> use-case, there could be more methods on List that a user could be > >>> interested in. > >>> > >>> -Prashant > >>> > >>> > >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[email protected] > >wrote: > >>> > >>>> Hi Prashant, > >>>> > >>>> [Replying to the dev list to get others take on these...] > >>>> > >>>> Just curious, why do you prefer a List of JobStats over the already > >>>> existing iterator? I hesitate to add one-liner methods if it's > something > >>>> that can be a one-liner my the caller, unless the use case if very > common. > >>>> > >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable > >>>> to me. > >>>> > >>>> I'm not sure about the rationale behind the differences between > >>>> registerScript and store(). Store() and registerQuery() are able to > >>>> manually add to the DAG as statements come in, but register script > needs > >>>> parsing for execution. That's probably why execution is delegated to > the > >>>> GruntParser. The resulting DAG for a single-store script should be > the same > >>>> though. It seems like registerScript() should be able to return a > list of > >>>> ExecJobs. > >>>> > >>>> thanks, > >>>> Bill > >>>> > >>>> > >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi < > >>>> [email protected]> wrote: > >>>> > >>>>> Hi Bill, > >>>>> > >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some > >>>>> functions. Let me know what you think. > >>>>> > >>>>> *getJobList()* returns a List representation of the iterator. > >>>>> > >>>>> public List<JobStats> getJobList() { > >>>>> return IteratorUtils.toList(iterator()); > >>>>> } > >>>>> > >>>>> What do you think about making getSuccessfulJobs() and > getFailedJobs() > >>>>> public and exposing it to the API? Currently they are > package-private? > >>>>> > >>>>> Had another question, seems like the execution flow for > >>>>> PigServer.registerScript/Query is different from PigServer.store(). > Was > >>>>> there a reason to make these different? The function store() returns > an > >>>>> ExecJob which is great to get info regarding the runs, but > registerScript() > >>>>> calls the GruntParser for execution which I think is a different > flow? > >>>>> > >>>>> Thanks, > >>>>> Prashant > >>>>> > >>>>> > >>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[email protected] > >wrote: > >>>>> > >>>>>> Makes sense to me. We could return a PigStats object. > >>>>>> > >>>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi < > >>>>>> [email protected]>wrote: > >>>>>> > >>>>>> > Hi All, > >>>>>> > > >>>>>> > I am looking at PigServer methods for running scripts/queries and > it > >>>>>> seems > >>>>>> > like currently theie return type is void which does not tell much > >>>>>> about job > >>>>>> > completion. > >>>>>> > > >>>>>> > public void registerScript(InputStream in, Map<String,String> > >>>>>> > params,List<String> paramsFiles) throws IOException { > >>>>>> > try { > >>>>>> > String substituted = doParamSubstitution(in, params, > >>>>>> > paramsFiles); > >>>>>> > GruntParser grunt = new GruntParser(new > >>>>>> > StringReader(substituted)); > >>>>>> > grunt.setInteractive(false); > >>>>>> > grunt.setParams(this); > >>>>>> > grunt.parseStopOnError(true); > >>>>>> > } catch > >>>>>> (org.apache.pig.tools.pigscript.parser.ParseException e) { > >>>>>> > log.error(e.getLocalizedMessage()); > >>>>>> > throw new IOException(e.getCause()); > >>>>>> > } > >>>>>> > } > >>>>>> > > >>>>>> > > >>>>>> > We do have a handle on number of jobs succeeded/failed as part of > >>>>>> the job > >>>>>> > run, so that is something we should add as return type? > >>>>>> > > >>>>>> > Thanks, > >>>>>> > Prashant > >>>>>> > > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> *Note that I'm no longer using my Yahoo! email address. Please email > >>>>>> me at > >>>>>> [email protected] going forward.* > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> *Note that I'm no longer using my Yahoo! email address. Please email > me > >>>> at [email protected] going forward.* > >>>> > >>> > >>> > >> > >> > >> -- > >> *Note that I'm no longer using my Yahoo! email address. Please email me > >> at [email protected] going forward.* > >> >
