Backwards compatibility is an issue..
On Thu, Oct 11, 2012 at 12:54 PM, Prashant Kommireddi <[email protected]> wrote: > True, that does what would serve the purpose. However, I feel the > abstraction could be at a lower level so callers of other functions such as > "store" could use it too. > > On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[email protected]> wrote: > >> Doesn't executeBatch() return exactly what you want? >> >> >> >> On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi >> <[email protected]> wrote: >> > I knew I had those negotiation skills :) >> > >> > Patch is available, please review. It's a minor one >> > https://issues.apache.org/jira/browse/PIG-2964 >> > >> > -Prashant >> > >> > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[email protected]> >> wrote: >> > >> >> Ok, I'm sold. :) >> >> >> >> >> >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi < >> [email protected] >> >> > wrote: >> >> >> >>> Thanks Bill. >> >>> >> >>> The rationale behind providing a List is that it simply provides a lot >> >>> more methods than an iterator. You are right in saying one could do >> that in >> >>> the caller code, I have a feeling providing this helper in the API >> would be >> >>> beneficial. For eg, a framework that is used by clients could initiate >> >>> several pig scripts/store commands at once. At the framework layer, you >> >>> might want to be able to determine the number of MR jobs in total >> spawned >> >>> by these multiple scripts and query stats on those. That's just one >> >>> use-case, there could be more methods on List that a user could be >> >>> interested in. >> >>> >> >>> -Prashant >> >>> >> >>> >> >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[email protected] >> >wrote: >> >>> >> >>>> Hi Prashant, >> >>>> >> >>>> [Replying to the dev list to get others take on these...] >> >>>> >> >>>> Just curious, why do you prefer a List of JobStats over the already >> >>>> existing iterator? I hesitate to add one-liner methods if it's >> something >> >>>> that can be a one-liner my the caller, unless the use case if very >> common. >> >>>> >> >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable >> >>>> to me. >> >>>> >> >>>> I'm not sure about the rationale behind the differences between >> >>>> registerScript and store(). Store() and registerQuery() are able to >> >>>> manually add to the DAG as statements come in, but register script >> needs >> >>>> parsing for execution. That's probably why execution is delegated to >> the >> >>>> GruntParser. The resulting DAG for a single-store script should be >> the same >> >>>> though. It seems like registerScript() should be able to return a >> list of >> >>>> ExecJobs. >> >>>> >> >>>> thanks, >> >>>> Bill >> >>>> >> >>>> >> >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi < >> >>>> [email protected]> wrote: >> >>>> >> >>>>> Hi Bill, >> >>>>> >> >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some >> >>>>> functions. Let me know what you think. >> >>>>> >> >>>>> *getJobList()* returns a List representation of the iterator. >> >>>>> >> >>>>> public List<JobStats> getJobList() { >> >>>>> return IteratorUtils.toList(iterator()); >> >>>>> } >> >>>>> >> >>>>> What do you think about making getSuccessfulJobs() and >> getFailedJobs() >> >>>>> public and exposing it to the API? Currently they are >> package-private? >> >>>>> >> >>>>> Had another question, seems like the execution flow for >> >>>>> PigServer.registerScript/Query is different from PigServer.store(). >> Was >> >>>>> there a reason to make these different? The function store() returns >> an >> >>>>> ExecJob which is great to get info regarding the runs, but >> registerScript() >> >>>>> calls the GruntParser for execution which I think is a different >> flow? >> >>>>> >> >>>>> Thanks, >> >>>>> Prashant >> >>>>> >> >>>>> >> >>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[email protected] >> >wrote: >> >>>>> >> >>>>>> Makes sense to me. We could return a PigStats object. >> >>>>>> >> >>>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi < >> >>>>>> [email protected]>wrote: >> >>>>>> >> >>>>>> > Hi All, >> >>>>>> > >> >>>>>> > I am looking at PigServer methods for running scripts/queries and >> it >> >>>>>> seems >> >>>>>> > like currently theie return type is void which does not tell much >> >>>>>> about job >> >>>>>> > completion. >> >>>>>> > >> >>>>>> > public void registerScript(InputStream in, Map<String,String> >> >>>>>> > params,List<String> paramsFiles) throws IOException { >> >>>>>> > try { >> >>>>>> > String substituted = doParamSubstitution(in, params, >> >>>>>> > paramsFiles); >> >>>>>> > GruntParser grunt = new GruntParser(new >> >>>>>> > StringReader(substituted)); >> >>>>>> > grunt.setInteractive(false); >> >>>>>> > grunt.setParams(this); >> >>>>>> > grunt.parseStopOnError(true); >> >>>>>> > } catch >> >>>>>> (org.apache.pig.tools.pigscript.parser.ParseException e) { >> >>>>>> > log.error(e.getLocalizedMessage()); >> >>>>>> > throw new IOException(e.getCause()); >> >>>>>> > } >> >>>>>> > } >> >>>>>> > >> >>>>>> > >> >>>>>> > We do have a handle on number of jobs succeeded/failed as part of >> >>>>>> the job >> >>>>>> > run, so that is something we should add as return type? >> >>>>>> > >> >>>>>> > Thanks, >> >>>>>> > Prashant >> >>>>>> > >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> *Note that I'm no longer using my Yahoo! email address. Please email >> >>>>>> me at >> >>>>>> [email protected] going forward.* >> >>>>>> >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> *Note that I'm no longer using my Yahoo! email address. Please email >> me >> >>>> at [email protected] going forward.* >> >>>> >> >>> >> >>> >> >> >> >> >> >> -- >> >> *Note that I'm no longer using my Yahoo! email address. Please email me >> >> at [email protected] going forward.* >> >> >>
