Re: PigServer API

Dmitriy Ryaboy Thu, 11 Oct 2012 14:28:49 -0700

Backwards compatibility is an issue..


On Thu, Oct 11, 2012 at 12:54 PM, Prashant Kommireddi
<[email protected]> wrote:
> True, that does what would serve the purpose. However, I feel the
> abstraction could be at a lower level so callers of other functions such as
> "store" could use it too.
>
> On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[email protected]> wrote:
>
>> Doesn't executeBatch() return exactly what you want?
>>
>>
>>
>> On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi
>> <[email protected]> wrote:
>> > I knew I had those negotiation skills :)
>> >
>> > Patch is available, please review. It's a minor one
>> > https://issues.apache.org/jira/browse/PIG-2964
>> >
>> > -Prashant
>> >
>> > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[email protected]>
>> wrote:
>> >
>> >> Ok, I'm sold. :)
>> >>
>> >>
>> >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <
>> [email protected]
>> >> > wrote:
>> >>
>> >>> Thanks Bill.
>> >>>
>> >>> The rationale behind providing a List is that it simply provides a lot
>> >>> more methods than an iterator. You are right in saying one could do
>> that in
>> >>> the caller code, I have a feeling providing this helper in the API
>> would be
>> >>> beneficial. For eg, a framework that is used by clients could initiate
>> >>> several pig scripts/store commands at once. At the framework layer, you
>> >>> might want to be able to determine the number of MR jobs in total
>> spawned
>> >>> by these multiple scripts and query stats on those. That's just one
>> >>> use-case, there could be more methods on List that a user could be
>> >>> interested in.
>> >>>
>> >>> -Prashant
>> >>>
>> >>>
>> >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[email protected]
>> >wrote:
>> >>>
>> >>>> Hi Prashant,
>> >>>>
>> >>>> [Replying to the dev list to get others take on these...]
>> >>>>
>> >>>> Just curious, why do you prefer a List of JobStats over the already
>> >>>> existing iterator? I hesitate to add one-liner methods if it's
>> something
>> >>>> that can be a one-liner my the caller, unless the use case if very
>> common.
>> >>>>
>> >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable
>> >>>> to me.
>> >>>>
>> >>>> I'm not sure about the rationale behind the differences between
>> >>>> registerScript and store(). Store() and registerQuery() are able to
>> >>>> manually add to the DAG as statements come in, but register script
>> needs
>> >>>> parsing for execution. That's probably why execution is delegated to
>> the
>> >>>> GruntParser. The resulting DAG for a single-store script should be
>> the same
>> >>>> though. It seems like registerScript() should be able to return a
>> list of
>> >>>> ExecJobs.
>> >>>>
>> >>>> thanks,
>> >>>> Bill
>> >>>>
>> >>>>
>> >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <
>> >>>> [email protected]> wrote:
>> >>>>
>> >>>>> Hi Bill,
>> >>>>>
>> >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some
>> >>>>> functions. Let me know what you think.
>> >>>>>
>> >>>>> *getJobList()* returns a List representation of the iterator.
>> >>>>>
>> >>>>> public List<JobStats> getJobList() {
>> >>>>>             return IteratorUtils.toList(iterator());
>> >>>>> }
>> >>>>>
>> >>>>> What do you think about making getSuccessfulJobs() and
>> getFailedJobs()
>> >>>>> public and exposing it to the API? Currently they are
>> package-private?
>> >>>>>
>> >>>>> Had another question, seems like the execution flow for
>> >>>>> PigServer.registerScript/Query is different from PigServer.store().
>> Was
>> >>>>> there a reason to make these different? The function store() returns
>> an
>> >>>>> ExecJob which is great to get info regarding the runs, but
>> registerScript()
>> >>>>> calls the GruntParser for execution which I think is a different
>> flow?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Prashant
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[email protected]
>> >wrote:
>> >>>>>
>> >>>>>> Makes sense to me. We could return a PigStats object.
>> >>>>>>
>> >>>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <
>> >>>>>> [email protected]>wrote:
>> >>>>>>
>> >>>>>> > Hi All,
>> >>>>>> >
>> >>>>>> > I am looking at PigServer methods for running scripts/queries and
>> it
>> >>>>>> seems
>> >>>>>> > like currently theie return type is void which does not tell much
>> >>>>>> about job
>> >>>>>> > completion.
>> >>>>>> >
>> >>>>>> >     public void registerScript(InputStream in, Map<String,String>
>> >>>>>> > params,List<String> paramsFiles) throws IOException {
>> >>>>>> >         try {
>> >>>>>> >             String substituted = doParamSubstitution(in, params,
>> >>>>>> > paramsFiles);
>> >>>>>> >             GruntParser grunt = new GruntParser(new
>> >>>>>> > StringReader(substituted));
>> >>>>>> >             grunt.setInteractive(false);
>> >>>>>> >             grunt.setParams(this);
>> >>>>>> >             grunt.parseStopOnError(true);
>> >>>>>> >         } catch
>> >>>>>> (org.apache.pig.tools.pigscript.parser.ParseException e) {
>> >>>>>> >             log.error(e.getLocalizedMessage());
>> >>>>>> >             throw new IOException(e.getCause());
>> >>>>>> >         }
>> >>>>>> >     }
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > We do have a handle on number of jobs succeeded/failed as part of
>> >>>>>> the job
>> >>>>>> > run, so that is something we should add as return type?
>> >>>>>> >
>> >>>>>> > Thanks,
>> >>>>>> > Prashant
>> >>>>>> >
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> *Note that I'm no longer using my Yahoo! email address. Please email
>> >>>>>> me at
>> >>>>>> [email protected] going forward.*
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> *Note that I'm no longer using my Yahoo! email address. Please email
>> me
>> >>>> at [email protected] going forward.*
>> >>>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *Note that I'm no longer using my Yahoo! email address. Please email me
>> >> at [email protected] going forward.*
>> >>
>>

Re: PigServer API

Reply via email to