There is a related work overlapping though with (slightly) different
goals and implementations:

http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper37.pdf
http://www.cidrdb.org/cidr2011/Talks/CIDR11_Ikeda.ppt

Ashutosh

On Mon, Feb 14, 2011 at 15:48, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
> Thanks for that, arvind.
>
> Y! folks, is there any public documentation for Penny?
> Is there overlap there with the error handling proposal?
>
> Also: think error handling can make it into 0.9 or are we thinking 0.10?
>
> D
>
> On Mon, Feb 14, 2011 at 12:55 PM, arv...@cloudera.com
> <arv...@cloudera.com>wrote:
>
>> Hi,
>>
>> Sorry for the delay in sending this. Following are the notes from the last
>> developer's meeting.
>>
>> Arvind
>> -----------
>> *Attendees*
>>
>>   - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian
>>   - From Cloudera: Arvind
>>
>> *Agenda*
>>
>>   - Error Handling
>>   - Brainstorming Ideas For 0.9
>>   - Brainstorming Ideas Beyond 0.9
>>
>> *Error Handling Suggestions/Proposal Discussion:*
>>
>>   - Allow each statement to declare ONERROR clause with a UDF to handle the
>>   control in case of error.
>>      - This would be better than current behavior of exiting on error.
>>   - Alternatively, allow ONERROR to be declared for an entire
>>   script/session which would allow individual statements to override and
>>   provide a more specialized UDF for error handling.
>>   - Yet another alternative - allow the specification of a threshold number
>>   of errors that Pig ignores before exiting.
>>   - Key idea is to ensure that the error handling is focused on data error
>>   handling and not control-flow.
>>   - Action Item: Post the key proposal on the Wiki.
>>
>> *Brainstorming Ideas For 0.9:*
>>
>>   - Internal development done by March
>>   - Release tentatively by May
>>   - Support for ILLUSTRATE.
>>   - Current status:
>>      - Parser rewrite almost complete
>>      - Working on load data according to schema - support for padding
>>      missing values
>>      - No support for Boolean type planned yet.
>>   - Big features in 0.9
>>      - Parser change
>>      - Macro support
>>      - Jython/Script support
>>      - Penny (Formally Inspector Gadget): framework to instrument scripts.
>>      Allows detection of bad records that cause failures, implement
>> constraints.
>>         - Works by integrating with the optimizer to produce wrappers for
>>         key UDFs of interest.
>>         - Agents can be added in different parts of the query
>>         - Prepackaged agents available, but framework allows the creation
>>         of custom agents as needed.
>>         - Pending work - implementation of unit tests, and turning this
>>         into a patch.
>>
>> *Brainstorming Ideas Beyond 0.9:*
>>
>>   - Support for different backends for Pig (MR, Piranha, Local, Oozie)
>>      - Execution engine that can generate plans specific to the underlying
>>      architecture and allow controlling routines to
>> rewrite/re-optimize the plan
>>      mid-execution.
>>   - Thread safety when running local jobs - to allow better embedding of
>>   Pig as a light-weight tool in web-applications and other multi-threaded
>>   environments.
>>      - Work includes making UDF context thread-safe and removing statics
>>      from the implementation.
>>      - Will benefit Oozie and other systems that embed Pig without having
>>      to worry about side-effects.
>>   - Allow execution to resume from where it left off after due to runtime
>>   failure.
>>      - May be done by allowing Oozie as a backend where the plan is
>>      converted into an Oozie workflow.
>>      - Alternatively Pig could delegate blocks of execution to Oozie.
>>   - Scalability: Pig should support users who may not know the intricate
>>   details of the job/architecture. Things such as memory allocation, skew
>>   handling etc automatically without user involvement.
>>   - Allow pig to kill jobs already submitted if the shell exits due to a
>>   Control+C or other failures.
>>   - UDF 2.0 - simplify UDF interfaces, along with support for multiple
>>   versions of the UDF at the same time.
>>
>>
>> *General*
>>
>>   - Loops in Pig: No direct support, but available indirectly by
>>   integration with scripting environments.
>>   - Would be good to allow Pig to be provisioned across the cluster for
>>   faster job startup.
>>   - Pig-pen: not under active development and not supported.
>>
>>
>> On Fri, Feb 11, 2011 at 6:30 PM, Santhosh Srinivasan <s...@yahoo-inc.com
>> >wrote:
>>
>> > Arvind from Cloudera took excellent notes. You should see it next week
>> > after Alan gets a chance to review them.
>> >
>> > Santhosh
>> >
>> > -----Original Message-----
>> > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
>> > Sent: Friday, February 11, 2011 5:34 PM
>> > To: dev@pig.apache.org
>> > Subject: Re: REMINDER: Pig developer meeting in February
>> >
>> > Hi folks,
>> > Any chance someone took notes? :)
>> >
>> > D
>> >
>> > On Tue, Feb 8, 2011 at 9:38 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
>> wrote:
>> > > Hi All,
>> > > I got sick and won't be able to make it. Would love to see some notes
>> > > after the meeting :).
>> > >
>> > > D
>> > >
>> > > On Tue, Feb 8, 2011 at 10:29 AM, Olga Natkovich <ol...@yahoo-inc.com>
>> > wrote:
>> > >> Hi Guys,
>> > >>
>> > >> We are looking forward to see you tomorrow at 4 pm at Yahoo campus in
>> > Sunnyvale.
>> > >>
>> > >> Yahoo address is
>> > >>
>> > >> 701 First Ave.
>> > >> Sunnyvale, CA 94089
>> > >>
>> > >> We are in building E. Please, ask for Alan or me at the reception.
>> > >>
>> > >> Olga
>> > >>
>> > >> -----Original Message-----
>> > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
>> > >> Sent: Thursday, February 03, 2011 10:42 AM
>> > >> To: dev@pig.apache.org
>> > >> Subject: REMINDER: Pig developer meeting in February
>> > >>
>> > >> Hi guys,
>> > >>
>> > >> This is just a reminder that the meeting will be held next Wednesday,
>> > 2/9 4-6 pm at Yahoo campus.
>> > >>
>> > >> If you have not yet responded but planning to attend, please, let me
>> > know.
>> > >>
>> > >> Olga
>> > >>
>> > >> -----Original Message-----
>> > >> From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com]
>> > >> Sent: Friday, January 28, 2011 3:36 PM
>> > >> To: dev@pig.apache.org
>> > >> Subject: RE: Pig developer meeting in February
>> > >>
>> > >> I am planning to attend.
>> > >>
>> > >> -----Original Message-----
>> > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
>> > >> Sent: Friday, January 28, 2011 12:58 PM
>> > >> To: dev@pig.apache.org
>> > >> Subject: RE: Pig developer meeting in February
>> > >>
>> > >> I believe we have critical mass so the meeting is on!
>> > >>
>> > >> If you have not responded yet but planning to attend, please, let me
>> > know.
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Olga
>> > >>
>> > >> -----Original Message-----
>> > >> From: Julien Le Dem [mailto:led...@yahoo-inc.com]
>> > >> Sent: Thursday, January 27, 2011 5:21 PM
>> > >> To: dev@pig.apache.org
>> > >> Subject: Re: Pig developer meeting in February
>> > >>
>> > >> Me too.
>> > >> Julien
>> > >>
>> > >>
>> > >> On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote:
>> > >>
>> > >> Ok yeah I'll come :).
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <ol...@yahoo-inc.com>
>> > wrote:
>> > >>
>> > >>> While there is a lively discussion on this thread, I have not
>> > >>> actually gotten any responses to having the meeting with exception of
>> 1
>> > person :).
>> > >>>
>> > >>> Please, let me know by the end of the week if you are planning to
>> > attend.
>> > >>> If we don't get at least a few more responses I suggest we postpone
>> > >>> the meeting.
>> > >>>
>> > >>> Thanks,
>> > >>>
>> > >>> Olga
>> > >>>
>> > >>> -----Original Message-----
>> > >>> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
>> > >>> Sent: Wednesday, January 26, 2011 6:04 PM
>> > >>> To: dev@pig.apache.org
>> > >>> Subject: Re: Pig developer meeting in February
>> > >>>
>> > >>> Right, we do partition filtering, but not true predicate pushdown.
>> > >>>
>> > >>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <jiany...@yahoo-inc.com>
>> > >>> wrote:
>> > >>>
>> > >>> > Are you talking about LoadMetadata.setPartitionFilter?
>> > >>> > PartitionFilterOptimizer will do that.
>> > >>> >
>> > >>> > Daniel
>> > >>> >
>> > >>> >
>> > >>> > Dmitriy Ryaboy wrote:
>> > >>> >
>> > >>> >> I may be wrong but I think predicate pushdown is designed for,
>> > >>> >> but not actually implemented in the current LoadPushdown
>> > >>> >> interface (you can only push projections). If I am wrong, that's
>> > >>> >> great.. but if not, that would
>> > >>> be
>> > >>> >> an important feature to add, as people are trying to connect Pig
>> > >>> >> to "smart"
>> > >>> >> storage systems like rdbmses, HBase, and Cassandra more and more.
>> > >>> >> I
>> > >>> think
>> > >>> >> we only kind of simulate this with partition keys info, which is
>> > >>> >> not always sufficient
>> > >>> >>
>> > >>> >> D
>> > >>> >>
>> > >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem
>> > >>> >> <led...@yahoo-inc.com>
>> > >>> >> wrote:
>> > >>> >>
>> > >>> >>
>> > >>> >>
>> > >>> >>> If making Pig Thread safe (i.e.: two threads running a different
>> > >>> >>> pig
>> > >>> >>> script) is important then we need to change some of the APIs
>> > >>> >>> from
>> > >>> static
>> > >>> >>> singleton access to a dependency injection pattern.
>> > >>> >>> In that case, this should probably be done before 1.0 For
>> example:
>> > >>> >>> UDFContext should be passed to the UDF after construction
>> > >>> >>> (similar to the SevrletContext in Servlet or the way Hadoop
>> > >>> >>> passes the context to tasks) Also a clearly separated API that
>> > >>> >>> does not depend on the Pig implementation would help.
>> > >>> >>> For example UDFContext is in org.apache.pig.impl.util when it
>> > >>> >>> would be better in org.apache.pig.api (Or at least an interface
>> > >>> >>> defining it)
>> > >>> >>>
>> > >>> >>> Julien
>> > >>> >>>
>> > >>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <ol...@yahoo-inc.com>
>> wrote:
>> > >>> >>>
>> > >>> >>> Hi Guys,
>> > >>> >>>
>> > >>> >>> I think it is time for us to have another meeting. Yahoo would
>> > >>> >>> be happy to host if this works for everybody. How about
>> > >>> >>> Wednesday,
>> > >>> >>> 2/9 4-6 pm.
>> > >>> >>> Please,
>> > >>> >>> let us know if you are planning to attend and if the date/time
>> > >>> >>> works
>> > >>> for
>> > >>> >>> you.
>> > >>> >>>
>> > >>> >>> Things that come to mind to discuss and as always feel free to
>> > >>> >>> suggest
>> > >>> >>> others:
>> > >>> >>>
>> > >>> >>> -          Error handling proposal - this might be easier to
>> > >>> >>> finalize face-to-face
>> > >>> >>> -          Pig 0.9 plan
>> > >>> >>> -          Pig Roadmap beyond 0.9 o        What do we want to do
>> > >>> >>> in Pig.next?
>> > >>> >>> o        Are we ready for Pig 1.0
>> > >>> >>>
>> > >>> >>> Olga
>> > >>> >>>
>> > >>> >>>
>> > >>> >>>
>> > >>> >>>
>> > >>> >>
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>

Reply via email to