Hi,
Sorry for the delay in sending this. Following are the notes from the last
developer's meeting.
Arvind
-----------
*Attendees*
- From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian
- From Cloudera: Arvind
*Agenda*
- Error Handling
- Brainstorming Ideas For 0.9
- Brainstorming Ideas Beyond 0.9
*Error Handling Suggestions/Proposal Discussion:*
- Allow each statement to declare ONERROR clause with a UDF to handle the
control in case of error.
- This would be better than current behavior of exiting on error.
- Alternatively, allow ONERROR to be declared for an entire
script/session which would allow individual statements to override and
provide a more specialized UDF for error handling.
- Yet another alternative - allow the specification of a threshold number
of errors that Pig ignores before exiting.
- Key idea is to ensure that the error handling is focused on data error
handling and not control-flow.
- Action Item: Post the key proposal on the Wiki.
*Brainstorming Ideas For 0.9:*
- Internal development done by March
- Release tentatively by May
- Support for ILLUSTRATE.
- Current status:
- Parser rewrite almost complete
- Working on load data according to schema - support for padding
missing values
- No support for Boolean type planned yet.
- Big features in 0.9
- Parser change
- Macro support
- Jython/Script support
- Penny (Formally Inspector Gadget): framework to instrument scripts.
Allows detection of bad records that cause failures, implement
constraints.
- Works by integrating with the optimizer to produce wrappers for
key UDFs of interest.
- Agents can be added in different parts of the query
- Prepackaged agents available, but framework allows the creation
of custom agents as needed.
- Pending work - implementation of unit tests, and turning this
into a patch.
*Brainstorming Ideas Beyond 0.9:*
- Support for different backends for Pig (MR, Piranha, Local, Oozie)
- Execution engine that can generate plans specific to the underlying
architecture and allow controlling routines to
rewrite/re-optimize the plan
mid-execution.
- Thread safety when running local jobs - to allow better embedding of
Pig as a light-weight tool in web-applications and other multi-threaded
environments.
- Work includes making UDF context thread-safe and removing statics
from the implementation.
- Will benefit Oozie and other systems that embed Pig without having
to worry about side-effects.
- Allow execution to resume from where it left off after due to runtime
failure.
- May be done by allowing Oozie as a backend where the plan is
converted into an Oozie workflow.
- Alternatively Pig could delegate blocks of execution to Oozie.
- Scalability: Pig should support users who may not know the intricate
details of the job/architecture. Things such as memory allocation, skew
handling etc automatically without user involvement.
- Allow pig to kill jobs already submitted if the shell exits due to a
Control+C or other failures.
- UDF 2.0 - simplify UDF interfaces, along with support for multiple
versions of the UDF at the same time.
*General*
- Loops in Pig: No direct support, but available indirectly by
integration with scripting environments.
- Would be good to allow Pig to be provisioned across the cluster for
faster job startup.
- Pig-pen: not under active development and not supported.
On Fri, Feb 11, 2011 at 6:30 PM, Santhosh Srinivasan <[email protected]>wrote:
> Arvind from Cloudera took excellent notes. You should see it next week
> after Alan gets a chance to review them.
>
> Santhosh
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:[email protected]]
> Sent: Friday, February 11, 2011 5:34 PM
> To: [email protected]
> Subject: Re: REMINDER: Pig developer meeting in February
>
> Hi folks,
> Any chance someone took notes? :)
>
> D
>
> On Tue, Feb 8, 2011 at 9:38 PM, Dmitriy Ryaboy <[email protected]> wrote:
> > Hi All,
> > I got sick and won't be able to make it. Would love to see some notes
> > after the meeting :).
> >
> > D
> >
> > On Tue, Feb 8, 2011 at 10:29 AM, Olga Natkovich <[email protected]>
> wrote:
> >> Hi Guys,
> >>
> >> We are looking forward to see you tomorrow at 4 pm at Yahoo campus in
> Sunnyvale.
> >>
> >> Yahoo address is
> >>
> >> 701 First Ave.
> >> Sunnyvale, CA 94089
> >>
> >> We are in building E. Please, ask for Alan or me at the reception.
> >>
> >> Olga
> >>
> >> -----Original Message-----
> >> From: Olga Natkovich [mailto:[email protected]]
> >> Sent: Thursday, February 03, 2011 10:42 AM
> >> To: [email protected]
> >> Subject: REMINDER: Pig developer meeting in February
> >>
> >> Hi guys,
> >>
> >> This is just a reminder that the meeting will be held next Wednesday,
> 2/9 4-6 pm at Yahoo campus.
> >>
> >> If you have not yet responded but planning to attend, please, let me
> know.
> >>
> >> Olga
> >>
> >> -----Original Message-----
> >> From: Santhosh Srinivasan [mailto:[email protected]]
> >> Sent: Friday, January 28, 2011 3:36 PM
> >> To: [email protected]
> >> Subject: RE: Pig developer meeting in February
> >>
> >> I am planning to attend.
> >>
> >> -----Original Message-----
> >> From: Olga Natkovich [mailto:[email protected]]
> >> Sent: Friday, January 28, 2011 12:58 PM
> >> To: [email protected]
> >> Subject: RE: Pig developer meeting in February
> >>
> >> I believe we have critical mass so the meeting is on!
> >>
> >> If you have not responded yet but planning to attend, please, let me
> know.
> >>
> >> Thanks,
> >>
> >> Olga
> >>
> >> -----Original Message-----
> >> From: Julien Le Dem [mailto:[email protected]]
> >> Sent: Thursday, January 27, 2011 5:21 PM
> >> To: [email protected]
> >> Subject: Re: Pig developer meeting in February
> >>
> >> Me too.
> >> Julien
> >>
> >>
> >> On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <[email protected]> wrote:
> >>
> >> Ok yeah I'll come :).
> >>
> >>
> >>
> >> On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <[email protected]>
> wrote:
> >>
> >>> While there is a lively discussion on this thread, I have not
> >>> actually gotten any responses to having the meeting with exception of 1
> person :).
> >>>
> >>> Please, let me know by the end of the week if you are planning to
> attend.
> >>> If we don't get at least a few more responses I suggest we postpone
> >>> the meeting.
> >>>
> >>> Thanks,
> >>>
> >>> Olga
> >>>
> >>> -----Original Message-----
> >>> From: Dmitriy Ryaboy [mailto:[email protected]]
> >>> Sent: Wednesday, January 26, 2011 6:04 PM
> >>> To: [email protected]
> >>> Subject: Re: Pig developer meeting in February
> >>>
> >>> Right, we do partition filtering, but not true predicate pushdown.
> >>>
> >>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <[email protected]>
> >>> wrote:
> >>>
> >>> > Are you talking about LoadMetadata.setPartitionFilter?
> >>> > PartitionFilterOptimizer will do that.
> >>> >
> >>> > Daniel
> >>> >
> >>> >
> >>> > Dmitriy Ryaboy wrote:
> >>> >
> >>> >> I may be wrong but I think predicate pushdown is designed for,
> >>> >> but not actually implemented in the current LoadPushdown
> >>> >> interface (you can only push projections). If I am wrong, that's
> >>> >> great.. but if not, that would
> >>> be
> >>> >> an important feature to add, as people are trying to connect Pig
> >>> >> to "smart"
> >>> >> storage systems like rdbmses, HBase, and Cassandra more and more.
> >>> >> I
> >>> think
> >>> >> we only kind of simulate this with partition keys info, which is
> >>> >> not always sufficient
> >>> >>
> >>> >> D
> >>> >>
> >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem
> >>> >> <[email protected]>
> >>> >> wrote:
> >>> >>
> >>> >>
> >>> >>
> >>> >>> If making Pig Thread safe (i.e.: two threads running a different
> >>> >>> pig
> >>> >>> script) is important then we need to change some of the APIs
> >>> >>> from
> >>> static
> >>> >>> singleton access to a dependency injection pattern.
> >>> >>> In that case, this should probably be done before 1.0 For example:
> >>> >>> UDFContext should be passed to the UDF after construction
> >>> >>> (similar to the SevrletContext in Servlet or the way Hadoop
> >>> >>> passes the context to tasks) Also a clearly separated API that
> >>> >>> does not depend on the Pig implementation would help.
> >>> >>> For example UDFContext is in org.apache.pig.impl.util when it
> >>> >>> would be better in org.apache.pig.api (Or at least an interface
> >>> >>> defining it)
> >>> >>>
> >>> >>> Julien
> >>> >>>
> >>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <[email protected]> wrote:
> >>> >>>
> >>> >>> Hi Guys,
> >>> >>>
> >>> >>> I think it is time for us to have another meeting. Yahoo would
> >>> >>> be happy to host if this works for everybody. How about
> >>> >>> Wednesday,
> >>> >>> 2/9 4-6 pm.
> >>> >>> Please,
> >>> >>> let us know if you are planning to attend and if the date/time
> >>> >>> works
> >>> for
> >>> >>> you.
> >>> >>>
> >>> >>> Things that come to mind to discuss and as always feel free to
> >>> >>> suggest
> >>> >>> others:
> >>> >>>
> >>> >>> - Error handling proposal - this might be easier to
> >>> >>> finalize face-to-face
> >>> >>> - Pig 0.9 plan
> >>> >>> - Pig Roadmap beyond 0.9 o What do we want to do
> >>> >>> in Pig.next?
> >>> >>> o Are we ready for Pig 1.0
> >>> >>>
> >>> >>> Olga
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>
> >>> >
> >>>
> >>
> >>
> >
>