Hey, there are slides from Chris Olston's talk. http://infolab.stanford.edu/infoseminar/olston.txt http://infolab.stanford.edu/infoseminar/olston-slides.pdf
But more formal documentation about Penny/InspectorGadget (cool name btw) would be awesome. Renato M. 2011/2/14 Olga Natkovich <ol...@yahoo-inc.com> > We do not yet have anything public about Penny yet - still trying to figure > out when/if it is going out. Don't think there is whole lot of interaction > with the error handling proposal but I will let Alan to comment on that. > > Given that the error handling proposal is still not finalized and 0.9 > already has lots of changes and little time left, I would suggest delaying > it to the release after 0.9. > > Olga > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > Sent: Monday, February 14, 2011 3:49 PM > To: dev@pig.apache.org > Subject: Re: REMINDER: Pig developer meeting in February > > Thanks for that, arvind. > > Y! folks, is there any public documentation for Penny? > Is there overlap there with the error handling proposal? > > Also: think error handling can make it into 0.9 or are we thinking 0.10? > > D > > On Mon, Feb 14, 2011 at 12:55 PM, arv...@cloudera.com > <arv...@cloudera.com>wrote: > > > Hi, > > > > Sorry for the delay in sending this. Following are the notes from the > last > > developer's meeting. > > > > Arvind > > ----------- > > *Attendees* > > > > - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, > Julian > > - From Cloudera: Arvind > > > > *Agenda* > > > > - Error Handling > > - Brainstorming Ideas For 0.9 > > - Brainstorming Ideas Beyond 0.9 > > > > *Error Handling Suggestions/Proposal Discussion:* > > > > - Allow each statement to declare ONERROR clause with a UDF to handle > the > > control in case of error. > > - This would be better than current behavior of exiting on error. > > - Alternatively, allow ONERROR to be declared for an entire > > script/session which would allow individual statements to override and > > provide a more specialized UDF for error handling. > > - Yet another alternative - allow the specification of a threshold > number > > of errors that Pig ignores before exiting. > > - Key idea is to ensure that the error handling is focused on data > error > > handling and not control-flow. > > - Action Item: Post the key proposal on the Wiki. > > > > *Brainstorming Ideas For 0.9:* > > > > - Internal development done by March > > - Release tentatively by May > > - Support for ILLUSTRATE. > > - Current status: > > - Parser rewrite almost complete > > - Working on load data according to schema - support for padding > > missing values > > - No support for Boolean type planned yet. > > - Big features in 0.9 > > - Parser change > > - Macro support > > - Jython/Script support > > - Penny (Formally Inspector Gadget): framework to instrument > scripts. > > Allows detection of bad records that cause failures, implement > > constraints. > > - Works by integrating with the optimizer to produce wrappers for > > key UDFs of interest. > > - Agents can be added in different parts of the query > > - Prepackaged agents available, but framework allows the creation > > of custom agents as needed. > > - Pending work - implementation of unit tests, and turning this > > into a patch. > > > > *Brainstorming Ideas Beyond 0.9:* > > > > - Support for different backends for Pig (MR, Piranha, Local, Oozie) > > - Execution engine that can generate plans specific to the > underlying > > architecture and allow controlling routines to > > rewrite/re-optimize the plan > > mid-execution. > > - Thread safety when running local jobs - to allow better embedding of > > Pig as a light-weight tool in web-applications and other multi-threaded > > environments. > > - Work includes making UDF context thread-safe and removing statics > > from the implementation. > > - Will benefit Oozie and other systems that embed Pig without having > > to worry about side-effects. > > - Allow execution to resume from where it left off after due to runtime > > failure. > > - May be done by allowing Oozie as a backend where the plan is > > converted into an Oozie workflow. > > - Alternatively Pig could delegate blocks of execution to Oozie. > > - Scalability: Pig should support users who may not know the intricate > > details of the job/architecture. Things such as memory allocation, skew > > handling etc automatically without user involvement. > > - Allow pig to kill jobs already submitted if the shell exits due to a > > Control+C or other failures. > > - UDF 2.0 - simplify UDF interfaces, along with support for multiple > > versions of the UDF at the same time. > > > > > > *General* > > > > - Loops in Pig: No direct support, but available indirectly by > > integration with scripting environments. > > - Would be good to allow Pig to be provisioned across the cluster for > > faster job startup. > > - Pig-pen: not under active development and not supported. > > > > > > On Fri, Feb 11, 2011 at 6:30 PM, Santhosh Srinivasan <s...@yahoo-inc.com > > >wrote: > > > > > Arvind from Cloudera took excellent notes. You should see it next week > > > after Alan gets a chance to review them. > > > > > > Santhosh > > > > > > -----Original Message----- > > > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > > > Sent: Friday, February 11, 2011 5:34 PM > > > To: dev@pig.apache.org > > > Subject: Re: REMINDER: Pig developer meeting in February > > > > > > Hi folks, > > > Any chance someone took notes? :) > > > > > > D > > > > > > On Tue, Feb 8, 2011 at 9:38 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > > wrote: > > > > Hi All, > > > > I got sick and won't be able to make it. Would love to see some notes > > > > after the meeting :). > > > > > > > > D > > > > > > > > On Tue, Feb 8, 2011 at 10:29 AM, Olga Natkovich <ol...@yahoo-inc.com > > > > > wrote: > > > >> Hi Guys, > > > >> > > > >> We are looking forward to see you tomorrow at 4 pm at Yahoo campus > in > > > Sunnyvale. > > > >> > > > >> Yahoo address is > > > >> > > > >> 701 First Ave. > > > >> Sunnyvale, CA 94089 > > > >> > > > >> We are in building E. Please, ask for Alan or me at the reception. > > > >> > > > >> Olga > > > >> > > > >> -----Original Message----- > > > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com] > > > >> Sent: Thursday, February 03, 2011 10:42 AM > > > >> To: dev@pig.apache.org > > > >> Subject: REMINDER: Pig developer meeting in February > > > >> > > > >> Hi guys, > > > >> > > > >> This is just a reminder that the meeting will be held next > Wednesday, > > > 2/9 4-6 pm at Yahoo campus. > > > >> > > > >> If you have not yet responded but planning to attend, please, let me > > > know. > > > >> > > > >> Olga > > > >> > > > >> -----Original Message----- > > > >> From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] > > > >> Sent: Friday, January 28, 2011 3:36 PM > > > >> To: dev@pig.apache.org > > > >> Subject: RE: Pig developer meeting in February > > > >> > > > >> I am planning to attend. > > > >> > > > >> -----Original Message----- > > > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com] > > > >> Sent: Friday, January 28, 2011 12:58 PM > > > >> To: dev@pig.apache.org > > > >> Subject: RE: Pig developer meeting in February > > > >> > > > >> I believe we have critical mass so the meeting is on! > > > >> > > > >> If you have not responded yet but planning to attend, please, let me > > > know. > > > >> > > > >> Thanks, > > > >> > > > >> Olga > > > >> > > > >> -----Original Message----- > > > >> From: Julien Le Dem [mailto:led...@yahoo-inc.com] > > > >> Sent: Thursday, January 27, 2011 5:21 PM > > > >> To: dev@pig.apache.org > > > >> Subject: Re: Pig developer meeting in February > > > >> > > > >> Me too. > > > >> Julien > > > >> > > > >> > > > >> On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: > > > >> > > > >> Ok yeah I'll come :). > > > >> > > > >> > > > >> > > > >> On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich < > ol...@yahoo-inc.com> > > > wrote: > > > >> > > > >>> While there is a lively discussion on this thread, I have not > > > >>> actually gotten any responses to having the meeting with exception > of > > 1 > > > person :). > > > >>> > > > >>> Please, let me know by the end of the week if you are planning to > > > attend. > > > >>> If we don't get at least a few more responses I suggest we postpone > > > >>> the meeting. > > > >>> > > > >>> Thanks, > > > >>> > > > >>> Olga > > > >>> > > > >>> -----Original Message----- > > > >>> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > > > >>> Sent: Wednesday, January 26, 2011 6:04 PM > > > >>> To: dev@pig.apache.org > > > >>> Subject: Re: Pig developer meeting in February > > > >>> > > > >>> Right, we do partition filtering, but not true predicate pushdown. > > > >>> > > > >>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai < > jiany...@yahoo-inc.com> > > > >>> wrote: > > > >>> > > > >>> > Are you talking about LoadMetadata.setPartitionFilter? > > > >>> > PartitionFilterOptimizer will do that. > > > >>> > > > > >>> > Daniel > > > >>> > > > > >>> > > > > >>> > Dmitriy Ryaboy wrote: > > > >>> > > > > >>> >> I may be wrong but I think predicate pushdown is designed for, > > > >>> >> but not actually implemented in the current LoadPushdown > > > >>> >> interface (you can only push projections). If I am wrong, that's > > > >>> >> great.. but if not, that would > > > >>> be > > > >>> >> an important feature to add, as people are trying to connect Pig > > > >>> >> to "smart" > > > >>> >> storage systems like rdbmses, HBase, and Cassandra more and > more. > > > >>> >> I > > > >>> think > > > >>> >> we only kind of simulate this with partition keys info, which is > > > >>> >> not always sufficient > > > >>> >> > > > >>> >> D > > > >>> >> > > > >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem > > > >>> >> <led...@yahoo-inc.com> > > > >>> >> wrote: > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >>> If making Pig Thread safe (i.e.: two threads running a > different > > > >>> >>> pig > > > >>> >>> script) is important then we need to change some of the APIs > > > >>> >>> from > > > >>> static > > > >>> >>> singleton access to a dependency injection pattern. > > > >>> >>> In that case, this should probably be done before 1.0 For > > example: > > > >>> >>> UDFContext should be passed to the UDF after construction > > > >>> >>> (similar to the SevrletContext in Servlet or the way Hadoop > > > >>> >>> passes the context to tasks) Also a clearly separated API that > > > >>> >>> does not depend on the Pig implementation would help. > > > >>> >>> For example UDFContext is in org.apache.pig.impl.util when it > > > >>> >>> would be better in org.apache.pig.api (Or at least an interface > > > >>> >>> defining it) > > > >>> >>> > > > >>> >>> Julien > > > >>> >>> > > > >>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <ol...@yahoo-inc.com> > > wrote: > > > >>> >>> > > > >>> >>> Hi Guys, > > > >>> >>> > > > >>> >>> I think it is time for us to have another meeting. Yahoo would > > > >>> >>> be happy to host if this works for everybody. How about > > > >>> >>> Wednesday, > > > >>> >>> 2/9 4-6 pm. > > > >>> >>> Please, > > > >>> >>> let us know if you are planning to attend and if the date/time > > > >>> >>> works > > > >>> for > > > >>> >>> you. > > > >>> >>> > > > >>> >>> Things that come to mind to discuss and as always feel free to > > > >>> >>> suggest > > > >>> >>> others: > > > >>> >>> > > > >>> >>> - Error handling proposal - this might be easier to > > > >>> >>> finalize face-to-face > > > >>> >>> - Pig 0.9 plan > > > >>> >>> - Pig Roadmap beyond 0.9 o What do we want to > do > > > >>> >>> in Pig.next? > > > >>> >>> o Are we ready for Pig 1.0 > > > >>> >>> > > > >>> >>> Olga > > > >>> >>> > > > >>> >>> > > > >>> >>> > > > >>> >>> > > > >>> >> > > > >>> > > > > >>> > > > >> > > > >> > > > > > > > > > >