There is a related work overlapping though with (slightly) different goals and implementations:
http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper37.pdf http://www.cidrdb.org/cidr2011/Talks/CIDR11_Ikeda.ppt Ashutosh On Mon, Feb 14, 2011 at 15:48, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Thanks for that, arvind. > > Y! folks, is there any public documentation for Penny? > Is there overlap there with the error handling proposal? > > Also: think error handling can make it into 0.9 or are we thinking 0.10? > > D > > On Mon, Feb 14, 2011 at 12:55 PM, arv...@cloudera.com > <arv...@cloudera.com>wrote: > >> Hi, >> >> Sorry for the delay in sending this. Following are the notes from the last >> developer's meeting. >> >> Arvind >> ----------- >> *Attendees* >> >> - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian >> - From Cloudera: Arvind >> >> *Agenda* >> >> - Error Handling >> - Brainstorming Ideas For 0.9 >> - Brainstorming Ideas Beyond 0.9 >> >> *Error Handling Suggestions/Proposal Discussion:* >> >> - Allow each statement to declare ONERROR clause with a UDF to handle the >> control in case of error. >> - This would be better than current behavior of exiting on error. >> - Alternatively, allow ONERROR to be declared for an entire >> script/session which would allow individual statements to override and >> provide a more specialized UDF for error handling. >> - Yet another alternative - allow the specification of a threshold number >> of errors that Pig ignores before exiting. >> - Key idea is to ensure that the error handling is focused on data error >> handling and not control-flow. >> - Action Item: Post the key proposal on the Wiki. >> >> *Brainstorming Ideas For 0.9:* >> >> - Internal development done by March >> - Release tentatively by May >> - Support for ILLUSTRATE. >> - Current status: >> - Parser rewrite almost complete >> - Working on load data according to schema - support for padding >> missing values >> - No support for Boolean type planned yet. >> - Big features in 0.9 >> - Parser change >> - Macro support >> - Jython/Script support >> - Penny (Formally Inspector Gadget): framework to instrument scripts. >> Allows detection of bad records that cause failures, implement >> constraints. >> - Works by integrating with the optimizer to produce wrappers for >> key UDFs of interest. >> - Agents can be added in different parts of the query >> - Prepackaged agents available, but framework allows the creation >> of custom agents as needed. >> - Pending work - implementation of unit tests, and turning this >> into a patch. >> >> *Brainstorming Ideas Beyond 0.9:* >> >> - Support for different backends for Pig (MR, Piranha, Local, Oozie) >> - Execution engine that can generate plans specific to the underlying >> architecture and allow controlling routines to >> rewrite/re-optimize the plan >> mid-execution. >> - Thread safety when running local jobs - to allow better embedding of >> Pig as a light-weight tool in web-applications and other multi-threaded >> environments. >> - Work includes making UDF context thread-safe and removing statics >> from the implementation. >> - Will benefit Oozie and other systems that embed Pig without having >> to worry about side-effects. >> - Allow execution to resume from where it left off after due to runtime >> failure. >> - May be done by allowing Oozie as a backend where the plan is >> converted into an Oozie workflow. >> - Alternatively Pig could delegate blocks of execution to Oozie. >> - Scalability: Pig should support users who may not know the intricate >> details of the job/architecture. Things such as memory allocation, skew >> handling etc automatically without user involvement. >> - Allow pig to kill jobs already submitted if the shell exits due to a >> Control+C or other failures. >> - UDF 2.0 - simplify UDF interfaces, along with support for multiple >> versions of the UDF at the same time. >> >> >> *General* >> >> - Loops in Pig: No direct support, but available indirectly by >> integration with scripting environments. >> - Would be good to allow Pig to be provisioned across the cluster for >> faster job startup. >> - Pig-pen: not under active development and not supported. >> >> >> On Fri, Feb 11, 2011 at 6:30 PM, Santhosh Srinivasan <s...@yahoo-inc.com >> >wrote: >> >> > Arvind from Cloudera took excellent notes. You should see it next week >> > after Alan gets a chance to review them. >> > >> > Santhosh >> > >> > -----Original Message----- >> > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] >> > Sent: Friday, February 11, 2011 5:34 PM >> > To: dev@pig.apache.org >> > Subject: Re: REMINDER: Pig developer meeting in February >> > >> > Hi folks, >> > Any chance someone took notes? :) >> > >> > D >> > >> > On Tue, Feb 8, 2011 at 9:38 PM, Dmitriy Ryaboy <dvrya...@gmail.com> >> wrote: >> > > Hi All, >> > > I got sick and won't be able to make it. Would love to see some notes >> > > after the meeting :). >> > > >> > > D >> > > >> > > On Tue, Feb 8, 2011 at 10:29 AM, Olga Natkovich <ol...@yahoo-inc.com> >> > wrote: >> > >> Hi Guys, >> > >> >> > >> We are looking forward to see you tomorrow at 4 pm at Yahoo campus in >> > Sunnyvale. >> > >> >> > >> Yahoo address is >> > >> >> > >> 701 First Ave. >> > >> Sunnyvale, CA 94089 >> > >> >> > >> We are in building E. Please, ask for Alan or me at the reception. >> > >> >> > >> Olga >> > >> >> > >> -----Original Message----- >> > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com] >> > >> Sent: Thursday, February 03, 2011 10:42 AM >> > >> To: dev@pig.apache.org >> > >> Subject: REMINDER: Pig developer meeting in February >> > >> >> > >> Hi guys, >> > >> >> > >> This is just a reminder that the meeting will be held next Wednesday, >> > 2/9 4-6 pm at Yahoo campus. >> > >> >> > >> If you have not yet responded but planning to attend, please, let me >> > know. >> > >> >> > >> Olga >> > >> >> > >> -----Original Message----- >> > >> From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] >> > >> Sent: Friday, January 28, 2011 3:36 PM >> > >> To: dev@pig.apache.org >> > >> Subject: RE: Pig developer meeting in February >> > >> >> > >> I am planning to attend. >> > >> >> > >> -----Original Message----- >> > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com] >> > >> Sent: Friday, January 28, 2011 12:58 PM >> > >> To: dev@pig.apache.org >> > >> Subject: RE: Pig developer meeting in February >> > >> >> > >> I believe we have critical mass so the meeting is on! >> > >> >> > >> If you have not responded yet but planning to attend, please, let me >> > know. >> > >> >> > >> Thanks, >> > >> >> > >> Olga >> > >> >> > >> -----Original Message----- >> > >> From: Julien Le Dem [mailto:led...@yahoo-inc.com] >> > >> Sent: Thursday, January 27, 2011 5:21 PM >> > >> To: dev@pig.apache.org >> > >> Subject: Re: Pig developer meeting in February >> > >> >> > >> Me too. >> > >> Julien >> > >> >> > >> >> > >> On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: >> > >> >> > >> Ok yeah I'll come :). >> > >> >> > >> >> > >> >> > >> On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <ol...@yahoo-inc.com> >> > wrote: >> > >> >> > >>> While there is a lively discussion on this thread, I have not >> > >>> actually gotten any responses to having the meeting with exception of >> 1 >> > person :). >> > >>> >> > >>> Please, let me know by the end of the week if you are planning to >> > attend. >> > >>> If we don't get at least a few more responses I suggest we postpone >> > >>> the meeting. >> > >>> >> > >>> Thanks, >> > >>> >> > >>> Olga >> > >>> >> > >>> -----Original Message----- >> > >>> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] >> > >>> Sent: Wednesday, January 26, 2011 6:04 PM >> > >>> To: dev@pig.apache.org >> > >>> Subject: Re: Pig developer meeting in February >> > >>> >> > >>> Right, we do partition filtering, but not true predicate pushdown. >> > >>> >> > >>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <jiany...@yahoo-inc.com> >> > >>> wrote: >> > >>> >> > >>> > Are you talking about LoadMetadata.setPartitionFilter? >> > >>> > PartitionFilterOptimizer will do that. >> > >>> > >> > >>> > Daniel >> > >>> > >> > >>> > >> > >>> > Dmitriy Ryaboy wrote: >> > >>> > >> > >>> >> I may be wrong but I think predicate pushdown is designed for, >> > >>> >> but not actually implemented in the current LoadPushdown >> > >>> >> interface (you can only push projections). If I am wrong, that's >> > >>> >> great.. but if not, that would >> > >>> be >> > >>> >> an important feature to add, as people are trying to connect Pig >> > >>> >> to "smart" >> > >>> >> storage systems like rdbmses, HBase, and Cassandra more and more. >> > >>> >> I >> > >>> think >> > >>> >> we only kind of simulate this with partition keys info, which is >> > >>> >> not always sufficient >> > >>> >> >> > >>> >> D >> > >>> >> >> > >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem >> > >>> >> <led...@yahoo-inc.com> >> > >>> >> wrote: >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >>> If making Pig Thread safe (i.e.: two threads running a different >> > >>> >>> pig >> > >>> >>> script) is important then we need to change some of the APIs >> > >>> >>> from >> > >>> static >> > >>> >>> singleton access to a dependency injection pattern. >> > >>> >>> In that case, this should probably be done before 1.0 For >> example: >> > >>> >>> UDFContext should be passed to the UDF after construction >> > >>> >>> (similar to the SevrletContext in Servlet or the way Hadoop >> > >>> >>> passes the context to tasks) Also a clearly separated API that >> > >>> >>> does not depend on the Pig implementation would help. >> > >>> >>> For example UDFContext is in org.apache.pig.impl.util when it >> > >>> >>> would be better in org.apache.pig.api (Or at least an interface >> > >>> >>> defining it) >> > >>> >>> >> > >>> >>> Julien >> > >>> >>> >> > >>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <ol...@yahoo-inc.com> >> wrote: >> > >>> >>> >> > >>> >>> Hi Guys, >> > >>> >>> >> > >>> >>> I think it is time for us to have another meeting. Yahoo would >> > >>> >>> be happy to host if this works for everybody. How about >> > >>> >>> Wednesday, >> > >>> >>> 2/9 4-6 pm. >> > >>> >>> Please, >> > >>> >>> let us know if you are planning to attend and if the date/time >> > >>> >>> works >> > >>> for >> > >>> >>> you. >> > >>> >>> >> > >>> >>> Things that come to mind to discuss and as always feel free to >> > >>> >>> suggest >> > >>> >>> others: >> > >>> >>> >> > >>> >>> - Error handling proposal - this might be easier to >> > >>> >>> finalize face-to-face >> > >>> >>> - Pig 0.9 plan >> > >>> >>> - Pig Roadmap beyond 0.9 o What do we want to do >> > >>> >>> in Pig.next? >> > >>> >>> o Are we ready for Pig 1.0 >> > >>> >>> >> > >>> >>> Olga >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >> >> > >>> > >> > >>> >> > >> >> > >> >> > > >> > >> >