Re: requirements for Pig 1.0?

Alan Gates Wed, 24 Jun 2009 10:03:11 -0700

Integration with Owl is something we want for 1.0. I am hopeful thatby Pig's 1.0 Owl will have flown the coop and become either asubproject or found a home in Hadoop's common, since it will hopefullybe used by multiple other subprojects.


Alan.


On Jun 23, 2009, at 11:42 PM, Russell Jurney wrote:

For 1.0 - complete Owl?

http://wiki.apache.org/pig/Metadata

Russell Jurney
rjur...@cloudstenography.com


On Jun 23, 2009, at 4:40 PM, Alan Gates wrote:
I don't believe there's a solid list of want to haves for 1.0. Thebig issue I see is that there are too many interfaces that arestill shifting, such as:
1) Data input/output formats. The way we do slicing (that is, userprovided InputFormats) and the equivalent outputs aren't yetsolid. They are still too tied to load and store functions. Weneed to break those out and understand how they will be expressedin the language. Related to this is the semantics of how Piginteracts with non-file based inputs and outputs. We have asuggestion of moving to URLs, but we haven't finished test drivingthis to see if it will really be what we want.
2) The memory model. While technically the choices we make on howto represent things in memory are internal, the reality is thatthese changes may affect the way we read and write tuples and bags,which in turn may affect our load, store, eval, and filter functions.
3) SQL. We're working on introducing SQL soon, and it will take ita few releases to be fully baked.
4) Much better error messages. In 0.2 our error messages made aleap forward, but before we can claim to be 1.0 I think they needto make 2 more leaps: 1) they need to be written in a way endusers can understand them instead of in a way engineers canunderstand them, including having sufficient error documentationwith suggested courses of action, etc.; 2) they need to be muchbetter at tying errors back to where they happened in the script,right now if one of the MR jobs associated with a Pig Latin scriptfails there is no way to know what part of the script it isassociated with.
There are probably others, but those are the ones I can think ofoff the top of my head. The summary from my viewpoint is we stillhave several 0.x releases before we're ready to consider 1.0. Itwould be nice to be 1.0 not too long after Hadoop is, which stillgives us at least 6-9 months.
Alan.


On Jun 22, 2009, at 10:58 AM, Dmitriy Ryaboy wrote:
I know there was some discussion of making the types release (0.2)a "Pig 1"release, but that got nixed. There wasn't a similar discussion on0.3.
Has the list of want-to-haves for Pig 1.0 been discussed since?

Re: requirements for Pig 1.0?

Reply via email to