Wonderful, Dmitriy, It's pity for me missing the contributor meeting. And any ppt shared ?
On Wed, Aug 25, 2010 at 8:32 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Twitter hosted this month's Pig contributor meeting. > Developers from Yahoo, Twitter, LinkedIn, RichRelevance, and Cloudera were > present. > > 1. Howl > First, Alan Gates demoed Howl, a project whose goal is to provide table > management service for all of hadoop. The vision is that ultimately you will > be able to read/write data using regular MR, or Pig, or Hive, and read it > using any of those three, with full support of a partition-aware metadata > store that will tell you what data is available, what its schema is, etc, > reusing a single table abstraction. > > Currently, tables are created using (a restricted subset of) Hive ddl > statements; a howl cli for this will be created, which will enforce the > restricted subset. > Writing to the table using Pig or MapReduce is supported. Reading can > already be done using all three. > > At the moment, a single Pig store statement can only store into a single > partition; adding ability to "spray" across partitions is on the roadmap. > This, and a good api for interacting with the metastore, are the two areas > that were identified as good opportunities for the wider developer community > to get involved with the project. The source code is on GitHub, and is at > the moment synchronized with the development trunk manually; Yahoo folks > will look into changing this. > > Security is a concern, and Yahoo will be working on it. Making it possible > for Hive to write to the tables is at the moment not as high a priority as > the others listed, it would basically involve just writing a Hive SerDe (an > equivalent of Pig's StoreFunc). > > 2. Azkaban presentation > Russel Jurney and Richard Park from LinkedIn presented the workflow > management tool open-sourced by LinkedIn, called Azkaban. It allows you to > declare job dependencies, has a web interface for launching and monitoring > jobs, etc. It has a special exec mode for Pig that lets you set some > Pig-specific options on a per-job basis. It does not currently have > triggering or job-instance parameter substitution (it does have job-level > parameter substitution). When asked what would Pig could do to make life > easier for Azkaban, the two things Richard identified were registering jars > through the grunt command line and a way to monitor the running job -- both > of these are already in trunk, so we're in pretty good shaped for 0.8 > > 3. Piggybank discussion > Kevin Weil led a discussion of the piggybank. There are a few problems with > it -- it's released on the Pig schedule, and has quite a few barriers to > submission that are, anecdotally at least, preventing people from > contributing. Several options were discussed, with the group finally > settling on starting a community-curated GitHub project for piggybank. It > will have a number of committers from different companies, and will aim to > make it easy for folks to contribute (all contribs will still have to have > tests, and be Apache 2.0-licensed). More details will be forthcoming as we > figure them out. Initially this project will be seeded with the current > Piggybank functions some time after 0.8 is branched. The initial list of > committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach > (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone. > Please send us any thoughts you might have on this subject. It was suggested > that a lot of common code might be shared with Hive UDFs, which have the > same problems as Piggybank does, and that perhaps the project can be another > collaboration point between the projects. Not clear how that would work, > Carl will talk to other Hive people. > > Pig 0.9 > So far the items on the list for 0.9 are: better type propagation / > resolution story and documentation, perhaps different parser (ANTLR?), some > performance tweaks, and map types with fixed-type values. Much still to be > decided. > > The next contributor meeting will be hosted by LinkedIn in October. > > -Dmitriy > -- Best Regards Jeff Zhang