Slides about Azkaban and Pig:
http://www.slideshare.net/rjurney/azkaban-pig-5057793

On Thu, Aug 26, 2010 at 12:55 AM, Jeff Zhang <zjf...@gmail.com> wrote:

> Wonderful, Dmitriy, It's pity for me missing the contributor meeting.
> And any ppt shared ?
>
>
>
> On Wed, Aug 25, 2010 at 8:32 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> wrote:
> > Twitter hosted this month's Pig contributor meeting.
> > Developers from Yahoo, Twitter, LinkedIn, RichRelevance, and Cloudera
> were
> > present.
> >
> > 1. Howl
> > First, Alan Gates demoed Howl, a project whose goal is to provide table
> > management service for all of hadoop. The vision is that ultimately you
> will
> > be able to read/write data using regular MR, or Pig, or Hive, and read it
> > using any of those three, with full support of a partition-aware metadata
> > store that will tell you what data is available, what its schema is, etc,
> > reusing a single table abstraction.
> >
> > Currently, tables are created using (a restricted subset of) Hive ddl
> > statements; a howl cli for this will be created, which will enforce the
> > restricted subset.
> > Writing to the table using Pig or MapReduce is supported. Reading can
> > already be done using all three.
> >
> > At the moment, a single Pig store statement can only store into a single
> > partition; adding ability to "spray" across partitions is on the roadmap.
> > This, and a good api for interacting with the metastore, are the two
> areas
> > that were identified as good opportunities for the wider developer
> community
> > to get involved with the project. The source code is on GitHub, and is at
> > the moment synchronized with the development trunk manually; Yahoo folks
> > will look into changing this.
> >
> > Security is a concern, and Yahoo will be working on it. Making it
> possible
> > for Hive to write to the tables is at the moment not as high a priority
> as
> > the others listed, it would basically involve just writing a Hive SerDe
> (an
> > equivalent of Pig's StoreFunc).
> >
> > 2. Azkaban presentation
> > Russel Jurney and Richard Park from LinkedIn presented the workflow
> > management tool open-sourced by LinkedIn, called Azkaban. It allows you
> to
> > declare job dependencies, has a web interface for launching and
> monitoring
> > jobs, etc. It has a special exec mode for Pig that lets you set some
> > Pig-specific options on a per-job basis. It does not currently have
> > triggering or job-instance parameter substitution (it does have job-level
> > parameter substitution).  When asked what would Pig could do to make life
> > easier for Azkaban, the two things Richard identified were registering
> jars
> > through the grunt command line and a way to monitor the running job --
> both
> > of these are already in trunk, so we're in pretty good shaped for 0.8
> >
> > 3. Piggybank discussion
> > Kevin Weil led a discussion of the piggybank. There are a few problems
> with
> > it -- it's released on the Pig schedule, and has quite a few barriers to
> > submission that are, anecdotally at least, preventing people from
> > contributing. Several options were discussed, with the group finally
> > settling on starting a community-curated GitHub project for piggybank. It
> > will have a number of committers from different companies, and will aim
> to
> > make it easy for folks to contribute (all contribs will still have to
> have
> > tests, and be Apache 2.0-licensed). More details will be forthcoming as
> we
> > figure them out. Initially this project will be seeded with the current
> > Piggybank functions some time after 0.8 is branched. The initial list of
> > committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach
> > (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate
> someone.
> > Please send us any thoughts you might have on this subject. It was
> suggested
> > that a lot of common code might be shared with Hive UDFs, which have the
> > same problems as Piggybank does, and that perhaps the project can be
> another
> > collaboration point between the projects. Not clear how that would work,
> > Carl will talk to other Hive people.
> >
> > Pig 0.9
> > So far the items on the list for 0.9 are: better type propagation /
> > resolution story and documentation,  perhaps different parser (ANTLR?),
> some
> > performance tweaks, and map types with fixed-type values. Much still to
> be
> > decided.
> >
> > The next contributor meeting will be hosted by LinkedIn in October.
> >
> > -Dmitriy
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Reply via email to