The reason for a branch is purely based on fair number of improvements
we are planning for Zebra and our desire to have a stable Zebra
implementation for users to use along with PIG on Hadoop-0.20.
New features planned (jiras will be filed soon) :
* Column security (different permissions for different columns)
* Ability to drop columns
* ability to address "column groups" by name
* Support for sorted tables, map side joins,
* ...
Many of these changes involve changes to table metadata, schema syntax,
and on disk format of the metadata (all of these will be backward
compatible).
If Zebra was a project of its own, one would have made a 0.1.0 branch
and worked on new features in the trunk. The new proposed branch is for
achieving the same by keeping PIG and stable Zebra together. PIG branch
0.4.0 will be made when it is appropriate for PIG. Generally, a contrib
project should not influence that decision.
Is there an alternative to creating a branch? Would you prefer we commit
new features to a line that is being used by users?
Raghu.
Milind A Bhandarkar wrote:
IANAC, but my (non-binding) vote is also -1. I think all the improvements
and feature addition to zebra should be available through pig trunk. The
codebase is not big enough to justify creating a branch. If the reason is
Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry
should be taken up asap, so that those who want to use zebra can use pig
trunk with hadoop 0.20
- milind
On 8/17/09 5:14 PM, "Yiping Han" <y...@yahoo-inc.com> wrote:
+1
On 8/18/09 7:11 AM, "Olga Natkovich" <ol...@yahoo-inc.com> wrote:
+1
-----Original Message-----
From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra
Thanks to the PIG team, The first version of contrib project Zebra
(PIG-833) is committed to PIG trunk.
In short, Zebra is a table storage layer built for use in PIG and other
Hadoop applications.
While we are stabilizing current version V1 in the trunk, we plan to add
more new features to it. We would like to create an svn branch for the
new features. We will be responsible for managing zebra in PIG trunk and
in the new branch. We will merge the branch when it is ready. We expect
the changes to affect only 'contrib/zebra' directory.
As a regular contributor to Hadoop, I will be the initial committer for
Zebra. As more patches are contributed by other Zebra developers, there
might be more commiters added through normal Hadoop/Apache procedure.
I would like to create a branch called 'zebra-v2' with approval from PIG
team.
Thanks,
Raghu.
----------------------
Yiping Han
F-3140
(408)349-4403
y...@yahoo-inc.com