I also agree with this goal.

As such, I think we should first see the proposal (JIRA?) for the
storage-api refactoring and other related work of Orc separating as TLP
before the actual separation happens, to make sure the separation is not
done in a way taking us further from this goal.  It may very well be this
refactoring moves us closer to the goal, but seeing the proposal first
would give a lot of clarity.

Thanks
Szehon

On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> To reiterate, one thing I want to avoid is having hive rely on code that
> sits in several tiny silos across Apache projects, or Apache Licensed but
> not ASF projects. Hive is a mature TLP with a large number of committers
> and it would not be a good situation if often work gets bottle necked
> because changes had to be made across two projects simultaneously to commit
> a feature. Especially if the two projects do not share the same committer
> list.
>
> I think if could be done perfectly things like ORC, Parquet, whatever would
> be <provided> scope dependencies, meaning the project can be built without
> a particular piece but as a hole the project still works. (That might be
> easier said than done :)
>
> On Wed, Apr 1, 2015 at 2:51 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:
>
> > I think the storage-api would be very helpful for HBase integration as
> > well.
> >
> > On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley <omal...@apache.org>
> wrote:
> >
> > >
> > >
> > > On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates <alanfga...@gmail.com>
> > wrote:
> > >
> > >>
> > >>
> > >>   Carl Steinbach <cwsteinb...@gmail.com>
> > >>  April 1, 2015 at 0:01
> > >>
> > >> Hi Owen,
> > >>
> > >> I think you're referring to the following questions I asked last week
> on
> > >> the PMC mailing list:
> > >>
> > >> 1) How much if any of the code for vectorization/sargs/ACID will
> migrate
> > >> over to the new ORC project.
> > >>
> > >> 2) Will Hive contributors encounter situations where they are required
> > to
> > >> make changes to ORC in order to complete work on projects related to
> > >> vectorization/sargs/ACID or other Hive features?
> > >>
> > >>  What I'd like to see here is well defined interfaces in Hive so that
> > any
> > >> storage format that wants can implement them.  Hopefully that means
> > things
> > >> like interfaces and utility classes for acid, sargs, and vectorization
> > move
> > >> into this new Hive module storage-api.  Then Orc, Parquet, etc. can
> > depend
> > >> on this module without needing to pull in all of Hive.
> > >>
> > >> Then Hive contributors would only be forced to make changes in Orc
> when
> > >> they want to implement something in Orc.
> > >>
> > >
> > > Agreed. The goal of the new module keep a clean separation between the
> > > code for ORC and Hive so that vectorization, sargs, and acid are kept
> in
> > > Hive and are not moved to or duplicated in the ORC project.
> > >
> > > .. Owen
> > >
> >
>

Reply via email to