To reiterate, one thing I want to avoid is having hive rely on code that sits in several tiny silos across Apache projects, or Apache Licensed but not ASF projects. Hive is a mature TLP with a large number of committers and it would not be a good situation if often work gets bottle necked because changes had to be made across two projects simultaneously to commit a feature. Especially if the two projects do not share the same committer list.
I think if could be done perfectly things like ORC, Parquet, whatever would be <provided> scope dependencies, meaning the project can be built without a particular piece but as a hole the project still works. (That might be easier said than done :) On Wed, Apr 1, 2015 at 2:51 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: > I think the storage-api would be very helpful for HBase integration as > well. > > On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley <omal...@apache.org> wrote: > > > > > > > On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates <alanfga...@gmail.com> > wrote: > > > >> > >> > >> Carl Steinbach <cwsteinb...@gmail.com> > >> April 1, 2015 at 0:01 > >> > >> Hi Owen, > >> > >> I think you're referring to the following questions I asked last week on > >> the PMC mailing list: > >> > >> 1) How much if any of the code for vectorization/sargs/ACID will migrate > >> over to the new ORC project. > >> > >> 2) Will Hive contributors encounter situations where they are required > to > >> make changes to ORC in order to complete work on projects related to > >> vectorization/sargs/ACID or other Hive features? > >> > >> What I'd like to see here is well defined interfaces in Hive so that > any > >> storage format that wants can implement them. Hopefully that means > things > >> like interfaces and utility classes for acid, sargs, and vectorization > move > >> into this new Hive module storage-api. Then Orc, Parquet, etc. can > depend > >> on this module without needing to pull in all of Hive. > >> > >> Then Hive contributors would only be forced to make changes in Orc when > >> they want to implement something in Orc. > >> > > > > Agreed. The goal of the new module keep a clean separation between the > > code for ORC and Hive so that vectorization, sargs, and acid are kept in > > Hive and are not moved to or duplicated in the ORC project. > > > > .. Owen > > >