I also agree with this goal. As such, I think we should first see the proposal (JIRA?) for the storage-api refactoring and other related work of Orc separating as TLP before the actual separation happens, to make sure the separation is not done in a way taking us further from this goal. It may very well be this refactoring moves us closer to the goal, but seeing the proposal first would give a lot of clarity.
Thanks Szehon On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <[email protected]> wrote: > To reiterate, one thing I want to avoid is having hive rely on code that > sits in several tiny silos across Apache projects, or Apache Licensed but > not ASF projects. Hive is a mature TLP with a large number of committers > and it would not be a good situation if often work gets bottle necked > because changes had to be made across two projects simultaneously to commit > a feature. Especially if the two projects do not share the same committer > list. > > I think if could be done perfectly things like ORC, Parquet, whatever would > be <provided> scope dependencies, meaning the project can be built without > a particular piece but as a hole the project still works. (That might be > easier said than done :) > > On Wed, Apr 1, 2015 at 2:51 PM, Nick Dimiduk <[email protected]> wrote: > > > I think the storage-api would be very helpful for HBase integration as > > well. > > > > On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley <[email protected]> > wrote: > > > > > > > > > > > On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates <[email protected]> > > wrote: > > > > > >> > > >> > > >> Carl Steinbach <[email protected]> > > >> April 1, 2015 at 0:01 > > >> > > >> Hi Owen, > > >> > > >> I think you're referring to the following questions I asked last week > on > > >> the PMC mailing list: > > >> > > >> 1) How much if any of the code for vectorization/sargs/ACID will > migrate > > >> over to the new ORC project. > > >> > > >> 2) Will Hive contributors encounter situations where they are required > > to > > >> make changes to ORC in order to complete work on projects related to > > >> vectorization/sargs/ACID or other Hive features? > > >> > > >> What I'd like to see here is well defined interfaces in Hive so that > > any > > >> storage format that wants can implement them. Hopefully that means > > things > > >> like interfaces and utility classes for acid, sargs, and vectorization > > move > > >> into this new Hive module storage-api. Then Orc, Parquet, etc. can > > depend > > >> on this module without needing to pull in all of Hive. > > >> > > >> Then Hive contributors would only be forced to make changes in Orc > when > > >> they want to implement something in Orc. > > >> > > > > > > Agreed. The goal of the new module keep a clean separation between the > > > code for ORC and Hive so that vectorization, sargs, and acid are kept > in > > > Hive and are not moved to or duplicated in the ORC project. > > > > > > .. Owen > > > > > >
