If I understood Allen's #2 comment, we are moving existing ORC code out of Hive and make it a separate project, which I definitely missed. Since existing Hive PMC has governance on the code, I would expect it's still the case even after the spinoff. Obviously the proposal doesn't reflect this.
Thanks, Xuefu On Fri, Apr 3, 2015 at 12:51 PM, Alan Gates <alanfga...@gmail.com> wrote: > A couple of points: > > 1) ORC isn't going into the incubator. The proposal before the board is > for it to go straight to TLP. There's no graduation to depend on. > 2) As currently proposed Hive would not depend on ORC to build. Hive > users who wished to used ORC would obviously need to pull in ORC artifacts > in addition to Hive. Given this I don't think it makes any sense to fork > ORC and have it in both places. This actually seems the worse outcome, as > the two will inevitably diverge. > > Alan. > > Xuefu Zhang <xzh...@cloudera.com> > April 3, 2015 at 6:41 > I actually have a different thought to share along the same line. > > ORC is not a subproject in Hive. I'm not sure if it's the best we can do by > making a surgery on Hive in order to make ORC a TLP, Not only may this > bring instability to Hive, but also it also makes Hive depend an incubating > project. Not every project graduates(, though I do wish ORC a success as > TLP), some of them fail. > > Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever > it has. This way, the new project can do whatever it wants, and Hive > community probably doesn't care and has no saying to it. Once ORC as a TLP > graduates, Hive community can decide whether to go along with it and if so > how to integrate with it. > > I think this will subside the current controversy, help ORC proceed faster > as a TLP, and leave the decision to the near future. > > Thanks, > Xuefu > > > Szehon Ho <sze...@cloudera.com> > April 2, 2015 at 23:54 > I also agree with this goal. > > As such, I think we should first see the proposal (JIRA?) for the > storage-api refactoring and other related work of Orc separating as TLP > before the actual separation happens, to make sure the separation is not > done in a way taking us further from this goal. It may very well be this > refactoring moves us closer to the goal, but seeing the proposal first > would give a lot of clarity. > > Thanks > Szehon > > On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxg...@gmail.com> > <edlinuxg...@gmail.com> > > Edward Capriolo <edlinuxg...@gmail.com> > April 2, 2015 at 22:20 > To reiterate, one thing I want to avoid is having hive rely on code that > sits in several tiny silos across Apache projects, or Apache Licensed but > not ASF projects. Hive is a mature TLP with a large number of committers > and it would not be a good situation if often work gets bottle necked > because changes had to be made across two projects simultaneously to commit > a feature. Especially if the two projects do not share the same committer > list. > > I think if could be done perfectly things like ORC, Parquet, whatever would > be <provided> scope dependencies, meaning the project can be built without > a particular piece but as a hole the project still works. (That might be > easier said than done :) > > > Nick Dimiduk <ndimi...@gmail.com> > April 1, 2015 at 11:51 > I think the storage-api would be very helpful for HBase integration as > well. > > > Owen O'Malley <omal...@apache.org> > April 1, 2015 at 11:22 > > > >> >> What I'd like to see here is well defined interfaces in Hive so that any >> storage format that wants can implement them. Hopefully that means things >> like interfaces and utility classes for acid, sargs, and vectorization move >> into this new Hive module storage-api. Then Orc, Parquet, etc. can depend >> on this module without needing to pull in all of Hive. >> >> Then Hive contributors would only be forced to make changes in Orc when >> they want to implement something in Orc. >> > > Agreed. The goal of the new module keep a clean separation between the > code for ORC and Hive so that vectorization, sargs, and acid are kept in > Hive and are not moved to or duplicated in the ORC project. > > .. Owen > >