A couple of points:

1) ORC isn't going into the incubator. The proposal before the board is for it to go straight to TLP. There's no graduation to depend on. 2) As currently proposed Hive would not depend on ORC to build. Hive users who wished to used ORC would obviously need to pull in ORC artifacts in addition to Hive. Given this I don't think it makes any sense to fork ORC and have it in both places. This actually seems the worse outcome, as the two will inevitably diverge.

Alan.

Xuefu Zhang <mailto:xzh...@cloudera.com>
April 3, 2015 at 6:41
I actually have a different thought to share along the same line.

ORC is not a subproject in Hive. I'm not sure if it's the best we can do by
making a surgery on Hive in order to make ORC a TLP, Not only may this
bring instability to Hive, but also it also makes Hive depend an incubating
project. Not every project graduates(, though I do wish ORC a success as
TLP), some of them fail.

Instead, I like the idea of forking Hive ORC as TLP and Hive keeps whatever
it has. This way, the new project can do whatever it wants, and Hive
community probably doesn't care and has no saying to it. Once ORC as a TLP
graduates, Hive community can decide whether to go along with it and if so
how to integrate with it.

I think this will subside the current controversy, help ORC proceed faster
as a TLP, and leave the decision to the near future.

Thanks,
Xuefu


Szehon Ho <mailto:sze...@cloudera.com>
April 2, 2015 at 23:54
I also agree with this goal.

As such, I think we should first see the proposal (JIRA?) for the
storage-api refactoring and other related work of Orc separating as TLP
before the actual separation happens, to make sure the separation is not
done in a way taking us further from this goal. It may very well be this
refactoring moves us closer to the goal, but seeing the proposal first
would give a lot of clarity.

Thanks
Szehon

On Thu, Apr 2, 2015 at 10:20 PM, Edward Capriolo <edlinuxg...@gmail.com>

Edward Capriolo <mailto:edlinuxg...@gmail.com>
April 2, 2015 at 22:20
To reiterate, one thing I want to avoid is having hive rely on code that
sits in several tiny silos across Apache projects, or Apache Licensed but
not ASF projects. Hive is a mature TLP with a large number of committers
and it would not be a good situation if often work gets bottle necked
because changes had to be made across two projects simultaneously to commit
a feature. Especially if the two projects do not share the same committer
list.

I think if could be done perfectly things like ORC, Parquet, whatever would
be <provided> scope dependencies, meaning the project can be built without
a particular piece but as a hole the project still works. (That might be
easier said than done :)


Nick Dimiduk <mailto:ndimi...@gmail.com>
April 1, 2015 at 11:51
I think the storage-api would be very helpful for HBase integration as well.


Owen O'Malley <mailto:omal...@apache.org>
April 1, 2015 at 11:22




    What I'd like to see here is well defined interfaces in Hive so
    that any storage format that wants can implement them.  Hopefully
    that means things like interfaces and utility classes for acid,
    sargs, and vectorization move into this new Hive module
    storage-api.  Then Orc, Parquet, etc. can depend on this module
    without needing to pull in all of Hive.

    Then Hive contributors would only be forced to make changes in Orc
    when they want to implement something in Orc.


Agreed. The goal of the new module keep a clean separation between the code for ORC and Hive so that vectorization, sargs, and acid are kept in Hive and are not moved to or duplicated in the ORC project.

.. Owen

Reply via email to