Speaking of the C++ ORC reader and writer, could they be included in the Hive project or do they have to be separate because they aren't Java code?
By the way, gmail thwarts adding [DISCUSS] to the subject line. It shows up in the mail archives, although pre- & post-DISCUSS threads are separate. -- Lefty On Fri, Apr 10, 2015 at 11:56 PM, Gopal Vijayaraghavan <gop...@apache.org> wrote: > > > On 4/10/15, 8:05 PM, "Xuefu Zhang" <xzh...@cloudera.com> wrote: > > >To Owen's explanation - Thanks. I guess my major concern is that we > >seemingly are breaking apart Hive's integrity and making it hard to > >release > >and maintain due to increasing number of external dependents. Let's say > >that Hive depends on a certain version of ORC (as TLP) and it's found that > >ORC has a bug that seriously impacts Hive users. We cannot release Hive as > >fast as we can, since dong so would need ORC community to fix the problem > >and make a release, for which Hive PMC has no control. On the contrary, > >Hive community can quickly fix the problem and make a release without > >waiting for other projects to make a release. I'm not sure this move (ORC > >as TLP) will be beneficial to vast Hive users. > > You need to understand exactly what this brings about for Hive, in fact to > those who do not use ORC today. > > With the proposed changes, competing formats like Parquet might be able to > compete with ORC in terms of hive features. > > That is the direct impact of standardization of a Storage-API > implementation. > > As an independent project, new ORC features cannot use the fact that it is > included in the ql/ source to introduce circular dependencies between > ql.exec -> orc -> ql.exec.vector classes. > > As far as your concern for risks go, I would ask for a comparison against > the bugs/release cycles of ³STORED AS PARQUET². > > As a Hive contributor, I¹m certain that if I find a core issue in Parquet, > my patches would be welcome there. > > That should be beneficial to the Parquet community, but might not be > aligned entirely along employer lines, since my patch might be good, but > my intention would be to migrating warehouses with > parquet.hive.DeprecatedParquetInputFormat Impala tables to Hive. > > Resolving that conflict should be ideally left to the Parquet IPMC & the > ASF rather than the Hive PMC (or let¹s do a bias check *to* Hive?). > > Now - reverse that argument and replay it, except instead we¹re talking > about the C++ ORC reader plus a non-ASF SQL competitor to Hive. > > > >If this not convincing, let me propose that we spin off metastore also as > >TLP tomorrow! > > http://incubator.apache.org/projects/hcatalog.html > > Cheers, > Gopal > > >