Re: [DISCUSS] Making storage-api a separately released artifact

Alan Gates Wed, 17 Aug 2016 10:47:50 -0700

+1 for making the API clean and easy for other projects to work with.  A few 
questions:


1) Would this also make it easier for Parquet and others to implement Hive’s 
ACID interfaces?

2) Would we make any attempt to coordinate version numbers between Hive and the 
storage module, or would a given version of Hive just depend on a given version 
of the storage module?

Alan.

> On Aug 15, 2016, at 17:01, Owen O'Malley <[email protected]> wrote:
> 
> All,
> 
> As part of moving ORC out of Hive, we pulled all of the vectorization
> storage and sarg classes into a separate module, which is named
> storage-api.  Although it is currently only used by ORC, it could be used
> by Parquet or Avro if they wanted to make a fast vectorized reader that
> read directly in to Hive's VectorizedRowBatch without needing a shim or
> data copy. Note that this is in many ways similar to pulling the Arrow
> project out of Drill.
> 
> This unfortunately still leaves us with a circular dependency between Hive
> and ORC. I'd hoped that storage-api wouldn't change that much, but that
> doesn't seem to be happening. As a result, ORC ends up shipping its own
> fork of storage-api.
> 
> Although we could make a new project for just the storage-api, I think it
> would be better to make it a subproject of Hive that is released
> independently.
> 
> What do others think?
> 
>   Owen

Re: [DISCUSS] Making storage-api a separately released artifact

Reply via email to