Hi,
You are right, i will add option to use own compiled class or dynamic
message.

Lukas

On Sun, Oct 26, 2014 at 8:27 PM, Chen Song <[email protected]> wrote:

> Hi,
>
> I am new to Parquet and we have a complicated use case in which we want to
> adopt Parquet as our storage format.
>
> Current:
>
>    - The data is stored in Sequence files as Protobuf.
>    - We have map reduce jobs to write the data. Hive tables were created
>    with Protobuf Serde using elephant-bird so people can query the data via
>    Hive.
>    - We enhance elephant-bird to add our own serializer so one can write
>    data into table via Hive and data is stored in Sequence files as
> Protobuf.
>
>
> Future:
> We want to use Parquet as the underlying storage format without losing
> Protobuf abstraction at application layer. After a bit research and
> practice, I have a few questions.
>
>    - Say if Hive table is created as Parquet table, and data is written via
>    Hive.
>    - If I want to read data in map reduce jobs as Protobuf records, can I
>       use ProtoParquetInputFormat in
>
> https://github.com/Parquet/parquet-mr/blob/master/parquet-protobuf/src/main/java/parquet/proto/ProtoParquetInputFormat.java
> ?
>       After looking at the API, it doesn't seem possible that I can
> specific the
>       Protobuf class for the input path. Instead,
> ProtoParquetInputFormat derives
>       the class from the footer of the underlying data. Is it fair to
>       day ProtoParquetInputFormat will only read data written
>       by ProtoParquetOutputFormat? Is there a way to work around this?
>       - If not, is there any out of the box Hive output format I can use to
>       piggy back ProtoParquetOutputFormat?
>    - If data is written by map reduce job with ProtoParquetOutputFormat.
>    Will read query in Hive work automatically?
>
> Thanks a lot in advance. Any suggestions would be appreciated.
>
> --
> Chen Song
>

Reply via email to