[
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842804#comment-13842804
]
Carl Steinbach commented on HIVE-5783:
--------------------------------------
[~brocknoland] Up to this point we have reserved first-class support for data
formats in Hive (i.e. changing the grammar) to formats that are implemented
natively in the Hive source repository. I think we should maintain this
convention. There are a couple option available if we feel that it's important
for users to be able to create Parquet formatted tables using the abbreviated
syntax:
# Add a format registry feature to Hive that allows admins to register
third-party SerDe implementations and associate them with a format keyword that
users can reference in a DDL statement.
# Maintain two copies of the Parquet SerDe implementation -- one in Hive and
one in the parquet-mr repository -- and backport patches between these
repositories as necessary. If users want to use the parquet-mr version of the
SerDe with Hive they may do so by referencing the third-party package name in
their DDL.
On a side note I think the ticket summary "Native Parquet Support in Hive" is
misleading. Users who see this description in the release notes will conclude
that the Parquet SerDe code lives in Hive when the exact opposite is true.
> Native Parquet Support in Hive
> ------------------------------
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
> Issue Type: New Feature
> Reporter: Justin Coffey
> Assignee: Justin Coffey
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-5783.patch, hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading,
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a
> single Parquet jar will be added as a dependency.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)