I read through the ticket, patch and documentation and would like to
suggest some changes.

As far as I can tell this basically adds parquet SerDes to hive, but the
file format remains external to hive. There is no way for hive devs to
makes changes, fix bugs add, change datatypes, add features to parquet
itself.

So:

- I suggest we document it as one of the built-in SerDes and not as a
native format like here:
https://cwiki.apache.org/confluence/display/Hive/Parquet (and here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual)
- I vote for the jira to say "Add parquet SerDes to Hive" and not "Native
support"
- I think we should revert the change to the grammar to allow "STORED AS
PARQUET" until we have a mechanism to do that for all SerDes, i.e.: someone
picks up: HIVE-5976. (I also don't think this actually works properly
unless we bundle parquet in hive-exec, which I don't think we want.)
- We should revert the deprecated classes (At least I don't understand how
a first drop needs to add deprecated stuff)

In general though, I'm also confused on why adding this SerDe to the hive
code base is beneficial. Seems to me that that just makes upgrading
Parquet, bug fixing, etc more difficult by tying a SerDe release to a Hive
release. To me that outweighs the benefit of a slightly more involved setup
of Hive + serde in the cluster.

Thanks,
Gunther.

Reply via email to