I read through the ticket, patch and documentation and would like to suggest some changes.
As far as I can tell this basically adds parquet SerDes to hive, but the file format remains external to hive. There is no way for hive devs to makes changes, fix bugs add, change datatypes, add features to parquet itself. So: - I suggest we document it as one of the built-in SerDes and not as a native format like here: https://cwiki.apache.org/confluence/display/Hive/Parquet (and here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual) - I vote for the jira to say "Add parquet SerDes to Hive" and not "Native support" - I think we should revert the change to the grammar to allow "STORED AS PARQUET" until we have a mechanism to do that for all SerDes, i.e.: someone picks up: HIVE-5976. (I also don't think this actually works properly unless we bundle parquet in hive-exec, which I don't think we want.) - We should revert the deprecated classes (At least I don't understand how a first drop needs to add deprecated stuff) In general though, I'm also confused on why adding this SerDe to the hive code base is beneficial. Seems to me that that just makes upgrading Parquet, bug fixing, etc more difficult by tying a SerDe release to a Hive release. To me that outweighs the benefit of a slightly more involved setup of Hive + serde in the cluster. Thanks, Gunther.