Hi, Lian:
Thanks for the information. It works as expect in the spark with this setting.
Yong
Subject: Re: Is this a Spark issue or Hive issue that Spark cannot read the
string type data in the Parquet generated by Hive
To: java8...@hotmail.com; user@spark.apache.org
From: lian.cs@gmail.com
Please set the the SQL option spark.sql.parquet.binaryAsString to true
when reading Parquet files containing strings generated by Hive.
This is actually a bug of parquet-hive. When generating Parquet schema
for a string field, Parquet requires a "UTF8" annotation, something like:
message
BTW, just checked that this bug should have been fixed since Hive
0.14.0. So the SQL option I mentioned is mostly used for reading legacy
Parquet files generated by older versions of Hive.
Cheng
On 9/25/15 2:42 PM, Cheng Lian wrote:
Please set the the SQL option