subject:"RE\: Is this a Spark issue or Hive issue that Spark cannot read the string type data in the Parquet generated by Hive"

RE: Is this a Spark issue or Hive issue that Spark cannot read the string type data in the Parquet generated by Hive

2015-09-28 Thread java8964

Hi, Lian: Thanks for the information. It works as expect in the spark with this setting. Yong Subject: Re: Is this a Spark issue or Hive issue that Spark cannot read the string type data in the Parquet generated by Hive To: java8...@hotmail.com; user@spark.apache.org From: lian.cs@gmail.com

Re: Is this a Spark issue or Hive issue that Spark cannot read the string type data in the Parquet generated by Hive

2015-09-25 Thread Cheng Lian

Please set the the SQL option spark.sql.parquet.binaryAsString to true when reading Parquet files containing strings generated by Hive. This is actually a bug of parquet-hive. When generating Parquet schema for a string field, Parquet requires a "UTF8" annotation, something like: message

Re: Is this a Spark issue or Hive issue that Spark cannot read the string type data in the Parquet generated by Hive

2015-09-25 Thread Cheng Lian

BTW, just checked that this bug should have been fixed since Hive 0.14.0. So the SQL option I mentioned is mostly used for reading legacy Parquet files generated by older versions of Hive. Cheng On 9/25/15 2:42 PM, Cheng Lian wrote: Please set the the SQL option