Hello,

Do you use df.write or you make with hivecontext.sql(" insert into ...")?

Angel.

El 12 jun. 2017 11:07 p. m., "Yong Zhang" <java8...@hotmail.com> escribió:

> We are using Spark *1.6.2* as ETL to generate parquet file for one
> dataset, and partitioned by "brand" (which is a string to represent brand
> in this dataset).
>
>
> After the partition files generated in HDFS like "brand=a" folder, we add
> the partitions in the Hive.
>
>
> The hive version is *1.2.1 *(In fact, we are using HDP 2.5.0).
>
>
> Now the problem is that for 2 brand partitions, we cannot query the data
> generated in Spark, but it works fine for the rest of partitions.
>
>
> Below is the error in the Hive CLI and hive.log I got if I query the bad
> partitions like "select * from  tablename where brand='*BrandA*' limit 3;"
>
>
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.UnsupportedOperationException: Cannot inspect
> org.apache.hadoop.io.LongWritable
>
>
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect
> org.apache.hadoop.io.LongWritable
>     at org.apache.hadoop.hive.ql.io.parquet.serde.primitive.
> ParquetStringInspector.getPrimitiveWritableObject(
> ParquetStringInspector.java:52)
>     at org.apache.hadoop.hive.serde2.lazy.LazyUtils.
> writePrimitiveUTF8(LazyUtils.java:222)
>     at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
> serialize(LazySimpleSerDe.java:307)
>     at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(
> LazySimpleSerDe.java:262)
>     at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(
> DelimitedJSONSerDe.java:72)
>     at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.
> doSerialize(LazySimpleSerDe.java:246)
>     at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(
> AbstractEncodingAwareSerDe.java:50)
>     at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.
> convert(DefaultFetchFormatter.java:71)
>     at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.
> convert(DefaultFetchFormatter.java:40)
>     at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(
> ListSinkOperator.java:90)
>     ... 22 more
>
> There are not too much I can find by googling this error message, but it
> points to that the schema in Hive is different as in parquet file.
> But this is a very strange case, as the same schema works fine for other
> brands, which defined as a partition column, and share the whole Hive
> schema as the above.
>
> If I query like: "select * from tablename where brand='*BrandB*' limit
> 3:", everything works fine.
>
> So is this really caused by the Hive schema mismatch with parquet file
> generated by Spark, or by the data within different partitioned keys, or
> really a compatible issue between Spark/Hive?
>
> Thanks
>
> Yong
>
>
>

Reply via email to