Hi, I have Impala created table with the following io format and serde:
inputFormat:parquet.hive.DeprecatedParquetInputFormat, outputFormat:parquet.hive.DeprecatedParquetOutputFormat, serdeInfo:SerDeInfo(name:null, serializationLib:parquet.hive.serde.ParquetHiveSerDe, parameters:{}) I am trying to read this table on Spark SQL 1.3 and see if caching improves my query latency but I am getting exception: java.lang.ClassNotFoundException: Class parquet.hive.serde.ParquetHiveSerDe not found I understand that in hive 0.13 (which I am using) parquet.hive.serde.ParquetHiveSerDe is deprecated but it seems Impala still used it to write the table. I also tried to provide the bundle jar with --jars option to Spark 1.3 Shell / SQL which has org.apache.parquet.hive.serde.ParquetHiveSerDe but I am confused how to configure to serde in SQLContext ? The table which has the following io format and serde can be read fine by Spark SQL 1.3: inputFormat=org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, outputFormat=org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, serializationLib=org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe Thanks. Deb On Sat, Jun 20, 2015 at 12:21 AM, Debasish Das <debasish.da...@gmail.com> wrote: > Hi, > > I have some impala created parquet tables which hive 0.13.2 can read fine. > > Now the same table when I want to read using Spark SQL 1.3 I am getting > exception class exception that parquet.hive.serde.ParquetHiveSerde not > found. > > I am assuming that hive somewhere is putting the parquet-hive-bundle.jar > in hive classpath but I tried putting the parquet-hive-bundle.jar in > spark-1.3/conf/hive-site.xml through auxillary jar but even that did not > work. > > Any input on fixing this will be really helpful. > > Thanks. > Deb >