I think you might need to set 
spark.sql.hive.convertMetastoreParquet to false if I understand that flag 
correctly

Sent on the new Sprint Network from my Samsung Galaxy S®4.

<div>-------- Original message --------</div><div>From: Xiaoyu Wang 
<wangxy...@gmail.com> </div><div>Date:01/16/2015  5:09 AM  (GMT-05:00) 
</div><div>To: user@spark.apache.org </div><div>Subject: Why custom parquet 
format hive table execute "ParquetTableScan" physical plan, not 
"HiveTableScan"? </div><div>
</div>Hi all!

In the Spark SQL1.2.0.
I create a hive table with custom parquet inputformat and outputformat.
like this :
CREATE TABLE test(
  id string, 
  msg string)
CLUSTERED BY ( 
  id) 
SORTED BY ( 
  id ASC) 
INTO 10 BUCKETS
ROW FORMAT SERDE
  'com.a.MyParquetHiveSerDe'
STORED AS INPUTFORMAT 
  'com.a.MyParquetInputFormat' 
OUTPUTFORMAT 
  'com.a.MyParquetOutputFormat';

And the spark shell see the plan of "select * from test" is :

[== Physical Plan ==]
[!OutputFaker [id#5,msg#6]]
[ ParquetTableScan [id#12,msg#13], (ParquetRelation 
hdfs://hadoop/user/hive/warehouse/test.db/test, Some(Configuration: 
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, 
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml), 
org.apache.spark.sql.hive.HiveContext@6d15a113, []), []]

Not HiveTableScan!!!
So it dosn't execute my custom inputformat!
Why? How can it execute my custom inputformat?

Thanks!

Reply via email to