[ https://issues.apache.org/jira/browse/SPARK-24766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539420#comment-16539420 ]
Yuming Wang edited comment on SPARK-24766 at 7/11/18 1:24 AM: -------------------------------------------------------------- It works after upgrade built-in Hive to 2.3.2 and upgrade parquet to 1.10.0. {noformat} java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta file:/tmp/spark/parquet/dir/part-00000-89b0c998-1d46-41a1-a241-4b5d1689b709-c000 file: file:/tmp/spark/parquet/dir/part-00000-89b0c998-1d46-41a1-a241-4b5d1689b709-c000 creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) file schema: hive_schema -------------------------------------------------------------------------------- decimal1: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 row group 1: RC:1 TS:60 OFFSET:4 -------------------------------------------------------------------------------- decimal1: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:62/60/0.97 VC:1 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 1, max: 1, num_nulls: 0] {noformat} was (Author: q79969786): It works after upgrade built-in Hive to 2.3.2 and upgrade parquet to 1.10.0. > CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column > stats > --------------------------------------------------------------------------------- > > Key: SPARK-24766 > URL: https://issues.apache.org/jira/browse/SPARK-24766 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: Yuming Wang > Priority: Major > > How to reproduce: > {code:java} > INSERT OVERWRITE LOCAL DIRECTORY '/tmp/spark/parquet/dir' STORED AS parquet > select cast(1 as decimal) as decimal1; > {code} > {code:java} > create table test_parquet stored as parquet as select cast(1 as decimal) as > decimal1; > {code} > {noformat} > $ java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta > file:/tmp/spark/parquet/dir/part-00000-cb96a617-4759-4b21-a222-2153ca0e8951-c000 > file: > file:/tmp/spark/parquet/dir/part-00000-cb96a617-4759-4b21-a222-2153ca0e8951-c000 > creator: parquet-mr version 1.6.0 (build > 6aa21f8776625b5fa6b18059cfebe7549f2e00cb) > file schema: hive_schema > -------------------------------------------------------------------------------- > decimal1: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > row group 1: RC:1 TS:46 OFFSET:4 > -------------------------------------------------------------------------------- > decimal1: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:48/46/0.96 VC:1 > ENC:BIT_PACKED,PLAIN,RLE ST:[no stats for this column] > {noformat} > because spark still use com.twitter.parquet-hadoop-bundle.1.6.0. > May be we should refactor {{CreateHiveTableAsSelectCommand}} and > {{InsertIntoHiveDirCommand}} or [upgrade built-in > Hive|https://issues.apache.org/jira/browse/SPARK-23710]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org