Stephane Maarek created SPARK-14586: ---------------------------------------
Summary: SparkSQL doesn't parse decimal like Hive Key: SPARK-14586 URL: https://issues.apache.org/jira/browse/SPARK-14586 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Reporter: Stephane Maarek create a test_data.csv with the following {code:none} a, 2.0 ,3.0 {code} (the space is intended before the 2) copy the test_data.csv to hdfs:///spark_testing_2 go in hive, run the following statements CREATE SCHEMA IF NOT EXISTS spark_testing; DROP TABLE IF EXISTS spark_testing.test_csv_2; CREATE EXTERNAL TABLE `spark_testing.test_csv_2`( column_1 varchar(10), column_2 decimal(4,2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/spark_testing_2' TBLPROPERTIES('serialization.null.format'=''); select * from spark_testing.test_csv_2; OK a 2 NULL 3 {code} As you can see, the value " 2" gets parsed correctly to 2 Now onto Spark-shell: {code:java} val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("select * from spark_testing.test_csv_2").show() +--------+--------+ |column_1|column_2| +--------+--------+ | a| null| | null| 3.00| +--------+--------+ {code} As you can see, the " 2" got parsed to null. Therefore Hive and Spark have a similar parsing behavior for decimals. I wouldn't say it is a bug per se, but it looks like a necessary improvement for the two engines to converge -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org