SHAILENDRA SHAHANE created SPARK-24496: ------------------------------------------
Summary: CLONE - JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision. Key: SPARK-24496 URL: https://issues.apache.org/jira/browse/SPARK-24496 Project: Spark Issue Type: Bug Components: SQL Reporter: SHAILENDRA SHAHANE Assignee: Hyukjin Kwon Fix For: 2.0.0 Currently, JSON data source supports {{floatAsBigDecimal}} option, which reads floats as {{DecimalType}}. I noticed there are several restrictions in Spark {{DecimalType}} below: 1. The precision cannot be bigger than 38. 2. scale cannot be bigger than precision. However, with the option above, it reads {{BigDecimal}} which does not follow the conditions above. This could be observed as below: {code} def simpleFloats: RDD[String] = sqlContext.sparkContext.parallelize( """{"a": 0.01}""" :: """{"a": 0.02}""" :: Nil) val jsonDF = sqlContext.read .option("floatAsBigDecimal", "true") .json(simpleFloats) jsonDF.printSchema() {code} throws an exception below: {code} org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater than precision (1).; at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:44) at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144) at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108) at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59) at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249) at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57) at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396) at scala.collection.Iterator$class.foreach(Iterator.scala:742) ... {code} Since JSON data source infers {{DataType}} as {{StringType}} when it fails to infer, it might have to be inferred as {{StringType}} or maybe just simply {{DoubleType}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org