[ https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107365#comment-17107365 ]
Rafael edited comment on SPARK-30100 at 5/14/20, 2:48 PM: ---------------------------------------------------------- Hey guys, I encountered an issue related to the precision issues. Now the code expects the for the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? was (Author: kyrdan): Hey guys, I encountered an issue related to the precision issues. Now the code expects the for the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869] > Decimal Precision Inferred from JDBC via Spark > ---------------------------------------------- > > Key: SPARK-30100 > URL: https://issues.apache.org/jira/browse/SPARK-30100 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 2.4.0 > Reporter: Joby Joje > Priority: Major > > When trying to load data from JDBC(Oracle) into Spark, there seems to be > precision loss in the decimal field, as per my understanding Spark supports > *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark > rounds off the last four digits making it a precision of DECIMAL(38,10). This > is happening to few fields in the dataframe where the column is fetched using > a CASE statement whereas in the same query another field populates the right > schema. > Tried to pass the > {code:java} > spark.sql.decimalOperations.allowPrecisionLoss=false{code} > conf in the Spark-submit though didn't get the desired results. > {code:java} > jdbcDF = spark.read \ > .format("jdbc") \ > .option("url", "ORACLE") \ > .option("dbtable", "QUERY") \ > .option("user", "USERNAME") \ > .option("password", "PASSWORD") \ > .load(){code} > So considering that the Spark infers the schema from a sample records, how > does this work here? Does it use the results of the query i.e (SELECT * FROM > TABLE_NAME JOIN ...) or does it take a different route to guess the schema > for itself? Can someone throw some light on this and advise how to achieve > the right decimal precision on this regards without manipulating the query as > doing a CAST on the query does solve the issue, but would prefer to get some > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org