[jira] [Comment Edited] (SPARK-30100) Decimal Precision Inferred from JDBC via Spark

Rafael (Jira) Thu, 14 May 2020 07:49:08 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107365#comment-17107365
 ]


Rafael edited comment on SPARK-30100 at 5/14/20, 2:48 PM:
----------------------------------------------------------

Hey guys, 
 I encountered an issue related to the precision issues.

Now the code expects the for the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?


was (Author: kyrdan):
Hey guys, 
I encountered an issue related to the precision issues.

Now the code expects the for the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
        at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?
 * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869]

> Decimal Precision Inferred from JDBC via Spark
> ----------------------------------------------
>
>                 Key: SPARK-30100
>                 URL: https://issues.apache.org/jira/browse/SPARK-30100
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.4.0
>            Reporter: Joby Joje
>            Priority: Major
>
> When trying to load data from JDBC(Oracle) into Spark, there seems to be 
> precision loss in the decimal field, as per my understanding Spark supports 
> *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark 
> rounds off the last four digits making it a precision of DECIMAL(38,10). This 
> is happening to few fields in the dataframe where the column is fetched using 
> a CASE statement whereas in the same query another field populates the right 
> schema.
> Tried to pass the
> {code:java}
> spark.sql.decimalOperations.allowPrecisionLoss=false{code}
> conf in the Spark-submit though didn't get the desired results.
> {code:java}
> jdbcDF = spark.read \ 
> .format("jdbc") \ 
> .option("url", "ORACLE") \ 
> .option("dbtable", "QUERY") \ 
> .option("user", "USERNAME") \ 
> .option("password", "PASSWORD") \ 
> .load(){code}
> So considering that the Spark infers the schema from a sample records, how 
> does this work here? Does it use the results of the query i.e (SELECT * FROM 
> TABLE_NAME JOIN ...) or does it take a different route to guess the schema 
> for itself? Can someone throw some light on this and advise how to achieve 
> the right decimal precision on this regards without manipulating the query as 
> doing a CAST on the query does solve the issue, but would prefer to get some 
> alternatives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-30100) Decimal Precision Inferred from JDBC via Spark

Reply via email to