[ 
https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joby Joje updated SPARK-30100:
------------------------------
    Description: 
When trying to load data from JDBC(Oracle) into Spark, there seems to be 
precision loss in the decimal field, as per my understanding Spark supports 
*DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark 
rounds off the last four digits making it a precision of DECIMAL(38,10). This 
is happening to few fields in the dataframe where the column is fetched using a 
CASE statement whereas in the same query another field populates the right 
schema.

Tried to pass the \{{}}
{code:java}
spark.sql.decimalOperations.allowPrecisionLoss=false{code}
conf in the Spark-submit though didn't get the desired results.
{code:java}
jdbcDF = spark.read \ .format("jdbc") \ .option("url", "ORACLE") \ 
.option("dbtable", "QUERY") \ .option("user", "USERNAME") \ .option("password", 
"PASSWORD") \ .load(){code}
So considering that the Spark infers the schema from a sample records, how does 
this work here? Does it use the results of the query i.e (SELECT * FROM 
TABLE_NAME JOIN ...) or does it take a different route to guess the schema for 
itself? Can someone throw some light on this and advise how to achieve the 
right decimal precision on this regards without manipulating the query as doing 
a CAST on the query does solve the issue, but would prefer to get some 
alternatives.

  was:
When trying to load data from JDBC(Oracle) into Spark, there seems to be 
precision loss in the decimal field, as per my understanding Spark supports 
*DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark 
rounds off the last four digits making it a precision of DECIMAL(38,10). This 
is happening to few fields in the dataframe where the column is fetched using a 
CASE statement whereas in the same query another field populates the right 
schema.

Tried to pass the {{}}
{code:java}
spark.sql.decimalOperations.allowPrecisionLoss=false{code}
{{}} conf in the Spark-submit though didn't get the desired results.

{{}}
{code:java}
jdbcDF = spark.read \ .format("jdbc") \ .option("url", "ORACLE") \ 
.option("dbtable", "QUERY") \ .option("user", "USERNAME") \ .option("password", 
"PASSWORD") \ .load(){code}
{{}}

So considering that the Spark infers the schema from a sample records, how does 
this work here? Does it use the results of the query i.e (SELECT * FROM 
TABLE_NAME JOIN ...) or does it take a different route to guess the schema for 
itself? Can someone throw some light on this and advise how to achieve the 
right decimal precision on this regards without manipulating the query as doing 
a CAST on the query does solve the issue, but would prefer to get some 
alternatives.


> Decimal Precision Inferred from JDBC via Spark
> ----------------------------------------------
>
>                 Key: SPARK-30100
>                 URL: https://issues.apache.org/jira/browse/SPARK-30100
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Joby Joje
>            Priority: Major
>
> When trying to load data from JDBC(Oracle) into Spark, there seems to be 
> precision loss in the decimal field, as per my understanding Spark supports 
> *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark 
> rounds off the last four digits making it a precision of DECIMAL(38,10). This 
> is happening to few fields in the dataframe where the column is fetched using 
> a CASE statement whereas in the same query another field populates the right 
> schema.
> Tried to pass the \{{}}
> {code:java}
> spark.sql.decimalOperations.allowPrecisionLoss=false{code}
> conf in the Spark-submit though didn't get the desired results.
> {code:java}
> jdbcDF = spark.read \ .format("jdbc") \ .option("url", "ORACLE") \ 
> .option("dbtable", "QUERY") \ .option("user", "USERNAME") \ 
> .option("password", "PASSWORD") \ .load(){code}
> So considering that the Spark infers the schema from a sample records, how 
> does this work here? Does it use the results of the query i.e (SELECT * FROM 
> TABLE_NAME JOIN ...) or does it take a different route to guess the schema 
> for itself? Can someone throw some light on this and advise how to achieve 
> the right decimal precision on this regards without manipulating the query as 
> doing a CAST on the query does solve the issue, but would prefer to get some 
> alternatives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to