[ https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joby Joje updated SPARK-30100: ------------------------------ Description: When trying to load data from JDBC(Oracle) into Spark, there seems to be precision loss in the decimal field, as per my understanding Spark supports *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark rounds off the last four digits making it a precision of DECIMAL(38,10). This is happening to few fields in the dataframe where the column is fetched using a CASE statement whereas in the same query another field populates the right schema. Tried to pass the \{{}} {code:java} spark.sql.decimalOperations.allowPrecisionLoss=false{code} conf in the Spark-submit though didn't get the desired results. {code:java} jdbcDF = spark.read \ .format("jdbc") \ .option("url", "ORACLE") \ .option("dbtable", "QUERY") \ .option("user", "USERNAME") \ .option("password", "PASSWORD") \ .load(){code} So considering that the Spark infers the schema from a sample records, how does this work here? Does it use the results of the query i.e (SELECT * FROM TABLE_NAME JOIN ...) or does it take a different route to guess the schema for itself? Can someone throw some light on this and advise how to achieve the right decimal precision on this regards without manipulating the query as doing a CAST on the query does solve the issue, but would prefer to get some alternatives. was: When trying to load data from JDBC(Oracle) into Spark, there seems to be precision loss in the decimal field, as per my understanding Spark supports *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark rounds off the last four digits making it a precision of DECIMAL(38,10). This is happening to few fields in the dataframe where the column is fetched using a CASE statement whereas in the same query another field populates the right schema. Tried to pass the {{}} {code:java} spark.sql.decimalOperations.allowPrecisionLoss=false{code} {{}} conf in the Spark-submit though didn't get the desired results. {{}} {code:java} jdbcDF = spark.read \ .format("jdbc") \ .option("url", "ORACLE") \ .option("dbtable", "QUERY") \ .option("user", "USERNAME") \ .option("password", "PASSWORD") \ .load(){code} {{}} So considering that the Spark infers the schema from a sample records, how does this work here? Does it use the results of the query i.e (SELECT * FROM TABLE_NAME JOIN ...) or does it take a different route to guess the schema for itself? Can someone throw some light on this and advise how to achieve the right decimal precision on this regards without manipulating the query as doing a CAST on the query does solve the issue, but would prefer to get some alternatives. > Decimal Precision Inferred from JDBC via Spark > ---------------------------------------------- > > Key: SPARK-30100 > URL: https://issues.apache.org/jira/browse/SPARK-30100 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Joby Joje > Priority: Major > > When trying to load data from JDBC(Oracle) into Spark, there seems to be > precision loss in the decimal field, as per my understanding Spark supports > *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark > rounds off the last four digits making it a precision of DECIMAL(38,10). This > is happening to few fields in the dataframe where the column is fetched using > a CASE statement whereas in the same query another field populates the right > schema. > Tried to pass the \{{}} > {code:java} > spark.sql.decimalOperations.allowPrecisionLoss=false{code} > conf in the Spark-submit though didn't get the desired results. > {code:java} > jdbcDF = spark.read \ .format("jdbc") \ .option("url", "ORACLE") \ > .option("dbtable", "QUERY") \ .option("user", "USERNAME") \ > .option("password", "PASSWORD") \ .load(){code} > So considering that the Spark infers the schema from a sample records, how > does this work here? Does it use the results of the query i.e (SELECT * FROM > TABLE_NAME JOIN ...) or does it take a different route to guess the schema > for itself? Can someone throw some light on this and advise how to achieve > the right decimal precision on this regards without manipulating the query as > doing a CAST on the query does solve the issue, but would prefer to get some > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org