[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366173#comment-17366173 ] Gejun Shen edited comment on SPARK-20427 at 6/20/21, 12:16 PM: --- When you use custom scheme, problem solved according to my test. But spark's max precision 38 seems still have some gap with oracle. For oracle, I believe there could be 39-40 for 10g version. >From oracle document. _p is the precision, or the maximum number of significant decimal digits, where the most significant digit is the left-most nonzero digit, and the least significant digit is the right-most known digit. Oracle guarantees the portability of numbers with precision of up to 20 base-100 digits, which is equivalent to 39 or 40 decimal digits depending on the position of the decimal point._ was (Author: sgejun): When you use custom scheme, problem solved according to my test. But spark's max precision 38 seems still have some gap with oracle standard. For oracle, I believe there could be 39-40 for 10g version. >From oracle document. _p is the precision, or the maximum number of significant decimal digits, where the most significant digit is the left-most nonzero digit, and the least significant digit is the right-most known digit. Oracle guarantees the portability of numbers with precision of up to 20 base-100 digits, which is equivalent to 39 or 40 decimal digits depending on the position of the decimal point._ > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366173#comment-17366173 ] Gejun Shen edited comment on SPARK-20427 at 6/20/21, 12:16 PM: --- When you use custom scheme, problem solved according to my test. But spark's max precision 38 seems still have some gap with oracle standard. For oracle, I believe there could be 39-40 for 10g version. >From oracle document. _p is the precision, or the maximum number of significant decimal digits, where the most significant digit is the left-most nonzero digit, and the least significant digit is the right-most known digit. Oracle guarantees the portability of numbers with precision of up to 20 base-100 digits, which is equivalent to 39 or 40 decimal digits depending on the position of the decimal point._ was (Author: sgejun): When you use custom scheme, problem solved according to my test. > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366173#comment-17366173 ] Gejun Shen edited comment on SPARK-20427 at 6/20/21, 12:09 PM: --- When you use custom scheme, problem solved according to my test. was (Author: sgejun): When you use custom scheme, I believe precision would also lost. For example, 123456789012345678901234567890123456789111, it will become something like 1.2531261878596697E40. Is this really what we want? > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107376#comment-17107376 ] Rafael edited comment on SPARK-20427 at 5/18/20, 7:27 AM: -- Hey guys, I encountered an issue related to precision issues. Now the code expects the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? UPDATE: Solved with the custom scheme. was (Author: kyrdan): Hey guys, I encountered an issue related to precision issues. Now the code expects the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054482#comment-17054482 ] Sunayan Saikia edited comment on SPARK-20427 at 3/9/20, 3:58 AM: - [~yumwang] Seems this fix broke the way we could get the column name with the _'name'_ key via the MetadataBuiler map inside getCatalystType() Is there a way I could get the column name now while I'm overriding the getCatalystType() method? Please check the Java code below for which things broke. public Option getCatalystType(int sqlJdbcType, String typeName, int size, MetadataBuilder md) { String columnName = String.valueOf(md.getMap().get("name").get()); was (Author: sunayansaikia): Seems this fix broke the way we could get the column name with the _'name'_ key via the MetadataBuiler map inside getCatalystType() Is there a way I could get the column name now while I'm overriding the getCatalystType() method? Please check the Java code below for which things broke. public Option getCatalystType(int sqlJdbcType, String typeName, int size, MetadataBuilder md) { String columnName = String.valueOf(md.getMap().get("name").get()); > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090 ] Paul Wu edited comment on SPARK-20427 at 10/1/19 4:13 PM: -- Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed that one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} Also, some may think their issues come from the .write() operation, but the issues are in fact from the .read() operation. The col1, col2...column names are not necessarily from the original tables. They could be the calculated fields for output in the queries. One could mistakenly bark at a wrong place to try to fix the issues. was (Author: zwu@gmail.com): Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed that one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090 ] Paul Wu edited comment on SPARK-20427 at 10/1/19 3:49 PM: -- Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed that one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} was (Author: zwu@gmail.com): Some one asked me this problem months ago and I found a solution for him , but I forgot the solution when another one in my team asked me again yesterday. I had to spend several hours on this since her query was quite complex. For a record and my own reference, I would like to put the solution here (inspired by [~sobusiak] and [~yumwang] ): Add the customSchema option after the read() that specifies all potential trouble makers as Double types. It can probably resolve most cases in real applications. Surely, this is supposed one does not particularly concern about the exact significant digits in his/her applications. {code:java} .read() .option("customSchema", "col1 Double, col2 Double") //where col1, col2... are columns that could cause the trouble. {code} > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156829#comment-16156829 ] Krzysztof Sobusiak edited comment on SPARK-20427 at 9/7/17 12:30 PM: - You can have very big and very small numbers at the same time in Oracle's NUMBER if precision and scale is not specified. Oracle documentation says: {quote} The following numbers can be stored in a NUMBER column: * Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 significant digits * Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 significant digits {quote} As was already noted before, currently Spark throws an exception for very big numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50). What was not noted is that it also truncates very small numbers to 0 (like 1E-50). As far as I understand you cannot fit all these numbers at the same time in {{DecimalType}} whatever precision and scale you set. I believe the default Spark type for Oracle's NUMBER should be {{DoubleType}}. Last but not least, this issue is not Oracle-specific! I have confirmed that the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL documentation states explicitly: {quote} Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. {quote} So basically NUMBER/NUMERIC *without* precision and scale is very different from NUMBER/NUMERIC *with* precision and scale. was (Author: sobusiak): You can have very big and very small numbers at the same time in Oracle's NUMBER if precision and scale is not specified. Oracle documentation says: {quote} The following numbers can be stored in a NUMBER column: * Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 significant digits * Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 significant digits {quote} As was already noted before, currently Spark throws an exception for very big numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50). What was not noted is that it also truncates very small numbers to 0 (like 1E-50). As far as I understand you cannot fit all these numbers at the same time in {{DecimalType}} whatever precision and scale you set. I believe the default Spark type for Oracle's NUMBER should be {{DoubleType}}. Last but not least, this issue is not Oracle-specific! I have confirmed that the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL documentation states explicitly: {quote} Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. {quote} So basically NUMBER/NUMERIC *is not* {{DecimalType}}. > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org