[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156966#comment-16156966 ] Krzysztof Sobusiak commented on SPARK-20427: As far as Oracle is concerned this is the code to blame: {code:java} private case object OracleDialect extends JdbcDialect { override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle") override def getCatalystType( sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = { if (sqlType == Types.NUMERIC) { val scale = if (null != md) md.build().getLong("scale") else 0L size match { // Handle NUMBER fields that have no precision/scale in special way // because JDBC ResultSetMetaData converts this to 0 precision and -127 scale // For more details, please see // https://github.com/apache/spark/pull/8780#issuecomment-145598968 // and // https://github.com/apache/spark/pull/8780#issuecomment-144541760 case 0 => Option(DecimalType(DecimalType.MAX_PRECISION, 10)) // Handle FLOAT fields in a special way because JDBC ResultSetMetaData converts // this to NUMERIC with -127 scale // Not sure if there is a more robust way to identify the field as a float (or other // numeric types that do not specify a scale. case _ if scale == -127L => Option(DecimalType(DecimalType.MAX_PRECISION, 10)) case _ => None } } else { None } } {code} > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156829#comment-16156829 ] Krzysztof Sobusiak edited comment on SPARK-20427 at 9/7/17 12:30 PM: - You can have very big and very small numbers at the same time in Oracle's NUMBER if precision and scale is not specified. Oracle documentation says: {quote} The following numbers can be stored in a NUMBER column: * Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 significant digits * Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 significant digits {quote} As was already noted before, currently Spark throws an exception for very big numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50). What was not noted is that it also truncates very small numbers to 0 (like 1E-50). As far as I understand you cannot fit all these numbers at the same time in {{DecimalType}} whatever precision and scale you set. I believe the default Spark type for Oracle's NUMBER should be {{DoubleType}}. Last but not least, this issue is not Oracle-specific! I have confirmed that the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL documentation states explicitly: {quote} Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. {quote} So basically NUMBER/NUMERIC *without* precision and scale is very different from NUMBER/NUMERIC *with* precision and scale. was (Author: sobusiak): You can have very big and very small numbers at the same time in Oracle's NUMBER if precision and scale is not specified. Oracle documentation says: {quote} The following numbers can be stored in a NUMBER column: * Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 significant digits * Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 significant digits {quote} As was already noted before, currently Spark throws an exception for very big numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50). What was not noted is that it also truncates very small numbers to 0 (like 1E-50). As far as I understand you cannot fit all these numbers at the same time in {{DecimalType}} whatever precision and scale you set. I believe the default Spark type for Oracle's NUMBER should be {{DoubleType}}. Last but not least, this issue is not Oracle-specific! I have confirmed that the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL documentation states explicitly: {quote} Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. {quote} So basically NUMBER/NUMERIC *is not* {{DecimalType}}. > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156829#comment-16156829 ] Krzysztof Sobusiak commented on SPARK-20427: You can have very big and very small numbers at the same time in Oracle's NUMBER if precision and scale is not specified. Oracle documentation says: {quote} The following numbers can be stored in a NUMBER column: * Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 significant digits * Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 significant digits {quote} As was already noted before, currently Spark throws an exception for very big numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50). What was not noted is that it also truncates very small numbers to 0 (like 1E-50). As far as I understand you cannot fit all these numbers at the same time in {{DecimalType}} whatever precision and scale you set. I believe the default Spark type for Oracle's NUMBER should be {{DoubleType}}. Last but not least, this issue is not Oracle-specific! I have confirmed that the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL documentation states explicitly: {quote} Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. A column of this kind will not coerce input values to any particular scale, whereas numeric columns with a declared scale will coerce input values to that scale. {quote} So basically NUMBER/NUMERIC *is not* {{DecimalType}}. > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org