[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2017-09-07 Thread Krzysztof Sobusiak (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156966#comment-16156966
 ] 

Krzysztof Sobusiak commented on SPARK-20427:


As far as Oracle is concerned this is the code to blame:

{code:java}
private case object OracleDialect extends JdbcDialect {

  override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle")

  override def getCatalystType(
  sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
if (sqlType == Types.NUMERIC) {
  val scale = if (null != md) md.build().getLong("scale") else 0L
  size match {
// Handle NUMBER fields that have no precision/scale in special way
// because JDBC ResultSetMetaData converts this to 0 precision and -127 
scale
// For more details, please see
// https://github.com/apache/spark/pull/8780#issuecomment-145598968
// and
// https://github.com/apache/spark/pull/8780#issuecomment-144541760
case 0 => Option(DecimalType(DecimalType.MAX_PRECISION, 10))
// Handle FLOAT fields in a special way because JDBC ResultSetMetaData 
converts
// this to NUMERIC with -127 scale
// Not sure if there is a more robust way to identify the field as a 
float (or other
// numeric types that do not specify a scale.
case _ if scale == -127L => 
Option(DecimalType(DecimalType.MAX_PRECISION, 10))
case _ => None
  }
} else {
  None
}
  }
{code}


> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2017-09-07 Thread Krzysztof Sobusiak (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156829#comment-16156829
 ] 

Krzysztof Sobusiak edited comment on SPARK-20427 at 9/7/17 12:30 PM:
-

You can have very big and very small numbers at the same time in Oracle's 
NUMBER if precision and scale is not specified.
Oracle documentation says:
{quote}
The following numbers can be stored in a NUMBER column:
* Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 
significant digits
* Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 
significant digits
{quote}

As was already noted before, currently Spark throws an exception for very big 
numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50).
What was not noted is that it also truncates very small numbers to 0 (like 
1E-50).

As far as I understand you cannot fit all these numbers at the same time in 
{{DecimalType}} whatever precision and scale you set. I believe the default 
Spark type for Oracle's NUMBER should be {{DoubleType}}.

Last but not least, this issue is not Oracle-specific! I have confirmed that 
the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL 
documentation states explicitly:
{quote}
Specifying NUMERIC without any precision or scale creates a column in which 
numeric values of any precision and scale can be stored, up to the 
implementation limit on precision. A column of this kind will not coerce input 
values to any particular scale, whereas numeric columns with a declared scale 
will coerce input values to that scale.
{quote}
So basically NUMBER/NUMERIC *without* precision and scale is very different 
from NUMBER/NUMERIC *with* precision and scale.


was (Author: sobusiak):
You can have very big and very small numbers at the same time in Oracle's 
NUMBER if precision and scale is not specified.
Oracle documentation says:
{quote}
The following numbers can be stored in a NUMBER column:
* Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 
significant digits
* Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 
significant digits
{quote}

As was already noted before, currently Spark throws an exception for very big 
numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50).
What was not noted is that it also truncates very small numbers to 0 (like 
1E-50).

As far as I understand you cannot fit all these numbers at the same time in 
{{DecimalType}} whatever precision and scale you set. I believe the default 
Spark type for Oracle's NUMBER should be {{DoubleType}}.

Last but not least, this issue is not Oracle-specific! I have confirmed that 
the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL 
documentation states explicitly:
{quote}
Specifying NUMERIC without any precision or scale creates a column in which 
numeric values of any precision and scale can be stored, up to the 
implementation limit on precision. A column of this kind will not coerce input 
values to any particular scale, whereas numeric columns with a declared scale 
will coerce input values to that scale.
{quote}
So basically NUMBER/NUMERIC *is not* {{DecimalType}}.

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2017-09-07 Thread Krzysztof Sobusiak (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156829#comment-16156829
 ] 

Krzysztof Sobusiak commented on SPARK-20427:


You can have very big and very small numbers at the same time in Oracle's 
NUMBER if precision and scale is not specified.
Oracle documentation says:
{quote}
The following numbers can be stored in a NUMBER column:
* Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 
significant digits
* Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 
significant digits
{quote}

As was already noted before, currently Spark throws an exception for very big 
numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50).
What was not noted is that it also truncates very small numbers to 0 (like 
1E-50).

As far as I understand you cannot fit all these numbers at the same time in 
{{DecimalType}} whatever precision and scale you set. I believe the default 
Spark type for Oracle's NUMBER should be {{DoubleType}}.

Last but not least, this issue is not Oracle-specific! I have confirmed that 
the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL 
documentation states explicitly:
{quote}
Specifying NUMERIC without any precision or scale creates a column in which 
numeric values of any precision and scale can be stored, up to the 
implementation limit on precision. A column of this kind will not coerce input 
values to any particular scale, whereas numeric columns with a declared scale 
will coerce input values to that scale.
{quote}
So basically NUMBER/NUMERIC *is not* {{DecimalType}}.

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org