[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2021-06-20 Thread Gejun Shen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366173#comment-17366173
 ] 

Gejun Shen edited comment on SPARK-20427 at 6/20/21, 12:16 PM:
---

When you use custom scheme, problem solved according to my test. 

But spark's max precision 38 seems still have some gap with oracle. For oracle, 
I believe there could be 39-40 for 10g version.

>From oracle document.

_p is the precision, or the maximum number of significant decimal digits, where 
the most significant digit is the left-most nonzero digit, and the least 
significant digit is the right-most known digit. Oracle guarantees the 
portability of numbers with precision of up to 20 base-100 digits, which is 
equivalent to 39 or 40 decimal digits depending on the position of the decimal 
point._


was (Author: sgejun):
When you use custom scheme, problem solved according to my test. 

But spark's max precision 38 seems still have some gap with oracle standard. 
For oracle, I believe there could be 39-40 for 10g version.

>From oracle document.

_p is the precision, or the maximum number of significant decimal digits, where 
the most significant digit is the left-most nonzero digit, and the least 
significant digit is the right-most known digit. Oracle guarantees the 
portability of numbers with precision of up to 20 base-100 digits, which is 
equivalent to 39 or 40 decimal digits depending on the position of the decimal 
point._

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2021-06-20 Thread Gejun Shen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366173#comment-17366173
 ] 

Gejun Shen edited comment on SPARK-20427 at 6/20/21, 12:16 PM:
---

When you use custom scheme, problem solved according to my test. 

But spark's max precision 38 seems still have some gap with oracle standard. 
For oracle, I believe there could be 39-40 for 10g version.

>From oracle document.

_p is the precision, or the maximum number of significant decimal digits, where 
the most significant digit is the left-most nonzero digit, and the least 
significant digit is the right-most known digit. Oracle guarantees the 
portability of numbers with precision of up to 20 base-100 digits, which is 
equivalent to 39 or 40 decimal digits depending on the position of the decimal 
point._


was (Author: sgejun):
When you use custom scheme, problem solved according to my test.

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2021-06-20 Thread Gejun Shen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366173#comment-17366173
 ] 

Gejun Shen edited comment on SPARK-20427 at 6/20/21, 12:09 PM:
---

When you use custom scheme, problem solved according to my test.


was (Author: sgejun):
When you use custom scheme, I believe precision would also lost. 

For example, 

123456789012345678901234567890123456789111, it will become something like 
1.2531261878596697E40. Is this really what we want?

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2020-05-18 Thread Rafael (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107376#comment-17107376
 ] 

Rafael edited comment on SPARK-20427 at 5/18/20, 7:27 AM:
--

Hey guys, 
 I encountered an issue related to precision issues.

Now the code expects the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?

 

UPDATE:

Solved with the custom scheme.


was (Author: kyrdan):
Hey guys, 
I encountered an issue related to precision issues.

Now the code expects the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2020-03-08 Thread Sunayan Saikia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054482#comment-17054482
 ] 

Sunayan Saikia edited comment on SPARK-20427 at 3/9/20, 3:58 AM:
-

[~yumwang] 
Seems this fix broke the way we could get the column name with the _'name'_  
key via the MetadataBuiler map inside getCatalystType()
 Is there a way I could get the column name now while I'm overriding the 
getCatalystType() method?

Please check the Java code below for which things broke.
 public Option getCatalystType(int sqlJdbcType, String typeName, int 
size, MetadataBuilder md) {
 String columnName = String.valueOf(md.getMap().get("name").get());


was (Author: sunayansaikia):
Seems this fix broke the way we could get the column name with the _'name'_  
key via the MetadataBuiler map inside getCatalystType()
Is there a way I could get the column name now while I'm overriding the 
getCatalystType() method?

Please check the Java code below for which things broke.
public Option getCatalystType(int sqlJdbcType, String typeName, int 
size, MetadataBuilder md) {
  String columnName = String.valueOf(md.getMap().get("name").get());

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2019-10-01 Thread Paul Wu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090
 ] 

Paul Wu edited comment on SPARK-20427 at 10/1/19 4:13 PM:
--

Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed that one 
does not particularly concern about the exact significant digits in his/her 
applications.
{code:java}
.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 Also, some may think their issues come from the .write() operation, but the 
issues are in fact from the .read() operation. The col1, col2...column names 
are not necessarily from the original tables. They could be the calculated 
fields for output in the queries.   One could mistakenly bark at a wrong place 
to try to fix the issues. 

 


was (Author: zwu@gmail.com):
Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed that one 
does not particularly concern about the exact significant digits in his/her 
applications.
{code:java}
.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 

 

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2019-10-01 Thread Paul Wu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942090#comment-16942090
 ] 

Paul Wu edited comment on SPARK-20427 at 10/1/19 3:49 PM:
--

Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed that one 
does not particularly concern about the exact significant digits in his/her 
applications.
{code:java}
.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 

 


was (Author: zwu@gmail.com):
Some one asked me this problem months ago and I found a solution for him , but 
I forgot the solution when another one in my team asked me again yesterday. I 
had to spend several hours on this since her query was quite complex.  For a 
record and my own reference, I would like to put the solution  here (inspired 
by [~sobusiak]  and [~yumwang] ):  Add the customSchema option after the read() 
that specifies all potential trouble makers as Double types.  It can probably 
resolve most cases in real applications.  Surely, this is supposed one does not 
particularly concern about the exact significant digits in his/her 
applications. 
{code:java}

.read()
.option("customSchema", "col1 Double, col2 Double") //where col1, col2... are 
columns that could cause the trouble.

{code}
 

 

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2017-09-07 Thread Krzysztof Sobusiak (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156829#comment-16156829
 ] 

Krzysztof Sobusiak edited comment on SPARK-20427 at 9/7/17 12:30 PM:
-

You can have very big and very small numbers at the same time in Oracle's 
NUMBER if precision and scale is not specified.
Oracle documentation says:
{quote}
The following numbers can be stored in a NUMBER column:
* Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 
significant digits
* Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 
significant digits
{quote}

As was already noted before, currently Spark throws an exception for very big 
numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50).
What was not noted is that it also truncates very small numbers to 0 (like 
1E-50).

As far as I understand you cannot fit all these numbers at the same time in 
{{DecimalType}} whatever precision and scale you set. I believe the default 
Spark type for Oracle's NUMBER should be {{DoubleType}}.

Last but not least, this issue is not Oracle-specific! I have confirmed that 
the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL 
documentation states explicitly:
{quote}
Specifying NUMERIC without any precision or scale creates a column in which 
numeric values of any precision and scale can be stored, up to the 
implementation limit on precision. A column of this kind will not coerce input 
values to any particular scale, whereas numeric columns with a declared scale 
will coerce input values to that scale.
{quote}
So basically NUMBER/NUMERIC *without* precision and scale is very different 
from NUMBER/NUMERIC *with* precision and scale.


was (Author: sobusiak):
You can have very big and very small numbers at the same time in Oracle's 
NUMBER if precision and scale is not specified.
Oracle documentation says:
{quote}
The following numbers can be stored in a NUMBER column:
* Positive numbers in the range 1 x 10^-130 to 9.99...9 x 10^125 with up to 38 
significant digits
* Negative numbers from -1 x 10^-130 to 9.99...99 x 10^125 with up to 38 
significant digits
{quote}

As was already noted before, currently Spark throws an exception for very big 
numbers (like "Decimal precision 61 exceeds max precision 38" for 1E+50).
What was not noted is that it also truncates very small numbers to 0 (like 
1E-50).

As far as I understand you cannot fit all these numbers at the same time in 
{{DecimalType}} whatever precision and scale you set. I believe the default 
Spark type for Oracle's NUMBER should be {{DoubleType}}.

Last but not least, this issue is not Oracle-specific! I have confirmed that 
the very same problems occur for NUMERIC of PostgreSQL. BTW, PostgreSQL 
documentation states explicitly:
{quote}
Specifying NUMERIC without any precision or scale creates a column in which 
numeric values of any precision and scale can be stored, up to the 
implementation limit on precision. A column of this kind will not coerce input 
values to any particular scale, whereas numeric columns with a declared scale 
will coerce input values to that scale.
{quote}
So basically NUMBER/NUMERIC *is not* {{DecimalType}}.

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org