[jira] [Commented] (SPARK-15348) Hive ACID

2018-05-14 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475265#comment-16475265
 ] 

Harleen Singh Mann commented on SPARK-15348:


Agreed with Arvind. This means either I dont use hive-ACID or break my pipeline 
out from spark and use hql

> Hive ACID
> -
>
> Key: SPARK-15348
> URL: https://issues.apache.org/jira/browse/SPARK-15348
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0, 2.3.0
>Reporter: Ran Haim
>Priority: Major
>
> Spark does not support any feature of hive's transnational tables,
> you cannot use spark to delete/update a table and it also has problems 
> reading the aggregated data when no compaction was done.
> Also it seems that compaction is not supported - alter table ... partition 
>  COMPACT 'major'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-11 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360257#comment-16360257
 ] 

Harleen Singh Mann commented on SPARK-23370:


# Yes, querying the table would mean non-trivial performance impact
 # It works for all tables that the jdbc user has access to. For more 
information refer to 
[https://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_2094.htm]

This is very similar to the INFORMATION_SCHEMA.COLUMNS table in MySQL.

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23372) Writing empty struct in parquet fails during execution. It should fail earlier during analysis.

2018-02-11 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360252#comment-16360252
 ] 

Harleen Singh Mann commented on SPARK-23372:


Perhaps a dumb question: Why are you edition 
[OrcUtils.scala|https://github.com/apache/spark/pull/20579/files#diff-3fb8426b690ab771c4f67f9cad336498]
 ?

> Writing empty struct in parquet fails during execution. It should fail 
> earlier during analysis.
> ---
>
> Key: SPARK-23372
> URL: https://issues.apache.org/jira/browse/SPARK-23372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dilip Biswal
>Priority: Minor
>
> *Running*
> spark.emptyDataFrame.write.format("parquet").mode("overwrite").save(path)
> *Results in*
> {code:java}
>  org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with 
> an empty group: message spark_schema {
>  }
> at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
>  at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
>  at org.apache.parquet.schema.MessageType.accept(MessageType.java:58)
>  at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:376)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:387)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:278)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:276)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:281)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:206)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:205)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.
>  {code}
> We should detect this earlier in the processing and raise the error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-10 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359788#comment-16359788
 ] 

Harleen Singh Mann commented on SPARK-23370:


This goes as far as I understand: 
 * JDBC driver: Once we create the result set object using the jdbc driver, it 
will contain all the actual data as well as the metadata for the concerned DB 
table. 
 * Query additional table (all_tab_rows): This would entail creating another 
result set that will capture the metadata for the concerned DB table as data 
(rows). Overhead:
 ** Connection: None. Since it will use pooling
 ** Retrieving result: Low impact. Since we will push down the predicate to the 
DB to filter data only for the concerned table

I believe that "all_tab_rows" table should be queried on the driver and 
broadcast to the executors. Does this make sense?

Can we get some inputs from someone else as well?

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23372) Writing empty struct in parquet fails during execution. It should fail earlier during analysis.

2018-02-09 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359202#comment-16359202
 ] 

Harleen Singh Mann commented on SPARK-23372:


[~dkbiswal] How will it throw the error during compile time?

with ref to your statement: 
_"We should detect this earlier and failed during compilation of the query."_ I 
mean the use of "compilation" in the sentence is probably incorrect. I will 
suggest changing it to "during preparing/executing the query".

 
 
 

> Writing empty struct in parquet fails during execution. It should fail 
> earlier during analysis.
> ---
>
> Key: SPARK-23372
> URL: https://issues.apache.org/jira/browse/SPARK-23372
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dilip Biswal
>Priority: Minor
>
> *Running*
> spark.emptyDataFrame.write.format("parquet").mode("overwrite").save(path)
> *Results in*
> {code:java}
>  org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with 
> an empty group: message spark_schema {
>  }
> at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
>  at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
>  at org.apache.parquet.schema.MessageType.accept(MessageType.java:58)
>  at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:376)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:387)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:278)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:276)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:281)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:206)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:205)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.
>  {code}
> We should detect this earlier and failed during compilation of the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359200#comment-16359200
 ] 

Harleen Singh Mann commented on SPARK-23370:


[~srowen] Yes should be able to implement in the Oracle JDBC dialect. I want to 
start working on it once we agree it adds value.

Do you mean overhead for Spark? Or for the Oracle DB? Or for the developer? haha

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358495#comment-16358495
 ] 

Harleen Singh Mann commented on SPARK-23370:


[~q79969786] your suggestion would work but only if one knows in advance that 
there exists a column in Oracle DB of type Numeric and created using alter 
table statement. This information is seldom available to developers.

[~srowen] True, it is an Oracle issue. If everyone agrees that Spark has 
nothing to do with it we may close this issue as is.

However, I feel there may be merit in evaluating the way spark is fetching 
schema information from jdbc - i.e. resultSet.getMetaData.getColumnType VS from 
all_tabs_columns

 

Thanks.

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Minor
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23372) Writing empty struct in parquet fails during execution. It should fail earlier during analysis.

2018-02-09 Thread Harleen Singh Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358294#comment-16358294
 ] 

Harleen Singh Mann commented on SPARK-23372:


what is your proposal on fixing this?

> Writing empty struct in parquet fails during execution. It should fail 
> earlier during analysis.
> ---
>
> Key: SPARK-23372
> URL: https://issues.apache.org/jira/browse/SPARK-23372
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dilip Biswal
>Priority: Minor
>
> *Running*
> spark.emptyDataFrame.write.format("parquet").mode("overwrite").save(path)
> *Results in*
> {code:java}
>  org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with 
> an empty group: message spark_schema {
>  }
> at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
>  at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
>  at org.apache.parquet.schema.MessageType.accept(MessageType.java:58)
>  at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:376)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:387)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:278)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:276)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:281)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:206)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:205)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.
>  {code}
> We should detect this earlier and failed during compilation of the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harleen Singh Mann updated SPARK-23370:
---
Shepherd: Sean Owen  (was: Xiangrui Meng)

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Major
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harleen Singh Mann updated SPARK-23370:
---
Shepherd: Xiangrui Meng

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Major
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harleen Singh Mann updated SPARK-23370:
---
Description: 
Currently, on jdbc read spark obtains the schema of a table from using 
{color:#654982} resultSet.getMetaData.getColumnType{color}

This works 99.99% of the times except when the column of Number type is added 
on an Oracle table using the alter statement. This is essentially an Oracle DB 
+ JDBC bug that has been documented on Oracle KB and patches exist. [oracle 
KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]

{color:#ff}As a result of the above mentioned issue, Spark receives a size 
of 0 for the field and defaults the field type to be BigDecimal(30,10) instead 
of what it actually should be. This is done in OracleDialect.scala. This may 
cause issues in the downstream application where relevant information may be 
missed to the changed precision and scale.{color}

_The versions that are affected are:_ 
 _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
 _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
_[Release: 11.1 to 11.2]_ 

+Proposed approach:+

There is another way of fetching the schema information in Oracle: Which is 
through the all_tab_columns table. If we use this table to fetch the precision 
and scale of Number time, the above issue is mitigated.

 

{color:#14892c}{color:#f6c342}I can implement the changes, but require some 
inputs on the approach from the gatekeepers here{color}.{color}


 {color:#14892c}PS. This is also my first Jira issue and my first fork for 
Spark, so I will need some guidance along the way. (yes, I am a newbee to this) 
Thanks...{color}

  was:
Currently, on jdbc read spark obtains the schema of a table from using 
{color:#654982} resultSet.getMetaData.getColumnType{color}

This works 99.99% of the times except when the column of Number type is added 
on an Oracle table using the alter statement. This is essentially an Oracle DB 
+ JDBC bug that has been documented on Oracle KB and patches exist. [oracle 
KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]

{color:#FF}As a result of the above mentioned issue, Spark receives a size 
of 0 for the field and defaults the field type to be BigDecimal(30,10) instead 
of what it actually should be. This is done in OracleDialect.scala. This may 
cause issues in the downstream application where relevant information may be 
missed to the changed precision and scale.{color}

_The versions that are affected are:_ 
_JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
_Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
_[Release: 11.1 to 11.2]_ 

+Proposed approach:+

There is another way of fetching the schema information in Oracle: Which is 
through the all_tab_columns table. If we use this table to fetch the precision 
and scale of Number time, the above issue is mitigated.

 

{color:#14892c}I can implement the changes, but require some inputs on the 
approach from the gatekeepers here.{color}
{color:#14892c}PS. This is also my first Jira issue and my first fork for 
Spark, so I will need some guidance along the way. (yes, I am a newbee to this) 
Thanks...{color}


> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Major
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 1

[jira] [Updated] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Harleen Singh Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harleen Singh Mann updated SPARK-23370:
---
Summary: Spark receives a size of 0 for an Oracle Number field and defaults 
the field type to be BigDecimal(30,10) instead of the actual precision and 
scale  (was: Spark receives a size of 0 for an Oracle Number field defaults the 
field type to be BigDecimal(30,10))

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Major
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#FF}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
> _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
> _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}I can implement the changes, but require some inputs on the 
> approach from the gatekeepers here.{color}
> {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field defaults the field type to be BigDecimal(30,10)

2018-02-09 Thread Harleen Singh Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harleen Singh Mann updated SPARK-23370:
---
Attachment: Oracle KB Document 1266785.pdf

> Spark receives a size of 0 for an Oracle Number field defaults the field type 
> to be BigDecimal(30,10)
> -
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Major
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#FF}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
> _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
> _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}I can implement the changes, but require some inputs on the 
> approach from the gatekeepers here.{color}
> {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field defaults the field type to be BigDecimal(30,10)

2018-02-09 Thread Harleen Singh Mann (JIRA)
Harleen Singh Mann created SPARK-23370:
--

 Summary: Spark receives a size of 0 for an Oracle Number field 
defaults the field type to be BigDecimal(30,10)
 Key: SPARK-23370
 URL: https://issues.apache.org/jira/browse/SPARK-23370
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.1
 Environment: Spark 2.2

Oracle 11g

JDBC ojdbc6.jar
Reporter: Harleen Singh Mann


Currently, on jdbc read spark obtains the schema of a table from using 
{color:#654982} resultSet.getMetaData.getColumnType{color}

This works 99.99% of the times except when the column of Number type is added 
on an Oracle table using the alter statement. This is essentially an Oracle DB 
+ JDBC bug that has been documented on Oracle KB and patches exist. [oracle 
KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]

{color:#FF}As a result of the above mentioned issue, Spark receives a size 
of 0 for the field and defaults the field type to be BigDecimal(30,10) instead 
of what it actually should be. This is done in OracleDialect.scala. This may 
cause issues in the downstream application where relevant information may be 
missed to the changed precision and scale.{color}

_The versions that are affected are:_ 
_JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
_Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
_[Release: 11.1 to 11.2]_ 

+Proposed approach:+

There is another way of fetching the schema information in Oracle: Which is 
through the all_tab_columns table. If we use this table to fetch the precision 
and scale of Number time, the above issue is mitigated.

 

{color:#14892c}I can implement the changes, but require some inputs on the 
approach from the gatekeepers here.{color}
{color:#14892c}PS. This is also my first Jira issue and my first fork for 
Spark, so I will need some guidance along the way. (yes, I am a newbee to this) 
Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org