date:20210222

[jira] [Commented] (SPARK-33600) Group exception messages in execution/datasources/v2

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1720#comment-1720
 ] 

Apache Spark commented on SPARK-33600:
--

User 'karenfeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/31619

> Group exception messages in execution/datasources/v2
> 
>
> Key: SPARK-33600
> URL: https://issues.apache.org/jira/browse/SPARK-33600
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename ||   Count ||
> | AlterTableExec.scala |   1 |
> | CreateNamespaceExec.scala|   1 |
> | CreateTableExec.scala|   1 |
> | DataSourceRDD.scala  |   2 |
> | DataSourceV2Strategy.scala   |   9 |
> | DropNamespaceExec.scala  |   2 |
> | DropTableExec.scala  |   1 |
> | EmptyPartitionReader.scala   |   1 |
> | FileDataSourceV2.scala   |   1 |
> | FilePartitionReader.scala|   2 |
> | FilePartitionReaderFactory.scala |   1 |
> | ReplaceTableExec.scala   |   3 |
> | TableCapabilityCheck.scala   |   2 |
> | V1FallbackWriters.scala  |   1 |
> | V2SessionCatalog.scala   |  14 |
> | WriteToDataSourceV2Exec.scala|  10 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc'
> || Filename   ||   Count ||
> | JDBCTableCatalog.scala |   3 |
> '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename||   Count ||
> | DataSourceV2Implicits.scala |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33600) Group exception messages in execution/datasources/v2

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33600:


Assignee: Apache Spark

> Group exception messages in execution/datasources/v2
> 
>
> Key: SPARK-33600
> URL: https://issues.apache.org/jira/browse/SPARK-33600
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename ||   Count ||
> | AlterTableExec.scala |   1 |
> | CreateNamespaceExec.scala|   1 |
> | CreateTableExec.scala|   1 |
> | DataSourceRDD.scala  |   2 |
> | DataSourceV2Strategy.scala   |   9 |
> | DropNamespaceExec.scala  |   2 |
> | DropTableExec.scala  |   1 |
> | EmptyPartitionReader.scala   |   1 |
> | FileDataSourceV2.scala   |   1 |
> | FilePartitionReader.scala|   2 |
> | FilePartitionReaderFactory.scala |   1 |
> | ReplaceTableExec.scala   |   3 |
> | TableCapabilityCheck.scala   |   2 |
> | V1FallbackWriters.scala  |   1 |
> | V2SessionCatalog.scala   |  14 |
> | WriteToDataSourceV2Exec.scala|  10 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc'
> || Filename   ||   Count ||
> | JDBCTableCatalog.scala |   3 |
> '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename||   Count ||
> | DataSourceV2Implicits.scala |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33600) Group exception messages in execution/datasources/v2

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33600:


Assignee: (was: Apache Spark)

> Group exception messages in execution/datasources/v2
> 
>
> Key: SPARK-33600
> URL: https://issues.apache.org/jira/browse/SPARK-33600
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename ||   Count ||
> | AlterTableExec.scala |   1 |
> | CreateNamespaceExec.scala|   1 |
> | CreateTableExec.scala|   1 |
> | DataSourceRDD.scala  |   2 |
> | DataSourceV2Strategy.scala   |   9 |
> | DropNamespaceExec.scala  |   2 |
> | DropTableExec.scala  |   1 |
> | EmptyPartitionReader.scala   |   1 |
> | FileDataSourceV2.scala   |   1 |
> | FilePartitionReader.scala|   2 |
> | FilePartitionReaderFactory.scala |   1 |
> | ReplaceTableExec.scala   |   3 |
> | TableCapabilityCheck.scala   |   2 |
> | V1FallbackWriters.scala  |   1 |
> | V2SessionCatalog.scala   |  14 |
> | WriteToDataSourceV2Exec.scala|  10 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc'
> || Filename   ||   Count ||
> | JDBCTableCatalog.scala |   3 |
> '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename||   Count ||
> | DataSourceV2Implicits.scala |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34504) avoid unnecessary view resolving and remove the `performCheck` flag

2021-02-22 Thread Linhong Liu (Jira)

Linhong Liu created SPARK-34504:
---

 Summary: avoid unnecessary view resolving and remove the 
`performCheck` flag
 Key: SPARK-34504
 URL: https://issues.apache.org/jira/browse/SPARK-34504
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Linhong Liu


in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
resolving views. This is due to some view resolution is unnecessary. So we can 
avoid these unnecessary view resolution and remove the `performCheck` flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34503:


Assignee: (was: Apache Spark)

> Use zstd for spark.eventLog.compression.codec by default
> 
>
> Key: SPARK-34503
> URL: https://issues.apache.org/jira/browse/SPARK-34503
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288850#comment-17288850
 ] 

Apache Spark commented on SPARK-34503:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31618

> Use zstd for spark.eventLog.compression.codec by default
> 
>
> Key: SPARK-34503
> URL: https://issues.apache.org/jira/browse/SPARK-34503
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288851#comment-17288851
 ] 

Apache Spark commented on SPARK-34503:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31618

> Use zstd for spark.eventLog.compression.codec by default
> 
>
> Key: SPARK-34503
> URL: https://issues.apache.org/jira/browse/SPARK-34503
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34503:


Assignee: Apache Spark

> Use zstd for spark.eventLog.compression.codec by default
> 
>
> Key: SPARK-34503
> URL: https://issues.apache.org/jira/browse/SPARK-34503
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default

2021-02-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34503:
--
Labels: releasenotes  (was: )

> Use zstd for spark.eventLog.compression.codec by default
> 
>
> Key: SPARK-34503
> URL: https://issues.apache.org/jira/browse/SPARK-34503
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default

2021-02-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-34503:
-

 Summary: Use zstd for spark.eventLog.compression.codec by default
 Key: SPARK-34503
 URL: https://issues.apache.org/jira/browse/SPARK-34503
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34502) Remove unused parameters in join methods

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288845#comment-17288845
 ] 

Apache Spark commented on SPARK-34502:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/31617

> Remove unused parameters in join methods
> 
>
> Key: SPARK-34502
> URL: https://issues.apache.org/jira/browse/SPARK-34502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Huaxin Gao
>Priority: Trivial
>
> Remove unused parameters in some join methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34502) Remove unused parameters in join methods

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288844#comment-17288844
 ] 

Apache Spark commented on SPARK-34502:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/31617

> Remove unused parameters in join methods
> 
>
> Key: SPARK-34502
> URL: https://issues.apache.org/jira/browse/SPARK-34502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Huaxin Gao
>Priority: Trivial
>
> Remove unused parameters in some join methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34502) Remove unused parameters in join methods

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34502:


Assignee: (was: Apache Spark)

> Remove unused parameters in join methods
> 
>
> Key: SPARK-34502
> URL: https://issues.apache.org/jira/browse/SPARK-34502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Huaxin Gao
>Priority: Trivial
>
> Remove unused parameters in some join methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34502) Remove unused parameters in join methods

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34502:


Assignee: Apache Spark

> Remove unused parameters in join methods
> 
>
> Key: SPARK-34502
> URL: https://issues.apache.org/jira/browse/SPARK-34502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Trivial
>
> Remove unused parameters in some join methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34502) Remove unused parameters in join methods

2021-02-22 Thread Huaxin Gao (Jira)

Huaxin Gao created SPARK-34502:
--

 Summary: Remove unused parameters in join methods
 Key: SPARK-34502
 URL: https://issues.apache.org/jira/browse/SPARK-34502
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Huaxin Gao


Remove unused parameters in some join methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13

2021-02-22 Thread Seth Tisue (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288814#comment-17288814
 ] 

Seth Tisue commented on SPARK-25075:


Scala 2.13.5 is now available.

> Build and test Spark against Scala 2.13
> ---
>
> Key: SPARK-25075
> URL: https://issues.apache.org/jira/browse/SPARK-25075
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, MLlib, Project Infra, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Guillaume Massé
>Priority: Major
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.13 milestone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34501) Support of DELETE/UPDATE operation for Database (MySQL, Postgres etc.) level

2021-02-22 Thread Nishant Ranjan (Jira)

Nishant Ranjan created SPARK-34501:
--

 Summary: Support of DELETE/UPDATE operation for Database (MySQL, 
Postgres etc.) level
 Key: SPARK-34501
 URL: https://issues.apache.org/jira/browse/SPARK-34501
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.2
Reporter: Nishant Ranjan


Spark 3.0+ SQL started supporting Delete operation. But still there is no way 
to downpush "Delete" operation to database (like MySQL, Postgres etc.)

In documents there is a mention of "deleteWhere(..)" but not sure whether it 
cant be used with JDBC data source. If yes, please help with some example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-22 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288806#comment-17288806
 ] 

Sean R. Owen commented on SPARK-34448:
--

So one coarse response is - I'm surprised if the initialization should matter 
_that_ much? Starting the intercept at this value is kind of like starting it 
at the mean of the response in linear regression - probably the best a priori 
guess. That's why I am wondering about convergence (but sounds like it 
converges?) or what scikit does. Given the small data set, is the answer 
under-determined in this case?

I'd have to actually look at your test case to answer those questions, but 
that's what I'm thinking here. Maybe you have already thought it through.

What's a better initial value of the intercept?

> Binary logistic regression incorrectly computes the intercept and 
> coefficients when data is not centered
> 
>
> Key: SPARK-34448
> URL: https://issues.apache.org/jira/browse/SPARK-34448
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Yakov Kerzhner
>Priority: Major
>  Labels: correctness
>
> I have written up a fairly detailed gist that includes code to reproduce the 
> bug, as well as the output of the code and some commentary:
> [https://gist.github.com/ykerzhner/51358780a6a4cc33266515f17bf98a96]
> To summarize: under certain conditions, the minimization that fits a binary 
> logistic regression contains a bug that pulls the intercept value towards the 
> log(odds) of the target data.  This is mathematically only correct when the 
> data comes from distributions with zero means.  In general, this gives 
> incorrect intercept values, and consequently incorrect coefficients as well.
> As I am not so familiar with the spark code base, I have not been able to 
> find this bug within the spark code itself.  A hint to this bug is here: 
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L894-L904]
> based on the code, I don't believe that the features have zero means at this 
> point, and so this heuristic is incorrect.  But an incorrect starting point 
> does not explain this bug.  The minimizer should drift to the correct place.  
> I was not able to find the code of the actual objective function that is 
> being minimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-34448:
-
Priority: Major  (was: Critical)

> Binary logistic regression incorrectly computes the intercept and 
> coefficients when data is not centered
> 
>
> Key: SPARK-34448
> URL: https://issues.apache.org/jira/browse/SPARK-34448
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Yakov Kerzhner
>Priority: Major
>  Labels: correctness
>
> I have written up a fairly detailed gist that includes code to reproduce the 
> bug, as well as the output of the code and some commentary:
> [https://gist.github.com/ykerzhner/51358780a6a4cc33266515f17bf98a96]
> To summarize: under certain conditions, the minimization that fits a binary 
> logistic regression contains a bug that pulls the intercept value towards the 
> log(odds) of the target data.  This is mathematically only correct when the 
> data comes from distributions with zero means.  In general, this gives 
> incorrect intercept values, and consequently incorrect coefficients as well.
> As I am not so familiar with the spark code base, I have not been able to 
> find this bug within the spark code itself.  A hint to this bug is here: 
> [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L894-L904]
> based on the code, I don't believe that the features have zero means at this 
> point, and so this heuristic is incorrect.  But an incorrect starting point 
> does not explain this bug.  The minimizer should drift to the correct place.  
> I was not able to find the code of the actual objective function that is 
> being minimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34499) improve the catalyst expression dsl internal APIs

2021-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34499.
-
Resolution: Won't Fix

> improve the catalyst expression dsl internal APIs
> -
>
> Key: SPARK-34499
> URL: https://issues.apache.org/jira/browse/SPARK-34499
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33863) Pyspark UDF wrongly changes timestamps to UTC

2021-02-22 Thread Nasir Ali (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288802#comment-17288802
 ] 

Nasir Ali commented on SPARK-33863:
---

[~hyukjin.kwon] and [~viirya] any update on this issue?

> Pyspark UDF wrongly changes timestamps to UTC
> -
>
> Key: SPARK-33863
> URL: https://issues.apache.org/jira/browse/SPARK-33863
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.1
> Environment: MAC/Linux
> Standalone cluster / local machine
>Reporter: Nasir Ali
>Priority: Major
>
> *Problem*:
> I have a dataframe with a ts (timestamp) column in UTC. If I create a new 
> column using udf, pyspark udf wrongly changes timestamps into UTC time. ts 
> (timestamp) column is already in UTC time. Therefore, pyspark udf should not 
> convert ts (timestamp) column into UTC timestamp. 
> I have used following configs to let spark know the timestamps are in UTC:
>  
> {code:java}
> --conf spark.driver.extraJavaOptions=-Duser.timezone=UTC 
> --conf spark.executor.extraJavaOptions=-Duser.timezone=UTC
> --conf spark.sql.session.timeZone=UTC
> {code}
> Below is a code snippet to reproduce the error:
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as F
> from pyspark.sql.types import StringType, TimestampType
> import datetime
> spark = SparkSession.builder.config("spark.sql.session.timeZone", 
> "UTC").getOrCreate()
> df = spark.createDataFrame([("usr1",17.00, "2018-02-10T15:27:18+00:00"),
> ("usr1",13.00, "2018-02-11T12:27:18+00:00"),
> ("usr1",25.00, "2018-02-12T11:27:18+00:00"),
> ("usr1",20.00, "2018-02-13T15:27:18+00:00"),
> ("usr1",17.00, "2018-02-14T12:27:18+00:00"),
> ("usr2",99.00, "2018-02-15T11:27:18+00:00"),
> ("usr2",156.00, "2018-02-22T11:27:18+00:00")
> ],
>["user","id", "ts"])
> df = df.withColumn('ts', df.ts.cast('timestamp'))
> df.show(truncate=False)
> def some_time_udf(i):
> if  datetime.time(5, 0)<=i.time() < datetime.time(12, 0):
> tmp= "Morning: " + str(i)
> elif  datetime.time(12, 0)<=i.time() < datetime.time(17, 0):
> tmp= "Afternoon: " + str(i)
> elif  datetime.time(17, 0)<=i.time() < datetime.time(21, 0):
> tmp= "Evening"
> elif  datetime.time(21, 0)<=i.time() < datetime.time(0, 0):
> tmp= "Night"
> elif  datetime.time(0, 0)<=i.time() < datetime.time(5, 0):
> tmp= "Night"
> return tmp
> udf = F.udf(some_time_udf,StringType())
> df.withColumn("day_part", udf(df.ts)).show(truncate=False)
> {code}
>  
> Below is the output of the above code:
> {code:java}
> ++-+---++
> |user|id   |ts |day_part|
> ++-+---++
> |usr1|17.0 |2018-02-10 15:27:18|Morning: 2018-02-10 09:27:18|
> |usr1|13.0 |2018-02-11 12:27:18|Morning: 2018-02-11 06:27:18|
> |usr1|25.0 |2018-02-12 11:27:18|Morning: 2018-02-12 05:27:18|
> |usr1|20.0 |2018-02-13 15:27:18|Morning: 2018-02-13 09:27:18|
> |usr1|17.0 |2018-02-14 12:27:18|Morning: 2018-02-14 06:27:18|
> |usr2|99.0 |2018-02-15 11:27:18|Morning: 2018-02-15 05:27:18|
> |usr2|156.0|2018-02-22 11:27:18|Morning: 2018-02-22 05:27:18|
> ++-+---++
> {code}
> Above output is incorrect. You can see ts and day_part columns don't have 
> same timestamps. Below is the output I would expect:
>  
> {code:java}
> ++-+---++
> |user|id   |ts |day_part|
> ++-+---++
> |usr1|17.0 |2018-02-10 15:27:18|Afternoon: 2018-02-10 15:27:18|
> |usr1|13.0 |2018-02-11 12:27:18|Afternoon: 2018-02-11 12:27:18|
> |usr1|25.0 |2018-02-12 11:27:18|Morning: 2018-02-12 11:27:18|
> |usr1|20.0 |2018-02-13 15:27:18|Afternoon: 2018-02-13 15:27:18|
> |usr1|17.0 |2018-02-14 12:27:18|Afternoon: 2018-02-14 12:27:18|
> |usr2|99.0 |2018-02-15 11:27:18|Morning: 2018-02-15 11:27:18|
> |usr2|156.0|2018-02-22 11:27:18|Morning: 2018-02-22 11:27:18|
> ++-+---++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34500) Replace symbol literals with $"" in examples and documents

2021-02-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34500.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31615
[https://github.com/apache/spark/pull/31615]

> Replace symbol literals with $"" in examples and documents
> --
>
> Key: SPARK-34500
> URL: https://issues.apache.org/jira/browse/SPARK-34500
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.2.0
>
>
> The Scala community seems to deprecate Symbol in the future so let's replace 
> symbol literals in user facing examples and documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11844) can not read class org.apache.parquet.format.PageHeader: don't know what type: 13

2021-02-22 Thread Sungwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288693#comment-17288693
 ] 

Sungwon commented on SPARK-11844:
-

I'm seeing the same issue on spark-2.4

I'm able to read the parquet files just fine and load them into DataFrames, but 
using any order related operations leads them to crash (min, max, orderBy)
{code:java}
 org.apache.iceberg.exceptions.RuntimeIOException: java.io.IOException: can not 
read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: 
don't know what type: 13
at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:133)
at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:110)
at 
org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:69)
at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:49)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
at 
org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:306)
at 
org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:304)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

And as mentioned above, it is accompanied by 'uncompressed_page_size' was not 
found in serialized data! error:
{code:java}
org.apache.iceberg.exceptions.RuntimeIOException: java.io.IOException: can not 
read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: 
Required field 'uncompressed_page_size' was not found in serialized data! 
Struct: 
org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@bb8d7d1
at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:133)
at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:110)
at 
org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:69)
at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:49)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
at 
org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:306)
at 
org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:304)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions

[jira] [Assigned] (SPARK-34500) Replace symbol literals with $"" in examples and documents

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34500:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Replace symbol literals with $"" in examples and documents
> --
>
> Key: SPARK-34500
> URL: https://issues.apache.org/jira/browse/SPARK-34500
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> The Scala community seems to deprecate Symbol in the future so let's replace 
> symbol literals in user facing examples and documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34500) Replace symbol literals with $"" in examples and documents

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288615#comment-17288615
 ] 

Apache Spark commented on SPARK-34500:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31615

> Replace symbol literals with $"" in examples and documents
> --
>
> Key: SPARK-34500
> URL: https://issues.apache.org/jira/browse/SPARK-34500
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> The Scala community seems to deprecate Symbol in the future so let's replace 
> symbol literals in user facing examples and documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34500) Replace symbol literals with $"" in examples and documents

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288616#comment-17288616
 ] 

Apache Spark commented on SPARK-34500:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31615

> Replace symbol literals with $"" in examples and documents
> --
>
> Key: SPARK-34500
> URL: https://issues.apache.org/jira/browse/SPARK-34500
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> The Scala community seems to deprecate Symbol in the future so let's replace 
> symbol literals in user facing examples and documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34500) Replace symbol literals with $"" in examples and documents

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34500:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Replace symbol literals with $"" in examples and documents
> --
>
> Key: SPARK-34500
> URL: https://issues.apache.org/jira/browse/SPARK-34500
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> The Scala community seems to deprecate Symbol in the future so let's replace 
> symbol literals in user facing examples and documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34500) Replace symbol literals with $"" in examples and documents

2021-02-22 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-34500:
--

 Summary: Replace symbol literals with $"" in examples and documents
 Key: SPARK-34500
 URL: https://issues.apache.org/jira/browse/SPARK-34500
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Examples
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


The Scala community seems to deprecate Symbol in the future so let's replace 
symbol literals in user facing examples and documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33602) Group exception messages in execution/datasources

2021-02-22 Thread Allison Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288591#comment-17288591
 ] 

Allison Wang commented on SPARK-33602:
--

[~beliefer] Let's put AnalysisException in QueryCompilationErrors for now.

> Group exception messages in execution/datasources
> -
>
> Key: SPARK-33602
> URL: https://issues.apache.org/jira/browse/SPARK-33602
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources'
> || Filename||   Count ||
> | DataSource.scala|   9 |
> | DataSourceStrategy.scala|   1 |
> | DataSourceUtils.scala   |   2 |
> | FileFormat.scala|   1 |
> | FileFormatWriter.scala  |   3 |
> | FileScanRDD.scala   |   2 |
> | InsertIntoHadoopFsRelationCommand.scala |   2 |
> | PartitioningAwareFileIndex.scala|   1 |
> | PartitioningUtils.scala |   3 |
> | RecordReaderIterator.scala  |   1 |
> | rules.scala |   4 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile'
> || Filename   ||   Count ||
> | BinaryFileFormat.scala |   2 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc'
> || Filename  ||   Count ||
> | JDBCOptions.scala |   2 |
> | JdbcUtils.scala   |   6 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc'
> || Filename  ||   Count ||
> | OrcDeserializer.scala |   1 |
> | OrcFilters.scala  |   1 |
> | OrcSerializer.scala   |   1 |
> | OrcUtils.scala|   2 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet'
> || Filename ||   Count ||
> | ParquetFileFormat.scala  |   2 |
> | ParquetReadSupport.scala |   1 |
> | ParquetSchemaConverter.scala |   6 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33600) Group exception messages in execution/datasources/v2

2021-02-22 Thread Karen Feng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288584#comment-17288584
 ] 

Karen Feng commented on SPARK-33600:


I'm working on this.

> Group exception messages in execution/datasources/v2
> 
>
> Key: SPARK-33600
> URL: https://issues.apache.org/jira/browse/SPARK-33600
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename ||   Count ||
> | AlterTableExec.scala |   1 |
> | CreateNamespaceExec.scala|   1 |
> | CreateTableExec.scala|   1 |
> | DataSourceRDD.scala  |   2 |
> | DataSourceV2Strategy.scala   |   9 |
> | DropNamespaceExec.scala  |   2 |
> | DropTableExec.scala  |   1 |
> | EmptyPartitionReader.scala   |   1 |
> | FileDataSourceV2.scala   |   1 |
> | FilePartitionReader.scala|   2 |
> | FilePartitionReaderFactory.scala |   1 |
> | ReplaceTableExec.scala   |   3 |
> | TableCapabilityCheck.scala   |   2 |
> | V1FallbackWriters.scala  |   1 |
> | V2SessionCatalog.scala   |  14 |
> | WriteToDataSourceV2Exec.scala|  10 |
> '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc'
> || Filename   ||   Count ||
> | JDBCTableCatalog.scala |   3 |
> '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2'
> || Filename||   Count ||
> | DataSourceV2Implicits.scala |   3 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27790) Support SQL INTERVAL types

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288527#comment-17288527
 ] 

Apache Spark commented on SPARK-27790:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31614

> Support SQL INTERVAL types
> --
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27790) Support SQL INTERVAL types

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27790:


Assignee: Apache Spark

> Support SQL INTERVAL types
> --
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27790) Support SQL INTERVAL types

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27790:


Assignee: Apache Spark

> Support SQL INTERVAL types
> --
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27790) Support SQL INTERVAL types

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288526#comment-17288526
 ] 

Apache Spark commented on SPARK-27790:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31614

> Support SQL INTERVAL types
> --
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27790) Support SQL INTERVAL types

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27790:


Assignee: (was: Apache Spark)

> Support SQL INTERVAL types
> --
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27790) Support SQL INTERVAL types

2021-02-22 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-27790:
---
Affects Version/s: (was: 3.1.0)
   3.2.0

> Support SQL INTERVAL types
> --
>
> Key: SPARK-27790
> URL: https://issues.apache.org/jira/browse/SPARK-27790
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> SQL standard defines 2 interval types:
> # year-month interval contains a YEAR field or a MONTH field or both
> # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction 
> of seconds)
> Need to add 2 new internal types YearMonthIntervalType and 
> DayTimeIntervalType, support operations defined by SQL standard as well as 
> INTERVAL literals.
> The java.time.Period and java.time.Duration can be supported as external type 
> for YearMonthIntervalType and DayTimeIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34499) improve the catalyst expression dsl internal APIs

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288483#comment-17288483
 ] 

Apache Spark commented on SPARK-34499:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/31612

> improve the catalyst expression dsl internal APIs
> -
>
> Key: SPARK-34499
> URL: https://issues.apache.org/jira/browse/SPARK-34499
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34499) improve the catalyst expression dsl internal APIs

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34499:


Assignee: Apache Spark

> improve the catalyst expression dsl internal APIs
> -
>
> Key: SPARK-34499
> URL: https://issues.apache.org/jira/browse/SPARK-34499
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34499) improve the catalyst expression dsl internal APIs

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34499:


Assignee: (was: Apache Spark)

> improve the catalyst expression dsl internal APIs
> -
>
> Key: SPARK-34499
> URL: https://issues.apache.org/jira/browse/SPARK-34499
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34476:


Assignee: (was: Apache Spark)

> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34476:


Assignee: Apache Spark

> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288482#comment-17288482
 ] 

Apache Spark commented on SPARK-34476:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/31613

> Duplicate referenceNames are given for ambiguousReferences
> --
>
> Key: SPARK-34476
> URL: https://issues.apache.org/jira/browse/SPARK-34476
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ted Yu
>Priority: Major
>
> When running test with Spark extension that converts custom function to json 
> path expression, I saw the following in test output:
> {code}
> 2021-02-19 21:57:24,550 (Time-limited test) [INFO - 
> org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is 
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Reference 
> 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b', 
> mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8
> {code}
> Please note the candidates following 'could be' are the same.
> Here is the physical plan for a working query where phone is a jsonb column:
> {code}
> TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], 
> output=[id#6,address#7,key#0])
> +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0]
>+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra 
> Scan: test.person
>  - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]]
>  - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b']
> {code}
> The difference for the failed query is that it tries to use 
> {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as 
> part of filter).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34499) improve the catalyst expression dsl internal APIs

2021-02-22 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-34499:
---

 Summary: improve the catalyst expression dsl internal APIs
 Key: SPARK-34499
 URL: https://issues.apache.org/jira/browse/SPARK-34499
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2021-02-22 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288423#comment-17288423
 ] 

Sean R. Owen commented on SPARK-7768:
-

To be clear I just opened it. [~viirya] did much more actual work to improve 
the API - and it is possible it still changes later. Use at your own risk. But 
this keeps the status quo for Java 9+

> Make user-defined type (UDT) API public
> ---
>
> Key: SPARK-7768
> URL: https://issues.apache.org/jira/browse/SPARK-7768
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Xiangrui Meng
>Assignee: Sean R. Owen
>Priority: Critical
> Fix For: 3.2.0
>
>
> As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it 
> would be nice to make the UDT API public in 1.5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2021-02-22 Thread Simeon H.K. Fitch (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288388#comment-17288388
 ] 

Simeon H.K. Fitch commented on SPARK-7768:
--

[~srowen]Thanks so much for for all your excellent work on this!

> Make user-defined type (UDT) API public
> ---
>
> Key: SPARK-7768
> URL: https://issues.apache.org/jira/browse/SPARK-7768
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Xiangrui Meng
>Assignee: Sean R. Owen
>Priority: Critical
> Fix For: 3.2.0
>
>
> As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it 
> would be nice to make the UDT API public in 1.5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34497) JDBC connection provider is not removing kerberos credentials from JVM security context

2021-02-22 Thread Gabor Somogyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288387#comment-17288387
 ] 

Gabor Somogyi commented on SPARK-34497:
---

Working on this.

> JDBC connection provider is not removing kerberos credentials from JVM 
> security context
> ---
>
> Key: SPARK-34497
> URL: https://issues.apache.org/jira/browse/SPARK-34497
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.0
>Reporter: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34498) fix the remaining problems in SPARK-34432

2021-02-22 Thread Kevin Pis (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288386#comment-17288386
 ] 

Kevin Pis commented on SPARK-34498:
---

I will fix  the remaining problems.

> fix the remaining problems in SPARK-34432
> -
>
> Key: SPARK-34498
> URL: https://issues.apache.org/jira/browse/SPARK-34498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.1
>Reporter: Kevin Pis
>Priority: Minor
>
> the remaining problems :
> 1.  we don't need to implement SessionConfigSupport in simple writable table 
> data source tests. remove it from both the Scala and Java versions.
> 2. change the schema of `SimpleWritableDataSource`, to match `TestingV2Source`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34498) fix the remaining problems in SPARK-34432

2021-02-22 Thread Kevin Pis (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Pis updated SPARK-34498:
--
Description: 
the remaining problems :

1.  we don't need to implement SessionConfigSupport in simple writable table 
data source tests. remove it from both the Scala and Java versions.
2. change the schema of `SimpleWritableDataSource`, to match `TestingV2Source`

  was:
the remaining problems :

1.  don't 


> fix the remaining problems in SPARK-34432
> -
>
> Key: SPARK-34498
> URL: https://issues.apache.org/jira/browse/SPARK-34498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.1
>Reporter: Kevin Pis
>Priority: Minor
>
> the remaining problems :
> 1.  we don't need to implement SessionConfigSupport in simple writable table 
> data source tests. remove it from both the Scala and Java versions.
> 2. change the schema of `SimpleWritableDataSource`, to match `TestingV2Source`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34498) fix the remaining problems in SPARK-34432

2021-02-22 Thread Kevin Pis (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Pis updated SPARK-34498:
--
Summary: fix the remaining problems in SPARK-34432  (was: don't implement 
SessionConfigSupport in simple writable table data source)

> fix the remaining problems in SPARK-34432
> -
>
> Key: SPARK-34498
> URL: https://issues.apache.org/jira/browse/SPARK-34498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.1
>Reporter: Kevin Pis
>Priority: Minor
>
> the remaining problems :
> 1.  don't 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34498) don't implement SessionConfigSupport in simple writable table data source

2021-02-22 Thread Kevin Pis (Jira)

Kevin Pis created SPARK-34498:
-

 Summary: don't implement SessionConfigSupport in simple writable 
table data source
 Key: SPARK-34498
 URL: https://issues.apache.org/jira/browse/SPARK-34498
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Tests
Affects Versions: 3.0.1
Reporter: Kevin Pis


the remaining problems :

1.  don't 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34497) JDBC connection provider is not removing kerberos credentials from JVM security context

2021-02-22 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-34497:
-

 Summary: JDBC connection provider is not removing kerberos 
credentials from JVM security context
 Key: SPARK-34497
 URL: https://issues.apache.org/jira/browse/SPARK-34497
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.2, 3.1.0
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34496) Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility

2021-02-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34496:


Assignee: Dongjoon Hyun

> Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility
> 
>
> Key: SPARK-34496
> URL: https://issues.apache.org/jira/browse/SPARK-34496
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34496) Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility

2021-02-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34496.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31609
[https://github.com/apache/spark/pull/31609]

> Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility
> 
>
> Key: SPARK-34496
> URL: https://issues.apache.org/jira/browse/SPARK-34496
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)

2021-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34473:
---

Assignee: Wenchen Fan

> avoid NPE in DataFrameReader.schema(StructType)
> ---
>
> Key: SPARK-34473
> URL: https://issues.apache.org/jira/browse/SPARK-34473
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)

2021-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34473.
-
Fix Version/s: 3.1.2
   3.2.0
   Resolution: Fixed

Issue resolved by pull request 31593
[https://github.com/apache/spark/pull/31593]

> avoid NPE in DataFrameReader.schema(StructType)
> ---
>
> Key: SPARK-34473
> URL: https://issues.apache.org/jira/browse/SPARK-34473
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34488) Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34488:


Assignee: (was: Apache Spark)

> Support task Metrics Distributions and executor Metrics Distributions in the 
> REST API call for a specified stage
> 
>
> Key: SPARK-34488
> URL: https://issues.apache.org/jira/browse/SPARK-34488
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ron Hu
>Priority: Major
> Attachments: executorMetricsDistributions.json, 
> taskMetricsDistributions.json
>
>
> For a specific stage, it is useful to show the task metrics in percentile 
> distribution.  This information can help users know whether or not there is a 
> skew/bottleneck among tasks in a given stage.  We list an example in 
> [^taskMetricsDistributions.json]
> Similarly, it is useful to show the executor metrics in percentile 
> distribution for a specific stage. This information can show whether or not 
> there is a skewed load on some executors.  We list an example in 
> [^executorMetricsDistributions.json]
>  
> We define withSummaries query parameter in the REST API for a specific stage 
> as:
> applications///?withSummaries=[true|false]
> When withSummaries=true, both task metrics in percentile distribution and 
> executor metrics in percentile distribution are included in the REST API 
> output.  The default value of withSummaries is false, i.e. no metrics 
> percentile distribution will be included in the REST API output.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34488) Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage

2021-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288312#comment-17288312
 ] 

Apache Spark commented on SPARK-34488:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31611

> Support task Metrics Distributions and executor Metrics Distributions in the 
> REST API call for a specified stage
> 
>
> Key: SPARK-34488
> URL: https://issues.apache.org/jira/browse/SPARK-34488
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ron Hu
>Priority: Major
> Attachments: executorMetricsDistributions.json, 
> taskMetricsDistributions.json
>
>
> For a specific stage, it is useful to show the task metrics in percentile 
> distribution.  This information can help users know whether or not there is a 
> skew/bottleneck among tasks in a given stage.  We list an example in 
> [^taskMetricsDistributions.json]
> Similarly, it is useful to show the executor metrics in percentile 
> distribution for a specific stage. This information can show whether or not 
> there is a skewed load on some executors.  We list an example in 
> [^executorMetricsDistributions.json]
>  
> We define withSummaries query parameter in the REST API for a specific stage 
> as:
> applications///?withSummaries=[true|false]
> When withSummaries=true, both task metrics in percentile distribution and 
> executor metrics in percentile distribution are included in the REST API 
> output.  The default value of withSummaries is false, i.e. no metrics 
> percentile distribution will be included in the REST API output.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34488) Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage

2021-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34488:


Assignee: Apache Spark

> Support task Metrics Distributions and executor Metrics Distributions in the 
> REST API call for a specified stage
> 
>
> Key: SPARK-34488
> URL: https://issues.apache.org/jira/browse/SPARK-34488
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.2
>Reporter: Ron Hu
>Assignee: Apache Spark
>Priority: Major
> Attachments: executorMetricsDistributions.json, 
> taskMetricsDistributions.json
>
>
> For a specific stage, it is useful to show the task metrics in percentile 
> distribution.  This information can help users know whether or not there is a 
> skew/bottleneck among tasks in a given stage.  We list an example in 
> [^taskMetricsDistributions.json]
> Similarly, it is useful to show the executor metrics in percentile 
> distribution for a specific stage. This information can show whether or not 
> there is a skewed load on some executors.  We list an example in 
> [^executorMetricsDistributions.json]
>  
> We define withSummaries query parameter in the REST API for a specific stage 
> as:
> applications///?withSummaries=[true|false]
> When withSummaries=true, both task metrics in percentile distribution and 
> executor metrics in percentile distribution are included in the REST API 
> output.  The default value of withSummaries is false, i.e. no metrics 
> percentile distribution will be included in the REST API output.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27500) Add tests for built-in Hive 2.3

2021-02-22 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-27500.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Add tests for built-in Hive 2.3
> ---
>
> Key: SPARK-27500
> URL: https://issues.apache.org/jira/browse/SPARK-27500
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Our Spark will use some of the new features and bug fixes of Hive 2.3, and we 
> should add tests for these. This is an umbrella JIRA for tracking this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27500) Add tests for built-in Hive 2.3

2021-02-22 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-27500:
---

Assignee: Yuming Wang

> Add tests for built-in Hive 2.3
> ---
>
> Key: SPARK-27500
> URL: https://issues.apache.org/jira/browse/SPARK-27500
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Our Spark will use some of the new features and bug fixes of Hive 2.3, and we 
> should add tests for these. This is an umbrella JIRA for tracking this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34432) add a java implementation for the simple writable data source

2021-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34432.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31560
[https://github.com/apache/spark/pull/31560]

> add a java implementation for the simple writable data source
> -
>
> Key: SPARK-34432
> URL: https://issues.apache.org/jira/browse/SPARK-34432
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.1
>Reporter: Kevin Pis
>Priority: Minor
> Fix For: 3.2.0
>
>
> This is a followup of https://github.com/apache/spark/pull/19269
> In #19269 , there is only a scala implementation of simple writable data 
> source in `DataSourceV2Suite`.
> This PR adds a java implementation of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34450) Unify v1 and v2 ALTER TABLE .. RENAME tests

2021-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34450:
---

Assignee: Maxim Gekk

> Unify v1 and v2 ALTER TABLE .. RENAME tests
> ---
>
> Key: SPARK-34450
> URL: https://issues.apache.org/jira/browse/SPARK-34450
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34450) Unify v1 and v2 ALTER TABLE .. RENAME tests

2021-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34450.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31575
[https://github.com/apache/spark/pull/31575]

> Unify v1 and v2 ALTER TABLE .. RENAME tests
> ---
>
> Key: SPARK-34450
> URL: https://issues.apache.org/jira/browse/SPARK-34450
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

63 matches

Mail list logo