date:20220403

[jira] [Assigned] (SPARK-38780) PySpark docs build should fail when there is warning.

2022-04-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38780:


Assignee: Haejoon Lee

> PySpark docs build should fail when there is warning.
> -
>
> Key: SPARK-38780
> URL: https://issues.apache.org/jira/browse/SPARK-38780
> Project: Spark
>  Issue Type: Test
>  Components: Documentation, PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> PySpark documents build with `make clean html` command now just passed even 
> if the Sphinx detects the warning.
> This should be failed when the docs violates the Sphinx rule to render docs 
> better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38780) PySpark docs build should fail when there is warning.

2022-04-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38780.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36058
[https://github.com/apache/spark/pull/36058]

> PySpark docs build should fail when there is warning.
> -
>
> Key: SPARK-38780
> URL: https://issues.apache.org/jira/browse/SPARK-38780
> Project: Spark
>  Issue Type: Test
>  Components: Documentation, PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.3.0
>
>
> PySpark documents build with `make clean html` command now just passed even 
> if the Sphinx detects the warning.
> This should be failed when the docs violates the Sphinx rule to render docs 
> better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38780) PySpark docs build should fail when there is warning.

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516615#comment-17516615
 ] 

Apache Spark commented on SPARK-38780:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/36058

> PySpark docs build should fail when there is warning.
> -
>
> Key: SPARK-38780
> URL: https://issues.apache.org/jira/browse/SPARK-38780
> Project: Spark
>  Issue Type: Test
>  Components: Documentation, PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> PySpark documents build with `make clean html` command now just passed even 
> if the Sphinx detects the warning.
> This should be failed when the docs violates the Sphinx rule to render docs 
> better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38780) PySpark docs build should fail when there is warning.

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516614#comment-17516614
 ] 

Apache Spark commented on SPARK-38780:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/36058

> PySpark docs build should fail when there is warning.
> -
>
> Key: SPARK-38780
> URL: https://issues.apache.org/jira/browse/SPARK-38780
> Project: Spark
>  Issue Type: Test
>  Components: Documentation, PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> PySpark documents build with `make clean html` command now just passed even 
> if the Sphinx detects the warning.
> This should be failed when the docs violates the Sphinx rule to render docs 
> better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38780) PySpark docs build should fail when there is warning.

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38780:


Assignee: Apache Spark

> PySpark docs build should fail when there is warning.
> -
>
> Key: SPARK-38780
> URL: https://issues.apache.org/jira/browse/SPARK-38780
> Project: Spark
>  Issue Type: Test
>  Components: Documentation, PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> PySpark documents build with `make clean html` command now just passed even 
> if the Sphinx detects the warning.
> This should be failed when the docs violates the Sphinx rule to render docs 
> better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38780) PySpark docs build should fail when there is warning.

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38780:


Assignee: (was: Apache Spark)

> PySpark docs build should fail when there is warning.
> -
>
> Key: SPARK-38780
> URL: https://issues.apache.org/jira/browse/SPARK-38780
> Project: Spark
>  Issue Type: Test
>  Components: Documentation, PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> PySpark documents build with `make clean html` command now just passed even 
> if the Sphinx detects the warning.
> This should be failed when the docs violates the Sphinx rule to render docs 
> better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38780) PySpark docs build should fail when there is warning.

2022-04-03 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-38780:
---

 Summary: PySpark docs build should fail when there is warning.
 Key: SPARK-38780
 URL: https://issues.apache.org/jira/browse/SPARK-38780
 Project: Spark
  Issue Type: Test
  Components: Documentation, PySpark, Tests
Affects Versions: 3.3.0
Reporter: Haejoon Lee


PySpark documents build with `make clean html` command now just passed even if 
the Sphinx detects the warning.

This should be failed when the docs violates the Sphinx rule to render docs 
better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36571) Optimized FileOutputCommitter with StagingDir

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516609#comment-17516609
 ] 

Apache Spark commented on SPARK-36571:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/36056

> Optimized FileOutputCommitter with StagingDir
> -
>
> Key: SPARK-36571
> URL: https://issues.apache.org/jira/browse/SPARK-36571
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38779.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36050
[https://github.com/apache/spark/pull/36050]

> Unify the pushed operator checking between FileSource test suite and JDBC 
> test suite
> 
>
> Key: SPARK-38779
> URL: https://issues.apache.org/jira/browse/SPARK-38779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.4.0
>
>
> In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. 
> Will do the same for FileSourceAggregatePushDownSuite
> {code:java}
>   private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): 
> Unit = {
> df.queryExecution.optimizedPlan.collect {
>   case _: DataSourceV2ScanRelation =>
> checkKeywordsExistsInExplain(df, expectedPlanFragment)
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38779:
-

Assignee: Huaxin Gao

> Unify the pushed operator checking between FileSource test suite and JDBC 
> test suite
> 
>
> Key: SPARK-38779
> URL: https://issues.apache.org/jira/browse/SPARK-38779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
>
> In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. 
> Will do the same for FileSourceAggregatePushDownSuite
> {code:java}
>   private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): 
> Unit = {
> df.queryExecution.optimizedPlan.collect {
>   case _: DataSourceV2ScanRelation =>
> checkKeywordsExistsInExplain(df, expectedPlanFragment)
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34863) Support nested column in Spark Parquet vectorized readers

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516602#comment-17516602
 ] 

Apache Spark commented on SPARK-34863:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/36055

> Support nested column in Spark Parquet vectorized readers
> -
>
> Key: SPARK-34863
> URL: https://issues.apache.org/jira/browse/SPARK-34863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.3.0
>
>
> The task is to support nested column type in Spark Parquet vectorized reader. 
> Currently Parquet vectorized reader does not support nested column type 
> (struct, array and map). We implemented nested column vectorized reader for 
> FB-ORC in our internal fork of Spark. We are seeing performance improvement 
> compared to non-vectorized reader when reading nested columns. In addition, 
> this can also help improve the non-nested column performance when reading 
> non-nested and nested columns together in one query.
>  
> Parquet: 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L173]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516596#comment-17516596
 ] 

Apache Spark commented on SPARK-38779:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/36050

> Unify the pushed operator checking between FileSource test suite and JDBC 
> test suite
> 
>
> Key: SPARK-38779
> URL: https://issues.apache.org/jira/browse/SPARK-38779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. 
> Will do the same for FileSourceAggregatePushDownSuite
> {code:java}
>   private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): 
> Unit = {
> df.queryExecution.optimizedPlan.collect {
>   case _: DataSourceV2ScanRelation =>
> checkKeywordsExistsInExplain(df, expectedPlanFragment)
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38779:


Assignee: Apache Spark

> Unify the pushed operator checking between FileSource test suite and JDBC 
> test suite
> 
>
> Key: SPARK-38779
> URL: https://issues.apache.org/jira/browse/SPARK-38779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Minor
>
> In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. 
> Will do the same for FileSourceAggregatePushDownSuite
> {code:java}
>   private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): 
> Unit = {
> df.queryExecution.optimizedPlan.collect {
>   case _: DataSourceV2ScanRelation =>
> checkKeywordsExistsInExplain(df, expectedPlanFragment)
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38779:


Assignee: (was: Apache Spark)

> Unify the pushed operator checking between FileSource test suite and JDBC 
> test suite
> 
>
> Key: SPARK-38779
> URL: https://issues.apache.org/jira/browse/SPARK-38779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. 
> Will do the same for FileSourceAggregatePushDownSuite
> {code:java}
>   private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): 
> Unit = {
> df.queryExecution.optimizedPlan.collect {
>   case _: DataSourceV2ScanRelation =>
> checkKeywordsExistsInExplain(df, expectedPlanFragment)
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite

2022-04-03 Thread Huaxin Gao (Jira)

Huaxin Gao created SPARK-38779:
--

 Summary: Unify the pushed operator checking between FileSource 
test suite and JDBC test suite
 Key: SPARK-38779
 URL: https://issues.apache.org/jira/browse/SPARK-38779
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0, 3.4.0
Reporter: Huaxin Gao


In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. Will 
do the same for FileSourceAggregatePushDownSuite


{code:java}
  private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): 
Unit = {
df.queryExecution.optimizedPlan.collect {
  case _: DataSourceV2ScanRelation =>
checkKeywordsExistsInExplain(df, expectedPlanFragment)
}
  }
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516582#comment-17516582
 ] 

Apache Spark commented on SPARK-38776:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36054

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516583#comment-17516583
 ] 

Apache Spark commented on SPARK-38776:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36054

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38446) Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38446:
-

Assignee: Kent Yao

> Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
> -
>
> Key: SPARK-38446
> URL: https://issues.apache.org/jira/browse/SPARK-38446
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.2.1, 3.03
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> {code:java}
> files-client-8-1
> PRIORITY : 5
> THREAD ID : 0X7FBFFC5EE000
> NATIVE ID : 0X14903
> NATIVE ID (DECIMAL) : 84227
> STATE : BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:398)
> - waiting to lock <0x0003c0753f88> (a 
> org.apache.spark.repl.ExecutorClassLoader)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:169)
> at 
> org.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:214)
> at 
> org.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:112)
> at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:113)
> at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:97)
> at 
> org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629)
> at 
> org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout$NoFormatPatternSerializer.toSerializable(PatternLayout.java:342)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:240)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:225)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59)
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:215)
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:208)
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:199)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:675)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:633)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:616)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:552)
> at 
> org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
> at org.apache.logging.log4j.core.Logger.log(Logger.java:161)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983)
> at org.apache.logging.slf4j.Log4jLogger.debug(Log4jLogger.java:139)
> at org.apache.spark.internal.Logging.logDebug(Logging.scala:82)
> at org.apache.spark.internal.Logging.logDebug$(Logging.scala:81)
> at org.apache.spark.rpc.netty.NettyRpcEnv.logDebug(NettyRpcEnv.scala:45)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv$FileDownloadCallback.onFailure(NettyRpcEnv.scala:454)
> at 
> org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260)
> {code}
> while the class loading lock 0x0003c0753f88 is locked by 
> ExecutorClassLoader who‘s downloading remote classes/jars though it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38446) Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38446.
---
Fix Version/s: 3.3.0
   3.2.2
   3.1.3
   Resolution: Fixed

Issue resolved by pull request 35765
[https://github.com/apache/spark/pull/35765]

> Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
> -
>
> Key: SPARK-38446
> URL: https://issues.apache.org/jira/browse/SPARK-38446
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.2.1, 3.03
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.3
>
>
> {code:java}
> files-client-8-1
> PRIORITY : 5
> THREAD ID : 0X7FBFFC5EE000
> NATIVE ID : 0X14903
> NATIVE ID (DECIMAL) : 84227
> STATE : BLOCKED
> stackTrace:
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:398)
> - waiting to lock <0x0003c0753f88> (a 
> org.apache.spark.repl.ExecutorClassLoader)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:169)
> at 
> org.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:214)
> at 
> org.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:112)
> at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:113)
> at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:97)
> at 
> org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629)
> at 
> org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout$NoFormatPatternSerializer.toSerializable(PatternLayout.java:342)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:240)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:225)
> at 
> org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59)
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:215)
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:208)
> at 
> org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:199)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
> at 
> org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:675)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:633)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:616)
> at 
> org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:552)
> at 
> org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
> at org.apache.logging.log4j.core.Logger.log(Logger.java:161)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017)
> at 
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983)
> at org.apache.logging.slf4j.Log4jLogger.debug(Log4jLogger.java:139)
> at org.apache.spark.internal.Logging.logDebug(Logging.scala:82)
> at org.apache.spark.internal.Logging.logDebug$(Logging.scala:81)
> at org.apache.spark.rpc.netty.NettyRpcEnv.logDebug(NettyRpcEnv.scala:45)
> at 
> org.apache.spark.rpc.netty.NettyRpcEnv$FileDownloadCallback.onFailure(NettyRpcEnv.scala:454)
> at 
> org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260)
> {code}
> while the class loading lock 0x0003c0753f88 is locked by 
> ExecutorClassLoader who‘s downloading remote classes/jars though it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

--

[jira] [Resolved] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38776.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36051
[https://github.com/apache/spark/pull/36051]

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38778) Replace http with https for project url in pom

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516537#comment-17516537
 ] 

Apache Spark commented on SPARK-38778:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/36053

> Replace http with https for project url in pom
> --
>
> Key: SPARK-38778
> URL: https://issues.apache.org/jira/browse/SPARK-38778
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> Replace http with https for project url in pom



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38778) Replace http with https for project url in pom

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38778:


Assignee: (was: Apache Spark)

> Replace http with https for project url in pom
> --
>
> Key: SPARK-38778
> URL: https://issues.apache.org/jira/browse/SPARK-38778
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> Replace http with https for project url in pom



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38778) Replace http with https for project url in pom

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38778:


Assignee: Apache Spark

> Replace http with https for project url in pom
> --
>
> Key: SPARK-38778
> URL: https://issues.apache.org/jira/browse/SPARK-38778
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> Replace http with https for project url in pom



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38778) Replace http with https for project url in pom

2022-04-03 Thread Kent Yao (Jira)

Kent Yao created SPARK-38778:


 Summary: Replace http with https for project url in pom
 Key: SPARK-38778
 URL: https://issues.apache.org/jira/browse/SPARK-38778
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0, 3.4.0
Reporter: Kent Yao


Replace http with https for project url in pom



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516529#comment-17516529
 ] 

Apache Spark commented on SPARK-38777:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/36052

> Add `bin/spark-submit --kill / --status` support for yarn
> -
>
> Key: SPARK-38777
> URL: https://issues.apache.org/jira/browse/SPARK-38777
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> Add this feature to yarn resource manager as what we support for standalone, 
> mesos, kubernetes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38777:


Assignee: (was: Apache Spark)

> Add `bin/spark-submit --kill / --status` support for yarn
> -
>
> Key: SPARK-38777
> URL: https://issues.apache.org/jira/browse/SPARK-38777
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> Add this feature to yarn resource manager as what we support for standalone, 
> mesos, kubernetes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516528#comment-17516528
 ] 

Apache Spark commented on SPARK-38777:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/36052

> Add `bin/spark-submit --kill / --status` support for yarn
> -
>
> Key: SPARK-38777
> URL: https://issues.apache.org/jira/browse/SPARK-38777
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> Add this feature to yarn resource manager as what we support for standalone, 
> mesos, kubernetes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38777:


Assignee: Apache Spark

> Add `bin/spark-submit --kill / --status` support for yarn
> -
>
> Key: SPARK-38777
> URL: https://issues.apache.org/jira/browse/SPARK-38777
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> Add this feature to yarn resource manager as what we support for standalone, 
> mesos, kubernetes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn

2022-04-03 Thread Kent Yao (Jira)

Kent Yao created SPARK-38777:


 Summary: Add `bin/spark-submit --kill / --status` support for yarn
 Key: SPARK-38777
 URL: https://issues.apache.org/jira/browse/SPARK-38777
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.4.0
Reporter: Kent Yao


Add this feature to yarn resource manager as what we support for standalone, 
mesos, kubernetes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28090.
---
Fix Version/s: 3.4.0
 Assignee: Peter Toth
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/35382

> Spark hangs when an execution plan has many projections on nested structs
> -
>
> Key: SPARK-28090
> URL: https://issues.apache.org/jira/browse/SPARK-28090
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: Tried in
>  * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows
>  * Spark 2.4.3 / Yarn on a Linux cluster
>Reporter: Ruslan Yushchenko
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.4.0
>
>
> This was already posted (#28016), but the provided example didn't always 
> reproduce the error. This example consistently reproduces the issue.
> Spark applications freeze on execution plan optimization stage (Catalyst) 
> when a logical execution plan contains a lot of projections that operate on 
> nested struct fields.
> The code listed below demonstrates the issue.
> To reproduce the Spark App does the following:
>  * A small dataframe is created from a JSON example.
>  * Several nested transformations (negation of a number) are applied on 
> struct fields and each time a new struct field is created. 
>  * Once more than 9 such transformations are applied the Catalyst optimizer 
> freezes on optimizing the execution plan.
>  * You can control the freezing by choosing different upper bound for the 
> Range. E.g. it will work file if the upper bound is 5, but will hang is the 
> bound is 10.
> {code:java}
> package com.example
> import org.apache.spark.sql._
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.types.{StructField, StructType}
> import scala.collection.mutable.ListBuffer
> object SparkApp1IssueSelfContained {
>   // A sample data for a dataframe with nested structs
>   val sample: List[String] =
> """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, 
> "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": 
> 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } 
> """ ::
> """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, 
> "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": 
> 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } 
> """ ::
> """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, 
> "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": 
> 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } 
> """ ::
> Nil
>   /**
> * Transforms a column inside a nested struct. The transformed value will 
> be put into a new field of that nested struct
> *
> * The output column name can omit the full path as the field will be 
> created at the same level of nesting as the input column.
> *
> * @param inputColumnName  A column name for which to apply the 
> transformation, e.g. `company.employee.firstName`.
> * @param outputColumnName The output column name. The path is optional, 
> e.g. you can use `transformedName` instead of 
> `company.employee.transformedName`.
> * @param expression   A function that applies a transformation to a 
> column as a Spark expression.
> * @return A dataframe with a new field that contains transformed values.
> */
>   def transformInsideNestedStruct(df: DataFrame,
>   inputColumnName: String,
>   outputColumnName: String,
>   expression: Column => Column): DataFrame = {
> def mapStruct(schema: StructType, path: Seq[String], parentColumn: 
> Option[Column] = None): Seq[Column] = {
>   val mappedFields = new ListBuffer[Column]()
>   def handleMatchedLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> val newColumn = expression(curColumn).as(outputColumnName)
> mappedFields += newColumn
> Seq(curColumn)
>   }
>   def handleMatchedNonLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> // Non-leaf columns need to be further processed recursively
> field.dataType match {
>   case dt: StructType => Seq(struct(mapStruct(dt, path.tail, 
> Some(curColumn)): _*).as(field.name))
>   case _ => throw new IllegalArgumentException(s"Field 
> '${field.name}' is not a struct type.")
> }
>   }
>   val fieldName = path.head
>   val isLeaf = path.lengthCompare(2) < 0
>   val newColumns = schema.fields.flatMap(fie

[jira] [Updated] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28090:
--
Labels:   (was: bulk-closed)

> Spark hangs when an execution plan has many projections on nested structs
> -
>
> Key: SPARK-28090
> URL: https://issues.apache.org/jira/browse/SPARK-28090
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.4.3
> Environment: Tried in
>  * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows
>  * Spark 2.4.3 / Yarn on a Linux cluster
>Reporter: Ruslan Yushchenko
>Priority: Major
>
> This was already posted (#28016), but the provided example didn't always 
> reproduce the error. This example consistently reproduces the issue.
> Spark applications freeze on execution plan optimization stage (Catalyst) 
> when a logical execution plan contains a lot of projections that operate on 
> nested struct fields.
> The code listed below demonstrates the issue.
> To reproduce the Spark App does the following:
>  * A small dataframe is created from a JSON example.
>  * Several nested transformations (negation of a number) are applied on 
> struct fields and each time a new struct field is created. 
>  * Once more than 9 such transformations are applied the Catalyst optimizer 
> freezes on optimizing the execution plan.
>  * You can control the freezing by choosing different upper bound for the 
> Range. E.g. it will work file if the upper bound is 5, but will hang is the 
> bound is 10.
> {code:java}
> package com.example
> import org.apache.spark.sql._
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.types.{StructField, StructType}
> import scala.collection.mutable.ListBuffer
> object SparkApp1IssueSelfContained {
>   // A sample data for a dataframe with nested structs
>   val sample: List[String] =
> """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, 
> "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": 
> 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } 
> """ ::
> """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, 
> "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": 
> 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } 
> """ ::
> """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, 
> "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": 
> 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } 
> """ ::
> Nil
>   /**
> * Transforms a column inside a nested struct. The transformed value will 
> be put into a new field of that nested struct
> *
> * The output column name can omit the full path as the field will be 
> created at the same level of nesting as the input column.
> *
> * @param inputColumnName  A column name for which to apply the 
> transformation, e.g. `company.employee.firstName`.
> * @param outputColumnName The output column name. The path is optional, 
> e.g. you can use `transformedName` instead of 
> `company.employee.transformedName`.
> * @param expression   A function that applies a transformation to a 
> column as a Spark expression.
> * @return A dataframe with a new field that contains transformed values.
> */
>   def transformInsideNestedStruct(df: DataFrame,
>   inputColumnName: String,
>   outputColumnName: String,
>   expression: Column => Column): DataFrame = {
> def mapStruct(schema: StructType, path: Seq[String], parentColumn: 
> Option[Column] = None): Seq[Column] = {
>   val mappedFields = new ListBuffer[Column]()
>   def handleMatchedLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> val newColumn = expression(curColumn).as(outputColumnName)
> mappedFields += newColumn
> Seq(curColumn)
>   }
>   def handleMatchedNonLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> // Non-leaf columns need to be further processed recursively
> field.dataType match {
>   case dt: StructType => Seq(struct(mapStruct(dt, path.tail, 
> Some(curColumn)): _*).as(field.name))
>   case _ => throw new IllegalArgumentException(s"Field 
> '${field.name}' is not a struct type.")
> }
>   }
>   val fieldName = path.head
>   val isLeaf = path.lengthCompare(2) < 0
>   val newColumns = schema.fields.flatMap(field => {
> // This is the original column (struct field) we want to process
> val curColumn = parentColumn match {
>   case None => new Column(fi

[jira] [Reopened] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-28090:
---

> Spark hangs when an execution plan has many projections on nested structs
> -
>
> Key: SPARK-28090
> URL: https://issues.apache.org/jira/browse/SPARK-28090
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: Tried in
>  * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows
>  * Spark 2.4.3 / Yarn on a Linux cluster
>Reporter: Ruslan Yushchenko
>Priority: Major
>
> This was already posted (#28016), but the provided example didn't always 
> reproduce the error. This example consistently reproduces the issue.
> Spark applications freeze on execution plan optimization stage (Catalyst) 
> when a logical execution plan contains a lot of projections that operate on 
> nested struct fields.
> The code listed below demonstrates the issue.
> To reproduce the Spark App does the following:
>  * A small dataframe is created from a JSON example.
>  * Several nested transformations (negation of a number) are applied on 
> struct fields and each time a new struct field is created. 
>  * Once more than 9 such transformations are applied the Catalyst optimizer 
> freezes on optimizing the execution plan.
>  * You can control the freezing by choosing different upper bound for the 
> Range. E.g. it will work file if the upper bound is 5, but will hang is the 
> bound is 10.
> {code:java}
> package com.example
> import org.apache.spark.sql._
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.types.{StructField, StructType}
> import scala.collection.mutable.ListBuffer
> object SparkApp1IssueSelfContained {
>   // A sample data for a dataframe with nested structs
>   val sample: List[String] =
> """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, 
> "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": 
> 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } 
> """ ::
> """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, 
> "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": 
> 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } 
> """ ::
> """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, 
> "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": 
> 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } 
> """ ::
> Nil
>   /**
> * Transforms a column inside a nested struct. The transformed value will 
> be put into a new field of that nested struct
> *
> * The output column name can omit the full path as the field will be 
> created at the same level of nesting as the input column.
> *
> * @param inputColumnName  A column name for which to apply the 
> transformation, e.g. `company.employee.firstName`.
> * @param outputColumnName The output column name. The path is optional, 
> e.g. you can use `transformedName` instead of 
> `company.employee.transformedName`.
> * @param expression   A function that applies a transformation to a 
> column as a Spark expression.
> * @return A dataframe with a new field that contains transformed values.
> */
>   def transformInsideNestedStruct(df: DataFrame,
>   inputColumnName: String,
>   outputColumnName: String,
>   expression: Column => Column): DataFrame = {
> def mapStruct(schema: StructType, path: Seq[String], parentColumn: 
> Option[Column] = None): Seq[Column] = {
>   val mappedFields = new ListBuffer[Column]()
>   def handleMatchedLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> val newColumn = expression(curColumn).as(outputColumnName)
> mappedFields += newColumn
> Seq(curColumn)
>   }
>   def handleMatchedNonLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> // Non-leaf columns need to be further processed recursively
> field.dataType match {
>   case dt: StructType => Seq(struct(mapStruct(dt, path.tail, 
> Some(curColumn)): _*).as(field.name))
>   case _ => throw new IllegalArgumentException(s"Field 
> '${field.name}' is not a struct type.")
> }
>   }
>   val fieldName = path.head
>   val isLeaf = path.lengthCompare(2) < 0
>   val newColumns = schema.fields.flatMap(field => {
> // This is the original column (struct field) we want to process
> val curColumn = parentColumn match {
>   case None => new Column(field.name)
>   case Some(col) => co

[jira] [Updated] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28090:
--
Component/s: (was: Optimizer)

> Spark hangs when an execution plan has many projections on nested structs
> -
>
> Key: SPARK-28090
> URL: https://issues.apache.org/jira/browse/SPARK-28090
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: Tried in
>  * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows
>  * Spark 2.4.3 / Yarn on a Linux cluster
>Reporter: Ruslan Yushchenko
>Priority: Major
>
> This was already posted (#28016), but the provided example didn't always 
> reproduce the error. This example consistently reproduces the issue.
> Spark applications freeze on execution plan optimization stage (Catalyst) 
> when a logical execution plan contains a lot of projections that operate on 
> nested struct fields.
> The code listed below demonstrates the issue.
> To reproduce the Spark App does the following:
>  * A small dataframe is created from a JSON example.
>  * Several nested transformations (negation of a number) are applied on 
> struct fields and each time a new struct field is created. 
>  * Once more than 9 such transformations are applied the Catalyst optimizer 
> freezes on optimizing the execution plan.
>  * You can control the freezing by choosing different upper bound for the 
> Range. E.g. it will work file if the upper bound is 5, but will hang is the 
> bound is 10.
> {code:java}
> package com.example
> import org.apache.spark.sql._
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.types.{StructField, StructType}
> import scala.collection.mutable.ListBuffer
> object SparkApp1IssueSelfContained {
>   // A sample data for a dataframe with nested structs
>   val sample: List[String] =
> """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, 
> "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": 
> 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } 
> """ ::
> """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, 
> "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": 
> 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } 
> """ ::
> """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, 
> "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": 
> 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } 
> """ ::
> Nil
>   /**
> * Transforms a column inside a nested struct. The transformed value will 
> be put into a new field of that nested struct
> *
> * The output column name can omit the full path as the field will be 
> created at the same level of nesting as the input column.
> *
> * @param inputColumnName  A column name for which to apply the 
> transformation, e.g. `company.employee.firstName`.
> * @param outputColumnName The output column name. The path is optional, 
> e.g. you can use `transformedName` instead of 
> `company.employee.transformedName`.
> * @param expression   A function that applies a transformation to a 
> column as a Spark expression.
> * @return A dataframe with a new field that contains transformed values.
> */
>   def transformInsideNestedStruct(df: DataFrame,
>   inputColumnName: String,
>   outputColumnName: String,
>   expression: Column => Column): DataFrame = {
> def mapStruct(schema: StructType, path: Seq[String], parentColumn: 
> Option[Column] = None): Seq[Column] = {
>   val mappedFields = new ListBuffer[Column]()
>   def handleMatchedLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> val newColumn = expression(curColumn).as(outputColumnName)
> mappedFields += newColumn
> Seq(curColumn)
>   }
>   def handleMatchedNonLeaf(field: StructField, curColumn: Column): 
> Seq[Column] = {
> // Non-leaf columns need to be further processed recursively
> field.dataType match {
>   case dt: StructType => Seq(struct(mapStruct(dt, path.tail, 
> Some(curColumn)): _*).as(field.name))
>   case _ => throw new IllegalArgumentException(s"Field 
> '${field.name}' is not a struct type.")
> }
>   }
>   val fieldName = path.head
>   val isLeaf = path.lengthCompare(2) < 0
>   val newColumns = schema.fields.flatMap(field => {
> // This is the original column (struct field) we want to process
> val curColumn = parentColumn match {
>   case None => new Column(field.na

[jira] [Assigned] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38776:
-

Assignee: Dongjoon Hyun

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516463#comment-17516463
 ] 

Apache Spark commented on SPARK-38776:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36051

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38776:


Assignee: (was: Apache Spark)

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38776:


Assignee: Apache Spark

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'

2022-04-03 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516462#comment-17516462
 ] 

Apache Spark commented on SPARK-38776:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36051

> Flaky test: ALSSuite.'ALS validate input dataset'
> -
>
> Key: SPARK-38776
> URL: https://issues.apache.org/jira/browse/SPARK-38776
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true
> {code}
> [info] ALSSuite:
> ...
> [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 
> milliseconds)
> [info]   Invalid Long: out of range "Job aborted due to stage failure: Task 0 
> in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 100.0 (TID 348) (localhost executor driver): 
> org.apache.spark.SparkArithmeticException: Casting 12310 to int 
> causes overflow. To return NULL instead, use 'try_cast'. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

38 matches

Mail list logo