[jira] [Assigned] (SPARK-38780) PySpark docs build should fail when there is warning.
[ https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38780: Assignee: Haejoon Lee > PySpark docs build should fail when there is warning. > - > > Key: SPARK-38780 > URL: https://issues.apache.org/jira/browse/SPARK-38780 > Project: Spark > Issue Type: Test > Components: Documentation, PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > PySpark documents build with `make clean html` command now just passed even > if the Sphinx detects the warning. > This should be failed when the docs violates the Sphinx rule to render docs > better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38780) PySpark docs build should fail when there is warning.
[ https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38780. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36058 [https://github.com/apache/spark/pull/36058] > PySpark docs build should fail when there is warning. > - > > Key: SPARK-38780 > URL: https://issues.apache.org/jira/browse/SPARK-38780 > Project: Spark > Issue Type: Test > Components: Documentation, PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.3.0 > > > PySpark documents build with `make clean html` command now just passed even > if the Sphinx detects the warning. > This should be failed when the docs violates the Sphinx rule to render docs > better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38780) PySpark docs build should fail when there is warning.
[ https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516615#comment-17516615 ] Apache Spark commented on SPARK-38780: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/36058 > PySpark docs build should fail when there is warning. > - > > Key: SPARK-38780 > URL: https://issues.apache.org/jira/browse/SPARK-38780 > Project: Spark > Issue Type: Test > Components: Documentation, PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > PySpark documents build with `make clean html` command now just passed even > if the Sphinx detects the warning. > This should be failed when the docs violates the Sphinx rule to render docs > better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38780) PySpark docs build should fail when there is warning.
[ https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516614#comment-17516614 ] Apache Spark commented on SPARK-38780: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/36058 > PySpark docs build should fail when there is warning. > - > > Key: SPARK-38780 > URL: https://issues.apache.org/jira/browse/SPARK-38780 > Project: Spark > Issue Type: Test > Components: Documentation, PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > PySpark documents build with `make clean html` command now just passed even > if the Sphinx detects the warning. > This should be failed when the docs violates the Sphinx rule to render docs > better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38780) PySpark docs build should fail when there is warning.
[ https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38780: Assignee: Apache Spark > PySpark docs build should fail when there is warning. > - > > Key: SPARK-38780 > URL: https://issues.apache.org/jira/browse/SPARK-38780 > Project: Spark > Issue Type: Test > Components: Documentation, PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > PySpark documents build with `make clean html` command now just passed even > if the Sphinx detects the warning. > This should be failed when the docs violates the Sphinx rule to render docs > better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38780) PySpark docs build should fail when there is warning.
[ https://issues.apache.org/jira/browse/SPARK-38780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38780: Assignee: (was: Apache Spark) > PySpark docs build should fail when there is warning. > - > > Key: SPARK-38780 > URL: https://issues.apache.org/jira/browse/SPARK-38780 > Project: Spark > Issue Type: Test > Components: Documentation, PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > PySpark documents build with `make clean html` command now just passed even > if the Sphinx detects the warning. > This should be failed when the docs violates the Sphinx rule to render docs > better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38780) PySpark docs build should fail when there is warning.
Haejoon Lee created SPARK-38780: --- Summary: PySpark docs build should fail when there is warning. Key: SPARK-38780 URL: https://issues.apache.org/jira/browse/SPARK-38780 Project: Spark Issue Type: Test Components: Documentation, PySpark, Tests Affects Versions: 3.3.0 Reporter: Haejoon Lee PySpark documents build with `make clean html` command now just passed even if the Sphinx detects the warning. This should be failed when the docs violates the Sphinx rule to render docs better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36571) Optimized FileOutputCommitter with StagingDir
[ https://issues.apache.org/jira/browse/SPARK-36571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516609#comment-17516609 ] Apache Spark commented on SPARK-36571: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/36056 > Optimized FileOutputCommitter with StagingDir > - > > Key: SPARK-36571 > URL: https://issues.apache.org/jira/browse/SPARK-36571 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite
[ https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38779. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36050 [https://github.com/apache/spark/pull/36050] > Unify the pushed operator checking between FileSource test suite and JDBC > test suite > > > Key: SPARK-38779 > URL: https://issues.apache.org/jira/browse/SPARK-38779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.4.0 > > > In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. > Will do the same for FileSourceAggregatePushDownSuite > {code:java} > private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): > Unit = { > df.queryExecution.optimizedPlan.collect { > case _: DataSourceV2ScanRelation => > checkKeywordsExistsInExplain(df, expectedPlanFragment) > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite
[ https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38779: - Assignee: Huaxin Gao > Unify the pushed operator checking between FileSource test suite and JDBC > test suite > > > Key: SPARK-38779 > URL: https://issues.apache.org/jira/browse/SPARK-38779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > > In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. > Will do the same for FileSourceAggregatePushDownSuite > {code:java} > private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): > Unit = { > df.queryExecution.optimizedPlan.collect { > case _: DataSourceV2ScanRelation => > checkKeywordsExistsInExplain(df, expectedPlanFragment) > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34863) Support nested column in Spark Parquet vectorized readers
[ https://issues.apache.org/jira/browse/SPARK-34863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516602#comment-17516602 ] Apache Spark commented on SPARK-34863: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/36055 > Support nested column in Spark Parquet vectorized readers > - > > Key: SPARK-34863 > URL: https://issues.apache.org/jira/browse/SPARK-34863 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Apache Spark >Priority: Minor > Fix For: 3.3.0 > > > The task is to support nested column type in Spark Parquet vectorized reader. > Currently Parquet vectorized reader does not support nested column type > (struct, array and map). We implemented nested column vectorized reader for > FB-ORC in our internal fork of Spark. We are seeing performance improvement > compared to non-vectorized reader when reading nested columns. In addition, > this can also help improve the non-nested column performance when reading > non-nested and nested columns together in one query. > > Parquet: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L173] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite
[ https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516596#comment-17516596 ] Apache Spark commented on SPARK-38779: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/36050 > Unify the pushed operator checking between FileSource test suite and JDBC > test suite > > > Key: SPARK-38779 > URL: https://issues.apache.org/jira/browse/SPARK-38779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Huaxin Gao >Priority: Minor > > In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. > Will do the same for FileSourceAggregatePushDownSuite > {code:java} > private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): > Unit = { > df.queryExecution.optimizedPlan.collect { > case _: DataSourceV2ScanRelation => > checkKeywordsExistsInExplain(df, expectedPlanFragment) > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite
[ https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38779: Assignee: Apache Spark > Unify the pushed operator checking between FileSource test suite and JDBC > test suite > > > Key: SPARK-38779 > URL: https://issues.apache.org/jira/browse/SPARK-38779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > > In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. > Will do the same for FileSourceAggregatePushDownSuite > {code:java} > private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): > Unit = { > df.queryExecution.optimizedPlan.collect { > case _: DataSourceV2ScanRelation => > checkKeywordsExistsInExplain(df, expectedPlanFragment) > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite
[ https://issues.apache.org/jira/browse/SPARK-38779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38779: Assignee: (was: Apache Spark) > Unify the pushed operator checking between FileSource test suite and JDBC > test suite > > > Key: SPARK-38779 > URL: https://issues.apache.org/jira/browse/SPARK-38779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Huaxin Gao >Priority: Minor > > In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. > Will do the same for FileSourceAggregatePushDownSuite > {code:java} > private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): > Unit = { > df.queryExecution.optimizedPlan.collect { > case _: DataSourceV2ScanRelation => > checkKeywordsExistsInExplain(df, expectedPlanFragment) > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite
Huaxin Gao created SPARK-38779: -- Summary: Unify the pushed operator checking between FileSource test suite and JDBC test suite Key: SPARK-38779 URL: https://issues.apache.org/jira/browse/SPARK-38779 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0, 3.4.0 Reporter: Huaxin Gao In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. Will do the same for FileSourceAggregatePushDownSuite {code:java} private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): Unit = { df.queryExecution.optimizedPlan.collect { case _: DataSourceV2ScanRelation => checkKeywordsExistsInExplain(df, expectedPlanFragment) } } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516582#comment-17516582 ] Apache Spark commented on SPARK-38776: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36054 > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516583#comment-17516583 ] Apache Spark commented on SPARK-38776: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36054 > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38446) Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
[ https://issues.apache.org/jira/browse/SPARK-38446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38446: - Assignee: Kent Yao > Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j > - > > Key: SPARK-38446 > URL: https://issues.apache.org/jira/browse/SPARK-38446 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.2.1, 3.03 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > {code:java} > files-client-8-1 > PRIORITY : 5 > THREAD ID : 0X7FBFFC5EE000 > NATIVE ID : 0X14903 > NATIVE ID (DECIMAL) : 84227 > STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at java.lang.ClassLoader.loadClass(ClassLoader.java:398) > - waiting to lock <0x0003c0753f88> (a > org.apache.spark.repl.ExecutorClassLoader) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:169) > at > org.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:214) > at > org.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:112) > at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:113) > at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:97) > at > org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629) > at > org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63) > at > org.apache.logging.log4j.core.layout.PatternLayout$NoFormatPatternSerializer.toSerializable(PatternLayout.java:342) > at > org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:240) > at > org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:225) > at > org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59) > at > org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:215) > at > org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:208) > at > org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:199) > at > org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89) > at > org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:675) > at > org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:633) > at > org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:616) > at > org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:552) > at > org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82) > at org.apache.logging.log4j.core.Logger.log(Logger.java:161) > at > org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017) > at > org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983) > at org.apache.logging.slf4j.Log4jLogger.debug(Log4jLogger.java:139) > at org.apache.spark.internal.Logging.logDebug(Logging.scala:82) > at org.apache.spark.internal.Logging.logDebug$(Logging.scala:81) > at org.apache.spark.rpc.netty.NettyRpcEnv.logDebug(NettyRpcEnv.scala:45) > at > org.apache.spark.rpc.netty.NettyRpcEnv$FileDownloadCallback.onFailure(NettyRpcEnv.scala:454) > at > org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260) > {code} > while the class loading lock 0x0003c0753f88 is locked by > ExecutorClassLoader who‘s downloading remote classes/jars though it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38446) Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
[ https://issues.apache.org/jira/browse/SPARK-38446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38446. --- Fix Version/s: 3.3.0 3.2.2 3.1.3 Resolution: Fixed Issue resolved by pull request 35765 [https://github.com/apache/spark/pull/35765] > Deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j > - > > Key: SPARK-38446 > URL: https://issues.apache.org/jira/browse/SPARK-38446 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.2.1, 3.03 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.3 > > > {code:java} > files-client-8-1 > PRIORITY : 5 > THREAD ID : 0X7FBFFC5EE000 > NATIVE ID : 0X14903 > NATIVE ID (DECIMAL) : 84227 > STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at java.lang.ClassLoader.loadClass(ClassLoader.java:398) > - waiting to lock <0x0003c0753f88> (a > org.apache.spark.repl.ExecutorClassLoader) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:169) > at > org.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:214) > at > org.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:112) > at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:113) > at org.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:97) > at > org.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629) > at > org.apache.logging.log4j.core.pattern.ExtendedThrowablePatternConverter.format(ExtendedThrowablePatternConverter.java:63) > at > org.apache.logging.log4j.core.layout.PatternLayout$NoFormatPatternSerializer.toSerializable(PatternLayout.java:342) > at > org.apache.logging.log4j.core.layout.PatternLayout.toText(PatternLayout.java:240) > at > org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:225) > at > org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:59) > at > org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:215) > at > org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:208) > at > org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:199) > at > org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125) > at > org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89) > at > org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:675) > at > org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:633) > at > org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:616) > at > org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:552) > at > org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82) > at org.apache.logging.log4j.core.Logger.log(Logger.java:161) > at > org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142) > at > org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017) > at > org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983) > at org.apache.logging.slf4j.Log4jLogger.debug(Log4jLogger.java:139) > at org.apache.spark.internal.Logging.logDebug(Logging.scala:82) > at org.apache.spark.internal.Logging.logDebug$(Logging.scala:81) > at org.apache.spark.rpc.netty.NettyRpcEnv.logDebug(NettyRpcEnv.scala:45) > at > org.apache.spark.rpc.netty.NettyRpcEnv$FileDownloadCallback.onFailure(NettyRpcEnv.scala:454) > at > org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260) > {code} > while the class loading lock 0x0003c0753f88 is locked by > ExecutorClassLoader who‘s downloading remote classes/jars though it. -- This message was sent by Atlassian Jira (v8.20.1#820001) --
[jira] [Resolved] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38776. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36051 [https://github.com/apache/spark/pull/36051] > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38778) Replace http with https for project url in pom
[ https://issues.apache.org/jira/browse/SPARK-38778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516537#comment-17516537 ] Apache Spark commented on SPARK-38778: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36053 > Replace http with https for project url in pom > -- > > Key: SPARK-38778 > URL: https://issues.apache.org/jira/browse/SPARK-38778 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0, 3.4.0 >Reporter: Kent Yao >Priority: Major > > Replace http with https for project url in pom -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38778) Replace http with https for project url in pom
[ https://issues.apache.org/jira/browse/SPARK-38778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38778: Assignee: (was: Apache Spark) > Replace http with https for project url in pom > -- > > Key: SPARK-38778 > URL: https://issues.apache.org/jira/browse/SPARK-38778 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0, 3.4.0 >Reporter: Kent Yao >Priority: Major > > Replace http with https for project url in pom -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38778) Replace http with https for project url in pom
[ https://issues.apache.org/jira/browse/SPARK-38778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38778: Assignee: Apache Spark > Replace http with https for project url in pom > -- > > Key: SPARK-38778 > URL: https://issues.apache.org/jira/browse/SPARK-38778 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0, 3.4.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > Replace http with https for project url in pom -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38778) Replace http with https for project url in pom
Kent Yao created SPARK-38778: Summary: Replace http with https for project url in pom Key: SPARK-38778 URL: https://issues.apache.org/jira/browse/SPARK-38778 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0, 3.4.0 Reporter: Kent Yao Replace http with https for project url in pom -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn
[ https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516529#comment-17516529 ] Apache Spark commented on SPARK-38777: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36052 > Add `bin/spark-submit --kill / --status` support for yarn > - > > Key: SPARK-38777 > URL: https://issues.apache.org/jira/browse/SPARK-38777 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > Add this feature to yarn resource manager as what we support for standalone, > mesos, kubernetes -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn
[ https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38777: Assignee: (was: Apache Spark) > Add `bin/spark-submit --kill / --status` support for yarn > - > > Key: SPARK-38777 > URL: https://issues.apache.org/jira/browse/SPARK-38777 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > Add this feature to yarn resource manager as what we support for standalone, > mesos, kubernetes -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn
[ https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516528#comment-17516528 ] Apache Spark commented on SPARK-38777: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/36052 > Add `bin/spark-submit --kill / --status` support for yarn > - > > Key: SPARK-38777 > URL: https://issues.apache.org/jira/browse/SPARK-38777 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > > Add this feature to yarn resource manager as what we support for standalone, > mesos, kubernetes -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn
[ https://issues.apache.org/jira/browse/SPARK-38777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38777: Assignee: Apache Spark > Add `bin/spark-submit --kill / --status` support for yarn > - > > Key: SPARK-38777 > URL: https://issues.apache.org/jira/browse/SPARK-38777 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.4.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > Add this feature to yarn resource manager as what we support for standalone, > mesos, kubernetes -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38777) Add `bin/spark-submit --kill / --status` support for yarn
Kent Yao created SPARK-38777: Summary: Add `bin/spark-submit --kill / --status` support for yarn Key: SPARK-38777 URL: https://issues.apache.org/jira/browse/SPARK-38777 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 3.4.0 Reporter: Kent Yao Add this feature to yarn resource manager as what we support for standalone, mesos, kubernetes -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs
[ https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28090. --- Fix Version/s: 3.4.0 Assignee: Peter Toth Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/35382 > Spark hangs when an execution plan has many projections on nested structs > - > > Key: SPARK-28090 > URL: https://issues.apache.org/jira/browse/SPARK-28090 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 > Environment: Tried in > * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows > * Spark 2.4.3 / Yarn on a Linux cluster >Reporter: Ruslan Yushchenko >Assignee: Peter Toth >Priority: Major > Fix For: 3.4.0 > > > This was already posted (#28016), but the provided example didn't always > reproduce the error. This example consistently reproduces the issue. > Spark applications freeze on execution plan optimization stage (Catalyst) > when a logical execution plan contains a lot of projections that operate on > nested struct fields. > The code listed below demonstrates the issue. > To reproduce the Spark App does the following: > * A small dataframe is created from a JSON example. > * Several nested transformations (negation of a number) are applied on > struct fields and each time a new struct field is created. > * Once more than 9 such transformations are applied the Catalyst optimizer > freezes on optimizing the execution plan. > * You can control the freezing by choosing different upper bound for the > Range. E.g. it will work file if the upper bound is 5, but will hang is the > bound is 10. > {code:java} > package com.example > import org.apache.spark.sql._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.types.{StructField, StructType} > import scala.collection.mutable.ListBuffer > object SparkApp1IssueSelfContained { > // A sample data for a dataframe with nested structs > val sample: List[String] = > """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, > "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": > 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } > """ :: > """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, > "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": > 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } > """ :: > """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, > "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": > 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } > """ :: > Nil > /** > * Transforms a column inside a nested struct. The transformed value will > be put into a new field of that nested struct > * > * The output column name can omit the full path as the field will be > created at the same level of nesting as the input column. > * > * @param inputColumnName A column name for which to apply the > transformation, e.g. `company.employee.firstName`. > * @param outputColumnName The output column name. The path is optional, > e.g. you can use `transformedName` instead of > `company.employee.transformedName`. > * @param expression A function that applies a transformation to a > column as a Spark expression. > * @return A dataframe with a new field that contains transformed values. > */ > def transformInsideNestedStruct(df: DataFrame, > inputColumnName: String, > outputColumnName: String, > expression: Column => Column): DataFrame = { > def mapStruct(schema: StructType, path: Seq[String], parentColumn: > Option[Column] = None): Seq[Column] = { > val mappedFields = new ListBuffer[Column]() > def handleMatchedLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > val newColumn = expression(curColumn).as(outputColumnName) > mappedFields += newColumn > Seq(curColumn) > } > def handleMatchedNonLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > // Non-leaf columns need to be further processed recursively > field.dataType match { > case dt: StructType => Seq(struct(mapStruct(dt, path.tail, > Some(curColumn)): _*).as(field.name)) > case _ => throw new IllegalArgumentException(s"Field > '${field.name}' is not a struct type.") > } > } > val fieldName = path.head > val isLeaf = path.lengthCompare(2) < 0 > val newColumns = schema.fields.flatMap(fie
[jira] [Updated] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs
[ https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28090: -- Labels: (was: bulk-closed) > Spark hangs when an execution plan has many projections on nested structs > - > > Key: SPARK-28090 > URL: https://issues.apache.org/jira/browse/SPARK-28090 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 2.4.3 > Environment: Tried in > * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows > * Spark 2.4.3 / Yarn on a Linux cluster >Reporter: Ruslan Yushchenko >Priority: Major > > This was already posted (#28016), but the provided example didn't always > reproduce the error. This example consistently reproduces the issue. > Spark applications freeze on execution plan optimization stage (Catalyst) > when a logical execution plan contains a lot of projections that operate on > nested struct fields. > The code listed below demonstrates the issue. > To reproduce the Spark App does the following: > * A small dataframe is created from a JSON example. > * Several nested transformations (negation of a number) are applied on > struct fields and each time a new struct field is created. > * Once more than 9 such transformations are applied the Catalyst optimizer > freezes on optimizing the execution plan. > * You can control the freezing by choosing different upper bound for the > Range. E.g. it will work file if the upper bound is 5, but will hang is the > bound is 10. > {code:java} > package com.example > import org.apache.spark.sql._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.types.{StructField, StructType} > import scala.collection.mutable.ListBuffer > object SparkApp1IssueSelfContained { > // A sample data for a dataframe with nested structs > val sample: List[String] = > """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, > "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": > 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } > """ :: > """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, > "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": > 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } > """ :: > """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, > "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": > 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } > """ :: > Nil > /** > * Transforms a column inside a nested struct. The transformed value will > be put into a new field of that nested struct > * > * The output column name can omit the full path as the field will be > created at the same level of nesting as the input column. > * > * @param inputColumnName A column name for which to apply the > transformation, e.g. `company.employee.firstName`. > * @param outputColumnName The output column name. The path is optional, > e.g. you can use `transformedName` instead of > `company.employee.transformedName`. > * @param expression A function that applies a transformation to a > column as a Spark expression. > * @return A dataframe with a new field that contains transformed values. > */ > def transformInsideNestedStruct(df: DataFrame, > inputColumnName: String, > outputColumnName: String, > expression: Column => Column): DataFrame = { > def mapStruct(schema: StructType, path: Seq[String], parentColumn: > Option[Column] = None): Seq[Column] = { > val mappedFields = new ListBuffer[Column]() > def handleMatchedLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > val newColumn = expression(curColumn).as(outputColumnName) > mappedFields += newColumn > Seq(curColumn) > } > def handleMatchedNonLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > // Non-leaf columns need to be further processed recursively > field.dataType match { > case dt: StructType => Seq(struct(mapStruct(dt, path.tail, > Some(curColumn)): _*).as(field.name)) > case _ => throw new IllegalArgumentException(s"Field > '${field.name}' is not a struct type.") > } > } > val fieldName = path.head > val isLeaf = path.lengthCompare(2) < 0 > val newColumns = schema.fields.flatMap(field => { > // This is the original column (struct field) we want to process > val curColumn = parentColumn match { > case None => new Column(fi
[jira] [Reopened] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs
[ https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-28090: --- > Spark hangs when an execution plan has many projections on nested structs > - > > Key: SPARK-28090 > URL: https://issues.apache.org/jira/browse/SPARK-28090 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 > Environment: Tried in > * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows > * Spark 2.4.3 / Yarn on a Linux cluster >Reporter: Ruslan Yushchenko >Priority: Major > > This was already posted (#28016), but the provided example didn't always > reproduce the error. This example consistently reproduces the issue. > Spark applications freeze on execution plan optimization stage (Catalyst) > when a logical execution plan contains a lot of projections that operate on > nested struct fields. > The code listed below demonstrates the issue. > To reproduce the Spark App does the following: > * A small dataframe is created from a JSON example. > * Several nested transformations (negation of a number) are applied on > struct fields and each time a new struct field is created. > * Once more than 9 such transformations are applied the Catalyst optimizer > freezes on optimizing the execution plan. > * You can control the freezing by choosing different upper bound for the > Range. E.g. it will work file if the upper bound is 5, but will hang is the > bound is 10. > {code:java} > package com.example > import org.apache.spark.sql._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.types.{StructField, StructType} > import scala.collection.mutable.ListBuffer > object SparkApp1IssueSelfContained { > // A sample data for a dataframe with nested structs > val sample: List[String] = > """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, > "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": > 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } > """ :: > """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, > "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": > 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } > """ :: > """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, > "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": > 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } > """ :: > Nil > /** > * Transforms a column inside a nested struct. The transformed value will > be put into a new field of that nested struct > * > * The output column name can omit the full path as the field will be > created at the same level of nesting as the input column. > * > * @param inputColumnName A column name for which to apply the > transformation, e.g. `company.employee.firstName`. > * @param outputColumnName The output column name. The path is optional, > e.g. you can use `transformedName` instead of > `company.employee.transformedName`. > * @param expression A function that applies a transformation to a > column as a Spark expression. > * @return A dataframe with a new field that contains transformed values. > */ > def transformInsideNestedStruct(df: DataFrame, > inputColumnName: String, > outputColumnName: String, > expression: Column => Column): DataFrame = { > def mapStruct(schema: StructType, path: Seq[String], parentColumn: > Option[Column] = None): Seq[Column] = { > val mappedFields = new ListBuffer[Column]() > def handleMatchedLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > val newColumn = expression(curColumn).as(outputColumnName) > mappedFields += newColumn > Seq(curColumn) > } > def handleMatchedNonLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > // Non-leaf columns need to be further processed recursively > field.dataType match { > case dt: StructType => Seq(struct(mapStruct(dt, path.tail, > Some(curColumn)): _*).as(field.name)) > case _ => throw new IllegalArgumentException(s"Field > '${field.name}' is not a struct type.") > } > } > val fieldName = path.head > val isLeaf = path.lengthCompare(2) < 0 > val newColumns = schema.fields.flatMap(field => { > // This is the original column (struct field) we want to process > val curColumn = parentColumn match { > case None => new Column(field.name) > case Some(col) => co
[jira] [Updated] (SPARK-28090) Spark hangs when an execution plan has many projections on nested structs
[ https://issues.apache.org/jira/browse/SPARK-28090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28090: -- Component/s: (was: Optimizer) > Spark hangs when an execution plan has many projections on nested structs > - > > Key: SPARK-28090 > URL: https://issues.apache.org/jira/browse/SPARK-28090 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 > Environment: Tried in > * Spark 2.2.1, Spark 2.4.3 in local mode on Linux, MasOS and Windows > * Spark 2.4.3 / Yarn on a Linux cluster >Reporter: Ruslan Yushchenko >Priority: Major > > This was already posted (#28016), but the provided example didn't always > reproduce the error. This example consistently reproduces the issue. > Spark applications freeze on execution plan optimization stage (Catalyst) > when a logical execution plan contains a lot of projections that operate on > nested struct fields. > The code listed below demonstrates the issue. > To reproduce the Spark App does the following: > * A small dataframe is created from a JSON example. > * Several nested transformations (negation of a number) are applied on > struct fields and each time a new struct field is created. > * Once more than 9 such transformations are applied the Catalyst optimizer > freezes on optimizing the execution plan. > * You can control the freezing by choosing different upper bound for the > Range. E.g. it will work file if the upper bound is 5, but will hang is the > bound is 10. > {code:java} > package com.example > import org.apache.spark.sql._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.types.{StructField, StructType} > import scala.collection.mutable.ListBuffer > object SparkApp1IssueSelfContained { > // A sample data for a dataframe with nested structs > val sample: List[String] = > """ { "numerics": {"num1": 101, "num2": 102, "num3": 103, "num4": 104, > "num5": 105, "num6": 106, "num7": 107, "num8": 108, "num9": 109, "num10": > 110, "num11": 111, "num12": 112, "num13": 113, "num14": 114, "num15": 115} } > """ :: > """ { "numerics": {"num1": 201, "num2": 202, "num3": 203, "num4": 204, > "num5": 205, "num6": 206, "num7": 207, "num8": 208, "num9": 209, "num10": > 210, "num11": 211, "num12": 212, "num13": 213, "num14": 214, "num15": 215} } > """ :: > """ { "numerics": {"num1": 301, "num2": 302, "num3": 303, "num4": 304, > "num5": 305, "num6": 306, "num7": 307, "num8": 308, "num9": 309, "num10": > 310, "num11": 311, "num12": 312, "num13": 313, "num14": 314, "num15": 315} } > """ :: > Nil > /** > * Transforms a column inside a nested struct. The transformed value will > be put into a new field of that nested struct > * > * The output column name can omit the full path as the field will be > created at the same level of nesting as the input column. > * > * @param inputColumnName A column name for which to apply the > transformation, e.g. `company.employee.firstName`. > * @param outputColumnName The output column name. The path is optional, > e.g. you can use `transformedName` instead of > `company.employee.transformedName`. > * @param expression A function that applies a transformation to a > column as a Spark expression. > * @return A dataframe with a new field that contains transformed values. > */ > def transformInsideNestedStruct(df: DataFrame, > inputColumnName: String, > outputColumnName: String, > expression: Column => Column): DataFrame = { > def mapStruct(schema: StructType, path: Seq[String], parentColumn: > Option[Column] = None): Seq[Column] = { > val mappedFields = new ListBuffer[Column]() > def handleMatchedLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > val newColumn = expression(curColumn).as(outputColumnName) > mappedFields += newColumn > Seq(curColumn) > } > def handleMatchedNonLeaf(field: StructField, curColumn: Column): > Seq[Column] = { > // Non-leaf columns need to be further processed recursively > field.dataType match { > case dt: StructType => Seq(struct(mapStruct(dt, path.tail, > Some(curColumn)): _*).as(field.name)) > case _ => throw new IllegalArgumentException(s"Field > '${field.name}' is not a struct type.") > } > } > val fieldName = path.head > val isLeaf = path.lengthCompare(2) < 0 > val newColumns = schema.fields.flatMap(field => { > // This is the original column (struct field) we want to process > val curColumn = parentColumn match { > case None => new Column(field.na
[jira] [Assigned] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38776: - Assignee: Dongjoon Hyun > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516463#comment-17516463 ] Apache Spark commented on SPARK-38776: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36051 > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38776: Assignee: (was: Apache Spark) > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38776: Assignee: Apache Spark > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38776) Flaky test: ALSSuite.'ALS validate input dataset'
[ https://issues.apache.org/jira/browse/SPARK-38776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516462#comment-17516462 ] Apache Spark commented on SPARK-38776: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36051 > Flaky test: ALSSuite.'ALS validate input dataset' > - > > Key: SPARK-38776 > URL: https://issues.apache.org/jira/browse/SPARK-38776 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > - https://github.com/apache/spark/runs/5803714260?check_suite_focus=true > {code} > [info] ALSSuite: > ... > [info] - ALS validate input dataset *** FAILED *** (2 seconds, 449 > milliseconds) > [info] Invalid Long: out of range "Job aborted due to stage failure: Task 0 > in stage 100.0 failed 1 times, most recent failure: Lost task 0.0 in stage > 100.0 (TID 348) (localhost executor driver): > org.apache.spark.SparkArithmeticException: Casting 12310 to int > causes overflow. To return NULL instead, use 'try_cast'. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org