[jira] [Commented] (SPARK-32253) Make readability better in the test result logs
[ https://issues.apache.org/jira/browse/SPARK-32253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164148#comment-17164148 ] Apache Spark commented on SPARK-32253: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/29219 > Make readability better in the test result logs > --- > > Key: SPARK-32253 > URL: https://issues.apache.org/jira/browse/SPARK-32253 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.1.0 > > > Currently, the readability in the logs are not really good. For example, see > https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D > We should have a way to easily see the failed test cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32253) Make readability better in the test result logs
[ https://issues.apache.org/jira/browse/SPARK-32253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164150#comment-17164150 ] Apache Spark commented on SPARK-32253: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/29219 > Make readability better in the test result logs > --- > > Key: SPARK-32253 > URL: https://issues.apache.org/jira/browse/SPARK-32253 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.1.0 > > > Currently, the readability in the logs are not really good. For example, see > https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D > We should have a way to easily see the failed test cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32408) Enable crossPaths back to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164135#comment-17164135 ] Apache Spark commented on SPARK-32408: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29218 > Enable crossPaths back to prevent side effects > -- > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should enable crossPaths back for now. > This is actually an issue in Jenkins jobs as well. See > https://github.com/apache/spark/pull/29205 for the analysis made. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32408) Enable crossPaths back to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164134#comment-17164134 ] Apache Spark commented on SPARK-32408: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29218 > Enable crossPaths back to prevent side effects > -- > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should enable crossPaths back for now. > This is actually an issue in Jenkins jobs as well. See > https://github.com/apache/spark/pull/29205 for the analysis made. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32308) Move by-name resolution logic of unionByName from API code to analysis phase
[ https://issues.apache.org/jira/browse/SPARK-32308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32308. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29107 [https://github.com/apache/spark/pull/29107] > Move by-name resolution logic of unionByName from API code to analysis phase > > > Key: SPARK-32308 > URL: https://issues.apache.org/jira/browse/SPARK-32308 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.1.0 > > > Currently the by-name resolution logic of unionByName is put in API code. We > should move the logic to analysis phase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-32372: Fix Version/s: 2.4.7 > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32280) AnalysisException thrown when query contains several JOINs
[ https://issues.apache.org/jira/browse/SPARK-32280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-32280: Fix Version/s: 2.4.7 > AnalysisException thrown when query contains several JOINs > -- > > Key: SPARK-32280 > URL: https://issues.apache.org/jira/browse/SPARK-32280 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: David Lindelöf >Assignee: wuyi >Priority: Major > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > I've come across a curious {{AnalysisException}} thrown in one of my SQL > queries, even though the SQL appears legitimate. I was able to reduce it to > this example: > {code:python} > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > spark.sql('SELECT 1 AS id').createOrReplaceTempView('A') > spark.sql(''' > SELECT id, > 'foo' AS kind > FROM A''').createOrReplaceTempView('B') > spark.sql(''' > SELECT l.id > FROM B AS l > JOIN B AS r > ON l.kind = r.kind''').createOrReplaceTempView('C') > spark.sql(''' > SELECT 0 > FROM ( >SELECT * >FROM B >JOIN C >USING (id)) > JOIN ( >SELECT * >FROM B >JOIN C >USING (id)) > USING (id)''') > {code} > Running this yields the following error: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql. > : org.apache.spark.sql.AnalysisException: Resolved attribute(s) kind#11 > missing from id#10,kind#2,id#7,kind#5 in operator !Join Inner, (kind#11 = > kind#5). Attribute(s) with the same name appear in the operation: kind. > Please check if the right attribute(s) are used.;; > Project [0 AS 0#15] > +- Project [id#0, kind#2, kind#11] >+- Join Inner, (id#0 = id#14) > :- SubqueryAlias `__auto_generated_subquery_name` > : +- Project [id#0, kind#2] > : +- Project [id#0, kind#2] > :+- Join Inner, (id#0 = id#9) > : :- SubqueryAlias `b` > : : +- Project [id#0, foo AS kind#2] > : : +- SubqueryAlias `a` > : :+- Project [1 AS id#0] > : : +- OneRowRelation > : +- SubqueryAlias `c` > : +- Project [id#9] > : +- Join Inner, (kind#2 = kind#5) > ::- SubqueryAlias `l` > :: +- SubqueryAlias `b` > :: +- Project [id#9, foo AS kind#2] > ::+- SubqueryAlias `a` > :: +- Project [1 AS id#9] > :: +- OneRowRelation > :+- SubqueryAlias `r` > : +- SubqueryAlias `b` > : +- Project [id#7, foo AS kind#5] > : +- SubqueryAlias `a` > :+- Project [1 AS id#7] > : +- OneRowRelation > +- SubqueryAlias `__auto_generated_subquery_name` > +- Project [id#14, kind#11] > +- Project [id#14, kind#11] >+- Join Inner, (id#14 = id#10) > :- SubqueryAlias `b` > : +- Project [id#14, foo AS kind#11] > : +- SubqueryAlias `a` > :+- Project [1 AS id#14] > : +- OneRowRelation > +- SubqueryAlias `c` > +- Project [id#10] > +- !Join Inner, (kind#11 = kind#5) >:- SubqueryAlias `l` >: +- SubqueryAlias `b` >: +- Project [id#10, foo AS kind#2] >:+- SubqueryAlias `a` >: +- Project [1 AS id#10] >: +- OneRowRelation >+- SubqueryAlias `r` > +- SubqueryAlias `b` > +- Project [id#7, foo AS kind#5] > +- SubqueryAlias `a` >+- Project [1 AS id#7] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:369) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86) > at >
[jira] [Updated] (SPARK-32408) Enable crossPaths back to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Description: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change since we're not doing the cross build in SBT. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. It seems causing side effects that are dependent on that path. See, for example, {{git grep \-r "target/scala\-"}}. To minimise the side effects, we should enable crossPaths back for now. This is actually an issue in Jenkins jobs as well. See was: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change since we're not doing the cross build in SBT. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. It seems causing side effects that are dependent on that path. See, for example, {{git grep \-r "target/scala\-"}}. To minimise the side effects, we should disable crossPaths only in GitHub Actions build for now. > Enable crossPaths back to prevent side effects > -- > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should enable crossPaths back for now. > This is actually an issue in Jenkins jobs as well. See -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Enable crossPaths back to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Description: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change since we're not doing the cross build in SBT. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. It seems causing side effects that are dependent on that path. See, for example, {{git grep \-r "target/scala\-"}}. To minimise the side effects, we should enable crossPaths back for now. This is actually an issue in Jenkins jobs as well. See https://github.com/apache/spark/pull/29205 for the analysis made. was: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change since we're not doing the cross build in SBT. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. It seems causing side effects that are dependent on that path. See, for example, {{git grep \-r "target/scala\-"}}. To minimise the side effects, we should enable crossPaths back for now. This is actually an issue in Jenkins jobs as well. See > Enable crossPaths back to prevent side effects > -- > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should enable crossPaths back for now. > This is actually an issue in Jenkins jobs as well. See > https://github.com/apache/spark/pull/29205 for the analysis made. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Enable crossPaths back to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Summary: Enable crossPaths back to prevent side effects (was: Disable crossPaths only in GitHub Actions to prevent side effects) > Enable crossPaths back to prevent side effects > -- > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32422) Don't skip pandas UDF tests in IntegratedUDFTestUtils
Hyukjin Kwon created SPARK-32422: Summary: Don't skip pandas UDF tests in IntegratedUDFTestUtils Key: SPARK-32422 URL: https://issues.apache.org/jira/browse/SPARK-32422 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 3.1.0 Reporter: Hyukjin Kwon Currently, pandas UDF test cases are being skipped as below: {code} [info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!! [info] - udf/postgreSQL/udf-select_having.sql - Scala UDF (2 seconds, 327 milliseconds) [info] - udf/postgreSQL/udf-select_having.sql - Regular Python UDF (3 seconds, 656 milliseconds) [info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!! [info] - udf/postgreSQL/udf-select_implicit.sql - Scala UDF (6 seconds, 769 milliseconds) [info] - udf/postgreSQL/udf-select_implicit.sql - Regular Python UDF (10 seconds, 487 milliseconds) [info] - udf/postgreSQL/udf-select_implicit.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!! [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scala UDF (119 milliseconds) [info] - udf/postgreSQL/udf-aggregates_part3.sql - Regular Python UDF (229 milliseconds) [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!! [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scala UDF (2 seconds, 376 milliseconds) [info] - udf/postgreSQL/udf-aggregates_part2.sql - Regular Python UDF (2 seconds, 449 milliseconds) [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!! [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scala UDF (3 seconds, 634 milliseconds) [info] - udf/postgreSQL/udf-aggregates_part1.sql - Regular Python UDF (5 seconds, 899 milliseconds) [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scalar Pandas UDF is skipped because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED !!! {code} in GitBub Actions. We should test them -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32422) Don't skip pandas UDF tests in IntegratedUDFTestUtils
[ https://issues.apache.org/jira/browse/SPARK-32422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32422: Assignee: (was: Apache Spark) > Don't skip pandas UDF tests in IntegratedUDFTestUtils > - > > Key: SPARK-32422 > URL: https://issues.apache.org/jira/browse/SPARK-32422 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, pandas UDF test cases are being skipped as below: > {code} > [info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because > pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED > !!! > [info] - udf/postgreSQL/udf-select_having.sql - Scala UDF (2 seconds, 327 > milliseconds) > [info] - udf/postgreSQL/udf-select_having.sql - Regular Python UDF (3 > seconds, 656 milliseconds) > [info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped > because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! > IGNORED !!! > [info] - udf/postgreSQL/udf-select_implicit.sql - Scala UDF (6 seconds, 769 > milliseconds) > [info] - udf/postgreSQL/udf-select_implicit.sql - Regular Python UDF (10 > seconds, 487 milliseconds) > [info] - udf/postgreSQL/udf-select_implicit.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scala UDF (119 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Regular Python UDF (229 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scala UDF (2 seconds, 376 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Regular Python UDF (2 > seconds, 449 milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scala UDF (3 seconds, 634 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Regular Python UDF (5 > seconds, 899 milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > {code} > in GitBub Actions. We should test them -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32422) Don't skip pandas UDF tests in IntegratedUDFTestUtils
[ https://issues.apache.org/jira/browse/SPARK-32422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32422: Assignee: Apache Spark > Don't skip pandas UDF tests in IntegratedUDFTestUtils > - > > Key: SPARK-32422 > URL: https://issues.apache.org/jira/browse/SPARK-32422 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Currently, pandas UDF test cases are being skipped as below: > {code} > [info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because > pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED > !!! > [info] - udf/postgreSQL/udf-select_having.sql - Scala UDF (2 seconds, 327 > milliseconds) > [info] - udf/postgreSQL/udf-select_having.sql - Regular Python UDF (3 > seconds, 656 milliseconds) > [info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped > because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! > IGNORED !!! > [info] - udf/postgreSQL/udf-select_implicit.sql - Scala UDF (6 seconds, 769 > milliseconds) > [info] - udf/postgreSQL/udf-select_implicit.sql - Regular Python UDF (10 > seconds, 487 milliseconds) > [info] - udf/postgreSQL/udf-select_implicit.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scala UDF (119 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Regular Python UDF (229 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scala UDF (2 seconds, 376 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Regular Python UDF (2 > seconds, 449 milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scala UDF (3 seconds, 634 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Regular Python UDF (5 > seconds, 899 milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > {code} > in GitBub Actions. We should test them -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32422) Don't skip pandas UDF tests in IntegratedUDFTestUtils
[ https://issues.apache.org/jira/browse/SPARK-32422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164125#comment-17164125 ] Apache Spark commented on SPARK-32422: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29217 > Don't skip pandas UDF tests in IntegratedUDFTestUtils > - > > Key: SPARK-32422 > URL: https://issues.apache.org/jira/browse/SPARK-32422 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, pandas UDF test cases are being skipped as below: > {code} > [info] - udf/postgreSQL/udf-case.sql - Scalar Pandas UDF is skipped because > pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! IGNORED > !!! > [info] - udf/postgreSQL/udf-select_having.sql - Scala UDF (2 seconds, 327 > milliseconds) > [info] - udf/postgreSQL/udf-select_having.sql - Regular Python UDF (3 > seconds, 656 milliseconds) > [info] - udf/postgreSQL/udf-select_having.sql - Scalar Pandas UDF is skipped > because pyspark,pandas and/or pyarrow were not available in [python3.6]. !!! > IGNORED !!! > [info] - udf/postgreSQL/udf-select_implicit.sql - Scala UDF (6 seconds, 769 > milliseconds) > [info] - udf/postgreSQL/udf-select_implicit.sql - Regular Python UDF (10 > seconds, 487 milliseconds) > [info] - udf/postgreSQL/udf-select_implicit.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scala UDF (119 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Regular Python UDF (229 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part3.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scala UDF (2 seconds, 376 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Regular Python UDF (2 > seconds, 449 milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part2.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scala UDF (3 seconds, 634 > milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Regular Python UDF (5 > seconds, 899 milliseconds) > [info] - udf/postgreSQL/udf-aggregates_part1.sql - Scalar Pandas UDF is > skipped because pyspark,pandas and/or pyarrow were not available in > [python3.6]. !!! IGNORED !!! > {code} > in GitBub Actions. We should test them -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32237) Cannot resolve column when put hint in the views of common table expression
[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32237. - Fix Version/s: 3.1.0 3.0.1 Resolution: Fixed Issue resolved by pull request 29201 [https://github.com/apache/spark/pull/29201] > Cannot resolve column when put hint in the views of common table expression > --- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 >Reporter: Kernel Force >Assignee: Lantao Jin >Priority: Major > Fix For: 3.0.1, 3.1.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; > +---+-+-+-+ > | t.id | t.name | t.batch | t.team | > +---+-+-+-+ > | 1 | mike| 2020-07-08 | A | > | 2 | john| 2020-07-07 | B | > | 3 | rose| 2020-07-06 | B | > | | > +---+-+-+-+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T >+- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA >+- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data >+- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at >
[jira] [Assigned] (SPARK-32237) Cannot resolve column when put hint in the views of common table expression
[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32237: --- Assignee: Lantao Jin > Cannot resolve column when put hint in the views of common table expression > --- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 >Reporter: Kernel Force >Assignee: Lantao Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; > +---+-+-+-+ > | t.id | t.name | t.batch | t.team | > +---+-+-+-+ > | 1 | mike| 2020-07-08 | A | > | 2 | john| 2020-07-07 | B | > | 3 | rose| 2020-07-06 | B | > | | > +---+-+-+-+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T >+- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA >+- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data >+- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140) > at >
[jira] [Assigned] (SPARK-32420) Add handling for unique key in non-codegen hash join
[ https://issues.apache.org/jira/browse/SPARK-32420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32420: Assignee: (was: Apache Spark) > Add handling for unique key in non-codegen hash join > > > Key: SPARK-32420 > URL: https://issues.apache.org/jira/browse/SPARK-32420 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Priority: Trivial > > `HashRelation` has two separate code paths for unique key look up and > non-unique key look up E.g. in its subclass > `UnsafeHashedRelation`([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L144-L177]), > unique key look up is more efficient as it does not have extra > `Iterator[UnsafeRow].hasNext()/next()` overhead per row. > `BroadcastHashJoinExec` has handled unique key vs non-unique key separately > in code-gen path > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala#L289-L321]). > But the non-codegen path for broadcast hash join and shuffled hash join do > not separate it yet, so adding the support here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32420) Add handling for unique key in non-codegen hash join
[ https://issues.apache.org/jira/browse/SPARK-32420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32420: Assignee: Apache Spark > Add handling for unique key in non-codegen hash join > > > Key: SPARK-32420 > URL: https://issues.apache.org/jira/browse/SPARK-32420 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Assignee: Apache Spark >Priority: Trivial > > `HashRelation` has two separate code paths for unique key look up and > non-unique key look up E.g. in its subclass > `UnsafeHashedRelation`([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L144-L177]), > unique key look up is more efficient as it does not have extra > `Iterator[UnsafeRow].hasNext()/next()` overhead per row. > `BroadcastHashJoinExec` has handled unique key vs non-unique key separately > in code-gen path > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala#L289-L321]). > But the non-codegen path for broadcast hash join and shuffled hash join do > not separate it yet, so adding the support here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32420) Add handling for unique key in non-codegen hash join
[ https://issues.apache.org/jira/browse/SPARK-32420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164113#comment-17164113 ] Apache Spark commented on SPARK-32420: -- User 'c21' has created a pull request for this issue: https://github.com/apache/spark/pull/29216 > Add handling for unique key in non-codegen hash join > > > Key: SPARK-32420 > URL: https://issues.apache.org/jira/browse/SPARK-32420 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Priority: Trivial > > `HashRelation` has two separate code paths for unique key look up and > non-unique key look up E.g. in its subclass > `UnsafeHashedRelation`([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L144-L177]), > unique key look up is more efficient as it does not have extra > `Iterator[UnsafeRow].hasNext()/next()` overhead per row. > `BroadcastHashJoinExec` has handled unique key vs non-unique key separately > in code-gen path > ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala#L289-L321]). > But the non-codegen path for broadcast hash join and shuffled hash join do > not separate it yet, so adding the support here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32421) Add code-gen for shuffled hash join
Cheng Su created SPARK-32421: Summary: Add code-gen for shuffled hash join Key: SPARK-32421 URL: https://issues.apache.org/jira/browse/SPARK-32421 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Cheng Su We added shuffled hash join codegen internally in our fork, and seeing obvious improvement in benchmark compared to current non-codegen code path. Creating this Jira to add this support. Shuffled hash join codegen is very similar to broadcast hash join codegen. So this is a simple change. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32421) Add code-gen for shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-32421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164108#comment-17164108 ] Cheng Su commented on SPARK-32421: -- Will raise a PR in a couple of days. > Add code-gen for shuffled hash join > --- > > Key: SPARK-32421 > URL: https://issues.apache.org/jira/browse/SPARK-32421 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Priority: Trivial > > We added shuffled hash join codegen internally in our fork, and seeing > obvious improvement in benchmark compared to current non-codegen code path. > Creating this Jira to add this support. Shuffled hash join codegen is very > similar to broadcast hash join codegen. So this is a simple change. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32420) Add handling for unique key in non-codegen hash join
Cheng Su created SPARK-32420: Summary: Add handling for unique key in non-codegen hash join Key: SPARK-32420 URL: https://issues.apache.org/jira/browse/SPARK-32420 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Cheng Su `HashRelation` has two separate code paths for unique key look up and non-unique key look up E.g. in its subclass `UnsafeHashedRelation`([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L144-L177]), unique key look up is more efficient as it does not have extra `Iterator[UnsafeRow].hasNext()/next()` overhead per row. `BroadcastHashJoinExec` has handled unique key vs non-unique key separately in code-gen path ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala#L289-L321]). But the non-codegen path for broadcast hash join and shuffled hash join do not separate it yet, so adding the support here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31547) Upgrade Genjavadoc to 0.16
[ https://issues.apache.org/jira/browse/SPARK-31547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31547: -- Parent: SPARK-25075 Issue Type: Sub-task (was: Improvement) > Upgrade Genjavadoc to 0.16 > -- > > Key: SPARK-31547 > URL: https://issues.apache.org/jira/browse/SPARK-31547 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32363) Flaky pip installation test in Jenkins
[ https://issues.apache.org/jira/browse/SPARK-32363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164102#comment-17164102 ] Apache Spark commented on SPARK-32363: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29215 > Flaky pip installation test in Jenkins > -- > > Key: SPARK-32363 > URL: https://issues.apache.org/jira/browse/SPARK-32363 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0 > > > Currently pip packaging test is flaky in Jenkins: > {code} > Installing collected packages: py4j, pyspark > Attempting uninstall: py4j > Found existing installation: py4j 0.10.9 > Uninstalling py4j-0.10.9: > Successfully uninstalled py4j-0.10.9 > Attempting uninstall: pyspark > Found existing installation: pyspark 3.1.0.dev0 > ERROR: Exception: > Traceback (most recent call last): > File > "/tmp/tmp.GX6lHKLHZK/3.6/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", > line 188, in _main > status = self.run(options, args) > File > "/tmp/tmp.GX6lHKLHZK/3.6/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", > line 185, in wrapper > return func(self, options, args) > File > "/tmp/tmp.GX6lHKLHZK/3.6/lib/python3.6/site-packages/pip/_internal/commands/install.py", > line 407, in run > use_user_site=options.use_user_site, > File > "/tmp/tmp.GX6lHKLHZK/3.6/lib/python3.6/site-packages/pip/_internal/req/__init__.py", > line 64, in install_given_reqs > auto_confirm=True > File > "/tmp/tmp.GX6lHKLHZK/3.6/lib/python3.6/site-packages/pip/_internal/req/req_install.py", > line 675, in uninstall > uninstalled_pathset = UninstallPathSet.from_dist(dist) > File > "/tmp/tmp.GX6lHKLHZK/3.6/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", > line 545, in from_dist > link_pointer, dist.project_name, dist.location) > AssertionError: Egg-link > /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7-hive-2.3/python does > not match installed location of pyspark (at > /home/jenkins/workspace/SparkPullRequestBuilder@2/python) > Cleaning up temporary directory - /tmp/tmp.GX6lHKLHZK > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32264) More resources in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164097#comment-17164097 ] Dongjoon Hyun commented on SPARK-32264: --- Thank you, [~holden] and [~hyukjin.kwon]. > More resources in Github Actions > > > Key: SPARK-32264 > URL: https://issues.apache.org/jira/browse/SPARK-32264 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Holden Karau >Priority: Major > > We are currently using free version of Github Actions which only allows 20 > concurrent jobs. This is not enough in the heavy development in Apache spark. > We should have a way to allocate more resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164086#comment-17164086 ] Apache Spark commented on SPARK-31525: -- User 'tianshizz' has created a pull request for this issue: https://github.com/apache/spark/pull/29214 > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31525: Assignee: (was: Apache Spark) > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164084#comment-17164084 ] Apache Spark commented on SPARK-31525: -- User 'tianshizz' has created a pull request for this issue: https://github.com/apache/spark/pull/29214 > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31525: Assignee: Apache Spark > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Assignee: Apache Spark >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14820) Reduce shuffle data by pushing filter toward storage
[ https://issues.apache.org/jira/browse/SPARK-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164074#comment-17164074 ] Yuming Wang commented on SPARK-14820: - It seems the issue fixed by SPARK-31705. > Reduce shuffle data by pushing filter toward storage > > > Key: SPARK-14820 > URL: https://issues.apache.org/jira/browse/SPARK-14820 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.1 >Reporter: Ali Tootoonchian >Priority: Trivial > Labels: bulk-closed > Attachments: Reduce Shuffle Data by pushing filter toward storage.pdf > > > SQL query planner can have intelligence to push down filter commands towards > the storage layer. If we optimize the query planner such that the IO to the > storage is reduced at the cost of running multiple filters (i.e., compute), > this should be desirable when the system is IO bound. > Proven analysis and example is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32264) More resources in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164070#comment-17164070 ] Hyukjin Kwon commented on SPARK-32264: -- Thank you [~holden]! > More resources in Github Actions > > > Key: SPARK-32264 > URL: https://issues.apache.org/jira/browse/SPARK-32264 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Holden Karau >Priority: Major > > We are currently using free version of Github Actions which only allows 20 > concurrent jobs. This is not enough in the heavy development in Apache spark. > We should have a way to allocate more resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32419) Leverage Conda environment at pip packaging test in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-32419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32419: Assignee: Apache Spark > Leverage Conda environment at pip packaging test in GitHub Actions > -- > > Key: SPARK-32419 > URL: https://issues.apache.org/jira/browse/SPARK-32419 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > If you take a close look for GitHub Actions log: > {code:java} > Installing dist into virtual env > Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Using legacy setup.py install for pyspark, since package 'wheel' is not > installed. > Installing collected packages: py4j, pyspark > Running setup.py install for pyspark: started > Running setup.py install for pyspark: finished with status 'done' > Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0 > ... > Installing dist into virtual env > Obtaining file:///home/runner/work/spark/spark/python > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Installing collected packages: py4j, pyspark > Attempting uninstall: py4j > Found existing installation: py4j 0.10.9 > Uninstalling py4j-0.10.9: > Successfully uninstalled py4j-0.10.9 > Attempting uninstall: pyspark > Found existing installation: pyspark 3.1.0.dev0 > Uninstalling pyspark-3.1.0.dev0: > Successfully uninstalled pyspark-3.1.0.dev0 > Running setup.py develop for pyspark > Successfully installed py4j-0.10.9 pyspark > {code} > It looks not properly using conda as it removes and re-installs again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32419) Leverage Conda environment at pip packaging test in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-32419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32419: Assignee: (was: Apache Spark) > Leverage Conda environment at pip packaging test in GitHub Actions > -- > > Key: SPARK-32419 > URL: https://issues.apache.org/jira/browse/SPARK-32419 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > If you take a close look for GitHub Actions log: > {code:java} > Installing dist into virtual env > Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Using legacy setup.py install for pyspark, since package 'wheel' is not > installed. > Installing collected packages: py4j, pyspark > Running setup.py install for pyspark: started > Running setup.py install for pyspark: finished with status 'done' > Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0 > ... > Installing dist into virtual env > Obtaining file:///home/runner/work/spark/spark/python > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Installing collected packages: py4j, pyspark > Attempting uninstall: py4j > Found existing installation: py4j 0.10.9 > Uninstalling py4j-0.10.9: > Successfully uninstalled py4j-0.10.9 > Attempting uninstall: pyspark > Found existing installation: pyspark 3.1.0.dev0 > Uninstalling pyspark-3.1.0.dev0: > Successfully uninstalled pyspark-3.1.0.dev0 > Running setup.py develop for pyspark > Successfully installed py4j-0.10.9 pyspark > {code} > It looks not properly using conda as it removes and re-installs again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32419) Leverage Conda environment at pip packaging test in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-32419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164067#comment-17164067 ] Apache Spark commented on SPARK-32419: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29212 > Leverage Conda environment at pip packaging test in GitHub Actions > -- > > Key: SPARK-32419 > URL: https://issues.apache.org/jira/browse/SPARK-32419 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > If you take a close look for GitHub Actions log: > {code:java} > Installing dist into virtual env > Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Using legacy setup.py install for pyspark, since package 'wheel' is not > installed. > Installing collected packages: py4j, pyspark > Running setup.py install for pyspark: started > Running setup.py install for pyspark: finished with status 'done' > Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0 > ... > Installing dist into virtual env > Obtaining file:///home/runner/work/spark/spark/python > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Installing collected packages: py4j, pyspark > Attempting uninstall: py4j > Found existing installation: py4j 0.10.9 > Uninstalling py4j-0.10.9: > Successfully uninstalled py4j-0.10.9 > Attempting uninstall: pyspark > Found existing installation: pyspark 3.1.0.dev0 > Uninstalling pyspark-3.1.0.dev0: > Successfully uninstalled pyspark-3.1.0.dev0 > Running setup.py develop for pyspark > Successfully installed py4j-0.10.9 pyspark > {code} > It looks not properly using conda as it removes and re-installs again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32264) More resources in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164066#comment-17164066 ] Holden Karau commented on SPARK-32264: -- It's being routed inside of GitHub as of my last contact with them (9 days ago). I'll follow up end of the month if we don't hear back. > More resources in Github Actions > > > Key: SPARK-32264 > URL: https://issues.apache.org/jira/browse/SPARK-32264 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Holden Karau >Priority: Major > > We are currently using free version of Github Actions which only allows 20 > concurrent jobs. This is not enough in the heavy development in Apache spark. > We should have a way to allocate more resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32419) Leverage Conda environment at pip packaging test in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-32419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32419: - Description: If you take a close look for GitHub Actions log: {code:java} Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) Using legacy setup.py install for pyspark, since package 'wheel' is not installed. Installing collected packages: py4j, pyspark Running setup.py install for pyspark: started Running setup.py install for pyspark: finished with status 'done' Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0 ... Installing dist into virtual env Obtaining file:///home/runner/work/spark/spark/python Collecting py4j==0.10.9 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) Installing collected packages: py4j, pyspark Attempting uninstall: py4j Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Attempting uninstall: pyspark Found existing installation: pyspark 3.1.0.dev0 Uninstalling pyspark-3.1.0.dev0: Successfully uninstalled pyspark-3.1.0.dev0 Running setup.py develop for pyspark Successfully installed py4j-0.10.9 pyspark {code} It looks not properly using conda as it removes and re-installs again. was: If you take a close look for GitHub Actions log {code:java} Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) Using legacy setup.py install for pyspark, since package 'wheel' is not installed. Installing collected packages: py4j, pyspark Running setup.py install for pyspark: started Running setup.py install for pyspark: finished with status 'done' Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0 ... Installing dist into virtual env Obtaining file:///home/runner/work/spark/spark/python Collecting py4j==0.10.9 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) Installing collected packages: py4j, pyspark Attempting uninstall: py4j Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Attempting uninstall: pyspark Found existing installation: pyspark 3.1.0.dev0 Uninstalling pyspark-3.1.0.dev0: Successfully uninstalled pyspark-3.1.0.dev0 Running setup.py develop for pyspark Successfully installed py4j-0.10.9 pyspark{code} > Leverage Conda environment at pip packaging test in GitHub Actions > -- > > Key: SPARK-32419 > URL: https://issues.apache.org/jira/browse/SPARK-32419 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > If you take a close look for GitHub Actions log: > {code:java} > Installing dist into virtual env > Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Using legacy setup.py install for pyspark, since package 'wheel' is not > installed. > Installing collected packages: py4j, pyspark > Running setup.py install for pyspark: started > Running setup.py install for pyspark: finished with status 'done' > Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0 > ... > Installing dist into virtual env > Obtaining file:///home/runner/work/spark/spark/python > Collecting py4j==0.10.9 > Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) > Installing collected packages: py4j, pyspark > Attempting uninstall: py4j > Found existing installation: py4j 0.10.9 > Uninstalling py4j-0.10.9: > Successfully uninstalled py4j-0.10.9 > Attempting uninstall: pyspark > Found existing installation: pyspark 3.1.0.dev0 > Uninstalling pyspark-3.1.0.dev0: > Successfully uninstalled pyspark-3.1.0.dev0 > Running setup.py develop for pyspark > Successfully installed py4j-0.10.9 pyspark > {code} > It looks not properly using conda as it removes and re-installs again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32419) Leverage Conda environment at pip packaging test in GitHub Actions
Hyukjin Kwon created SPARK-32419: Summary: Leverage Conda environment at pip packaging test in GitHub Actions Key: SPARK-32419 URL: https://issues.apache.org/jira/browse/SPARK-32419 Project: Spark Issue Type: Sub-task Components: Build, PySpark Affects Versions: 3.1.0 Reporter: Hyukjin Kwon If you take a close look for GitHub Actions log {code:java} Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) Using legacy setup.py install for pyspark, since package 'wheel' is not installed. Installing collected packages: py4j, pyspark Running setup.py install for pyspark: started Running setup.py install for pyspark: finished with status 'done' Successfully installed py4j-0.10.9 pyspark-3.1.0.dev0 ... Installing dist into virtual env Obtaining file:///home/runner/work/spark/spark/python Collecting py4j==0.10.9 Downloading py4j-0.10.9-py2.py3-none-any.whl (198 kB) Installing collected packages: py4j, pyspark Attempting uninstall: py4j Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Attempting uninstall: pyspark Found existing installation: pyspark 3.1.0.dev0 Uninstalling pyspark-3.1.0.dev0: Successfully uninstalled pyspark-3.1.0.dev0 Running setup.py develop for pyspark Successfully installed py4j-0.10.9 pyspark{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32264) More resources in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164062#comment-17164062 ] Hyukjin Kwon commented on SPARK-32264: -- I'll assign this to [~holden] for now .. since she's being the contact point for now. > More resources in Github Actions > > > Key: SPARK-32264 > URL: https://issues.apache.org/jira/browse/SPARK-32264 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Holden Karau >Priority: Major > > We are currently using free version of Github Actions which only allows 20 > concurrent jobs. This is not enough in the heavy development in Apache spark. > We should have a way to allocate more resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32264) More resources in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32264: Assignee: Holden Karau > More resources in Github Actions > > > Key: SPARK-32264 > URL: https://issues.apache.org/jira/browse/SPARK-32264 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Holden Karau >Priority: Major > > We are currently using free version of Github Actions which only allows 20 > concurrent jobs. This is not enough in the heavy development in Apache spark. > We should have a way to allocate more resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32415) Enable JSON tests for the allowNonNumericNumbers option
[ https://issues.apache.org/jira/browse/SPARK-32415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32415. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29207 [https://github.com/apache/spark/pull/29207] > Enable JSON tests for the allowNonNumericNumbers option > --- > > Key: SPARK-32415 > URL: https://issues.apache.org/jira/browse/SPARK-32415 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Currently, 2 tests in JsonParsingOptionsSuite for the allowNonNumericNumbers > option are ignored. The tests can be enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32415) Enable JSON tests for the allowNonNumericNumbers option
[ https://issues.apache.org/jira/browse/SPARK-32415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32415: Assignee: Maxim Gekk > Enable JSON tests for the allowNonNumericNumbers option > --- > > Key: SPARK-32415 > URL: https://issues.apache.org/jira/browse/SPARK-32415 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Currently, 2 tests in JsonParsingOptionsSuite for the allowNonNumericNumbers > option are ignored. The tests can be enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32398) Upgrade to scalatest 3.2.0 for Scala 2.13.3 compatibility
[ https://issues.apache.org/jira/browse/SPARK-32398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32398. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29196 [https://github.com/apache/spark/pull/29196] > Upgrade to scalatest 3.2.0 for Scala 2.13.3 compatibility > - > > Key: SPARK-32398 > URL: https://issues.apache.org/jira/browse/SPARK-32398 > Project: Spark > Issue Type: Sub-task > Components: ML, Spark Core, SQL, Structured Streaming, Tests >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Major > Fix For: 3.1.0 > > > We'll need to update to scalatest 3.2.0 in order to pick up the fix here, > which fixes an incompatibility with Scala 2.13.3: > https://github.com/scalatest/scalatest/commit/7c89416aa9f3e7f2730a343ad6d3bdcff65809de > That's a big change unfortunately - 3.1 / 3.2 reorganized many classes. > Fortunately it's just like import updates in 100 files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32364) Use CaseInsensitiveMap for DataFrameReader/Writer options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32364: -- Fix Version/s: 2.4.7 > Use CaseInsensitiveMap for DataFrameReader/Writer options > - > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Girish A Pandit >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > When a user have multiple options like path, paTH, and PATH for the same key > path, option/options is non-deterministic because extraOptions is HashMap. > This issue aims to use *CaseInsensitiveMap* instead of *HashMap* to fix this > bug fundamentally. > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .load("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32418) Flaky test: org.apache.spark.DistributedSuite.caching in memory, serialized, replicated (encryption = off)
Jungtaek Lim created SPARK-32418: Summary: Flaky test: org.apache.spark.DistributedSuite.caching in memory, serialized, replicated (encryption = off) Key: SPARK-32418 URL: https://issues.apache.org/jira/browse/SPARK-32418 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.0 Reporter: Jungtaek Lim https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126432/testReport/ {noformat} org.apache.spark.DistributedSuite.caching in memory, serialized, replicated (encryption = off) Error Details org.scalatest.exceptions.TestFailedException: 9 did not equal 10 Stack Trace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 9 did not equal 10 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.DistributedSuite.testCaching(DistributedSuite.scala:181) at org.apache.spark.DistributedSuite.$anonfun$testCaching$1(DistributedSuite.scala:162) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:157) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:59) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:59) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:59) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:59) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue
[ https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163987#comment-17163987 ] Wing Yew Poon commented on SPARK-31693: --- I'm seeing a problem with the .m2 cache on amp-jenkins-worker-06. In https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126370/console {noformat} [EnvInject] - Loading node environment variables. Building remotely on amp-jenkins-worker-06 (centos spark-test) in workspace /home/jenkins/workspace/SparkPullRequestBuilder ... Running build tests exec: curl -s -L https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz exec: curl -s -L https://downloads.lightbend.com/scala/2.12.10/scala-2.12.10.tgz Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn Performing Maven install for hadoop-2.7-hive-1.2 Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.6.3/bin/mvn [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on project spark-yarn_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-yarn_2.12/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-yarn_2.12/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got t (position: END_TAG seen ...\nt... @13:2) -> [Help 1] {noformat} > Investigate AmpLab Jenkins server network issue > --- > > Key: SPARK-31693 > URL: https://issues.apache.org/jira/browse/SPARK-31693 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Critical > > Given the series of failures in Spark packaging Jenkins job, it seems that > there is a network issue in AmbLab Jenkins cluster. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ > - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay. > - The node failed to download the maven mirror. (SPARK-31691) -> The primary > host is okay. > - The node failed to communicate repository.apache.org. (Current master > branch Jenkins job failure) > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) > on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve > remote metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could > not transfer metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to > apache.snapshots.https > (https://repository.apache.org/content/repositories/snapshots): Transfer > failed for > https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml: > Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] > failed: Connection timed out (Connection timed out) -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31197) Exit the executor once all tasks & migrations are finished
[ https://issues.apache.org/jira/browse/SPARK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163983#comment-17163983 ] Apache Spark commented on SPARK-31197: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29211 > Exit the executor once all tasks & migrations are finished > -- > > Key: SPARK-31197 > URL: https://issues.apache.org/jira/browse/SPARK-31197 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31197) Exit the executor once all tasks & migrations are finished
[ https://issues.apache.org/jira/browse/SPARK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163981#comment-17163981 ] Apache Spark commented on SPARK-31197: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29211 > Exit the executor once all tasks & migrations are finished > -- > > Key: SPARK-31197 > URL: https://issues.apache.org/jira/browse/SPARK-31197 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32054) Flaky test: org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.Fallback Parquet V2 to V1
[ https://issues.apache.org/jira/browse/SPARK-32054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163976#comment-17163976 ] Wing Yew Poon commented on SPARK-32054: --- org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.Fallback Parquet V2 to V1 failed in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126425; however, earlier, it passed in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126354/ for the same PR (no changes between the runs). > Flaky test: > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.Fallback Parquet > V2 to V1 > -- > > Key: SPARK-32054 > URL: https://issues.apache.org/jira/browse/SPARK-32054 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124364/testReport/org.apache.spark.sql.connector/FileDataSourceV2FallBackSuite/Fallback_Parquet_V2_to_V1/ > {code:java} > Error Message > org.scalatest.exceptions.TestFailedException: > ArrayBuffer((collect,Relation[id#387495L] parquet ), > (save,InsertIntoHadoopFsRelationCommand > file:/home/jenkins/workspace/SparkPullRequestBuilder@3/target/tmp/spark-fe4d8028-b7c5-406d-9c5a-59c96e98f776, > false, Parquet, Map(path -> > /home/jenkins/workspace/SparkPullRequestBuilder@3/target/tmp/spark-fe4d8028-b7c5-406d-9c5a-59c96e98f776), > ErrorIfExists, [id] +- Range (0, 10, step=1, splits=Some(2)) )) had length 2 > instead of expected length 1 > Stacktrace > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: > ArrayBuffer((collect,Relation[id#387495L] parquet > ), (save,InsertIntoHadoopFsRelationCommand > file:/home/jenkins/workspace/SparkPullRequestBuilder@3/target/tmp/spark-fe4d8028-b7c5-406d-9c5a-59c96e98f776, > false, Parquet, Map(path -> > /home/jenkins/workspace/SparkPullRequestBuilder@3/target/tmp/spark-fe4d8028-b7c5-406d-9c5a-59c96e98f776), > ErrorIfExists, [id] > +- Range (0, 10, step=1, splits=Some(2)) > )) had length 2 instead of expected length 1 > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.$anonfun$new$22(FileDataSourceV2FallBackSuite.scala:180) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.$anonfun$new$22$adapted(FileDataSourceV2FallBackSuite.scala:176) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66) > at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:34) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.$anonfun$new$21(FileDataSourceV2FallBackSuite.scala:176) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(FileDataSourceV2FallBackSuite.scala:85) > at > org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246) > at > org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.withSQLConf(FileDataSourceV2FallBackSuite.scala:85) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.$anonfun$new$20(FileDataSourceV2FallBackSuite.scala:158) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.$anonfun$new$20$adapted(FileDataSourceV2FallBackSuite.scala:157) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.connector.FileDataSourceV2FallBackSuite.$anonfun$new$19(FileDataSourceV2FallBackSuite.scala:157) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at
[jira] [Created] (SPARK-32417) Flaky test: BlockManagerDecommissionIntegrationSuite.verify that an already running task which is going to cache data succeeds on a decommissioned executor
Gabor Somogyi created SPARK-32417: - Summary: Flaky test: BlockManagerDecommissionIntegrationSuite.verify that an already running task which is going to cache data succeeds on a decommissioned executor Key: SPARK-32417 URL: https://issues.apache.org/jira/browse/SPARK-32417 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 3.1.0 Reporter: Gabor Somogyi https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126424/testReport/ {code:java} Error Message org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 2759 times over 30.001772248 seconds. Last failure message: Map() was empty We should have a block that has been on multiple BMs in rdds: ArrayBuffer(SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_2,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, localhost, 42041, None),rdd_1_1,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_0,StorageLevel(memory, deserialized, 1 replicas),56,0))) from: ArrayBuffer(SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(driver, amp-jenkins-worker-05.amp, 45854, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(2, localhost, 42805, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, localhost, 42041, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_2,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, localhost, 42041, None),rdd_1_1,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_0,StorageLevel(memory, deserialized, 1 replicas),56,0))) but instead we got: Map(rdd_1_0 -> 1, rdd_1_2 -> 1, rdd_1_1 -> 1). Stacktrace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 2759 times over 30.001772248 seconds. Last failure message: Map() was empty We should have a block that has been on multiple BMs in rdds: ArrayBuffer(SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_2,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, localhost, 42041, None),rdd_1_1,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_0,StorageLevel(memory, deserialized, 1 replicas),56,0))) from: ArrayBuffer(SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(driver, amp-jenkins-worker-05.amp, 45854, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(2, localhost, 42805, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, localhost, 42041, None),broadcast_1_piece0,StorageLevel(memory, 1 replicas),2695,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_2,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, localhost, 42041, None),rdd_1_1,StorageLevel(memory, deserialized, 1 replicas),56,0)), SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, localhost, 37968, None),rdd_1_0,StorageLevel(memory, deserialized, 1 replicas),56,0))) but instead we got: Map(rdd_1_0 -> 1, rdd_1_2 -> 1, rdd_1_1 -> 1). at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) at org.apache.spark.storage.BlockManagerDecommissionIntegrationSuite.eventually(BlockManagerDecommissionIntegrationSuite.scala:33) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307) at
[jira] [Created] (SPARK-32416) Flaky test: SparkContextSuite.Cancelling stages/jobs with custom reasons
Gabor Somogyi created SPARK-32416: - Summary: Flaky test: SparkContextSuite.Cancelling stages/jobs with custom reasons Key: SPARK-32416 URL: https://issues.apache.org/jira/browse/SPARK-32416 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 3.1.0 Reporter: Gabor Somogyi [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126424/testReport/] {code:java} Error Message org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1293 times over 20.01121311 seconds. Last failure message: 1 did not equal 0. Stacktrace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1293 times over 20.01121311 seconds. Last failure message: 1 did not equal 0. at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) at org.apache.spark.SparkContextSuite.eventually(SparkContextSuite.scala:49) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336) at org.apache.spark.SparkContextSuite.eventually(SparkContextSuite.scala:49) at org.apache.spark.SparkContextSuite.$anonfun$new$58(SparkContextSuite.scala:607) at org.apache.spark.SparkContextSuite.$anonfun$new$58$adapted(SparkContextSuite.scala:566) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.SparkContextSuite.$anonfun$new$57(SparkContextSuite.scala:566) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:157) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:59) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:59) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:59) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:59) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at
[jira] [Commented] (SPARK-24497) ANSI SQL: Recursive query
[ https://issues.apache.org/jira/browse/SPARK-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163864#comment-17163864 ] Apache Spark commented on SPARK-24497: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/29210 > ANSI SQL: Recursive query > - > > Key: SPARK-24497 > URL: https://issues.apache.org/jira/browse/SPARK-24497 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > h3. *Examples* > Here is an example for {{WITH RECURSIVE}} clause usage. Table "department" > represents the structure of an organization as an adjacency list. > {code:sql} > CREATE TABLE department ( > id INTEGER PRIMARY KEY, -- department ID > parent_department INTEGER REFERENCES department, -- upper department ID > name TEXT -- department name > ); > INSERT INTO department (id, parent_department, "name") > VALUES > (0, NULL, 'ROOT'), > (1, 0, 'A'), > (2, 1, 'B'), > (3, 2, 'C'), > (4, 2, 'D'), > (5, 0, 'E'), > (6, 4, 'F'), > (7, 5, 'G'); > -- department structure represented here is as follows: > -- > -- ROOT-+->A-+->B-+->C > -- | | > -- | +->D-+->F > -- +->E-+->G > {code} > > To extract all departments under A, you can use the following recursive > query: > {code:sql} > WITH RECURSIVE subdepartment AS > ( > -- non-recursive term > SELECT * FROM department WHERE name = 'A' > UNION ALL > -- recursive term > SELECT d.* > FROM > department AS d > JOIN > subdepartment AS sd > ON (d.parent_department = sd.id) > ) > SELECT * > FROM subdepartment > ORDER BY name; > {code} > More details: > [http://wiki.postgresql.org/wiki/CTEReadme] > [https://info.teradata.com/htmlpubs/DB_TTU_16_00/index.html#page/SQL_Reference/B035-1141-160K/lqe1472241402390.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32364) Use CaseInsensitiveMap for DataFrameReader/Writer options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163857#comment-17163857 ] Apache Spark commented on SPARK-32364: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29209 > Use CaseInsensitiveMap for DataFrameReader/Writer options > - > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Girish A Pandit >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > When a user have multiple options like path, paTH, and PATH for the same key > path, option/options is non-deterministic because extraOptions is HashMap. > This issue aims to use *CaseInsensitiveMap* instead of *HashMap* to fix this > bug fundamentally. > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .load("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17053) Spark ignores hive.exec.drop.ignorenonexistent=true option
[ https://issues.apache.org/jira/browse/SPARK-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163821#comment-17163821 ] Jeffrey E Rodriguez commented on SPARK-17053: -- [~rxin] as esoteric as "hive.exec.drop.ignorenonexisten" configurations may look. There is much code written by Hive developers that would not work in Spark. This will inhibit migration to Spark using Spark SQL. Give the case of having to touch their Hive code and moving to Spark some Hive developers would choose to not touch their code and stay with Hive. [~dongjoon] fix looks good and fixes the issue, it is my opinion as an Apache committer that it should get a chance to make it. > Spark ignores hive.exec.drop.ignorenonexistent=true option > -- > > Key: SPARK-17053 > URL: https://issues.apache.org/jira/browse/SPARK-17053 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Gokhan Civan >Priority: Major > > In version 1.6.1, the following does not throw an exception: > create table a as select 1; drop table a; drop table a; > In version 2.0.0, the second drop fails; this is not compatible with Hive. > The same problem exists for views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31418) Blacklisting feature aborts Spark job without retrying for max num retries in case of Dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-31418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-31418. --- Fix Version/s: 3.1.0 Assignee: Venkata krishnan Sowrirajan Resolution: Fixed > Blacklisting feature aborts Spark job without retrying for max num retries in > case of Dynamic allocation > > > Key: SPARK-31418 > URL: https://issues.apache.org/jira/browse/SPARK-31418 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0, 2.4.5 >Reporter: Venkata krishnan Sowrirajan >Assignee: Venkata krishnan Sowrirajan >Priority: Major > Fix For: 3.1.0 > > > With Spark blacklisting, if a task fails on an executor, the executor gets > blacklisted for the task. In order to retry the task, it checks if there are > idle blacklisted executor which can be killed and replaced to retry the task > if not it aborts the job without doing max retries. > In the context of dynamic allocation this can be better, instead of killing > the blacklisted idle executor (its possible there are no idle blacklisted > executor), request an additional executor and retry the task. > This can be easily reproduced with a simple job like below, although this > example should fail eventually just to show that its not retried > spark.task.maxFailures times: > {code:java} > def test(a: Int) = { a.asInstanceOf[String] } > sc.parallelize(1 to 10, 10).map(x => test(x)).collect > {code} > with dynamic allocation enabled and min executors set to 1. But there are > various other cases where this can fail as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-32413) Guidance for my project
[ https://issues.apache.org/jira/browse/SPARK-32413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-32413. > Guidance for my project > > > Key: SPARK-32413 > URL: https://issues.apache.org/jira/browse/SPARK-32413 > Project: Spark > Issue Type: Brainstorming > Components: PySpark, Spark Core, SparkR >Affects Versions: 3.0.0 >Reporter: Suat Toksoz >Priority: Minor > > hi, > I am planning to get-read elasticsearch index continuously, and put that data > on Data Frame and group that data, search and create an alert. I like to > write my code in python. > For this purpose, what should I use, spark, jupter, pyspark... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32413) Guidance for my project
[ https://issues.apache.org/jira/browse/SPARK-32413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-32413. -- Resolution: Not A Problem Hi [~stoksoz] , this type of discussion is more appropriate for the mailing list, see [https://spark.apache.org/community.html] on how to subscribe > Guidance for my project > > > Key: SPARK-32413 > URL: https://issues.apache.org/jira/browse/SPARK-32413 > Project: Spark > Issue Type: Brainstorming > Components: PySpark, Spark Core, SparkR >Affects Versions: 3.0.0 >Reporter: Suat Toksoz >Priority: Minor > > hi, > I am planning to get-read elasticsearch index continuously, and put that data > on Data Frame and group that data, search and create an alert. I like to > write my code in python. > For this purpose, what should I use, spark, jupter, pyspark... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163781#comment-17163781 ] Apache Spark commented on SPARK-32372: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29208 > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32280) AnalysisException thrown when query contains several JOINs
[ https://issues.apache.org/jira/browse/SPARK-32280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163780#comment-17163780 ] Apache Spark commented on SPARK-32280: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29208 > AnalysisException thrown when query contains several JOINs > -- > > Key: SPARK-32280 > URL: https://issues.apache.org/jira/browse/SPARK-32280 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: David Lindelöf >Assignee: wuyi >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > I've come across a curious {{AnalysisException}} thrown in one of my SQL > queries, even though the SQL appears legitimate. I was able to reduce it to > this example: > {code:python} > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > spark.sql('SELECT 1 AS id').createOrReplaceTempView('A') > spark.sql(''' > SELECT id, > 'foo' AS kind > FROM A''').createOrReplaceTempView('B') > spark.sql(''' > SELECT l.id > FROM B AS l > JOIN B AS r > ON l.kind = r.kind''').createOrReplaceTempView('C') > spark.sql(''' > SELECT 0 > FROM ( >SELECT * >FROM B >JOIN C >USING (id)) > JOIN ( >SELECT * >FROM B >JOIN C >USING (id)) > USING (id)''') > {code} > Running this yields the following error: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql. > : org.apache.spark.sql.AnalysisException: Resolved attribute(s) kind#11 > missing from id#10,kind#2,id#7,kind#5 in operator !Join Inner, (kind#11 = > kind#5). Attribute(s) with the same name appear in the operation: kind. > Please check if the right attribute(s) are used.;; > Project [0 AS 0#15] > +- Project [id#0, kind#2, kind#11] >+- Join Inner, (id#0 = id#14) > :- SubqueryAlias `__auto_generated_subquery_name` > : +- Project [id#0, kind#2] > : +- Project [id#0, kind#2] > :+- Join Inner, (id#0 = id#9) > : :- SubqueryAlias `b` > : : +- Project [id#0, foo AS kind#2] > : : +- SubqueryAlias `a` > : :+- Project [1 AS id#0] > : : +- OneRowRelation > : +- SubqueryAlias `c` > : +- Project [id#9] > : +- Join Inner, (kind#2 = kind#5) > ::- SubqueryAlias `l` > :: +- SubqueryAlias `b` > :: +- Project [id#9, foo AS kind#2] > ::+- SubqueryAlias `a` > :: +- Project [1 AS id#9] > :: +- OneRowRelation > :+- SubqueryAlias `r` > : +- SubqueryAlias `b` > : +- Project [id#7, foo AS kind#5] > : +- SubqueryAlias `a` > :+- Project [1 AS id#7] > : +- OneRowRelation > +- SubqueryAlias `__auto_generated_subquery_name` > +- Project [id#14, kind#11] > +- Project [id#14, kind#11] >+- Join Inner, (id#14 = id#10) > :- SubqueryAlias `b` > : +- Project [id#14, foo AS kind#11] > : +- SubqueryAlias `a` > :+- Project [1 AS id#14] > : +- OneRowRelation > +- SubqueryAlias `c` > +- Project [id#10] > +- !Join Inner, (kind#11 = kind#5) >:- SubqueryAlias `l` >: +- SubqueryAlias `b` >: +- Project [id#10, foo AS kind#2] >:+- SubqueryAlias `a` >: +- Project [1 AS id#10] >: +- OneRowRelation >+- SubqueryAlias `r` > +- SubqueryAlias `b` > +- Project [id#7, foo AS kind#5] > +- SubqueryAlias `a` >+- Project [1 AS id#7] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:369) > at >
[jira] [Commented] (SPARK-32411) GPU Cluster Fail
[ https://issues.apache.org/jira/browse/SPARK-32411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163742#comment-17163742 ] L. C. Hsieh commented on SPARK-32411: - I think it is because the configs. "spark.task.resource.gpu.amount 2" means each task requires 2 gpus, but "spark.executor.resource.gpu.amount 1" specifies each executor has only 1 gpu. So task scheduler cannot find an executor which meets task requirement. > GPU Cluster Fail > > > Key: SPARK-32411 > URL: https://issues.apache.org/jira/browse/SPARK-32411 > Project: Spark > Issue Type: Bug > Components: PySpark, Web UI >Affects Versions: 3.0.0 > Environment: Ihave a Apache Spark 3.0 cluster consisting of machines > with multiple nvidia-gpus and I connect my jupyter notebook to the cluster > using pyspark, >Reporter: Vinh Tran >Priority: Major > > I'm having a difficult time getting a GPU cluster started on Apache Spark > 3.0. It was hard to find documentation on this, but I stumbled on a NVIDIA > github page for Rapids which suggested the following additional edits to the > spark-defaults.conf: > {code:java} > spark.task.resource.gpu.amount 0.25 > spark.executor.resource.gpu.discoveryScript > ./usr/local/spark/getGpusResources.sh{code} > I have a Apache Spark 3.0 cluster consisting of machines with multiple > nvidia-gpus and I connect my jupyter notebook to the cluster using pyspark, > however it results in the following error: > {code:java} > Py4JJavaError: An error occurred while calling > None.org.apache.spark.api.java.JavaSparkContext. > : org.apache.spark.SparkException: You must specify an amount for gpu > at > org.apache.spark.resource.ResourceUtils$.$anonfun$parseResourceRequest$1(ResourceUtils.scala:142) > at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:119) > at > org.apache.spark.resource.ResourceUtils$.parseResourceRequest(ResourceUtils.scala:142) > at > org.apache.spark.resource.ResourceUtils$.$anonfun$parseAllResourceRequests$1(ResourceUtils.scala:159) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.resource.ResourceUtils$.parseAllResourceRequests(ResourceUtils.scala:159) > at > org.apache.spark.SparkContext$.checkResourcesPerTask$1(SparkContext.scala:2773) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2884) > at org.apache.spark.SparkContext.(SparkContext.scala:528) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:238) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) > at py4j.GatewayConnection.run(GatewayConnection.java:238) > at java.lang.Thread.run(Thread.java:748) > {code} > After this, I then tried adding another line to the conf per the instructions > which results in no errors, however when I log in to the Web UI at > localhost:8080, under Running Applications, the state remains at waiting. > {code:java} > spark.task.resource.gpu.amount 2 > spark.executor.resource.gpu.discoveryScript > ./usr/local/spark/getGpusResources.sh > spark.executor.resource.gpu.amount 1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32411) GPU Cluster Fail
[ https://issues.apache.org/jira/browse/SPARK-32411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-32411. - Resolution: Not A Problem > GPU Cluster Fail > > > Key: SPARK-32411 > URL: https://issues.apache.org/jira/browse/SPARK-32411 > Project: Spark > Issue Type: Bug > Components: PySpark, Web UI >Affects Versions: 3.0.0 > Environment: Ihave a Apache Spark 3.0 cluster consisting of machines > with multiple nvidia-gpus and I connect my jupyter notebook to the cluster > using pyspark, >Reporter: Vinh Tran >Priority: Major > > I'm having a difficult time getting a GPU cluster started on Apache Spark > 3.0. It was hard to find documentation on this, but I stumbled on a NVIDIA > github page for Rapids which suggested the following additional edits to the > spark-defaults.conf: > {code:java} > spark.task.resource.gpu.amount 0.25 > spark.executor.resource.gpu.discoveryScript > ./usr/local/spark/getGpusResources.sh{code} > I have a Apache Spark 3.0 cluster consisting of machines with multiple > nvidia-gpus and I connect my jupyter notebook to the cluster using pyspark, > however it results in the following error: > {code:java} > Py4JJavaError: An error occurred while calling > None.org.apache.spark.api.java.JavaSparkContext. > : org.apache.spark.SparkException: You must specify an amount for gpu > at > org.apache.spark.resource.ResourceUtils$.$anonfun$parseResourceRequest$1(ResourceUtils.scala:142) > at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:119) > at > org.apache.spark.resource.ResourceUtils$.parseResourceRequest(ResourceUtils.scala:142) > at > org.apache.spark.resource.ResourceUtils$.$anonfun$parseAllResourceRequests$1(ResourceUtils.scala:159) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.resource.ResourceUtils$.parseAllResourceRequests(ResourceUtils.scala:159) > at > org.apache.spark.SparkContext$.checkResourcesPerTask$1(SparkContext.scala:2773) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2884) > at org.apache.spark.SparkContext.(SparkContext.scala:528) > at > org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:238) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) > at py4j.GatewayConnection.run(GatewayConnection.java:238) > at java.lang.Thread.run(Thread.java:748) > {code} > After this, I then tried adding another line to the conf per the instructions > which results in no errors, however when I log in to the Web UI at > localhost:8080, under Running Applications, the state remains at waiting. > {code:java} > spark.task.resource.gpu.amount 2 > spark.executor.resource.gpu.discoveryScript > ./usr/local/spark/getGpusResources.sh > spark.executor.resource.gpu.amount 1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32415) Enable JSON tests for the allowNonNumericNumbers option
[ https://issues.apache.org/jira/browse/SPARK-32415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32415: Assignee: Apache Spark > Enable JSON tests for the allowNonNumericNumbers option > --- > > Key: SPARK-32415 > URL: https://issues.apache.org/jira/browse/SPARK-32415 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Currently, 2 tests in JsonParsingOptionsSuite for the allowNonNumericNumbers > option are ignored. The tests can be enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32415) Enable JSON tests for the allowNonNumericNumbers option
[ https://issues.apache.org/jira/browse/SPARK-32415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32415: Assignee: (was: Apache Spark) > Enable JSON tests for the allowNonNumericNumbers option > --- > > Key: SPARK-32415 > URL: https://issues.apache.org/jira/browse/SPARK-32415 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Currently, 2 tests in JsonParsingOptionsSuite for the allowNonNumericNumbers > option are ignored. The tests can be enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32415) Enable JSON tests for the allowNonNumericNumbers option
[ https://issues.apache.org/jira/browse/SPARK-32415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163720#comment-17163720 ] Apache Spark commented on SPARK-32415: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29207 > Enable JSON tests for the allowNonNumericNumbers option > --- > > Key: SPARK-32415 > URL: https://issues.apache.org/jira/browse/SPARK-32415 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Currently, 2 tests in JsonParsingOptionsSuite for the allowNonNumericNumbers > option are ignored. The tests can be enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32415) Enable JSON tests for the allowNonNumericNumbers option
[ https://issues.apache.org/jira/browse/SPARK-32415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-32415: --- Summary: Enable JSON tests for the allowNonNumericNumbers option (was: Enable JSON tests from the allowNonNumericNumbers option) > Enable JSON tests for the allowNonNumericNumbers option > --- > > Key: SPARK-32415 > URL: https://issues.apache.org/jira/browse/SPARK-32415 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Currently, 2 tests in JsonParsingOptionsSuite for the allowNonNumericNumbers > option are ignored. The tests can be enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32415) Enable JSON tests from the allowNonNumericNumbers option
Maxim Gekk created SPARK-32415: -- Summary: Enable JSON tests from the allowNonNumericNumbers option Key: SPARK-32415 URL: https://issues.apache.org/jira/browse/SPARK-32415 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Currently, 2 tests in JsonParsingOptionsSuite for the allowNonNumericNumbers option are ignored. The tests can be enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32414) pyspark crashes in cluster mode with kafka structured streaming
[ https://issues.apache.org/jira/browse/SPARK-32414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyrille cazenave updated SPARK-32414: - Attachment: spark.py > pyspark crashes in cluster mode with kafka structured streaming > --- > > Key: SPARK-32414 > URL: https://issues.apache.org/jira/browse/SPARK-32414 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: * spark version 3.0.0 from mac brew > * kubernetes Kind 18+ > * kafka cluster: strimzi/kafka:0.18.0-kafka-2.5.0 > * kafka package: org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 >Reporter: cyrille cazenave >Priority: Major > Attachments: fulllogs.txt, spark.py > > > Hello, > {{I have been trying to run a pyspark script on Spark on Kubernetes and I > have this error that crashed the application:}} > {{java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD)}} > > I followed those steps: > * for spark on kubernetes: > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that > include building the image using docker-image-tool.sh on mac with -p flag) > * Tried to use the image by the dev on > GoogleCloudPlatform/spark-on-k8s-operator > (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue > * for kafka streaming: > [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying] > * {{When running the script manually in a jupyter notebook > (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with > PYSPARK_SUBMIT_ARGS=--packages > org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran > without issue}} > * the command ran from the laptop is: > spark-submit --master > k8s://[https://127.0.0.1:53979|https://127.0.0.1:53979/] --name spark-pi > --deploy-mode cluster --packages > org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf > spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.executor.request.cores=1 --conf > spark.kubernetes.driver.request.cores=1 --conf > spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py > > {{full logs on the error in the attachements}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32414) pyspark crashes in cluster mode with kafka structured streaming
[ https://issues.apache.org/jira/browse/SPARK-32414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyrille cazenave updated SPARK-32414: - Description: Hello, {{I have been trying to run a pyspark script on Spark on Kubernetes and I have this error that crashed the application:}} {{java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD)}} I followed those steps: * for spark on kubernetes: [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that include building the image using docker-image-tool.sh on mac with -p flag) * Tried to use the image by the dev on GoogleCloudPlatform/spark-on-k8s-operator (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue * for kafka streaming: [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying] * {{When running the script manually in a jupyter notebook (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with PYSPARK_SUBMIT_ARGS=--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran without issue}} * the command ran from the laptop is: spark-submit --master k8s://[https://127.0.0.1:53979|https://127.0.0.1:53979/] --name spark-pi --deploy-mode cluster --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.executor.request.cores=1 --conf spark.kubernetes.driver.request.cores=1 --conf spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py {{full logs on the error in the attachements}} was: Hello, {{I have been trying to run a pyspark script on Spark on Kubernetes and I have this error that crashed the application:}} {{java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD)}} I followed those steps: * for spark on kubernetes: [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that include building the image using docker-image-tool.sh on mac with -p flag) * Tried to use the image by the dev on GoogleCloudPlatform/spark-on-k8s-operator (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue * for kafka streaming: [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying] * {{When running the script manually in a jupyter notebook (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with PYSPARK_SUBMIT_ARGS=--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran without issue}} * the command ran from the laptop is: spark-submit --master k8s://https://127.0.0.1:53979 --name spark-pi --deploy-mode cluster --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.executor.request.cores=1 --conf spark.kubernetes.driver.request.cores=1 --conf spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py {{more logs on the error:}} \{{}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 1.3 in stage 1.0 (TID 11) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 11]}} {{20/07/23 14:26:08 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Cancelling stage 1}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage cancelled}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Stage 1 was cancelled}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 3.3 in stage 1.0 (TID 13) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 12]}} {{20/07/23 14:26:08 INFO DAGScheduler: ResultStage 1 (start at NativeMethodAccessorImpl.java:0) failed in 20.352 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 11, 10.244.3.7, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD}} \{{ at
[jira] [Updated] (SPARK-32414) pyspark crashes in cluster mode with kafka structured streaming
[ https://issues.apache.org/jira/browse/SPARK-32414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyrille cazenave updated SPARK-32414: - Attachment: fulllogs.txt > pyspark crashes in cluster mode with kafka structured streaming > --- > > Key: SPARK-32414 > URL: https://issues.apache.org/jira/browse/SPARK-32414 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: * spark version 3.0.0 from mac brew > * kubernetes Kind 18+ > * kafka cluster: strimzi/kafka:0.18.0-kafka-2.5.0 > * kafka package: org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 >Reporter: cyrille cazenave >Priority: Major > Attachments: fulllogs.txt > > > Hello, > {{I have been trying to run a pyspark script on Spark on Kubernetes and I > have this error that crashed the application:}} > {{java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD)}} > > I followed those steps: > * for spark on kubernetes: > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that > include building the image using docker-image-tool.sh on mac with -p flag) > * Tried to use the image by the dev on > GoogleCloudPlatform/spark-on-k8s-operator > (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue > * for kafka streaming: > [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying] > * {{When running the script manually in a jupyter notebook > (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with > PYSPARK_SUBMIT_ARGS=--packages > org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran > without issue}} > * the command ran from the laptop is: > spark-submit --master k8s://https://127.0.0.1:53979 --name spark-pi > --deploy-mode cluster --packages > org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf > spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.executor.request.cores=1 --conf > spark.kubernetes.driver.request.cores=1 --conf > spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py > > {{more logs on the error:}} > \{{}} > {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 1.3 in stage 1.0 (TID 11) > on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign > instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 11]}} > {{20/07/23 14:26:08 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 > times; aborting job}} > {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Cancelling stage 1}} > {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Killing all running tasks in > stage 1: Stage cancelled}} > {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Stage 1 was cancelled}} > {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 3.3 in stage 1.0 (TID 13) > on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign > instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 12]}} > {{20/07/23 14:26:08 INFO DAGScheduler: ResultStage 1 (start at > NativeMethodAccessorImpl.java:0) failed in 20.352 s due to Job aborted due to > stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost > task 1.3 in stage 1.0 (TID 11, 10.244.3.7, executor 1): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD}} > \{{ at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} > \{{ at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} > \{{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2350)}} > \{{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} > \{{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} > \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} > \{{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)}} > \{{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} > \{{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} > \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} > \{{
[jira] [Updated] (SPARK-32414) pyspark crashes in cluster mode with kafka structured streaming
[ https://issues.apache.org/jira/browse/SPARK-32414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyrille cazenave updated SPARK-32414: - Description: Hello, {{I have been trying to run a pyspark script on Spark on Kubernetes and I have this error that crashed the application:}} {{java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD)}} I followed those steps: * for spark on kubernetes: [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that include building the image using docker-image-tool.sh on mac with -p flag) * Tried to use the image by the dev on GoogleCloudPlatform/spark-on-k8s-operator (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue * for kafka streaming: [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying] * {{When running the script manually in a jupyter notebook (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with PYSPARK_SUBMIT_ARGS=--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran without issue}} * the command ran from the laptop is: spark-submit --master k8s://https://127.0.0.1:53979 --name spark-pi --deploy-mode cluster --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.executor.request.cores=1 --conf spark.kubernetes.driver.request.cores=1 --conf spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py {{more logs on the error:}} \{{}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 1.3 in stage 1.0 (TID 11) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 11]}} {{20/07/23 14:26:08 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Cancelling stage 1}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage cancelled}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Stage 1 was cancelled}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 3.3 in stage 1.0 (TID 13) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 12]}} {{20/07/23 14:26:08 INFO DAGScheduler: ResultStage 1 (start at NativeMethodAccessorImpl.java:0) failed in 20.352 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 11, 10.244.3.7, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD}} \{{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} \{{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} \{{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2350)}} \{{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} \{{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} \{{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)}} \{{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} \{{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} \{{ at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)}} \{{ at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)}} \{{ at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:488)}} \{{ at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)}} \{{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} \{{ at java.lang.reflect.Method.invoke(Method.java:498)}} \{{ at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)}} \{{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2235)}} \{{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} \{{ at
[jira] [Commented] (SPARK-30648) Support filters pushdown in JSON datasource
[ https://issues.apache.org/jira/browse/SPARK-30648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163690#comment-17163690 ] Apache Spark commented on SPARK-30648: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/29206 > Support filters pushdown in JSON datasource > --- > > Key: SPARK-30648 > URL: https://issues.apache.org/jira/browse/SPARK-30648 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > * Implement the `SupportsPushDownFilters` interface in `JsonScanBuilder` > * Apply filters in JacksonParser > * Change API JacksonParser - return Option[InternalRow] from > `convertObject()` for root JSON fields. > * Update JSONBenchmark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32414) pyspark crashes in cluster mode with kafka structured streaming
[ https://issues.apache.org/jira/browse/SPARK-32414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cyrille cazenave updated SPARK-32414: - Description: Hello, {{I have been trying to run a pyspark script on Spark on Kubernetes and I have this error that crashed the application:}} {{java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD)}} I followed those steps: * for spark on kubernetes: [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that include building the image using docker-image-tool.sh on mac with -p flag) * Tried to use the image by the dev on GoogleCloudPlatform/spark-on-k8s-operator (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue * for kafka streaming: [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying] * {{When running the script manually in a jupyter notebook (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with PYSPARK_SUBMIT_ARGS=--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran without issue}} * the command ran from the laptop is: {{spark-submit --master k8s://[https://127.0.0.1:53979|https://127.0.0.1:53979/] --name spark-pi --deploy-mode cluster --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf spark.kubernetes.container.image=fifoosab/pytest:latest --conf spark.jars.ivy=/tmp --conf spark.kubernetes.driver.volumes.emptyDir.ivy.mount.path=/opt/spark/ivy --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py}} {{more logs on the error:}} \{{}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 1.3 in stage 1.0 (TID 11) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 11]}} {{20/07/23 14:26:08 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Cancelling stage 1}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage cancelled}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Stage 1 was cancelled}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 3.3 in stage 1.0 (TID 13) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 12]}} {{20/07/23 14:26:08 INFO DAGScheduler: ResultStage 1 (start at NativeMethodAccessorImpl.java:0) failed in 20.352 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 11, 10.244.3.7, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD}} \{{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} \{{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} \{{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2350)}} \{{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} \{{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} \{{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)}} \{{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} \{{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} \{{ at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)}} \{{ at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)}} \{{ at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:488)}} \{{ at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)}} \{{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} \{{ at java.lang.reflect.Method.invoke(Method.java:498)}} \{{ at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)}} \{{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2235)}} \{{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} \{{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} \{{ at
[jira] [Created] (SPARK-32414) pyspark crashes in cluster mode with kafka structured streaming
cyrille cazenave created SPARK-32414: Summary: pyspark crashes in cluster mode with kafka structured streaming Key: SPARK-32414 URL: https://issues.apache.org/jira/browse/SPARK-32414 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.0.0 Environment: * spark version 3.0.0 from mac brew * kubernetes Kind 18+ * kafka cluster: strimzi/kafka:0.18.0-kafka-2.5.0 * kafka package: org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 Reporter: cyrille cazenave {{Hello, }} {{I have been trying to run a pyspark script on Spark on Kubernetes and I have this error that crashed the application:}} {{java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD)}} I followed those steps: * for spark on kubernetes: [https://spark.apache.org/docs/latest/running-on-kubernetes.html] (that include building the image using docker-image-tool.sh on mac with -p flag) * Tried to use the image by the dev on GoogleCloudPlatform/spark-on-k8s-operator (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue * for kafka streaming: [https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying] * {{When running the script manually in a jupyter notebook (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with PYSPARK_SUBMIT_ARGS=--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran without issue}} * the command ran from the laptop is: {{spark-submit --master k8s://https://127.0.0.1:53979 --name spark-pi --deploy-mode cluster --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf spark.kubernetes.container.image=fifoosab/pytest:latest --conf spark.jars.ivy=/tmp --conf spark.kubernetes.driver.volumes.emptyDir.ivy.mount.path=/opt/spark/ivy --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py}} {{more logs on the error:}} {{}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 1.3 in stage 1.0 (TID 11) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 11]}} {{20/07/23 14:26:08 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Cancelling stage 1}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage cancelled}} {{20/07/23 14:26:08 INFO TaskSchedulerImpl: Stage 1 was cancelled}} {{20/07/23 14:26:08 INFO TaskSetManager: Lost task 3.3 in stage 1.0 (TID 13) on 10.244.3.7, executor 1: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 12]}} {{20/07/23 14:26:08 INFO DAGScheduler: ResultStage 1 (start at NativeMethodAccessorImpl.java:0) failed in 20.352 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 11, 10.244.3.7, executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD}} {{ at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} {{ at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2350)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} {{ at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)}} {{ at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)}} {{ at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)}} {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)}} {{ at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)}} {{ at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)}} {{ at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:488)}} {{ at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)}} {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} {{ at java.lang.reflect.Method.invoke(Method.java:498)}}
[jira] [Resolved] (SPARK-32386) Fix temp view leaking in Structured Streaming tests
[ https://issues.apache.org/jira/browse/SPARK-32386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li resolved SPARK-32386. - Resolution: Won't Fix In various suites, we need this temp view for checking results, e.g KafkaDontFailOnDataLossSuite. > Fix temp view leaking in Structured Streaming tests > --- > > Key: SPARK-32386 > URL: https://issues.apache.org/jira/browse/SPARK-32386 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32372. - Fix Version/s: 3.1.0 3.0.1 Assignee: wuyi Resolution: Fixed > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Blocker > Fix For: 3.0.1, 3.1.0 > > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32374) Disallow setting properties when creating temporary views
[ https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32374: --- Assignee: Terry Kim (was: Apache Spark) > Disallow setting properties when creating temporary views > - > > Key: SPARK-32374 > URL: https://issues.apache.org/jira/browse/SPARK-32374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.1.0 > > > Currently, you can specify properties when creating a temporary view. > However, they are not used and SHOW TBLPROPERTIES always returns an empty > result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-32372: Priority: Major (was: Critical) > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32372) "Resolved attribute(s) XXX missing" after dudup conflict references
[ https://issues.apache.org/jira/browse/SPARK-32372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-32372: Priority: Critical (was: Blocker) > "Resolved attribute(s) XXX missing" after dudup conflict references > --- > > Key: SPARK-32372 > URL: https://issues.apache.org/jira/browse/SPARK-32372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.4, 2.4.6, 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Critical > Fix For: 3.0.1, 3.1.0 > > > {code:java} > // case class Person(id: Int, name: String, age: Int) > sql("SELECT name, avg(age) as avg_age FROM person GROUP BY > name").createOrReplaceTempView("person_a") > sql("SELECT p1.name, p2.avg_age FROM person p1 JOIN person_a p2 ON p1.name = > p2.name").createOrReplaceTempView("person_b") > sql("SELECT * FROM person_a UNION SELECT * FROM person_b") > .createOrReplaceTempView("person_c") > sql("SELECT p1.name, p2.avg_age FROM person_c p1 JOIN person_c p2 ON p1.name > = p2.name").show > {code} > error: > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > Resolved attribute(s) avg_age#235 missing from name#233,avg_age#231 in > operator !Project [name#233, avg_age#235]. Attribute(s) with the same name > appear in the operation: avg_age. Please check if the right attribute(s) are > used.;; > ...{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32374) Disallow setting properties when creating temporary views
[ https://issues.apache.org/jira/browse/SPARK-32374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32374. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29167 [https://github.com/apache/spark/pull/29167] > Disallow setting properties when creating temporary views > - > > Key: SPARK-32374 > URL: https://issues.apache.org/jira/browse/SPARK-32374 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Major > Fix For: 3.1.0 > > > Currently, you can specify properties when creating a temporary view. > However, they are not used and SHOW TBLPROPERTIES always returns an empty > result on temporary views. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32280) AnalysisException thrown when query contains several JOINs
[ https://issues.apache.org/jira/browse/SPARK-32280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32280: --- Assignee: wuyi > AnalysisException thrown when query contains several JOINs > -- > > Key: SPARK-32280 > URL: https://issues.apache.org/jira/browse/SPARK-32280 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: David Lindelöf >Assignee: wuyi >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > I've come across a curious {{AnalysisException}} thrown in one of my SQL > queries, even though the SQL appears legitimate. I was able to reduce it to > this example: > {code:python} > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > spark.sql('SELECT 1 AS id').createOrReplaceTempView('A') > spark.sql(''' > SELECT id, > 'foo' AS kind > FROM A''').createOrReplaceTempView('B') > spark.sql(''' > SELECT l.id > FROM B AS l > JOIN B AS r > ON l.kind = r.kind''').createOrReplaceTempView('C') > spark.sql(''' > SELECT 0 > FROM ( >SELECT * >FROM B >JOIN C >USING (id)) > JOIN ( >SELECT * >FROM B >JOIN C >USING (id)) > USING (id)''') > {code} > Running this yields the following error: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql. > : org.apache.spark.sql.AnalysisException: Resolved attribute(s) kind#11 > missing from id#10,kind#2,id#7,kind#5 in operator !Join Inner, (kind#11 = > kind#5). Attribute(s) with the same name appear in the operation: kind. > Please check if the right attribute(s) are used.;; > Project [0 AS 0#15] > +- Project [id#0, kind#2, kind#11] >+- Join Inner, (id#0 = id#14) > :- SubqueryAlias `__auto_generated_subquery_name` > : +- Project [id#0, kind#2] > : +- Project [id#0, kind#2] > :+- Join Inner, (id#0 = id#9) > : :- SubqueryAlias `b` > : : +- Project [id#0, foo AS kind#2] > : : +- SubqueryAlias `a` > : :+- Project [1 AS id#0] > : : +- OneRowRelation > : +- SubqueryAlias `c` > : +- Project [id#9] > : +- Join Inner, (kind#2 = kind#5) > ::- SubqueryAlias `l` > :: +- SubqueryAlias `b` > :: +- Project [id#9, foo AS kind#2] > ::+- SubqueryAlias `a` > :: +- Project [1 AS id#9] > :: +- OneRowRelation > :+- SubqueryAlias `r` > : +- SubqueryAlias `b` > : +- Project [id#7, foo AS kind#5] > : +- SubqueryAlias `a` > :+- Project [1 AS id#7] > : +- OneRowRelation > +- SubqueryAlias `__auto_generated_subquery_name` > +- Project [id#14, kind#11] > +- Project [id#14, kind#11] >+- Join Inner, (id#14 = id#10) > :- SubqueryAlias `b` > : +- Project [id#14, foo AS kind#11] > : +- SubqueryAlias `a` > :+- Project [1 AS id#14] > : +- OneRowRelation > +- SubqueryAlias `c` > +- Project [id#10] > +- !Join Inner, (kind#11 = kind#5) >:- SubqueryAlias `l` >: +- SubqueryAlias `b` >: +- Project [id#10, foo AS kind#2] >:+- SubqueryAlias `a` >: +- Project [1 AS id#10] >: +- OneRowRelation >+- SubqueryAlias `r` > +- SubqueryAlias `b` > +- Project [id#7, foo AS kind#5] > +- SubqueryAlias `a` >+- Project [1 AS id#7] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:369) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86) > at >
[jira] [Resolved] (SPARK-32280) AnalysisException thrown when query contains several JOINs
[ https://issues.apache.org/jira/browse/SPARK-32280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32280. - Fix Version/s: 3.1.0 3.0.1 Resolution: Fixed Issue resolved by pull request 29166 [https://github.com/apache/spark/pull/29166] > AnalysisException thrown when query contains several JOINs > -- > > Key: SPARK-32280 > URL: https://issues.apache.org/jira/browse/SPARK-32280 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: David Lindelöf >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > I've come across a curious {{AnalysisException}} thrown in one of my SQL > queries, even though the SQL appears legitimate. I was able to reduce it to > this example: > {code:python} > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > spark.sql('SELECT 1 AS id').createOrReplaceTempView('A') > spark.sql(''' > SELECT id, > 'foo' AS kind > FROM A''').createOrReplaceTempView('B') > spark.sql(''' > SELECT l.id > FROM B AS l > JOIN B AS r > ON l.kind = r.kind''').createOrReplaceTempView('C') > spark.sql(''' > SELECT 0 > FROM ( >SELECT * >FROM B >JOIN C >USING (id)) > JOIN ( >SELECT * >FROM B >JOIN C >USING (id)) > USING (id)''') > {code} > Running this yields the following error: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql. > : org.apache.spark.sql.AnalysisException: Resolved attribute(s) kind#11 > missing from id#10,kind#2,id#7,kind#5 in operator !Join Inner, (kind#11 = > kind#5). Attribute(s) with the same name appear in the operation: kind. > Please check if the right attribute(s) are used.;; > Project [0 AS 0#15] > +- Project [id#0, kind#2, kind#11] >+- Join Inner, (id#0 = id#14) > :- SubqueryAlias `__auto_generated_subquery_name` > : +- Project [id#0, kind#2] > : +- Project [id#0, kind#2] > :+- Join Inner, (id#0 = id#9) > : :- SubqueryAlias `b` > : : +- Project [id#0, foo AS kind#2] > : : +- SubqueryAlias `a` > : :+- Project [1 AS id#0] > : : +- OneRowRelation > : +- SubqueryAlias `c` > : +- Project [id#9] > : +- Join Inner, (kind#2 = kind#5) > ::- SubqueryAlias `l` > :: +- SubqueryAlias `b` > :: +- Project [id#9, foo AS kind#2] > ::+- SubqueryAlias `a` > :: +- Project [1 AS id#9] > :: +- OneRowRelation > :+- SubqueryAlias `r` > : +- SubqueryAlias `b` > : +- Project [id#7, foo AS kind#5] > : +- SubqueryAlias `a` > :+- Project [1 AS id#7] > : +- OneRowRelation > +- SubqueryAlias `__auto_generated_subquery_name` > +- Project [id#14, kind#11] > +- Project [id#14, kind#11] >+- Join Inner, (id#14 = id#10) > :- SubqueryAlias `b` > : +- Project [id#14, foo AS kind#11] > : +- SubqueryAlias `a` > :+- Project [1 AS id#14] > : +- OneRowRelation > +- SubqueryAlias `c` > +- Project [id#10] > +- !Join Inner, (kind#11 = kind#5) >:- SubqueryAlias `l` >: +- SubqueryAlias `b` >: +- Project [id#10, foo AS kind#2] >:+- SubqueryAlias `a` >: +- Project [1 AS id#10] >: +- OneRowRelation >+- SubqueryAlias `r` > +- SubqueryAlias `b` > +- Project [id#7, foo AS kind#5] > +- SubqueryAlias `a` >+- Project [1 AS id#7] > +- OneRowRelation > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:369) > at >
[jira] [Commented] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163595#comment-17163595 ] Apache Spark commented on SPARK-32408: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29205 > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32408: Assignee: (was: Apache Spark) > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32408: Assignee: Apache Spark > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32334) Investigate commonizing Columnar and Row data transformations
[ https://issues.apache.org/jira/browse/SPARK-32334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163576#comment-17163576 ] Robert Joseph Evans commented on SPARK-32334: - Row to columnar and columnar to row is mostly figured out. There are some performance improvements that we could probably make in the row to columnar transition. The issue is going to be with a columnar to columnar transition. Copying data from one columnar format in a performance way is a solvable problem, but we might need to special case a few things or do code generation if we cannot come up with a good common API. The issue is going to be with the desired batch size. parquet and orc output a batch size of 4096 rows by default, but each are separate configs. in memory columnar storage wants 1 rows by default, but also has a hard coded soft limit of 4MB compressed. The arrow config though is for a maximum size of 1 rows by default. So I am thinking that we want `SparkPlan` to optionally specify a maximum batch size instead of a target size. The row to columnar transition would just build up a batch until it hits the target size or the end of the input iterator. The columnar to columnar transition is a little more complicated. It would have to copy out a range of rows from one batch into another batch. This could mean in the worst case that we have one batch come in, in arrow format, but we need to copy it to another batch, so that we can split it up into the target size. This should cover the use case for basic map like UDFs. For UDFs like `FlatMapCoGroupsInPandasExec` there is no fixed batch size, and in fact it takes two iterators as input that are co-grouped together. If we wanted an operator like this to do columnar processing we would have to be able to replicate all of that processing, but for columnar Arrow formatted data. This is starting to go beyond what I see as the scope of this JIRA and I would prefer to stick with just `MapInPandasExec`, `MapPartitionsInRWithArrowExec`, and `ArrowEvalPythonExec` for now. In follow on work we can start to look at what it would take to support an ArrowBatchedGroupedIterator, and an ArrowBatchedCoGroupedIterator. > Investigate commonizing Columnar and Row data transformations > -- > > Key: SPARK-32334 > URL: https://issues.apache.org/jira/browse/SPARK-32334 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > We introduced more Columnar Support with SPARK-27396. > With that we recognized that there is code that is doing very similar > transformations from ColumnarBatch or Arrow into InternalRow and vice versa. > For instance: > [https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L56-L58] > [https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L389] > We should investigate if we can commonize that code. > We are also looking at making the internal caching serialization pluggable to > allow for different cache implementations. > ([https://github.com/apache/spark/pull/29067]). > It was recently brought up that we should investigate if using the data > source v2 api makes sense and is feasible for some of these transformations > to allow it to be easily extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32364) Use CaseInsensitiveMap for DataFrameReader/Writer options
[ https://issues.apache.org/jira/browse/SPARK-32364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32364: -- Reporter: Girish A Pandit (was: Dongjoon Hyun) > Use CaseInsensitiveMap for DataFrameReader/Writer options > - > > Key: SPARK-32364 > URL: https://issues.apache.org/jira/browse/SPARK-32364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0 >Reporter: Girish A Pandit >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.1, 3.1.0 > > > When a user have multiple options like path, paTH, and PATH for the same key > path, option/options is non-deterministic because extraOptions is HashMap. > This issue aims to use *CaseInsensitiveMap* instead of *HashMap* to fix this > bug fundamentally. > {code} > spark.read > .option("paTh", "1") > .option("PATH", "2") > .option("Path", "3") > .option("patH", "4") > .load("5") > ... > org.apache.spark.sql.AnalysisException: > Path does not exist: file:/.../1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32412) Unify error handling for spark thrift server operations
[ https://issues.apache.org/jira/browse/SPARK-32412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32412: Assignee: Apache Spark > Unify error handling for spark thrift server operations > --- > > Key: SPARK-32412 > URL: https://issues.apache.org/jira/browse/SPARK-32412 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > Log error only once at the server-side for all kinds of operations in both > async and sync mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32412) Unify error handling for spark thrift server operations
[ https://issues.apache.org/jira/browse/SPARK-32412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32412: Assignee: (was: Apache Spark) > Unify error handling for spark thrift server operations > --- > > Key: SPARK-32412 > URL: https://issues.apache.org/jira/browse/SPARK-32412 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Major > > Log error only once at the server-side for all kinds of operations in both > async and sync mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32412) Unify error handling for spark thrift server operations
[ https://issues.apache.org/jira/browse/SPARK-32412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163539#comment-17163539 ] Apache Spark commented on SPARK-32412: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/29204 > Unify error handling for spark thrift server operations > --- > > Key: SPARK-32412 > URL: https://issues.apache.org/jira/browse/SPARK-32412 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Major > > Log error only once at the server-side for all kinds of operations in both > async and sync mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32413) Guidance for my project
Suat Toksoz created SPARK-32413: --- Summary: Guidance for my project Key: SPARK-32413 URL: https://issues.apache.org/jira/browse/SPARK-32413 Project: Spark Issue Type: Brainstorming Components: PySpark, Spark Core, SparkR Affects Versions: 3.0.0 Reporter: Suat Toksoz hi, I am planning to get-read elasticsearch index continuously, and put that data on Data Frame and group that data, search and create an alert. I like to write my code in python. For this purpose, what should I use, spark, jupter, pyspark... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32412) Unify error handling for spark thrift server operations
Kent Yao created SPARK-32412: Summary: Unify error handling for spark thrift server operations Key: SPARK-32412 URL: https://issues.apache.org/jira/browse/SPARK-32412 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Kent Yao Log error only once at the server-side for all kinds of operations in both async and sync mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32411) GPU Cluster Fail
Vinh Tran created SPARK-32411: - Summary: GPU Cluster Fail Key: SPARK-32411 URL: https://issues.apache.org/jira/browse/SPARK-32411 Project: Spark Issue Type: Bug Components: PySpark, Web UI Affects Versions: 3.0.0 Environment: Ihave a Apache Spark 3.0 cluster consisting of machines with multiple nvidia-gpus and I connect my jupyter notebook to the cluster using pyspark, Reporter: Vinh Tran I'm having a difficult time getting a GPU cluster started on Apache Spark 3.0. It was hard to find documentation on this, but I stumbled on a NVIDIA github page for Rapids which suggested the following additional edits to the spark-defaults.conf: {code:java} spark.task.resource.gpu.amount 0.25 spark.executor.resource.gpu.discoveryScript ./usr/local/spark/getGpusResources.sh{code} I have a Apache Spark 3.0 cluster consisting of machines with multiple nvidia-gpus and I connect my jupyter notebook to the cluster using pyspark, however it results in the following error: {code:java} Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: You must specify an amount for gpu at org.apache.spark.resource.ResourceUtils$.$anonfun$parseResourceRequest$1(ResourceUtils.scala:142) at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:119) at org.apache.spark.resource.ResourceUtils$.parseResourceRequest(ResourceUtils.scala:142) at org.apache.spark.resource.ResourceUtils$.$anonfun$parseAllResourceRequests$1(ResourceUtils.scala:159) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.resource.ResourceUtils$.parseAllResourceRequests(ResourceUtils.scala:159) at org.apache.spark.SparkContext$.checkResourcesPerTask$1(SparkContext.scala:2773) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2884) at org.apache.spark.SparkContext.(SparkContext.scala:528) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) {code} After this, I then tried adding another line to the conf per the instructions which results in no errors, however when I log in to the Web UI at localhost:8080, under Running Applications, the state remains at waiting. {code:java} spark.task.resource.gpu.amount 2 spark.executor.resource.gpu.discoveryScript ./usr/local/spark/getGpusResources.sh spark.executor.resource.gpu.amount 1 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Parent: SPARK-32244 Issue Type: Sub-task (was: Improvement) > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep -r "target/scala-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Description: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change since we're not doing the cross build in SBT. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. It seems causing side effects that are dependent on that path. See, for example, {{git grep \-r "target/scala\-"}}. To minimise the side effects, we should disable crossPaths only in GitHub Actions build for now. was: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change since we're not doing the cross build in SBT. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. It seems causing side effects that are dependent on that path. See, for example, {{git grep -r "target/scala-"}}. To minimise the side effects, we should disable crossPaths only in GitHub Actions build for now. > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep \-r "target/scala\-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Issue Type: Test (was: Improvement) > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Test > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep -r "target/scala-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Issue Type: Improvement (was: Test) > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep -r "target/scala-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Description: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change since we're not doing the cross build in SBT. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. It seems causing side effects that are dependent on that path. See, for example, {{git grep -r "target/scala-"}}. To minimise the side effects, we should disable crossPaths only in GitHub Actions build for now. was: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change Now, the intermediate classes are placed without Scala version directory in SBT build specifically. We should reflect this changes in particular about classpathes. SBT assembly does not get affected so it is mostly just test-only. > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change since we're not doing the cross build in SBT. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > It seems causing side effects that are dependent on that path. See, for > example, {{git grep -r "target/scala-"}}. > To minimise the side effects, we should disable crossPaths only in GitHub > Actions build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Description: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. This is correct change Now, the intermediate classes are placed without Scala version directory in SBT build specifically. We should reflect this changes in particular about classpathes. SBT assembly does not get affected so it is mostly just test-only. was: After SPARK-32245, crossPaths was disabled in SBT build to run the Junit tests per project properly. Now, the intermediate classes are placed without Scala version directory in SBT build specifically. We should reflect this changes in particular about classpathes. SBT assembly does not get affected so it is mostly just test-only. > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > This is correct change > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > We should reflect this changes in particular about classpathes. > SBT assembly does not get affected so it is mostly just test-only. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32408) Disable crossPaths only in GitHub Actions to prevent side effects
[ https://issues.apache.org/jira/browse/SPARK-32408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32408: - Summary: Disable crossPaths only in GitHub Actions to prevent side effects (was: Reflect the removed Scala version directory to the classpaths) > Disable crossPaths only in GitHub Actions to prevent side effects > - > > Key: SPARK-32408 > URL: https://issues.apache.org/jira/browse/SPARK-32408 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > After SPARK-32245, crossPaths was disabled in SBT build to run the Junit > tests per project properly. > Now, the intermediate classes are placed without Scala version directory in > SBT build specifically. > We should reflect this changes in particular about classpathes. > SBT assembly does not get affected so it is mostly just test-only. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32389) Add all hive.execution suites in the parallel test group
[ https://issues.apache.org/jira/browse/SPARK-32389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32389. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 28977 [https://github.com/apache/spark/pull/28977] > Add all hive.execution suites in the parallel test group > > > Key: SPARK-32389 > URL: https://issues.apache.org/jira/browse/SPARK-32389 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.1.0 > > > Similar to SPARK-27460, we add an extra parallel test group for all > `hive.executiton` suites to reduce the Jenkins testing time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32389) Add all hive.execution suites in the parallel test group
[ https://issues.apache.org/jira/browse/SPARK-32389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32389: Assignee: Yuanjian Li > Add all hive.execution suites in the parallel test group > > > Key: SPARK-32389 > URL: https://issues.apache.org/jira/browse/SPARK-32389 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > > Similar to SPARK-27460, we add an extra parallel test group for all > `hive.executiton` suites to reduce the Jenkins testing time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org