[jira] [Commented] (SPARK-31736) Nested column pruning for other operators
[ https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109341#comment-17109341 ] Apache Spark commented on SPARK-31736: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/28556 > Nested column pruning for other operators > - > > Key: SPARK-31736 > URL: https://issues.apache.org/jira/browse/SPARK-31736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently we only push nested column pruning through a few operators such as > LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for > nested column pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31736) Nested column pruning for other operators
[ https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31736: Assignee: L. C. Hsieh (was: Apache Spark) > Nested column pruning for other operators > - > > Key: SPARK-31736 > URL: https://issues.apache.org/jira/browse/SPARK-31736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently we only push nested column pruning through a few operators such as > LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for > nested column pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31736) Nested column pruning for other operators
[ https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31736: Assignee: Apache Spark (was: L. C. Hsieh) > Nested column pruning for other operators > - > > Key: SPARK-31736 > URL: https://issues.apache.org/jira/browse/SPARK-31736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > > Currently we only push nested column pruning through a few operators such as > LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for > nested column pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31736) Nested column pruning for other operators
[ https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109339#comment-17109339 ] Apache Spark commented on SPARK-31736: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/28556 > Nested column pruning for other operators > - > > Key: SPARK-31736 > URL: https://issues.apache.org/jira/browse/SPARK-31736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently we only push nested column pruning through a few operators such as > LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for > nested column pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31736) Nested column pruning for other operators
[ https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-31736: Parent: SPARK-25603 Issue Type: Sub-task (was: Improvement) > Nested column pruning for other operators > - > > Key: SPARK-31736 > URL: https://issues.apache.org/jira/browse/SPARK-31736 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Currently we only push nested column pruning through a few operators such as > LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for > nested column pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31736) Nested column pruning for other operators
L. C. Hsieh created SPARK-31736: --- Summary: Nested column pruning for other operators Key: SPARK-31736 URL: https://issues.apache.org/jira/browse/SPARK-31736 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh Currently we only push nested column pruning through a few operators such as LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for nested column pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25841) Redesign window function rangeBetween API
[ https://issues.apache.org/jira/browse/SPARK-25841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109313#comment-17109313 ] Shyam commented on SPARK-25841: --- [~rxin] is this fixed in latest version i.e. 2.4.3v ? still this issue persisting ? > Redesign window function rangeBetween API > - > > Key: SPARK-25841 > URL: https://issues.apache.org/jira/browse/SPARK-25841 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > As I was reviewing the Spark API changes for 2.4, I found that through > organic, ad-hoc evolution the current API for window functions in Scala is > pretty bad. > > To illustrate the problem, we have two rangeBetween functions in Window > class: > > {code:java} > class Window { > def unboundedPreceding: Long > ... > def rangeBetween(start: Long, end: Long): WindowSpec > def rangeBetween(start: Column, end: Column): WindowSpec > }{code} > > The Column version of rangeBetween was added in Spark 2.3 because the > previous version (Long) could only support integral values and not time > intervals. Now in order to support specifying unboundedPreceding in the > rangeBetween(Column, Column) API, we added an unboundedPreceding that returns > a Column in functions.scala. > > There are a few issues I have with the API: > > 1. To the end user, this can be just super confusing. Why are there two > unboundedPreceding functions, in different classes, that are named the same > but return different types? > > 2. Using Column as the parameter signature implies this can be an actual > Column, but in practice rangeBetween can only accept literal values. > > 3. We added the new APIs to support intervals, but they don't actually work, > because in the implementation we try to validate the start is less than the > end, but calendar interval types are not comparable, and as a result we throw > a type mismatch exception at runtime: scala.MatchError: CalendarIntervalType > (of class org.apache.spark.sql.types.CalendarIntervalType$) > > 4. In order to make interval work, users need to create an interval using > CalendarInterval, which is an internal class that has no documentation and no > stable API. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31085) Amend Spark's Semantic Versioning Policy
[ https://issues.apache.org/jira/browse/SPARK-31085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31085. - Fix Version/s: 3.0.0 Resolution: Fixed > Amend Spark's Semantic Versioning Policy > > > Key: SPARK-31085 > URL: https://issues.apache.org/jira/browse/SPARK-31085 > Project: Spark > Issue Type: Umbrella > Components: PySpark, Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Blocker > Fix For: 3.0.0 > > > This issue tracks all the activity for the following discussion and vote. > - > https://lists.apache.org/thread.html/r82f99ad8c2798629eed66d65f2cddc1ed196dddf82e8e9370f3b7d32%40%3Cdev.spark.apache.org%3E > - > https://lists.apache.org/thread.html/r683dbb0481adb1944461b6e1a60aafc44a66423c6e9fa2bab24a07db%40%3Cdev.spark.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31404) file source backward compatibility after calendar switch
[ https://issues.apache.org/jira/browse/SPARK-31404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31404. - Fix Version/s: 3.0.0 Resolution: Fixed > file source backward compatibility after calendar switch > > > Key: SPARK-31404 > URL: https://issues.apache.org/jira/browse/SPARK-31404 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Blocker > Fix For: 3.0.0 > > > In Spark 3.0, we switch to the Proleptic Gregorian calendar by using the Java > 8 datetime APIs. This makes Spark follow the ISO and SQL standard, but > introduces some backward compatibility problems: > 1. may read wrong data from the data files written by Spark 2.4 > 2. may have perf regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31405) fail by default when read/write datetime values and not sure if they need rebase or not
[ https://issues.apache.org/jira/browse/SPARK-31405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31405: --- Assignee: Wenchen Fan > fail by default when read/write datetime values and not sure if they need > rebase or not > --- > > Key: SPARK-31405 > URL: https://issues.apache.org/jira/browse/SPARK-31405 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31405) fail by default when read/write datetime values and not sure if they need rebase or not
[ https://issues.apache.org/jira/browse/SPARK-31405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31405. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28526 [https://github.com/apache/spark/pull/28526] > fail by default when read/write datetime values and not sure if they need > rebase or not > --- > > Key: SPARK-31405 > URL: https://issues.apache.org/jira/browse/SPARK-31405 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31707) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
[ https://issues.apache.org/jira/browse/SPARK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31707: --- Assignee: Jungtaek Lim > Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax > - > > Key: SPARK-31707 > URL: https://issues.apache.org/jira/browse/SPARK-31707 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Blocker > > According to the latest status of discussion in the dev@ mailing list, > [[DISCUSS] Resolve ambiguous parser rule between two "create > table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html], > we'd want to revert the change of SPARK-30098 first to unblock Spark 3.0.0. > This issue tracks the effort of revert. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31707) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
[ https://issues.apache.org/jira/browse/SPARK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31707. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28517 [https://github.com/apache/spark/pull/28517] > Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax > - > > Key: SPARK-31707 > URL: https://issues.apache.org/jira/browse/SPARK-31707 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Blocker > Fix For: 3.0.0 > > > According to the latest status of discussion in the dev@ mailing list, > [[DISCUSS] Resolve ambiguous parser rule between two "create > table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html], > we'd want to revert the change of SPARK-30098 first to unblock Spark 3.0.0. > This issue tracks the effort of revert. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31725) Set America/Los_Angeles time zone and Locale.US by default
[ https://issues.apache.org/jira/browse/SPARK-31725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31725. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28548 [https://github.com/apache/spark/pull/28548] > Set America/Los_Angeles time zone and Locale.US by default > -- > > Key: SPARK-31725 > URL: https://issues.apache.org/jira/browse/SPARK-31725 > Project: Spark > Issue Type: Test > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Move default time zone and locale settings to SparkFunSuite and set: > # America/Los_Angeles as the default time zone > # Locale.US as the default locale -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31725) Set America/Los_Angeles time zone and Locale.US by default
[ https://issues.apache.org/jira/browse/SPARK-31725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31725: --- Assignee: Maxim Gekk > Set America/Los_Angeles time zone and Locale.US by default > -- > > Key: SPARK-31725 > URL: https://issues.apache.org/jira/browse/SPARK-31725 > Project: Spark > Issue Type: Test > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Move default time zone and locale settings to SparkFunSuite and set: > # America/Los_Angeles as the default time zone > # Locale.US as the default locale -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25985) Verify the SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches
[ https://issues.apache.org/jira/browse/SPARK-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109201#comment-17109201 ] Nick Afshartous commented on SPARK-25985: - [~smilegator] Can you please comment if this task is still relevant. If so I'd like to look into it if you could please elaborate on "works well" in the description. > Verify the SPARK-24613 Cache with UDF could not be matched with subsequent > dependent caches > --- > > Key: SPARK-25985 > URL: https://issues.apache.org/jira/browse/SPARK-25985 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > Labels: starter > > Verify whether recacheByCondition works well when the cache data is with UDF. > This is a follow-up of https://github.com/apache/spark/pull/21602 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31735) Include all columns in the summary report
[ https://issues.apache.org/jira/browse/SPARK-31735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31735: Assignee: (was: Apache Spark) > Include all columns in the summary report > - > > Key: SPARK-31735 > URL: https://issues.apache.org/jira/browse/SPARK-31735 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.4.5 >Reporter: Fokko Driesprong >Priority: Major > > Dates and other columns are excluded: > > {{from datetime import datetime, timedelta, timezone}} > {{from pyspark.sql import types as T}} > {{from pyspark.sql import Row}} > {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, > tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + > timedelta(days=n))) for n in range(0, n_days)]}}{{schema = > T.StructType([T.StructField(name="date", dataType=T.DateType(), > nullable=False)])}} > {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = > spark.createDataFrame(data=rdd, schema=schema)}} > {{df.agg(F.max("date")).show()}}{{df.summary().show()}} > {{+---+}} > {{|summary|}} > {{+---+}} > {{| count |}} > {{| mean |}} > {{| stddev|}} > {{| min |}} > {{| 25% |}} > {{| 50% |}} > {{| 75% |}} > {{| max |}} > {{+---+}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31735) Include all columns in the summary report
[ https://issues.apache.org/jira/browse/SPARK-31735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109148#comment-17109148 ] Apache Spark commented on SPARK-31735: -- User 'Fokko' has created a pull request for this issue: https://github.com/apache/spark/pull/28554 > Include all columns in the summary report > - > > Key: SPARK-31735 > URL: https://issues.apache.org/jira/browse/SPARK-31735 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.4.5 >Reporter: Fokko Driesprong >Priority: Major > > Dates and other columns are excluded: > > {{from datetime import datetime, timedelta, timezone}} > {{from pyspark.sql import types as T}} > {{from pyspark.sql import Row}} > {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, > tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + > timedelta(days=n))) for n in range(0, n_days)]}}{{schema = > T.StructType([T.StructField(name="date", dataType=T.DateType(), > nullable=False)])}} > {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = > spark.createDataFrame(data=rdd, schema=schema)}} > {{df.agg(F.max("date")).show()}}{{df.summary().show()}} > {{+---+}} > {{|summary|}} > {{+---+}} > {{| count |}} > {{| mean |}} > {{| stddev|}} > {{| min |}} > {{| 25% |}} > {{| 50% |}} > {{| 75% |}} > {{| max |}} > {{+---+}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31735) Include all columns in the summary report
[ https://issues.apache.org/jira/browse/SPARK-31735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31735: Assignee: Apache Spark > Include all columns in the summary report > - > > Key: SPARK-31735 > URL: https://issues.apache.org/jira/browse/SPARK-31735 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 2.4.5 >Reporter: Fokko Driesprong >Assignee: Apache Spark >Priority: Major > > Dates and other columns are excluded: > > {{from datetime import datetime, timedelta, timezone}} > {{from pyspark.sql import types as T}} > {{from pyspark.sql import Row}} > {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, > tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + > timedelta(days=n))) for n in range(0, n_days)]}}{{schema = > T.StructType([T.StructField(name="date", dataType=T.DateType(), > nullable=False)])}} > {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = > spark.createDataFrame(data=rdd, schema=schema)}} > {{df.agg(F.max("date")).show()}}{{df.summary().show()}} > {{+---+}} > {{|summary|}} > {{+---+}} > {{| count |}} > {{| mean |}} > {{| stddev|}} > {{| min |}} > {{| 25% |}} > {{| 50% |}} > {{| 75% |}} > {{| max |}} > {{+---+}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31734) add weight support in ClusteringEvaluator
[ https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109149#comment-17109149 ] Apache Spark commented on SPARK-31734: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/28553 > add weight support in ClusteringEvaluator > - > > Key: SPARK-31734 > URL: https://issues.apache.org/jira/browse/SPARK-31734 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > add weight support in ClusteringEvaluator -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31734) add weight support in ClusteringEvaluator
[ https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31734: Assignee: Apache Spark > add weight support in ClusteringEvaluator > - > > Key: SPARK-31734 > URL: https://issues.apache.org/jira/browse/SPARK-31734 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Major > > add weight support in ClusteringEvaluator -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31734) add weight support in ClusteringEvaluator
[ https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109147#comment-17109147 ] Apache Spark commented on SPARK-31734: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/28553 > add weight support in ClusteringEvaluator > - > > Key: SPARK-31734 > URL: https://issues.apache.org/jira/browse/SPARK-31734 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > add weight support in ClusteringEvaluator -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31734) add weight support in ClusteringEvaluator
[ https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31734: Assignee: (was: Apache Spark) > add weight support in ClusteringEvaluator > - > > Key: SPARK-31734 > URL: https://issues.apache.org/jira/browse/SPARK-31734 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Major > > add weight support in ClusteringEvaluator -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31735) Include all columns in the summary report
Fokko Driesprong created SPARK-31735: Summary: Include all columns in the summary report Key: SPARK-31735 URL: https://issues.apache.org/jira/browse/SPARK-31735 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 2.4.5 Reporter: Fokko Driesprong Dates and other columns are excluded: {{from datetime import datetime, timedelta, timezone}} {{from pyspark.sql import types as T}} {{from pyspark.sql import Row}} {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + timedelta(days=n))) for n in range(0, n_days)]}}{{schema = T.StructType([T.StructField(name="date", dataType=T.DateType(), nullable=False)])}} {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = spark.createDataFrame(data=rdd, schema=schema)}} {{df.agg(F.max("date")).show()}}{{df.summary().show()}} {{+---+}} {{|summary|}} {{+---+}} {{| count |}} {{| mean |}} {{| stddev|}} {{| min |}} {{| 25% |}} {{| 50% |}} {{| 75% |}} {{| max |}} {{+---+}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31734) add weight support in ClusteringEvaluator
Huaxin Gao created SPARK-31734: -- Summary: add weight support in ClusteringEvaluator Key: SPARK-31734 URL: https://issues.apache.org/jira/browse/SPARK-31734 Project: Spark Issue Type: Improvement Components: ML, PySpark Affects Versions: 3.1.0 Reporter: Huaxin Gao add weight support in ClusteringEvaluator -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31235) Separates different categories of applications
[ https://issues.apache.org/jira/browse/SPARK-31235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109088#comment-17109088 ] Apache Spark commented on SPARK-31235: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28552 > Separates different categories of applications > -- > > Key: SPARK-31235 > URL: https://issues.apache.org/jira/browse/SPARK-31235 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: wangzhun >Assignee: wangzhun >Priority: Minor > Fix For: 3.1.0 > > > The current application defaults to the SPARK type. > In fact, different types of applications have different characteristics and > are suitable for different scenarios.For example: SPAKR-SQL, SPARK-STREAMING. > I recommend distinguishing them by the parameter `spark.yarn.applicationType` > so that we can more easily manage and maintain different types of > applications. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31733) Make YarnClient.`specify a more specific type for the application` pass in Hadoop-3.2
Dongjoon Hyun created SPARK-31733: - Summary: Make YarnClient.`specify a more specific type for the application` pass in Hadoop-3.2 Key: SPARK-31733 URL: https://issues.apache.org/jira/browse/SPARK-31733 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 3.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31732) disable some flaky test
[ https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31732. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28547 [https://github.com/apache/spark/pull/28547] > disable some flaky test > --- > > Key: SPARK-31732 > URL: https://issues.apache.org/jira/browse/SPARK-31732 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > > https://issues.apache.org/jira/browse/SPARK-31722 > https://issues.apache.org/jira/browse/SPARK-31723 > https://issues.apache.org/jira/browse/SPARK-31729 > https://issues.apache.org/jira/browse/SPARK-31728 > https://issues.apache.org/jira/browse/SPARK-31730 > https://issues.apache.org/jira/browse/SPARK-31731 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31289) Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite
[ https://issues.apache.org/jira/browse/SPARK-31289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31289. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28055 [https://github.com/apache/spark/pull/28055] > Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite > --- > > Key: SPARK-31289 > URL: https://issues.apache.org/jira/browse/SPARK-31289 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.0.0 > > > {code:java} > Caused by: MetaException(message:Unable to open a test connection to the > given database. JDBC url = > jdbc:derby:;databaseName=/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb;create=true, > username = APP. Terminating connection pool (set lazyInit to true if you > expect to start your database after your app). Original Exception: -- > 2020-03-27 09:20:11.949 - stderr> java.sql.SQLException: Failed to create > database > '/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb', > see the next exception for details. > {code} > {code:java} > Caused by: ERROR XBM0A: The database directory > '/home/jenkins/workspace/SparkPullRequestBuilder@4/target/tmp/spark-84c8ff0e-214f-416c-9d44-ab19f864a79b' > exists. However, it does not contain the expected 'service.properties' file. > Perhaps Derby was brought down in the middle of creating this database. You > may want to delete this directory and try creating the database again. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31289) Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite
[ https://issues.apache.org/jira/browse/SPARK-31289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31289: --- Assignee: Kent Yao > Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite > --- > > Key: SPARK-31289 > URL: https://issues.apache.org/jira/browse/SPARK-31289 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > > {code:java} > Caused by: MetaException(message:Unable to open a test connection to the > given database. JDBC url = > jdbc:derby:;databaseName=/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb;create=true, > username = APP. Terminating connection pool (set lazyInit to true if you > expect to start your database after your app). Original Exception: -- > 2020-03-27 09:20:11.949 - stderr> java.sql.SQLException: Failed to create > database > '/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb', > see the next exception for details. > {code} > {code:java} > Caused by: ERROR XBM0A: The database directory > '/home/jenkins/workspace/SparkPullRequestBuilder@4/target/tmp/spark-84c8ff0e-214f-416c-9d44-ab19f864a79b' > exists. However, it does not contain the expected 'service.properties' file. > Perhaps Derby was brought down in the middle of creating this database. You > may want to delete this directory and try creating the database again. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31732) disable some flaky test
[ https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31732: Assignee: Apache Spark (was: Wenchen Fan) > disable some flaky test > --- > > Key: SPARK-31732 > URL: https://issues.apache.org/jira/browse/SPARK-31732 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31722 > https://issues.apache.org/jira/browse/SPARK-31723 > https://issues.apache.org/jira/browse/SPARK-31729 > https://issues.apache.org/jira/browse/SPARK-31728 > https://issues.apache.org/jira/browse/SPARK-31730 > https://issues.apache.org/jira/browse/SPARK-31731 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31732) disable some flaky test
[ https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31732: Assignee: Wenchen Fan (was: Apache Spark) > disable some flaky test > --- > > Key: SPARK-31732 > URL: https://issues.apache.org/jira/browse/SPARK-31732 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31722 > https://issues.apache.org/jira/browse/SPARK-31723 > https://issues.apache.org/jira/browse/SPARK-31729 > https://issues.apache.org/jira/browse/SPARK-31728 > https://issues.apache.org/jira/browse/SPARK-31730 > https://issues.apache.org/jira/browse/SPARK-31731 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31732) disable some flaky test
[ https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108882#comment-17108882 ] Apache Spark commented on SPARK-31732: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/28547 > disable some flaky test > --- > > Key: SPARK-31732 > URL: https://issues.apache.org/jira/browse/SPARK-31732 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31722 > https://issues.apache.org/jira/browse/SPARK-31723 > https://issues.apache.org/jira/browse/SPARK-31729 > https://issues.apache.org/jira/browse/SPARK-31728 > https://issues.apache.org/jira/browse/SPARK-31730 > https://issues.apache.org/jira/browse/SPARK-31731 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31732) disable some flaky test
[ https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-31732: Description: https://issues.apache.org/jira/browse/SPARK-31722 https://issues.apache.org/jira/browse/SPARK-31723 https://issues.apache.org/jira/browse/SPARK-31729 https://issues.apache.org/jira/browse/SPARK-31728 https://issues.apache.org/jira/browse/SPARK-31730 https://issues.apache.org/jira/browse/SPARK-31731 was: https://issues.apache.org/jira/browse/SPARK-31722 https://issues.apache.org/jira/browse/SPARK-31723 https://issues.apache.org/jira/browse/SPARK-31729 https://issues.apache.org/jira/browse/SPARK-31728 https://issues.apache.org/jira/browse/SPARK-31730 > disable some flaky test > --- > > Key: SPARK-31732 > URL: https://issues.apache.org/jira/browse/SPARK-31732 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31722 > https://issues.apache.org/jira/browse/SPARK-31723 > https://issues.apache.org/jira/browse/SPARK-31729 > https://issues.apache.org/jira/browse/SPARK-31728 > https://issues.apache.org/jira/browse/SPARK-31730 > https://issues.apache.org/jira/browse/SPARK-31731 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31732) disable some flaky test
[ https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-31732: Description: https://issues.apache.org/jira/browse/SPARK-31722 https://issues.apache.org/jira/browse/SPARK-31723 https://issues.apache.org/jira/browse/SPARK-31729 https://issues.apache.org/jira/browse/SPARK-31728 https://issues.apache.org/jira/browse/SPARK-31730 was: https://issues.apache.org/jira/browse/SPARK-31728 > disable some flaky test > --- > > Key: SPARK-31732 > URL: https://issues.apache.org/jira/browse/SPARK-31732 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31722 > https://issues.apache.org/jira/browse/SPARK-31723 > https://issues.apache.org/jira/browse/SPARK-31729 > https://issues.apache.org/jira/browse/SPARK-31728 > https://issues.apache.org/jira/browse/SPARK-31730 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31732) disable some flaky test
[ https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-31732: Description: https://issues.apache.org/jira/browse/SPARK-31728 > disable some flaky test > --- > > Key: SPARK-31732 > URL: https://issues.apache.org/jira/browse/SPARK-31732 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31728 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31731) flaky test: org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceSuite
Wenchen Fan created SPARK-31731: --- Summary: flaky test: org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceSuite Key: SPARK-31731 URL: https://issues.apache.org/jira/browse/SPARK-31731 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.0.0 Reporter: Wenchen Fan https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/668/testReport/ KafkaMicroBatchV1SourceSuite.subscribing topic by pattern with topic deletions {code} sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: Timed out waiting for stream: The code passed to eventually never returned normally. Attempted 304 times over 1.000842521668 minutes. Last failure message: KafkaTestUtils.this.zkClient.isTopicMarkedForDeletion(topic) was true topic is still marked for deletion. org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308) org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307) org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479) org.apache.spark.sql.kafka010.KafkaTestUtils.verifyTopicDeletionWithRetries(KafkaTestUtils.scala:618) org.apache.spark.sql.kafka010.KafkaTestUtils.deleteTopic(KafkaTestUtils.scala:410) org.apache.spark.sql.kafka010.KafkaMicroBatchSourceSuiteBase.$anonfun$new$20(KafkaMicroBatchSourceSuite.scala:379) Caused by: KafkaTestUtils.this.zkClient.isTopicMarkedForDeletion(topic) was true topic is still marked for deletion org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) org.apache.spark.sql.kafka010.KafkaTestUtils.verifyTopicDeletion(KafkaTestUtils.scala:590) org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$verifyTopicDeletionWithRetries$1(KafkaTestUtils.scala:620) scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395) org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409) org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) == Progress == AssertOnQuery(, ) AddKafkaData(topics = Set(topic-31-seems), data = WrappedArray(1, 2, 3), message = ) CheckAnswer: [2],[3],[4] => Assert(, ) AddKafkaData(topics = Set(topic-31-bad), data = WrappedArray(4, 5, 6), message = ) CheckAnswer: [2],[3],[4],[5],[6],[7] == Stream == Output Mode: Append Stream state: {KafkaSourceV1[SubscribePattern[topic-31-.*]]: {}} Thread state: alive Thread stack trace: java.lang.Thread.sleep(Native Method) org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:241) org.apache.spark.sql.execution.streaming.MicroBatchExecution$$Lambda$2829/1543669599.apply$mcZ$sp(Unknown Source) org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57) org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:185) org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:333) org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:244) == Sink == 0: 1: [2] 2: [4] [3] 3: == Plan == == Parsed Logical Plan == WriteToDataSourceV2 org.apache.spark.sql.execution.streaming.sources.MicroBatchWrite@2f31f781 +- SerializeFromObject [input[0, int, false] AS value#8108] +- MapElements org.apache.spark.sql.kafka010.KafkaMicroBatchSourceSuiteBase$$Lambda$5466/109510938@420a5093, class scala.Tuple2, [StructField(_1,StringType,true), StructField(_2,StringType,true)], obj#8107: int +- DeserializeToObject newInstance(class scala.Tuple2), obj#8106: scala.Tuple2 +- Project [cast(key#8082 as string) AS key#8096, cast(value#8083 as string) AS value#8097] +- Project [key#8183 AS key#8082, value#8184 AS value#8083, topic#8185 AS topic#8084, partition#8186 AS partition#8085, offset#8187L AS offset#8086L, timestamp#8188 AS timestamp#8087, timestampType#8189 AS timestampType#8088] +-
[jira] [Created] (SPARK-31732) disable some flaky test
Wenchen Fan created SPARK-31732: --- Summary: disable some flaky test Key: SPARK-31732 URL: https://issues.apache.org/jira/browse/SPARK-31732 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31730) flaky test: org.apache.spark.scheduler.BarrierTaskContextSuite
Wenchen Fan created SPARK-31730: --- Summary: flaky test: org.apache.spark.scheduler.BarrierTaskContextSuite Key: SPARK-31730 URL: https://issues.apache.org/jira/browse/SPARK-31730 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.0.0 Reporter: Wenchen Fan https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122655/testReport/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/668/testReport/ BarrierTaskContextSuite.support multiple barrier() call within a single task {code} sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1031 was not less than or equal to 1000 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.scheduler.BarrierTaskContextSuite.$anonfun$new$15(BarrierTaskContextSuite.scala:157) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} BarrierTaskContextSuite.global sync by barrier() call {code} sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1049 was not less than or equal to 1000 at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at