[jira] [Commented] (SPARK-31736) Nested column pruning for other operators

2020-05-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109341#comment-17109341
 ] 

Apache Spark commented on SPARK-31736:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/28556

> Nested column pruning for other operators
> -
>
> Key: SPARK-31736
> URL: https://issues.apache.org/jira/browse/SPARK-31736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Currently we only push nested column pruning through a few operators such as 
> LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for 
> nested column pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31736) Nested column pruning for other operators

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31736:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Nested column pruning for other operators
> -
>
> Key: SPARK-31736
> URL: https://issues.apache.org/jira/browse/SPARK-31736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Currently we only push nested column pruning through a few operators such as 
> LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for 
> nested column pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31736) Nested column pruning for other operators

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31736:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Nested column pruning for other operators
> -
>
> Key: SPARK-31736
> URL: https://issues.apache.org/jira/browse/SPARK-31736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> Currently we only push nested column pruning through a few operators such as 
> LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for 
> nested column pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31736) Nested column pruning for other operators

2020-05-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109339#comment-17109339
 ] 

Apache Spark commented on SPARK-31736:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/28556

> Nested column pruning for other operators
> -
>
> Key: SPARK-31736
> URL: https://issues.apache.org/jira/browse/SPARK-31736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Currently we only push nested column pruning through a few operators such as 
> LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for 
> nested column pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31736) Nested column pruning for other operators

2020-05-16 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-31736:

Parent: SPARK-25603
Issue Type: Sub-task  (was: Improvement)

> Nested column pruning for other operators
> -
>
> Key: SPARK-31736
> URL: https://issues.apache.org/jira/browse/SPARK-31736
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Currently we only push nested column pruning through a few operators such as 
> LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for 
> nested column pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31736) Nested column pruning for other operators

2020-05-16 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-31736:
---

 Summary: Nested column pruning for other operators
 Key: SPARK-31736
 URL: https://issues.apache.org/jira/browse/SPARK-31736
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


Currently we only push nested column pruning through a few operators such as 
LIMIT, SAMPLE, etc. This is the ticket for supporting other operators for 
nested column pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25841) Redesign window function rangeBetween API

2020-05-16 Thread Shyam (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109313#comment-17109313
 ] 

Shyam commented on SPARK-25841:
---

[~rxin] is this fixed in latest version i.e. 2.4.3v ? still this issue 
persisting ?

> Redesign window function rangeBetween API
> -
>
> Key: SPARK-25841
> URL: https://issues.apache.org/jira/browse/SPARK-25841
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>
> As I was reviewing the Spark API changes for 2.4, I found that through 
> organic, ad-hoc evolution the current API for window functions in Scala is 
> pretty bad.
>   
>  To illustrate the problem, we have two rangeBetween functions in Window 
> class:
>   
> {code:java}
> class Window {
>  def unboundedPreceding: Long
>  ...
>  def rangeBetween(start: Long, end: Long): WindowSpec
>  def rangeBetween(start: Column, end: Column): WindowSpec
> }{code}
>  
>  The Column version of rangeBetween was added in Spark 2.3 because the 
> previous version (Long) could only support integral values and not time 
> intervals. Now in order to support specifying unboundedPreceding in the 
> rangeBetween(Column, Column) API, we added an unboundedPreceding that returns 
> a Column in functions.scala.
>   
>  There are a few issues I have with the API:
>   
>  1. To the end user, this can be just super confusing. Why are there two 
> unboundedPreceding functions, in different classes, that are named the same 
> but return different types?
>   
>  2. Using Column as the parameter signature implies this can be an actual 
> Column, but in practice rangeBetween can only accept literal values.
>   
>  3. We added the new APIs to support intervals, but they don't actually work, 
> because in the implementation we try to validate the start is less than the 
> end, but calendar interval types are not comparable, and as a result we throw 
> a type mismatch exception at runtime: scala.MatchError: CalendarIntervalType 
> (of class org.apache.spark.sql.types.CalendarIntervalType$)
>   
>  4. In order to make interval work, users need to create an interval using 
> CalendarInterval, which is an internal class that has no documentation and no 
> stable API.
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31085) Amend Spark's Semantic Versioning Policy

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31085.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Amend Spark's Semantic Versioning Policy
> 
>
> Key: SPARK-31085
> URL: https://issues.apache.org/jira/browse/SPARK-31085
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 3.0.0
>
>
> This issue tracks all the activity for the following discussion and vote.
> - 
> https://lists.apache.org/thread.html/r82f99ad8c2798629eed66d65f2cddc1ed196dddf82e8e9370f3b7d32%40%3Cdev.spark.apache.org%3E
> - 
> https://lists.apache.org/thread.html/r683dbb0481adb1944461b6e1a60aafc44a66423c6e9fa2bab24a07db%40%3Cdev.spark.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31404) file source backward compatibility after calendar switch

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31404.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> file source backward compatibility after calendar switch
> 
>
> Key: SPARK-31404
> URL: https://issues.apache.org/jira/browse/SPARK-31404
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
> Fix For: 3.0.0
>
>
> In Spark 3.0, we switch to the Proleptic Gregorian calendar by using the Java 
> 8 datetime APIs. This makes Spark follow the ISO and SQL standard, but 
> introduces some backward compatibility problems:
> 1. may read wrong data from the data files written by Spark 2.4
> 2. may have perf regression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31405) fail by default when read/write datetime values and not sure if they need rebase or not

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31405:
---

Assignee: Wenchen Fan

> fail by default when read/write datetime values and not sure if they need 
> rebase or not
> ---
>
> Key: SPARK-31405
> URL: https://issues.apache.org/jira/browse/SPARK-31405
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31405) fail by default when read/write datetime values and not sure if they need rebase or not

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31405.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28526
[https://github.com/apache/spark/pull/28526]

> fail by default when read/write datetime values and not sure if they need 
> rebase or not
> ---
>
> Key: SPARK-31405
> URL: https://issues.apache.org/jira/browse/SPARK-31405
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31707) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31707:
---

Assignee: Jungtaek Lim

> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -
>
> Key: SPARK-31707
> URL: https://issues.apache.org/jira/browse/SPARK-31707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Blocker
>
> According to the latest status of discussion in the dev@ mailing list, 
> [[DISCUSS] Resolve ambiguous parser rule between two "create 
> table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html],
>  we'd want to revert the change of SPARK-30098 first to unblock Spark 3.0.0.
> This issue tracks the effort of revert.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31707) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31707.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28517
[https://github.com/apache/spark/pull/28517]

> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -
>
> Key: SPARK-31707
> URL: https://issues.apache.org/jira/browse/SPARK-31707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Blocker
> Fix For: 3.0.0
>
>
> According to the latest status of discussion in the dev@ mailing list, 
> [[DISCUSS] Resolve ambiguous parser rule between two "create 
> table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html],
>  we'd want to revert the change of SPARK-30098 first to unblock Spark 3.0.0.
> This issue tracks the effort of revert.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31725) Set America/Los_Angeles time zone and Locale.US by default

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31725.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28548
[https://github.com/apache/spark/pull/28548]

> Set America/Los_Angeles time zone and Locale.US by default
> --
>
> Key: SPARK-31725
> URL: https://issues.apache.org/jira/browse/SPARK-31725
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Move default time zone and locale settings to SparkFunSuite and set:
> # America/Los_Angeles as the default time zone
> # Locale.US as the default locale



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31725) Set America/Los_Angeles time zone and Locale.US by default

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31725:
---

Assignee: Maxim Gekk

> Set America/Los_Angeles time zone and Locale.US by default
> --
>
> Key: SPARK-31725
> URL: https://issues.apache.org/jira/browse/SPARK-31725
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Move default time zone and locale settings to SparkFunSuite and set:
> # America/Los_Angeles as the default time zone
> # Locale.US as the default locale



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25985) Verify the SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches

2020-05-16 Thread Nick Afshartous (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109201#comment-17109201
 ] 

Nick Afshartous commented on SPARK-25985:
-

[~smilegator] Can you please comment if this task is still relevant.  If so I'd 
like to look into it if you could please elaborate on "works well" in the 
description. 

> Verify the SPARK-24613 Cache with UDF could not be matched with subsequent 
> dependent caches
> ---
>
> Key: SPARK-25985
> URL: https://issues.apache.org/jira/browse/SPARK-25985
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>  Labels: starter
>
> Verify whether recacheByCondition works well when the cache data is with UDF. 
> This is a follow-up of https://github.com/apache/spark/pull/21602



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31735) Include all columns in the summary report

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31735:


Assignee: (was: Apache Spark)

> Include all columns in the summary report
> -
>
> Key: SPARK-31735
> URL: https://issues.apache.org/jira/browse/SPARK-31735
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.4.5
>Reporter: Fokko Driesprong
>Priority: Major
>
> Dates and other columns are excluded:
>  
> {{from datetime import datetime, timedelta, timezone}}
> {{from pyspark.sql import types as T}}
> {{from pyspark.sql import Row}}
> {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, 
> tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + 
> timedelta(days=n))) for n in range(0, n_days)]}}{{schema = 
> T.StructType([T.StructField(name="date", dataType=T.DateType(), 
> nullable=False)])}}
> {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = 
> spark.createDataFrame(data=rdd, schema=schema)}}
> {{df.agg(F.max("date")).show()}}{{df.summary().show()}}
> {{+---+}}
> {{|summary|}}
> {{+---+}}
> {{| count |}}
> {{| mean  |}}
> {{| stddev|}}
> {{| min   |}}
> {{| 25%   |}}
> {{| 50%   |}}
> {{| 75%   |}}
> {{| max   |}}
> {{+---+}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31735) Include all columns in the summary report

2020-05-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109148#comment-17109148
 ] 

Apache Spark commented on SPARK-31735:
--

User 'Fokko' has created a pull request for this issue:
https://github.com/apache/spark/pull/28554

> Include all columns in the summary report
> -
>
> Key: SPARK-31735
> URL: https://issues.apache.org/jira/browse/SPARK-31735
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.4.5
>Reporter: Fokko Driesprong
>Priority: Major
>
> Dates and other columns are excluded:
>  
> {{from datetime import datetime, timedelta, timezone}}
> {{from pyspark.sql import types as T}}
> {{from pyspark.sql import Row}}
> {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, 
> tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + 
> timedelta(days=n))) for n in range(0, n_days)]}}{{schema = 
> T.StructType([T.StructField(name="date", dataType=T.DateType(), 
> nullable=False)])}}
> {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = 
> spark.createDataFrame(data=rdd, schema=schema)}}
> {{df.agg(F.max("date")).show()}}{{df.summary().show()}}
> {{+---+}}
> {{|summary|}}
> {{+---+}}
> {{| count |}}
> {{| mean  |}}
> {{| stddev|}}
> {{| min   |}}
> {{| 25%   |}}
> {{| 50%   |}}
> {{| 75%   |}}
> {{| max   |}}
> {{+---+}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31735) Include all columns in the summary report

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31735:


Assignee: Apache Spark

> Include all columns in the summary report
> -
>
> Key: SPARK-31735
> URL: https://issues.apache.org/jira/browse/SPARK-31735
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.4.5
>Reporter: Fokko Driesprong
>Assignee: Apache Spark
>Priority: Major
>
> Dates and other columns are excluded:
>  
> {{from datetime import datetime, timedelta, timezone}}
> {{from pyspark.sql import types as T}}
> {{from pyspark.sql import Row}}
> {{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, 
> tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + 
> timedelta(days=n))) for n in range(0, n_days)]}}{{schema = 
> T.StructType([T.StructField(name="date", dataType=T.DateType(), 
> nullable=False)])}}
> {{rdd = spark.sparkContext.parallelize(date_range)}}{{df = 
> spark.createDataFrame(data=rdd, schema=schema)}}
> {{df.agg(F.max("date")).show()}}{{df.summary().show()}}
> {{+---+}}
> {{|summary|}}
> {{+---+}}
> {{| count |}}
> {{| mean  |}}
> {{| stddev|}}
> {{| min   |}}
> {{| 25%   |}}
> {{| 50%   |}}
> {{| 75%   |}}
> {{| max   |}}
> {{+---+}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31734) add weight support in ClusteringEvaluator

2020-05-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109149#comment-17109149
 ] 

Apache Spark commented on SPARK-31734:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/28553

> add weight support in ClusteringEvaluator
> -
>
> Key: SPARK-31734
> URL: https://issues.apache.org/jira/browse/SPARK-31734
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Major
>
> add weight support in ClusteringEvaluator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31734) add weight support in ClusteringEvaluator

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31734:


Assignee: Apache Spark

> add weight support in ClusteringEvaluator
> -
>
> Key: SPARK-31734
> URL: https://issues.apache.org/jira/browse/SPARK-31734
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Major
>
> add weight support in ClusteringEvaluator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31734) add weight support in ClusteringEvaluator

2020-05-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109147#comment-17109147
 ] 

Apache Spark commented on SPARK-31734:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/28553

> add weight support in ClusteringEvaluator
> -
>
> Key: SPARK-31734
> URL: https://issues.apache.org/jira/browse/SPARK-31734
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Major
>
> add weight support in ClusteringEvaluator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31734) add weight support in ClusteringEvaluator

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31734:


Assignee: (was: Apache Spark)

> add weight support in ClusteringEvaluator
> -
>
> Key: SPARK-31734
> URL: https://issues.apache.org/jira/browse/SPARK-31734
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Major
>
> add weight support in ClusteringEvaluator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31735) Include all columns in the summary report

2020-05-16 Thread Fokko Driesprong (Jira)
Fokko Driesprong created SPARK-31735:


 Summary: Include all columns in the summary report
 Key: SPARK-31735
 URL: https://issues.apache.org/jira/browse/SPARK-31735
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 2.4.5
Reporter: Fokko Driesprong


Dates and other columns are excluded:

 

{{from datetime import datetime, timedelta, timezone}}
{{from pyspark.sql import types as T}}
{{from pyspark.sql import Row}}
{{from pyspark.sql import functions as F}}{{START = datetime(2014, 1, 1, 
tzinfo=timezone.utc)}}{{n_days = 22}}{{date_range = [Row(date=(START + 
timedelta(days=n))) for n in range(0, n_days)]}}{{schema = 
T.StructType([T.StructField(name="date", dataType=T.DateType(), 
nullable=False)])}}
{{rdd = spark.sparkContext.parallelize(date_range)}}{{df = 
spark.createDataFrame(data=rdd, schema=schema)}}
{{df.agg(F.max("date")).show()}}{{df.summary().show()}}
{{+---+}}
{{|summary|}}
{{+---+}}
{{| count |}}
{{| mean  |}}
{{| stddev|}}
{{| min   |}}
{{| 25%   |}}
{{| 50%   |}}
{{| 75%   |}}
{{| max   |}}
{{+---+}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31734) add weight support in ClusteringEvaluator

2020-05-16 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-31734:
--

 Summary: add weight support in ClusteringEvaluator
 Key: SPARK-31734
 URL: https://issues.apache.org/jira/browse/SPARK-31734
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 3.1.0
Reporter: Huaxin Gao


add weight support in ClusteringEvaluator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31235) Separates different categories of applications

2020-05-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109088#comment-17109088
 ] 

Apache Spark commented on SPARK-31235:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28552

> Separates different categories of applications
> --
>
> Key: SPARK-31235
> URL: https://issues.apache.org/jira/browse/SPARK-31235
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: wangzhun
>Assignee: wangzhun
>Priority: Minor
> Fix For: 3.1.0
>
>
> The current application defaults to the SPARK type. 
> In fact, different types of applications have different characteristics and 
> are suitable for different scenarios.For example: SPAKR-SQL, SPARK-STREAMING.
> I recommend distinguishing them by the parameter `spark.yarn.applicationType` 
> so that we can more easily manage and maintain different types of 
> applications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31733) Make YarnClient.`specify a more specific type for the application` pass in Hadoop-3.2

2020-05-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-31733:
-

 Summary: Make YarnClient.`specify a more specific type for the 
application` pass in Hadoop-3.2
 Key: SPARK-31733
 URL: https://issues.apache.org/jira/browse/SPARK-31733
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31732) disable some flaky test

2020-05-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31732.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28547
[https://github.com/apache/spark/pull/28547]

> disable some flaky test
> ---
>
> Key: SPARK-31732
> URL: https://issues.apache.org/jira/browse/SPARK-31732
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>
> https://issues.apache.org/jira/browse/SPARK-31722
> https://issues.apache.org/jira/browse/SPARK-31723
> https://issues.apache.org/jira/browse/SPARK-31729
> https://issues.apache.org/jira/browse/SPARK-31728
> https://issues.apache.org/jira/browse/SPARK-31730
> https://issues.apache.org/jira/browse/SPARK-31731



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31289) Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31289.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28055
[https://github.com/apache/spark/pull/28055]

> Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite
> ---
>
> Key: SPARK-31289
> URL: https://issues.apache.org/jira/browse/SPARK-31289
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> {code:java}
> Caused by: MetaException(message:Unable to open a test connection to the 
> given database. JDBC url = 
> jdbc:derby:;databaseName=/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb;create=true,
>  username = APP. Terminating connection pool (set lazyInit to true if you 
> expect to start your database after your app). Original Exception: --
> 2020-03-27 09:20:11.949 - stderr> java.sql.SQLException: Failed to create 
> database 
> '/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb',
>  see the next exception for details.
> {code}
> {code:java}
> Caused by: ERROR XBM0A: The database directory 
> '/home/jenkins/workspace/SparkPullRequestBuilder@4/target/tmp/spark-84c8ff0e-214f-416c-9d44-ab19f864a79b'
>  exists. However, it does not contain the expected 'service.properties' file. 
> Perhaps Derby was brought down in the middle of creating this database. You 
> may want to delete this directory and try creating the database again.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31289) Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31289:
---

Assignee: Kent Yao

> Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite
> ---
>
> Key: SPARK-31289
> URL: https://issues.apache.org/jira/browse/SPARK-31289
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> {code:java}
> Caused by: MetaException(message:Unable to open a test connection to the 
> given database. JDBC url = 
> jdbc:derby:;databaseName=/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb;create=true,
>  username = APP. Terminating connection pool (set lazyInit to true if you 
> expect to start your database after your app). Original Exception: --
> 2020-03-27 09:20:11.949 - stderr> java.sql.SQLException: Failed to create 
> database 
> '/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-17bf6d71-1e68-4e56-b656-f1b2fd2e15fb',
>  see the next exception for details.
> {code}
> {code:java}
> Caused by: ERROR XBM0A: The database directory 
> '/home/jenkins/workspace/SparkPullRequestBuilder@4/target/tmp/spark-84c8ff0e-214f-416c-9d44-ab19f864a79b'
>  exists. However, it does not contain the expected 'service.properties' file. 
> Perhaps Derby was brought down in the middle of creating this database. You 
> may want to delete this directory and try creating the database again.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31732) disable some flaky test

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31732:


Assignee: Apache Spark  (was: Wenchen Fan)

> disable some flaky test
> ---
>
> Key: SPARK-31732
> URL: https://issues.apache.org/jira/browse/SPARK-31732
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31722
> https://issues.apache.org/jira/browse/SPARK-31723
> https://issues.apache.org/jira/browse/SPARK-31729
> https://issues.apache.org/jira/browse/SPARK-31728
> https://issues.apache.org/jira/browse/SPARK-31730
> https://issues.apache.org/jira/browse/SPARK-31731



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31732) disable some flaky test

2020-05-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31732:


Assignee: Wenchen Fan  (was: Apache Spark)

> disable some flaky test
> ---
>
> Key: SPARK-31732
> URL: https://issues.apache.org/jira/browse/SPARK-31732
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31722
> https://issues.apache.org/jira/browse/SPARK-31723
> https://issues.apache.org/jira/browse/SPARK-31729
> https://issues.apache.org/jira/browse/SPARK-31728
> https://issues.apache.org/jira/browse/SPARK-31730
> https://issues.apache.org/jira/browse/SPARK-31731



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31732) disable some flaky test

2020-05-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108882#comment-17108882
 ] 

Apache Spark commented on SPARK-31732:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/28547

> disable some flaky test
> ---
>
> Key: SPARK-31732
> URL: https://issues.apache.org/jira/browse/SPARK-31732
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31722
> https://issues.apache.org/jira/browse/SPARK-31723
> https://issues.apache.org/jira/browse/SPARK-31729
> https://issues.apache.org/jira/browse/SPARK-31728
> https://issues.apache.org/jira/browse/SPARK-31730
> https://issues.apache.org/jira/browse/SPARK-31731



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31732) disable some flaky test

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31732:

Description: 
https://issues.apache.org/jira/browse/SPARK-31722
https://issues.apache.org/jira/browse/SPARK-31723
https://issues.apache.org/jira/browse/SPARK-31729
https://issues.apache.org/jira/browse/SPARK-31728
https://issues.apache.org/jira/browse/SPARK-31730
https://issues.apache.org/jira/browse/SPARK-31731

  was:
https://issues.apache.org/jira/browse/SPARK-31722
https://issues.apache.org/jira/browse/SPARK-31723
https://issues.apache.org/jira/browse/SPARK-31729
https://issues.apache.org/jira/browse/SPARK-31728
https://issues.apache.org/jira/browse/SPARK-31730


> disable some flaky test
> ---
>
> Key: SPARK-31732
> URL: https://issues.apache.org/jira/browse/SPARK-31732
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31722
> https://issues.apache.org/jira/browse/SPARK-31723
> https://issues.apache.org/jira/browse/SPARK-31729
> https://issues.apache.org/jira/browse/SPARK-31728
> https://issues.apache.org/jira/browse/SPARK-31730
> https://issues.apache.org/jira/browse/SPARK-31731



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31732) disable some flaky test

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31732:

Description: 
https://issues.apache.org/jira/browse/SPARK-31722
https://issues.apache.org/jira/browse/SPARK-31723
https://issues.apache.org/jira/browse/SPARK-31729
https://issues.apache.org/jira/browse/SPARK-31728
https://issues.apache.org/jira/browse/SPARK-31730

  was:

https://issues.apache.org/jira/browse/SPARK-31728


> disable some flaky test
> ---
>
> Key: SPARK-31732
> URL: https://issues.apache.org/jira/browse/SPARK-31732
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31722
> https://issues.apache.org/jira/browse/SPARK-31723
> https://issues.apache.org/jira/browse/SPARK-31729
> https://issues.apache.org/jira/browse/SPARK-31728
> https://issues.apache.org/jira/browse/SPARK-31730



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31732) disable some flaky test

2020-05-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31732:

Description: 

https://issues.apache.org/jira/browse/SPARK-31728

> disable some flaky test
> ---
>
> Key: SPARK-31732
> URL: https://issues.apache.org/jira/browse/SPARK-31732
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31731) flaky test: org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceSuite

2020-05-16 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31731:
---

 Summary: flaky test: 
org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceSuite
 Key: SPARK-31731
 URL: https://issues.apache.org/jira/browse/SPARK-31731
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.0.0
Reporter: Wenchen Fan


https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/668/testReport/

KafkaMicroBatchV1SourceSuite.subscribing topic by pattern with topic deletions
{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
Timed out waiting for stream: The code passed to eventually never returned 
normally. Attempted 304 times over 1.000842521668 minutes. Last failure 
message: KafkaTestUtils.this.zkClient.isTopicMarkedForDeletion(topic) was true 
topic is still marked for deletion.
org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)
org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:479)

org.apache.spark.sql.kafka010.KafkaTestUtils.verifyTopicDeletionWithRetries(KafkaTestUtils.scala:618)

org.apache.spark.sql.kafka010.KafkaTestUtils.deleteTopic(KafkaTestUtils.scala:410)

org.apache.spark.sql.kafka010.KafkaMicroBatchSourceSuiteBase.$anonfun$new$20(KafkaMicroBatchSourceSuite.scala:379)

Caused by:  
KafkaTestUtils.this.zkClient.isTopicMarkedForDeletion(topic) was true topic is 
still marked for deletion

org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)

org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)

org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)

org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)

org.apache.spark.sql.kafka010.KafkaTestUtils.verifyTopicDeletion(KafkaTestUtils.scala:590)

org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$verifyTopicDeletionWithRetries$1(KafkaTestUtils.scala:620)

scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395)

org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409)

org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)


== Progress ==
   AssertOnQuery(, )
   AddKafkaData(topics = Set(topic-31-seems), data = WrappedArray(1, 2, 3), 
message = )
   CheckAnswer: [2],[3],[4]
=> Assert(, )
   AddKafkaData(topics = Set(topic-31-bad), data = WrappedArray(4, 5, 6), 
message = )
   CheckAnswer: [2],[3],[4],[5],[6],[7]

== Stream ==
Output Mode: Append
Stream state: {KafkaSourceV1[SubscribePattern[topic-31-.*]]: {}}
Thread state: alive
Thread stack trace: java.lang.Thread.sleep(Native Method)
org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:241)
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$Lambda$2829/1543669599.apply$mcZ$sp(Unknown
 Source)
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57)
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:185)
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:333)
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:244)


== Sink ==
0: 
1: [2]
2: [4] [3]
3: 


== Plan ==
== Parsed Logical Plan ==
WriteToDataSourceV2 
org.apache.spark.sql.execution.streaming.sources.MicroBatchWrite@2f31f781
+- SerializeFromObject [input[0, int, false] AS value#8108]
   +- MapElements 
org.apache.spark.sql.kafka010.KafkaMicroBatchSourceSuiteBase$$Lambda$5466/109510938@420a5093,
 class scala.Tuple2, [StructField(_1,StringType,true), 
StructField(_2,StringType,true)], obj#8107: int
  +- DeserializeToObject newInstance(class scala.Tuple2), obj#8106: 
scala.Tuple2
 +- Project [cast(key#8082 as string) AS key#8096, cast(value#8083 as 
string) AS value#8097]
+- Project [key#8183 AS key#8082, value#8184 AS value#8083, 
topic#8185 AS topic#8084, partition#8186 AS partition#8085, offset#8187L AS 
offset#8086L, timestamp#8188 AS timestamp#8087, timestampType#8189 AS 
timestampType#8088]
   +- 

[jira] [Created] (SPARK-31732) disable some flaky test

2020-05-16 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31732:
---

 Summary: disable some flaky test
 Key: SPARK-31732
 URL: https://issues.apache.org/jira/browse/SPARK-31732
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31730) flaky test: org.apache.spark.scheduler.BarrierTaskContextSuite

2020-05-16 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-31730:
---

 Summary: flaky test: 
org.apache.spark.scheduler.BarrierTaskContextSuite
 Key: SPARK-31730
 URL: https://issues.apache.org/jira/browse/SPARK-31730
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.0.0
Reporter: Wenchen Fan


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122655/testReport/
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/668/testReport/

BarrierTaskContextSuite.support multiple barrier() call within a single task
{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1031 was 
not less than or equal to 1000
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
at 
org.apache.spark.scheduler.BarrierTaskContextSuite.$anonfun$new$15(BarrierTaskContextSuite.scala:157)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite.run(Suite.scala:1124)
at org.scalatest.Suite.run$(Suite.scala:1106)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

BarrierTaskContextSuite.global sync by barrier() call
{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1049 was 
not less than or equal to 1000
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at