[jira] [Updated] (SPARK-30531) Duplicate query plan on Spark UI SQL page

2020-01-20 Thread Enrico Minack (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Minack updated SPARK-30531:
--
Affects Version/s: 2.4.4

> Duplicate query plan on Spark UI SQL page
> -
>
> Key: SPARK-30531
> URL: https://issues.apache.org/jira/browse/SPARK-30531
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Enrico Minack
>Priority: Major
>
> When you save a Spark UI SQL query page to disk and then display the html 
> file with your browser, the query plan will be rendered a second time. This 
> change avoids rendering the plan visualization when it exists already.
>  
> !https://user-images.githubusercontent.com/44700269/72543429-fcb8d980-3885-11ea-82aa-c0b3638847e5.png!
> The fix does not call {{renderPlanViz()}} when the plan exists already:
> !https://user-images.githubusercontent.com/44700269/72543641-57523580-3886-11ea-8cdf-5fb0cdffa983.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30593) Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and no round trip.

2020-01-20 Thread Kent Yao (Jira)
Kent Yao created SPARK-30593:


 Summary: Revert interval ISO/ANSI SQL Standard output since we 
decide not to follow ANSI and no round trip.
 Key: SPARK-30593
 URL: https://issues.apache.org/jira/browse/SPARK-30593
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


Revert interval ISO/ANSI SQL Standard output since we decide not to
follow ANSI, so there is no round trip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30021) set ownerName and owner type as reserved properties to support v2 catalog

2020-01-20 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-30021.
--
Resolution: Won't Fix

> set ownerName and owner type as reserved properties to support v2 catalog
> -
>
> Key: SPARK-30021
> URL: https://issues.apache.org/jira/browse/SPARK-30021
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> ownerName and ownerType properties should be reversed for secure and can be 
> only modified by set owner syntax



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30592) to_csv should not support output intervals as same as using CSV file format

2020-01-20 Thread Kent Yao (Jira)
Kent Yao created SPARK-30592:


 Summary: to_csv should not support output intervals as same as 
using CSV file format
 Key: SPARK-30592
 URL: https://issues.apache.org/jira/browse/SPARK-30592
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


to_csv should not support output intervals as same as using CSV file format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30587) Run test suites for CSV v1 and JSON v1

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30587:
---

Assignee: Maxim Gekk

> Run test suites for CSV v1 and JSON v1
> --
>
> Key: SPARK-30587
> URL: https://issues.apache.org/jira/browse/SPARK-30587
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, CSVSuite and JSONSuite test only CSV and JSON version 2 but 
> version 1 which is still supported, and can be enabled by users. This ticket 
> aims to add CSVv1Suite and JsonV1Suite similar to AvroSuite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30587) Run test suites for CSV v1 and JSON v1

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30587.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27294
[https://github.com/apache/spark/pull/27294]

> Run test suites for CSV v1 and JSON v1
> --
>
> Key: SPARK-30587
> URL: https://issues.apache.org/jira/browse/SPARK-30587
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, CSVSuite and JSONSuite test only CSV and JSON version 2 but 
> version 1 which is still supported, and can be enabled by users. This ticket 
> aims to add CSVv1Suite and JsonV1Suite similar to AvroSuite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30475) File source V2: Push data filters for file listing

2020-01-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-30475.

  Assignee: Guy Khazma
Resolution: Fixed

This issue is resolved by https://github.com/apache/spark/pull/27157

> File source V2: Push data filters for file listing
> --
>
> Key: SPARK-30475
> URL: https://issues.apache.org/jira/browse/SPARK-30475
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Guy Khazma
>Assignee: Guy Khazma
>Priority: Major
> Fix For: 3.0.0
>
>
> Follow up on [SPARK-30428|https://github.com/apache/spark/pull/27112] which 
> added support for partition pruning in File source V2.
>  We should also pass the {{dataFilters}} to the {{listFiles method.}}
> Datasources such as {{csv}} and {{json}} do not implement the 
> {{SupportsPushDownFilters}} trait. In order to support data skipping 
> uniformly for all file based data sources, one can override the {{listFiles}} 
> method in a {{FileIndex}} implementation and use the {{dataFilters}} and 
> partitionFilters to consult external metadata and prunes the list of files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30568) Invalidate interval type as a field table schema

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30568:
---

Assignee: Kent Yao

> Invalidate interval type as a field table schema
> 
>
> Key: SPARK-30568
> URL: https://issues.apache.org/jira/browse/SPARK-30568
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> After this commit 
> https://github.com/apache/spark/commit/d67b98ea016e9b714bef68feaac108edd08159c9,
>  we are able to create table or alter table with interval column types if the 
> external catalog accepts which is varying the interval type's purpose for 
> internal usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30568) Invalidate interval type as a field table schema

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30568.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27277
[https://github.com/apache/spark/pull/27277]

> Invalidate interval type as a field table schema
> 
>
> Key: SPARK-30568
> URL: https://issues.apache.org/jira/browse/SPARK-30568
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> After this commit 
> https://github.com/apache/spark/commit/d67b98ea016e9b714bef68feaac108edd08159c9,
>  we are able to create table or alter table with interval column types if the 
> external catalog accepts which is varying the interval type's purpose for 
> internal usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30591) Remove the nonstandard SET OWNER syntax for namespaces

2020-01-20 Thread Kent Yao (Jira)
Kent Yao created SPARK-30591:


 Summary: Remove the nonstandard SET OWNER syntax for namespaces
 Key: SPARK-30591
 URL: https://issues.apache.org/jira/browse/SPARK-30591
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


the current implementation of ALTER DATABASE SET OWNER syntax is not ANSI 
standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30019) Support ALTER TABLE SET OWNER syntax

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30019:
---

Assignee: Kent Yao

> Support ALTER TABLE SET OWNER syntax
> 
>
> Key: SPARK-30019
> URL: https://issues.apache.org/jira/browse/SPARK-30019
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> [https://jira.apache.org/jira/browse/HIVE-18762]
> To support syntax like below to support change the ownership of a table
>  
> {code:java}
> alter table tb1 set owner user user1;
> alter table tb1 set owner role role1;
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30019) Support ALTER TABLE SET OWNER syntax

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30019.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27249
[https://github.com/apache/spark/pull/27249]

> Support ALTER TABLE SET OWNER syntax
> 
>
> Key: SPARK-30019
> URL: https://issues.apache.org/jira/browse/SPARK-30019
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> [https://jira.apache.org/jira/browse/HIVE-18762]
> To support syntax like below to support change the ownership of a table
>  
> {code:java}
> alter table tb1 set owner user user1;
> alter table tb1 set owner role role1;
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30590) can't use more than five type-safe user-defined aggregation in select statement

2020-01-20 Thread Daniel Mantovani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Mantovani updated SPARK-30590:
-
Description: 
 How to reproduce:
{code:scala}
val df = Seq((1,2,3,4,5,6)).toDF("a","b","c","d","e","f")

import org.apache.spark.sql.expressions.Aggregator
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoders
import org.apache.spark.sql.Row

case class FooAgg(s:Int) extends Aggregator[Row, Int, Int] {
  def zero:Int = s
  def reduce(b: Int, r: Row): Int = b + r.getAs[Int](0)
  def merge(b1: Int, b2: Int): Int = b1 + b2
  def finish(b: Int): Int = b
  def bufferEncoder: Encoder[Int] = Encoders.scalaInt
  def outputEncoder: Encoder[Int] = Encoders.scalaInt
}

val fooAgg = (i:Int) => FooAgg(i).toColumn.name(s"foo_agg_$i")

scala> df.select(fooAgg(1),fooAgg(2),fooAgg(3),fooAgg(4),fooAgg(5)).show
+-+-+-+-+-+
|foo_agg_1|foo_agg_2|foo_agg_3|foo_agg_4|foo_agg_5|
+-+-+-+-+-+
|3|5|7|9|   11|
+-+-+-+-+-+

{code}
With 6 arguments we have error:
{code:scala}
scala> 
df.select(fooAgg(1),fooAgg(2),fooAgg(3),fooAgg(4),fooAgg(5),fooAgg(6)).show

org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
[fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, 
assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, 
IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, 
None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as 
int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS 
foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS 
value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS 
value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), 
None, None, None, input[0, int, false] AS value#129, 
assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, 
IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, 
None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as 
int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS 
foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS 
value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS 
value#138, IntegerType, IntegerType, false) AS foo_agg_6#141];;
'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS 
value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS 
value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), 
None, None, None, input[0, int, false] AS value#119, 
assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, 
IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, 
None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as 
int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS 
foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS 
value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS 
value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), 
None, None, None, input[0, int, false] AS value#134, 
assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, 
IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, 
None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as 
int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS 
foo_agg_6#141]
+- Project [_1#6 AS a#13, _2#7 AS b#14, _3#8 AS c#15, _4#9 AS d#16, _5#10 AS 
e#17, _6#11 AS F#18]
 +- LocalRelation [_1#6, _2#7, _3#8, _4#9, _5#10, _6#11]

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:431)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:430)
 at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:430)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
 at 
org.apache.spark.sql.catalyst.analysis.

[jira] [Created] (SPARK-30590) can't use more than five type-safe user-defined aggregation in select statement

2020-01-20 Thread Daniel Mantovani (Jira)
Daniel Mantovani created SPARK-30590:


 Summary: can't use more than five type-safe user-defined 
aggregation in select statement
 Key: SPARK-30590
 URL: https://issues.apache.org/jira/browse/SPARK-30590
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4, 2.3.4, 2.2.3
Reporter: Daniel Mantovani


 How to reproduce:
{code:scala}
val df = Seq((1,2,3,4,5,6)).toDF("a","b","c","d","e","f")

import org.apache.spark.sql.expressions.Aggregator
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoders

import org.apache.spark.sql.Rowcase class FooAgg(s:Int) extends Aggregator[Row, 
Int, Int] {
  def zero:Int = s
  def reduce(b: Int, r: Row): Int = b + r.getAs[Int](0)
  def merge(b1: Int, b2: Int): Int = b1 + b2
  def finish(b: Int): Int = b
  def bufferEncoder: Encoder[Int] = Encoders.scalaInt
  def outputEncoder: Encoder[Int] = Encoders.scalaInt
}

val fooAgg = (i:Int) => FooAgg(i).toColumn.name(s"foo_agg_$i")

scala> df.select(fooAgg(1),fooAgg(2),fooAgg(3),fooAgg(4),fooAgg(5)).show
+-+-+-+-+-+
|foo_agg_1|foo_agg_2|foo_agg_3|foo_agg_4|foo_agg_5|
+-+-+-+-+-+
|3|5|7|9|   11|
+-+-+-+-+-+

{code}
With 6 arguments we have error:
{code:scala}
scala> 
df.select(fooAgg(1),fooAgg(2),fooAgg(3),fooAgg(4),fooAgg(5),fooAgg(6)).show

org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
[fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, 
assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, 
IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, 
None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as 
int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS 
foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS 
value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS 
value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), 
None, None, None, input[0, int, false] AS value#129, 
assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, 
IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, 
None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as 
int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS 
foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS 
value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS 
value#138, IntegerType, IntegerType, false) AS foo_agg_6#141];;
'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS 
value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS 
value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), 
None, None, None, input[0, int, false] AS value#119, 
assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, 
IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, 
None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as 
int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS 
foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS 
value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS 
value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), 
None, None, None, input[0, int, false] AS value#134, 
assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, 
IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, 
None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as 
int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS 
foo_agg_6#141]
+- Project [_1#6 AS a#13, _2#7 AS b#14, _3#8 AS c#15, _4#9 AS d#16, _5#10 AS 
e#17, _6#11 AS F#18]
 +- LocalRelation [_1#6, _2#7, _3#8, _4#9, _5#10, _6#11]

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:431)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:430)
 at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:430)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108)
 at 
org.apache.spark

[jira] [Commented] (SPARK-24915) Calling SparkSession.createDataFrame with schema can throw exception

2020-01-20 Thread Bryan Cutler (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019770#comment-17019770
 ] 

Bryan Cutler commented on SPARK-24915:
--

[~jhereth] since there is already a lot of discussion on that PR I would leave 
it open until there is a conclusion on patching 2.4 or not. If so, then you 
could rebase or open a new PR against branch-2.4.

> Calling SparkSession.createDataFrame with schema can throw exception
> 
>
> Key: SPARK-24915
> URL: https://issues.apache.org/jira/browse/SPARK-24915
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
> Environment: Python 3.6.3
> PySpark 2.3.1 (installed via pip)
> OSX 10.12.6
>Reporter: Stephen Spencer
>Priority: Major
>
> There seems to be a bug in PySpark when using the PySparkSQL session to 
> create a dataframe with a pre-defined schema.
> Code to reproduce the error:
> {code:java}
> from pyspark import SparkConf, SparkContext
> from pyspark.sql import SparkSession
> from pyspark.sql.types import StructType, StructField, StringType, Row
> conf = SparkConf().setMaster("local").setAppName("repro") 
> context = SparkContext(conf=conf) 
> session = SparkSession(context)
> # Construct schema (the order of fields is important)
> schema = StructType([
> StructField('field2', StructType([StructField('sub_field', StringType(), 
> False)]), False),
> StructField('field1', StringType(), False),
> ])
> # Create data to populate data frame
> data = [
> Row(field1="Hello", field2=Row(sub_field='world'))
> ]
> # Attempt to create the data frame supplying the schema
> # this will throw a ValueError
> df = session.createDataFrame(data, schema=schema)
> df.show(){code}
> Running this throws a ValueError
> {noformat}
> Traceback (most recent call last):
> File "schema_bug.py", line 18, in 
> df = session.createDataFrame(data, schema=schema)
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/session.py",
>  line 691, in createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/session.py",
>  line 423, in _createFromLocal
> data = [schema.toInternal(row) for row in data]
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/session.py",
>  line 423, in 
> data = [schema.toInternal(row) for row in data]
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 601, in toInternal
> for f, v, c in zip(self.fields, obj, self._needConversion))
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 601, in 
> for f, v, c in zip(self.fields, obj, self._needConversion))
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 439, in toInternal
> return self.dataType.toInternal(obj)
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 619, in toInternal
> raise ValueError("Unexpected tuple %r with StructType" % obj)
> ValueError: Unexpected tuple 'Hello' with StructType{noformat}
> The problem seems to be here:
> https://github.com/apache/spark/blob/3d5c61e5fd24f07302e39b5d61294da79aa0c2f9/python/pyspark/sql/types.py#L603
> specifically the bit
> {code:java}
> zip(self.fields, obj, self._needConversion)
> {code}
> This zip statement seems to assume that obj and self.fields are ordered in 
> the same way, so that the elements of obj will correspond to the right fields 
> in the schema. However this is not true, a Row orders its elements 
> alphabetically but the fields in the schema are in whatever order they are 
> specified. In this example field2 is being initialised with the field1 
> element 'Hello'. If you re-order the fields in the schema to go (field1, 
> field2), the given example works without error.
> The schema in the repro is specifically designed to elicit the problem, the 
> fields are out of alphabetical order and one field is a StructType, making 
> chema._needSerializeAnyField==True . However we encountered this in real use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30589) Document DISTRIBUTE BY Clause of SELECT statement in SQL Reference.

2020-01-20 Thread Dilip Biswal (Jira)
Dilip Biswal created SPARK-30589:


 Summary: Document DISTRIBUTE BY Clause of SELECT statement in SQL 
Reference.
 Key: SPARK-30589
 URL: https://issues.apache.org/jira/browse/SPARK-30589
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, SQL
Affects Versions: 2.4.3
Reporter: Dilip Biswal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30588) Document CLUSTER BY Clause of SELECT statement in SQL Reference.

2020-01-20 Thread Dilip Biswal (Jira)
Dilip Biswal created SPARK-30588:


 Summary: Document CLUSTER BY Clause of SELECT statement in SQL 
Reference.
 Key: SPARK-30588
 URL: https://issues.apache.org/jira/browse/SPARK-30588
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, SQL
Affects Versions: 2.4.3
Reporter: Dilip Biswal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30584) Non-synchronized methods override Synchronized methods

2020-01-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-30584.
--
Resolution: Not A Problem

> Non-synchronized methods override Synchronized methods
> --
>
> Key: SPARK-30584
> URL: https://issues.apache.org/jira/browse/SPARK-30584
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Aman Omer
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30587) Run test suites for CSV v1 and JSON v1

2020-01-20 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30587:
--

 Summary: Run test suites for CSV v1 and JSON v1
 Key: SPARK-30587
 URL: https://issues.apache.org/jira/browse/SPARK-30587
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Currently, CSVSuite and JSONSuite test only CSV and JSON version 2 but version 
1 which is still supported, and can be enabled by users. This ticket aims to 
add CSVv1Suite and JsonV1Suite similar to AvroSuite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30578) Explicitly set conf to use datasource v2 for v2.3/OrcFilterSuite

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30578:
---

Assignee: wuyi

> Explicitly set conf to use datasource v2 for v2.3/OrcFilterSuite
> 
>
> Key: SPARK-30578
> URL: https://issues.apache.org/jira/browse/SPARK-30578
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> v2.3/OrcFilterSuite tests intentionally for DSv2 OrcTable, we should 
> explicitly set conf to use DSv2 for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30578) Explicitly set conf to use datasource v2 for v2.3/OrcFilterSuite

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30578.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27285
[https://github.com/apache/spark/pull/27285]

> Explicitly set conf to use datasource v2 for v2.3/OrcFilterSuite
> 
>
> Key: SPARK-30578
> URL: https://issues.apache.org/jira/browse/SPARK-30578
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.0.0
>
>
> v2.3/OrcFilterSuite tests intentionally for DSv2 OrcTable, we should 
> explicitly set conf to use DSv2 for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30535) Migrate ALTER TABLE commands to the new resolution framework

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30535.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27243
[https://github.com/apache/spark/pull/27243]

> Migrate ALTER TABLE commands to the new resolution framework
> 
>
> Key: SPARK-30535
> URL: https://issues.apache.org/jira/browse/SPARK-30535
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> Migrate ALTER TABLE commands to the new resolution framework introduced in 
> SPARK-30214



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30535) Migrate ALTER TABLE commands to the new resolution framework

2020-01-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30535:
---

Assignee: Terry Kim

> Migrate ALTER TABLE commands to the new resolution framework
> 
>
> Key: SPARK-30535
> URL: https://issues.apache.org/jira/browse/SPARK-30535
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> Migrate ALTER TABLE commands to the new resolution framework introduced in 
> SPARK-30214



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30586) NPE in LiveRDDDistribution (AppStatusListener)

2020-01-20 Thread Jan Van den bosch (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Van den bosch updated SPARK-30586:
--
Description: 
We've been noticing a great amount of NullPointerExceptions in our long-running 
Spark job driver logs:
{noformat}
20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an 
exception
java.lang.NullPointerException
at 
org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
at 
org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
at 
org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
at 
org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
at 
org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
at 
org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
at 
org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
at 
org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
at 
org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
at 
org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
at 
org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
at 
org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
at 
org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{noformat}
Symptoms of a Spark app that made us investigate the logs in the first place 
include:
 * slower execution of submitted jobs
 * jobs remaining "Active Jobs" in the Spark UI even though they should have 
completed days ago
 * these jobs could not be killed from the Spark UI (the page refreshes but the 
jobs rem

[jira] [Created] (SPARK-30586) NPE in LiveRDDDistribution (AppStatusListener)

2020-01-20 Thread Jan Van den bosch (Jira)
Jan Van den bosch created SPARK-30586:
-

 Summary: NPE in LiveRDDDistribution (AppStatusListener)
 Key: SPARK-30586
 URL: https://issues.apache.org/jira/browse/SPARK-30586
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
 Environment: A Hadoop cluster consisting of Centos 7.4 machines.
Reporter: Jan Van den bosch


We've been noticing a great amount of NullPointerExceptions in our long-running 
Spark job driver logs:

{noformat}
20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an 
exception
java.lang.NullPointerException
at 
org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
at 
org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
at 
org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
at 
org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
at 
org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
at 
org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
at 
org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
at 
org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
at 
org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
at 
org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
at 
org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
at 
org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
at 
org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{noformat}

Symptoms of a Spark app that made us investigat

[jira] [Comment Edited] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-01-20 Thread Sivakumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017820#comment-17017820
 ] 

Sivakumar edited comment on SPARK-30542 at 1/20/20 12:36 PM:
-

Hi Jungtaek,

I think this might be a feature that should be added to structured streaming.

Earlier with Spark Dstreams two jobs can have a same base path.

But with Spark structured streaming I don't have that flexibility. I guess this 
should be a feature that structured streaming should support.

Also Please lemme know If you have any work around for this.


was (Author: sparksiva):
Hi Jungtaek,

I thought this might be a feature that should be added to structured streaming. 

Earlier with Spark Dstreams two jobs can have a same base path.

But with Spark structured streaming I don't have that flexibility. I guess this 
should be a feature that structured streaming should support.

Also Please lemme know If you have any work around for this.

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30585) scalatest fails for Apache Spark SQL project

2020-01-20 Thread Rashmi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019425#comment-17019425
 ] 

Rashmi commented on SPARK-30585:


- cte.sql
- datetime.sql
- describe-table-column.sql
03:48:33.567 WARN org.apache.spark.sql.execution.command.DropTableCommand: 
org.apache.spark.sql.AnalysisException: Table or view not found: default.t; 
line 1 pos 14
org.apache.spark.sql.AnalysisException: Table or view not found: default.t; 
line 1 pos 14
 at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:733)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:685)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:715)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:708)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$1.apply(AnalysisHelper.scala:87)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:87)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:708)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:654)
 at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
 at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
 at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
 at scala.collection.immutable.List.foldLeft(List.scala:84)
 at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
 at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$35.apply(Analyzer.scala:699)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$35.apply(Analyzer.scala:692)
 at 
org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withAnalysisContext(Analyzer.scala:87)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:692)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:703)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:715)
 at 
org.apache.spark.

[jira] [Resolved] (SPARK-30470) Uncache table in tempViews if needed on session closed

2020-01-20 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-30470.
--
Resolution: Duplicate

> Uncache table in tempViews if needed on session closed
> --
>
> Key: SPARK-30470
> URL: https://issues.apache.org/jira/browse/SPARK-30470
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: liupengcheng
>Priority: Major
>
> Currently, Spark will not cleanup cached tables in tempViews produced by sql 
> like following
> `CACHE TABLE table1 as SELECT `
> There are risks that the `uncache table` not called due to session closed 
> unexpectedly, or user closed manually. Then these temp views will lost, and 
> we can not visit them in other session, but the cached plan still exists in 
> the `CacheManager`.
> Moreover, the leaks may cause the failure of the subsequent query, one 
> failure we encoutered in our production environment is as below:
> {code:java}
> Caused by: java.io.FileNotFoundException: File does not exist: 
> /user//xx/data__db60e76d_91b8_42f3_909d_5c68692ecdd4Caused by: 
> java.io.FileNotFoundException: File does not exist: 
> /user//xx/data__db60e76d_91b8_42f3_909d_5c68692ecdd4It is possible the 
> underlying files have been updated. You can explicitly invalidate the cache 
> in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating 
> the Dataset/DataFrame involved. at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:131)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.scan_nextBatch_0$(Unknown
>  Source) at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.processNext(Unknown
>  Source) at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) at 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> {code}
> The above exception happens when user update the data of the table, but spark 
> still use the old cached plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30585) scalatest fails for Apache Spark SQL project

2020-01-20 Thread Rashmi (Jira)
Rashmi created SPARK-30585:
--

 Summary: scalatest fails for Apache Spark SQL project
 Key: SPARK-30585
 URL: https://issues.apache.org/jira/browse/SPARK-30585
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.0
Reporter: Rashmi


Error logs:-

23:36:49.039 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 3.0 (TID 6, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:49.039 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 3.0 (TID 7, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:51.354 WARN 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor: Current batch 
is falling behind. The trigger interval is 100 milliseconds, but spent 1854 
milliseconds
23:36:51.381 WARN 
org.apache.spark.sql.execution.streaming.continuous.ContinuousQueuedDataReader$DataReaderThread:
 data reader thread failed
org.apache.spark.SparkException: Exception thrown in awaitResult:
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
 at 
org.apache.spark.sql.execution.streaming.sources.ContinuousMemoryStreamInputPartitionReader.getRecord(ContinuousMemoryStream.scala:195)
 at 
org.apache.spark.sql.execution.streaming.sources.ContinuousMemoryStreamInputPartitionReader.next(ContinuousMemoryStream.scala:181)
 at 
org.apache.spark.sql.execution.streaming.continuous.ContinuousQueuedDataReader$DataReaderThread.run(ContinuousQueuedDataReader.scala:143)
Caused by: org.apache.spark.SparkException: Could not find 
ContinuousMemoryStreamRecordEndpoint-f7d4460c-9f4e-47ee-a846-258b34964852-9.
 at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)
 at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
 at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
 at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
 ... 4 more
23:36:51.389 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 4.0 (TID 9, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:51.390 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 4.0 (TID 8, localhost, executor driver): TaskKilled (Stage cancelled)
- flatMap
23:36:51.754 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 5.0 (TID 11, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:51.754 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 5.0 (TID 10, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:52.248 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 6.0 (TID 13, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:52.249 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 6.0 (TID 12, localhost, executor driver): TaskKilled (Stage cancelled)
- filter
23:36:52.611 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 7.0 (TID 14, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:52.611 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 7.0 (TID 15, localhost, executor driver): TaskKilled (Stage cancelled)
- deduplicate
- timestamp
23:36:53.015 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 8.0 (TID 16, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:53.015 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 8.0 (TID 17, localhost, executor driver): TaskKilled (Stage cancelled)
- subquery alias
23:36:53.572 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 9.0 (TID 19, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:53.572 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 9.0 (TID 18, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:53.953 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 10.0 (TID 21, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:53.953 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 10.0 (TID 20, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:54.552 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 11.0 (TID 23, localhost, executor driver): TaskKilled (Stage cancelled)
23:36:54.552 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
stage 11.0 (TID 22, localhost, executor driver): TaskKilled (Stage cancelled)
- repeatedly restart
23:36:54.591 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
stage 1

[jira] [Commented] (SPARK-30585) scalatest fails for Apache Spark SQL project

2020-01-20 Thread Rashmi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019335#comment-17019335
 ] 

Rashmi commented on SPARK-30585:


Trying to build Apache Spark on Power.

> scalatest fails for Apache Spark SQL project
> 
>
> Key: SPARK-30585
> URL: https://issues.apache.org/jira/browse/SPARK-30585
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Rashmi
>Priority: Blocker
>
> Error logs:-
> 23:36:49.039 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 3.0 (TID 6, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:49.039 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 3.0 (TID 7, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:51.354 WARN 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor: Current 
> batch is falling behind. The trigger interval is 100 milliseconds, but spent 
> 1854 milliseconds
> 23:36:51.381 WARN 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousQueuedDataReader$DataReaderThread:
>  data reader thread failed
> org.apache.spark.SparkException: Exception thrown in awaitResult:
>  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>  at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>  at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>  at 
> org.apache.spark.sql.execution.streaming.sources.ContinuousMemoryStreamInputPartitionReader.getRecord(ContinuousMemoryStream.scala:195)
>  at 
> org.apache.spark.sql.execution.streaming.sources.ContinuousMemoryStreamInputPartitionReader.next(ContinuousMemoryStream.scala:181)
>  at 
> org.apache.spark.sql.execution.streaming.continuous.ContinuousQueuedDataReader$DataReaderThread.run(ContinuousQueuedDataReader.scala:143)
> Caused by: org.apache.spark.SparkException: Could not find 
> ContinuousMemoryStreamRecordEndpoint-f7d4460c-9f4e-47ee-a846-258b34964852-9.
>  at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)
>  at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
>  at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
>  at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
>  at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
>  ... 4 more
> 23:36:51.389 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 4.0 (TID 9, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:51.390 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 4.0 (TID 8, localhost, executor driver): TaskKilled (Stage cancelled)
> - flatMap
> 23:36:51.754 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 5.0 (TID 11, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:51.754 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 5.0 (TID 10, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:52.248 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 6.0 (TID 13, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:52.249 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 6.0 (TID 12, localhost, executor driver): TaskKilled (Stage cancelled)
> - filter
> 23:36:52.611 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 7.0 (TID 14, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:52.611 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 7.0 (TID 15, localhost, executor driver): TaskKilled (Stage cancelled)
> - deduplicate
> - timestamp
> 23:36:53.015 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 8.0 (TID 16, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:53.015 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 8.0 (TID 17, localhost, executor driver): TaskKilled (Stage cancelled)
> - subquery alias
> 23:36:53.572 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 9.0 (TID 19, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:53.572 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 9.0 (TID 18, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:53.953 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in 
> stage 10.0 (TID 21, localhost, executor driver): TaskKilled (Stage cancelled)
> 23:36:53.953 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in 
> stage 10.0 (TID 20, localhost, executor driver): TaskKilled (Stage cancelled)

[jira] [Created] (SPARK-30584) Non-synchronized methods override Synchronized methods

2020-01-20 Thread Aman Omer (Jira)
Aman Omer created SPARK-30584:
-

 Summary: Non-synchronized methods override Synchronized methods
 Key: SPARK-30584
 URL: https://issues.apache.org/jira/browse/SPARK-30584
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Aman Omer






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30583) Document LIMIT Clause of SELECT statement in SQL Reference.

2020-01-20 Thread Dilip Biswal (Jira)
Dilip Biswal created SPARK-30583:


 Summary:  Document LIMIT Clause of SELECT statement in SQL 
Reference.
 Key: SPARK-30583
 URL: https://issues.apache.org/jira/browse/SPARK-30583
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, SQL
Affects Versions: 2.4.3
Reporter: Dilip Biswal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30582) Spark UI is not showing Aggregated Metrics by Executor in stage page

2020-01-20 Thread Saurabh Chawla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Chawla updated SPARK-30582:
---
Attachment: SparkUIStagePage.mov

> Spark UI is not showing Aggregated Metrics by Executor in stage page
> 
>
> Key: SPARK-30582
> URL: https://issues.apache.org/jira/browse/SPARK-30582
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Saurabh Chawla
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: SparkUIStagePage.mov
>
>
> There are scenarios where Spark History Server is located behind the VPC. So 
> whenever api calls hit to get the executor Summary(allexecutors). There can 
> be delay in getting the response of executor summary and in mean time 
> "stage-page-template.html" is loaded and the response of executor Summary is 
> not added to the stage-page-template.html.
> As the result of which Aggregated Metrics by Executor in stage page is 
> showing blank.
> This scenario can be easily found in the cases when there is some 
> proxy-server which is responsible for sending the request and response to 
> spark History server.
>  This can be reproduced in Knox/In-house proxy servers which are used to send 
> and receive response to Spark History Server.
> Alternative scenario to test this case, Open the spark UI in developer mode 
> in browser add some breakpoint in stagepage.js, this will add some delay in 
> getting the response and now if we check the spark UI for stage Aggregated 
> Metrics by Executor in stage page is showing blank.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30582) Spark UI is not showing Aggregated Metrics by Executor in stage page

2020-01-20 Thread Saurabh Chawla (Jira)
Saurabh Chawla created SPARK-30582:
--

 Summary: Spark UI is not showing Aggregated Metrics by Executor in 
stage page
 Key: SPARK-30582
 URL: https://issues.apache.org/jira/browse/SPARK-30582
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Saurabh Chawla
 Fix For: 3.0.0
 Attachments: SparkUIStagePage.mov

There are scenarios where Spark History Server is located behind the VPC. So 
whenever api calls hit to get the executor Summary(allexecutors). There can be 
delay in getting the response of executor summary and in mean time 
"stage-page-template.html" is loaded and the response of executor Summary is 
not added to the stage-page-template.html.

As the result of which Aggregated Metrics by Executor in stage page is showing 
blank.

This scenario can be easily found in the cases when there is some proxy-server 
which is responsible for sending the request and response to spark History 
server.
 This can be reproduced in Knox/In-house proxy servers which are used to send 
and receive response to Spark History Server.

Alternative scenario to test this case, Open the spark UI in developer mode in 
browser add some breakpoint in stagepage.js, this will add some delay in 
getting the response and now if we check the spark UI for stage Aggregated 
Metrics by Executor in stage page is showing blank.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30581) Document SORT BY Clause of SELECT statement in SQL Reference.

2020-01-20 Thread Dilip Biswal (Jira)
Dilip Biswal created SPARK-30581:


 Summary: Document SORT BY Clause of SELECT statement in SQL 
Reference.
 Key: SPARK-30581
 URL: https://issues.apache.org/jira/browse/SPARK-30581
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, SQL
Affects Versions: 2.4.3
Reporter: Dilip Biswal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30580) Why can PySpark persist data only in serialised format?

2020-01-20 Thread Francesco Cavrini (Jira)
Francesco Cavrini created SPARK-30580:
-

 Summary: Why can PySpark persist data only in serialised format?
 Key: SPARK-30580
 URL: https://issues.apache.org/jira/browse/SPARK-30580
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.0
Reporter: Francesco Cavrini


The storage levels in PySpark allow to persist data only in serialised format. 
There is also [a 
comment|[https://github.com/apache/spark/blob/master/python/pyspark/storagelevel.py#L28]]
 explicitly stating that "Since the data is always serialized on the Python 
side, all the constants use the serialized formats." While that makes totally 
sense for RDDs, it is not clear to me why it is not possible to persist data 
without serialisation when using the dataframe/dataset APIs. In theory, in such 
cases, the persist would only be a directive and data would never leave the 
JVM, thus allowing for un-serialised persistence, correct? Many thanks for the 
feedback!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org