date:20201228

[jira] [Commented] (SPARK-33847) Replace None of elseValue inside CaseWhen if all branches are FalseLiteral

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255855#comment-17255855
 ] 

Apache Spark commented on SPARK-33847:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30960

> Replace None of elseValue inside CaseWhen if all branches are FalseLiteral
> --
>
> Key: SPARK-33847
> URL: https://issues.apache.org/jira/browse/SPARK-33847
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> {code:scala}
> spark.sql("create table t1 using parquet as select id from range(10)")
> spark.sql("select id from t1 where (CASE WHEN id = 1 THEN 'a' WHEN id = 3 
> THEN 'b' end) = 'c' ").explain()
> {code}
> Before:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [CASE 
> WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {noformat}
> After:
> {noformat}
> == Physical Plan ==
> LocalTableScan , [id#1L]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33847) Replace None of elseValue inside CaseWhen if all branches are FalseLiteral

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255854#comment-17255854
 ] 

Apache Spark commented on SPARK-33847:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30960

> Replace None of elseValue inside CaseWhen if all branches are FalseLiteral
> --
>
> Key: SPARK-33847
> URL: https://issues.apache.org/jira/browse/SPARK-33847
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> {code:scala}
> spark.sql("create table t1 using parquet as select id from range(10)")
> spark.sql("select id from t1 where (CASE WHEN id = 1 THEN 'a' WHEN id = 3 
> THEN 'b' end) = 'c' ").explain()
> {code}
> Before:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [CASE 
> WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {noformat}
> After:
> {noformat}
> == Physical Plan ==
> LocalTableScan , [id#1L]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33928) Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle executors"

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33928:
---

Assignee: wuyi

> Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target 
> num executors when killing idle executors"
> 
>
> Key: SPARK-33928
> URL: https://issues.apache.org/jira/browse/SPARK-33928
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0, 3.0.1, 3.1.0, 3.2.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> [info] - SPARK-23365 Don't update target num executors when killing idle 
> executors *** FAILED *** (126 milliseconds)
> [info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1617)
> [info] org.scalatest.exceptions.TestFailedException:
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
> [info] at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
> [info] at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
> [info] at 
> org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
> [info] at 
> org.apache.spark.SparkFunSuite.$anonfun$test$1(SparkFunSuite.scala:423)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:246)
> [info] at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
> [info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
> [info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
> [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
> [info] at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
> [info] at scala.collection.immutable.List.foreach(List.scala:392)
> [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
> [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
> [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
> [info] at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
> [info] at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
> [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info] at org.scalatest.Suite.run(Suite.scala:1124)
> [info] at org.scalatest.Suite.run$(Suite.scala:1106)
> [info] at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info] at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
> [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
> [info] at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
> [info] at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:433)
> [info] at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
> [info] at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
> [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
> [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info] at

[jira] [Resolved] (SPARK-33928) Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle executors"

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33928.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30956
[https://github.com/apache/spark/pull/30956]

> Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target 
> num executors when killing idle executors"
> 
>
> Key: SPARK-33928
> URL: https://issues.apache.org/jira/browse/SPARK-33928
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0, 3.0.1, 3.1.0, 3.2.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.1.0
>
>
> [info] - SPARK-23365 Don't update target num executors when killing idle 
> executors *** FAILED *** (126 milliseconds)
> [info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1617)
> [info] org.scalatest.exceptions.TestFailedException:
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
> [info] at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
> [info] at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
> [info] at 
> org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
> [info] at 
> org.apache.spark.SparkFunSuite.$anonfun$test$1(SparkFunSuite.scala:423)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:246)
> [info] at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
> [info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
> [info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
> [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
> [info] at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
> [info] at scala.collection.immutable.List.foreach(List.scala:392)
> [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
> [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
> [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
> [info] at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
> [info] at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
> [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info] at org.scalatest.Suite.run(Suite.scala:1124)
> [info] at org.scalatest.Suite.run$(Suite.scala:1106)
> [info] at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info] at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
> [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
> [info] at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
> [info] at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:433)
> [info] at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
> [info] at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
> [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
> [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info] at 
>

[jira] [Assigned] (SPARK-33884) Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33884:
---

Assignee: Yuming Wang

> Simplify CaseWhenclauses with (true and false) and (false and true)
> ---
>
> Key: SPARK-33884
> URL: https://issues.apache.org/jira/browse/SPARK-33884
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Simplify CaseWhenclauses with (true and false) and (false and true):
> ||Expression||After simplify||
> |case when cond then true else false end|cond|
> |case when cond then false else true end|!cond|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33884) Simplify CaseWhenclauses with (true and false) and (false and true)

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33884.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30898
[https://github.com/apache/spark/pull/30898]

> Simplify CaseWhenclauses with (true and false) and (false and true)
> ---
>
> Key: SPARK-33884
> URL: https://issues.apache.org/jira/browse/SPARK-33884
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Simplify CaseWhenclauses with (true and false) and (false and true):
> ||Expression||After simplify||
> |case when cond then true else false end|cond|
> |case when cond then false else true end|!cond|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33924) v2 INSERT INTO .. PARTITION drops the partition location

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33924.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30952
[https://github.com/apache/spark/pull/30952]

> v2 INSERT INTO .. PARTITION drops the partition location
> 
>
> Key: SPARK-33924
> URL: https://issues.apache.org/jira/browse/SPARK-33924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> See the test "with location" in v2.AlterTableRenamePartitionSuite:
> {code:scala}
> val loc = "location1"
> sql(s"ALTER TABLE $t ADD PARTITION (id = 2) LOCATION '$loc'")
> checkLocation(t, Map("id" -> "2"), loc)
> sql(s"INSERT INTO $t PARTITION (id = 2) SELECT 'def'")
> checkLocation(t, Map("id" -> "2"), loc)
> {code}
> The second check must not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33931:
--
Priority: Blocker  (was: Major)

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33931:
--
Target Version/s: 3.1.0, 3.2.0

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33931:
--
Comment: was deleted

(was: User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30959)

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33931:


Assignee: (was: Apache Spark)

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255825#comment-17255825
 ] 

Apache Spark commented on SPARK-33931:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30959

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33931:


Assignee: Apache Spark

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255824#comment-17255824
 ] 

Apache Spark commented on SPARK-33931:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30959

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33931) Recover GitHub Action

2020-12-28 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-33931:
-

 Summary: Recover GitHub Action
 Key: SPARK-33931
 URL: https://issues.apache.org/jira/browse/SPARK-33931
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.0.1, 2.4.8, 3.1.0, 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33930) Spark SQL no serde row format field delimit default is '\u0001'

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255819#comment-17255819
 ] 

Apache Spark commented on SPARK-33930:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30958

> Spark SQL no serde row format field delimit default is '\u0001'
> ---
>
> Key: SPARK-33930
> URL: https://issues.apache.org/jira/browse/SPARK-33930
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> For same sql
> {code:java}
> SELECT TRANSFORM(a, b, c, null)
> ROW FORMAT DELIMITED
> USING 'cat' 
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '&'
> FROM (select 1 as a, 2 as b, 3  as c) t
> {code}
> !image-2020-12-29-13-11-31-336.png!
>  
> !image-2020-12-29-13-11-45-734.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33930) Spark SQL no serde row format field delimit default is '\u0001'

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33930:


Assignee: (was: Apache Spark)

> Spark SQL no serde row format field delimit default is '\u0001'
> ---
>
> Key: SPARK-33930
> URL: https://issues.apache.org/jira/browse/SPARK-33930
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> For same sql
> {code:java}
> SELECT TRANSFORM(a, b, c, null)
> ROW FORMAT DELIMITED
> USING 'cat' 
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '&'
> FROM (select 1 as a, 2 as b, 3  as c) t
> {code}
> !image-2020-12-29-13-11-31-336.png!
>  
> !image-2020-12-29-13-11-45-734.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33930) Spark SQL no serde row format field delimit default is '\u0001'

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33930:


Assignee: Apache Spark

> Spark SQL no serde row format field delimit default is '\u0001'
> ---
>
> Key: SPARK-33930
> URL: https://issues.apache.org/jira/browse/SPARK-33930
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> For same sql
> {code:java}
> SELECT TRANSFORM(a, b, c, null)
> ROW FORMAT DELIMITED
> USING 'cat' 
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '&'
> FROM (select 1 as a, 2 as b, 3  as c) t
> {code}
> !image-2020-12-29-13-11-31-336.png!
>  
> !image-2020-12-29-13-11-45-734.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33930) Spark SQL no serde row format field delimit default is '\u0001'

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255818#comment-17255818
 ] 

Apache Spark commented on SPARK-33930:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30958

> Spark SQL no serde row format field delimit default is '\u0001'
> ---
>
> Key: SPARK-33930
> URL: https://issues.apache.org/jira/browse/SPARK-33930
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> For same sql
> {code:java}
> SELECT TRANSFORM(a, b, c, null)
> ROW FORMAT DELIMITED
> USING 'cat' 
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '&'
> FROM (select 1 as a, 2 as b, 3  as c) t
> {code}
> !image-2020-12-29-13-11-31-336.png!
>  
> !image-2020-12-29-13-11-45-734.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31937) Support processing array/map/struct type using spark noserde mode

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255815#comment-17255815
 ] 

Apache Spark commented on SPARK-31937:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30957

> Support processing array/map/struct type using spark noserde mode
> -
>
> Key: SPARK-31937
> URL: https://issues.apache.org/jira/browse/SPARK-31937
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Currently, It is not supported to use script(e.g. python) to process array 
> type or map type, it will complain with below message:
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast 
> to [Ljava.lang.Object}}
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeMapData cannot be cast to 
> java.util.Map}}
> To support it
>  
> https://github.com/apache/spark/pull/29085/commits/43d0f24f2c769dc270cf7e5fa2c5c13c32d0a631?file-filters%5B%5D=.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31937) Support processing array/map/struct type using spark noserde mode

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31937:


Assignee: (was: Apache Spark)

> Support processing array/map/struct type using spark noserde mode
> -
>
> Key: SPARK-31937
> URL: https://issues.apache.org/jira/browse/SPARK-31937
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Currently, It is not supported to use script(e.g. python) to process array 
> type or map type, it will complain with below message:
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast 
> to [Ljava.lang.Object}}
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeMapData cannot be cast to 
> java.util.Map}}
> To support it
>  
> https://github.com/apache/spark/pull/29085/commits/43d0f24f2c769dc270cf7e5fa2c5c13c32d0a631?file-filters%5B%5D=.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31937) Support processing array/map/struct type using spark noserde mode

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255814#comment-17255814
 ] 

Apache Spark commented on SPARK-31937:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30957

> Support processing array/map/struct type using spark noserde mode
> -
>
> Key: SPARK-31937
> URL: https://issues.apache.org/jira/browse/SPARK-31937
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Currently, It is not supported to use script(e.g. python) to process array 
> type or map type, it will complain with below message:
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast 
> to [Ljava.lang.Object}}
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeMapData cannot be cast to 
> java.util.Map}}
> To support it
>  
> https://github.com/apache/spark/pull/29085/commits/43d0f24f2c769dc270cf7e5fa2c5c13c32d0a631?file-filters%5B%5D=.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31937) Support processing array/map/struct type using spark noserde mode

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31937:


Assignee: Apache Spark

> Support processing array/map/struct type using spark noserde mode
> -
>
> Key: SPARK-31937
> URL: https://issues.apache.org/jira/browse/SPARK-31937
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Currently, It is not supported to use script(e.g. python) to process array 
> type or map type, it will complain with below message:
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast 
> to [Ljava.lang.Object}}
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeMapData cannot be cast to 
> java.util.Map}}
> To support it
>  
> https://github.com/apache/spark/pull/29085/commits/43d0f24f2c769dc270cf7e5fa2c5c13c32d0a631?file-filters%5B%5D=.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31937) Support processing array/map/struct type using spark noserde mode

2020-12-28 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-31937:
--
Summary: Support processing array/map/struct type using spark noserde mode  
(was: Support processing array and map type using spark noserde mode)

> Support processing array/map/struct type using spark noserde mode
> -
>
> Key: SPARK-31937
> URL: https://issues.apache.org/jira/browse/SPARK-31937
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Currently, It is not supported to use script(e.g. python) to process array 
> type or map type, it will complain with below message:
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast 
> to [Ljava.lang.Object}}
>  {{org.apache.spark.sql.catalyst.expressions.UnsafeMapData cannot be cast to 
> java.util.Map}}
> To support it
>  
> https://github.com/apache/spark/pull/29085/commits/43d0f24f2c769dc270cf7e5fa2c5c13c32d0a631?file-filters%5B%5D=.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33930) Spark SQL no serde row format field delimit default is '\u0001'

2020-12-28 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255813#comment-17255813
 ] 

angerszhu commented on SPARK-33930:
---

raise a pr soon

> Spark SQL no serde row format field delimit default is '\u0001'
> ---
>
> Key: SPARK-33930
> URL: https://issues.apache.org/jira/browse/SPARK-33930
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> For same sql
> {code:java}
> SELECT TRANSFORM(a, b, c, null)
> ROW FORMAT DELIMITED
> USING 'cat' 
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '&'
> FROM (select 1 as a, 2 as b, 3  as c) t
> {code}
> !image-2020-12-29-13-11-31-336.png!
>  
> !image-2020-12-29-13-11-45-734.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33930) Spark SQL no serde row format field delimit default is '\u0001'

2020-12-28 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-33930:
--
Parent: SPARK-31936
Issue Type: Sub-task  (was: Improvement)

> Spark SQL no serde row format field delimit default is '\u0001'
> ---
>
> Key: SPARK-33930
> URL: https://issues.apache.org/jira/browse/SPARK-33930
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> For same sql
> {code:java}
> SELECT TRANSFORM(a, b, c, null)
> ROW FORMAT DELIMITED
> USING 'cat' 
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '&'
> FROM (select 1 as a, 2 as b, 3  as c) t
> {code}
> !image-2020-12-29-13-11-31-336.png!
>  
> !image-2020-12-29-13-11-45-734.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33930) Spark SQL no serde row format field delimit default is '\u0001'

2020-12-28 Thread angerszhu (Jira)

angerszhu created SPARK-33930:
-

 Summary: Spark SQL no serde row format field delimit default is 
'\u0001'
 Key: SPARK-33930
 URL: https://issues.apache.org/jira/browse/SPARK-33930
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


For same sql
{code:java}
SELECT TRANSFORM(a, b, c, null)
ROW FORMAT DELIMITED
USING 'cat' 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '&'
FROM (select 1 as a, 2 as b, 3  as c) t

{code}

!image-2020-12-29-13-11-31-336.png!

 

!image-2020-12-29-13-11-45-734.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33929) Spark-submit with --package deequ doesn't pull all jars

2020-12-28 Thread Dustin Smith (Jira)

Dustin Smith created SPARK-33929:


 Summary: Spark-submit with --package deequ doesn't pull all jars
 Key: SPARK-33929
 URL: https://issues.apache.org/jira/browse/SPARK-33929
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.3.4, 2.3.3, 2.3.2, 2.3.1, 2.3.0
Reporter: Dustin Smith


This issue was marked as solved 
[SPARK-24074|https://issues.apache.org/jira/browse/SPARK-24074?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel];
 however, another [~hyukjin.kwon] pointed out in the comments that version 2.4x 
was experiencing this same problem. 

This problem exist in 2.3.x ecosystem as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33929) Spark-submit with --package deequ doesn't pull all jars

2020-12-28 Thread Dustin Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Smith updated SPARK-33929:
-
Description: 
This issue was marked as solved SPARK-24074; however, another [~hyukjin.kwon] 
pointed out in the comments that version 2.4x was experiencing this same 
problem when using Amazon Deequ.

This problem exist in 2.3.x ecosystem as well for Deequ.

  was:
This issue was marked as solved 
[SPARK-24074|https://issues.apache.org/jira/browse/SPARK-24074?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel];
 however, another [~hyukjin.kwon] pointed out in the comments that version 2.4x 
was experiencing this same problem. 

This problem exist in 2.3.x ecosystem as well.


> Spark-submit with --package deequ doesn't pull all jars
> ---
>
> Key: SPARK-33929
> URL: https://issues.apache.org/jira/browse/SPARK-33929
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4
>Reporter: Dustin Smith
>Priority: Major
>
> This issue was marked as solved SPARK-24074; however, another [~hyukjin.kwon] 
> pointed out in the comments that version 2.4x was experiencing this same 
> problem when using Amazon Deequ.
> This problem exist in 2.3.x ecosystem as well for Deequ.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33928) Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle executors"

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33928:


Assignee: Apache Spark

> Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target 
> num executors when killing idle executors"
> 
>
> Key: SPARK-33928
> URL: https://issues.apache.org/jira/browse/SPARK-33928
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0, 3.0.1, 3.1.0, 3.2.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> [info] - SPARK-23365 Don't update target num executors when killing idle 
> executors *** FAILED *** (126 milliseconds)
> [info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1617)
> [info] org.scalatest.exceptions.TestFailedException:
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
> [info] at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
> [info] at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
> [info] at 
> org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
> [info] at 
> org.apache.spark.SparkFunSuite.$anonfun$test$1(SparkFunSuite.scala:423)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:246)
> [info] at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
> [info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
> [info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
> [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
> [info] at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
> [info] at scala.collection.immutable.List.foreach(List.scala:392)
> [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
> [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
> [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
> [info] at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
> [info] at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
> [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info] at org.scalatest.Suite.run(Suite.scala:1124)
> [info] at org.scalatest.Suite.run$(Suite.scala:1106)
> [info] at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info] at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
> [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
> [info] at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
> [info] at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:433)
> [info] at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
> [info] at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
> [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
> [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]

[jira] [Commented] (SPARK-33928) Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle executors"

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255798#comment-17255798
 ] 

Apache Spark commented on SPARK-33928:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/30956

> Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target 
> num executors when killing idle executors"
> 
>
> Key: SPARK-33928
> URL: https://issues.apache.org/jira/browse/SPARK-33928
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0, 3.0.1, 3.1.0, 3.2.0
>Reporter: wuyi
>Priority: Major
>
> [info] - SPARK-23365 Don't update target num executors when killing idle 
> executors *** FAILED *** (126 milliseconds)
> [info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1617)
> [info] org.scalatest.exceptions.TestFailedException:
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
> [info] at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
> [info] at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
> [info] at 
> org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
> [info] at 
> org.apache.spark.SparkFunSuite.$anonfun$test$1(SparkFunSuite.scala:423)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:246)
> [info] at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
> [info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
> [info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
> [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
> [info] at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
> [info] at scala.collection.immutable.List.foreach(List.scala:392)
> [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
> [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
> [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
> [info] at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
> [info] at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
> [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info] at org.scalatest.Suite.run(Suite.scala:1124)
> [info] at org.scalatest.Suite.run$(Suite.scala:1106)
> [info] at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info] at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
> [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
> [info] at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
> [info] at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:433)
> [info] at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
> [info] at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
> [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
> [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info] at 
>

[jira] [Assigned] (SPARK-33928) Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle executors"

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33928:


Assignee: (was: Apache Spark)

> Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target 
> num executors when killing idle executors"
> 
>
> Key: SPARK-33928
> URL: https://issues.apache.org/jira/browse/SPARK-33928
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0, 3.0.1, 3.1.0, 3.2.0
>Reporter: wuyi
>Priority: Major
>
> [info] - SPARK-23365 Don't update target num executors when killing idle 
> executors *** FAILED *** (126 milliseconds)
> [info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1617)
> [info] org.scalatest.exceptions.TestFailedException:
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
> [info] at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
> [info] at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
> [info] at 
> org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
> [info] at 
> org.apache.spark.SparkFunSuite.$anonfun$test$1(SparkFunSuite.scala:423)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:246)
> [info] at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
> [info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
> [info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
> [info] at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
> [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
> [info] at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
> [info] at scala.collection.immutable.List.foreach(List.scala:392)
> [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
> [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
> [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
> [info] at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
> [info] at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
> [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info] at org.scalatest.Suite.run(Suite.scala:1124)
> [info] at org.scalatest.Suite.run$(Suite.scala:1106)
> [info] at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info] at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
> [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
> [info] at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
> [info] at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
> [info] at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:74)
> [info] at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:433)
> [info] at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
> [info] at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
> [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
> [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info] at

[jira] [Commented] (SPARK-23365) DynamicAllocation with failure in straggler task can lead to a hung spark job

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255799#comment-17255799
 ] 

Apache Spark commented on SPARK-23365:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/30956

> DynamicAllocation with failure in straggler task can lead to a hung spark job
> -
>
> Key: SPARK-23365
> URL: https://issues.apache.org/jira/browse/SPARK-23365
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.2, 2.2.1, 2.3.0
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> Dynamic Allocation can lead to a spark app getting stuck with 0 executors 
> requested when the executors in the last tasks of a taskset fail (eg. with an 
> OOM).
> This happens when {{ExecutorAllocationManager}} s internal target number of 
> executors gets out of sync with {{CoarseGrainedSchedulerBackend}} s target 
> number.  {{EAM}} updates the {{CGSB}} in two ways: (1) it tracks how many 
> tasks are active or pending in submitted stages, and computes how many 
> executors would be needed for them.  And as tasks finish, it will actively 
> decrease that count, informing the {{CGSB}} along the way.  (2) When it 
> decides executors are inactive for long enough, then it requests that 
> {{CGSB}} kill the executors -- this also tells the {{CGSB}} to update its 
> target number of executors: 
> https://github.com/apache/spark/blob/4df84c3f818aa536515729b442601e08c253ed35/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L622
> So when there is just one task left, you could have the following sequence of 
> events:
> (1) the {{EAM}} sets the desired number of executors to 1, and updates the 
> {{CGSB}} too
> (2) while that final task is still running, the other executors cross the 
> idle timeout, and the {{EAM}} requests the {{CGSB}} kill them
> (3) now the {{EAM}} has a target of 1 executor, and the {{CGSB}} has a target 
> of 0 executors
> If the final task completed normally now, everything would be OK; the next 
> taskset would get submitted, the {{EAM}} would increase the target number of 
> executors and it would update the {{CGSB}}.
> But if the executor for that final task failed (eg. an OOM), then the {{EAM}} 
> thinks it [doesn't need to update 
> anything|https://github.com/apache/spark/blob/4df84c3f818aa536515729b442601e08c253ed35/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L384-L386],
>  because its target is already 1, which is all it needs for that final task; 
> and the {{CGSB}} doesn't update anything either since its target is 0.
> I think you can determine if this is the cause of a stuck app by looking for
> {noformat}
> yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
> {noformat}
> in the logs of the ApplicationMaster (at least on yarn).
> You can reproduce this with this test app, run with {{--conf 
> "spark.dynamicAllocation.minExecutors=1" --conf 
> "spark.dynamicAllocation.maxExecutors=5" --conf 
> "spark.dynamicAllocation.executorIdleTimeout=5s"}}
> {code}
> import org.apache.spark.SparkEnv
> sc.setLogLevel("INFO")
> sc.parallelize(1 to 1, 1000).count()
> val execs = sc.parallelize(1 to 1000, 1000).map { _ => 
> SparkEnv.get.executorId}.collect().toSet
> val badExec = execs.head
> println("will kill exec " + badExec)
> sc.parallelize(1 to 5, 5).mapPartitions { itr =>
>   val exec = SparkEnv.get.executorId
>   if (exec == badExec) {
> Thread.sleep(2) // long enough that all the other tasks finish, and 
> the executors cross the idle timeout
> // now cause the executor to oom
> var buffers = Seq[Array[Byte]]()
> while(true) {
>   buffers :+= new Array[Byte](1e8.toInt)
> }
> itr
>   } else {
> itr
>   }
> }.collect()
> {code}
> *EDIT*: I adjusted the repro to cause an OOM on the bad executor, since 
> {{sc.killExecutor}} doesn't play nice with dynamic allocation in other ways.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33928) Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle executors"

2020-12-28 Thread wuyi (Jira)

wuyi created SPARK-33928:


 Summary: Flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 
Don't update target num executors when killing idle executors"
 Key: SPARK-33928
 URL: https://issues.apache.org/jira/browse/SPARK-33928
 Project: Spark
  Issue Type: Test
  Components: Spark Core
Affects Versions: 3.0.1, 3.0.0, 2.4.0, 3.1.0, 3.2.0
Reporter: wuyi


[info] - SPARK-23365 Don't update target num executors when killing idle 
executors *** FAILED *** (126 milliseconds)
[info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1617)
[info] org.scalatest.exceptions.TestFailedException:
[info] at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
[info] at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
[info] at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
[info] at 
org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
[info] at 
org.apache.spark.SparkFunSuite.$anonfun$test$1(SparkFunSuite.scala:423)
[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info] at org.scalatest.Transformer.apply(Transformer.scala:22)
[info] at org.scalatest.Transformer.apply(Transformer.scala:20)
[info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:246)
[info] at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
[info] at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
[info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
[info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
[info] at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:74)
[info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
[info] at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
[info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:74)
[info] at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
[info] at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
[info] at scala.collection.immutable.List.foreach(List.scala:392)
[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
[info] at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
[info] at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
[info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
[info] at org.scalatest.Suite.run(Suite.scala:1124)
[info] at org.scalatest.Suite.run$(Suite.scala:1106)
[info] at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
[info] at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
[info] at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
[info] at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
[info] at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:74)
[info] at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:433)
[info] at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
[info] at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
[info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33910) Simplify/Optimize conditional expressions

2020-12-28 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33910:

Description: 
1. Push down the foldable expressions through CaseWhen/If
2. Simplify conditional in predicate
3. Push the UnaryExpression into (if / case) branches
4. Simplify CaseWhen if elseValue is None
5. Simplify CaseWhen clauses with (true and false) and (false and true)

Common use cases are:
{code:sql}
create table t1 using parquet as select * from range(100);
create table t2 using parquet as select * from range(200);

create temp view v1 as
select 'a' as event_type, * from t1   
union all 
select CASE WHEN id = 1 THEN 'b' ELSE 'c' end as event_type, * from t2
{code}

1. Reduce read the whole table.
{noformat}
explain select * from v1 where event_type = 'a';
Before simplify:
== Physical Plan ==
Union
:- *(1) Project [a AS event_type#7, id#9L]
:  +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[id#9L] Batched: true, DataFilters: [], 
Format: Parquet, PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct
+- *(2) Project [CASE WHEN (id#10L = 1) THEN b ELSE c END AS event_type#8, 
id#10L]
   +- *(2) Filter (CASE WHEN (id#10L = 1) THEN b ELSE c END = a)
  +- *(2) ColumnarToRow
 +- FileScan parquet default.t2[id#10L] Batched: true, DataFilters: 
[(CASE WHEN (id#10L = 1) THEN b ELSE c END = a)], Format: Parquet, 
PartitionFilters: [], PushedFilters: [], ReadSchema: struct

After simplify:
== Physical Plan ==
*(1) Project [a AS event_type#8, id#4L]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#4L] Batched: true, DataFilters: [], 
Format: Parquet
{noformat}

2. Push down the conditional expressions to data source.
{noformat}
explain select * from v1 where event_type = 'b';
Before simplify:
== Physical Plan ==
Union
:- LocalTableScan , [event_type#7, id#9L]
+- *(1) Project [CASE WHEN (id#10L = 1) THEN b ELSE c END AS event_type#8, 
id#10L]
   +- *(1) Filter (CASE WHEN (id#10L = 1) THEN b ELSE c END = b)
  +- *(1) ColumnarToRow
 +- FileScan parquet default.t2[id#10L] Batched: true, DataFilters: 
[(CASE WHEN (id#10L = 1) THEN b ELSE c END = b)], Format: Parquet, 
PartitionFilters: [], PushedFilters: [], ReadSchema: struct

After simplify:
== Physical Plan ==
*(1) Project [CASE WHEN (id#5L = 1) THEN b ELSE c END AS event_type#8, id#5L AS 
id#4L]
+- *(1) Filter (isnotnull(id#5L) AND (id#5L = 1))
   +- *(1) ColumnarToRow
  +- FileScan parquet default.t2[id#5L] Batched: true, DataFilters: 
[isnotnull(id#5L), (id#5L = 1)], Format: Parquet, PartitionFilters: [], 
PushedFilters: [IsNotNull(id), EqualTo(id,1)], ReadSchema: struct
{noformat}

3. Reduce the amount of calculation.
{noformat}
Before simplify:
explain select event_type = 'e' from v1;
== Physical Plan ==
Union
:- *(1) Project [false AS (event_type = e)#37]
:  +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[] Batched: true, DataFilters: [], Format: 
Parquet, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
+- *(2) Project [(CASE WHEN (id#21L = 1) THEN b ELSE c END = e) AS (event_type 
= e)#38]
   +- *(2) ColumnarToRow
  +- FileScan parquet default.t2[id#21L] Batched: true, DataFilters: [], 
Format: Parquet, PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct

After simplify:
== Physical Plan ==
Union
:- *(1) Project [false AS (event_type = e)#10]
:  +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[] Batched: true, DataFilters: [], Format: 
Parquet,
+- *(2) Project [false AS (event_type = e)#14]
   +- *(2) ColumnarToRow
  +- FileScan parquet default.t2[] Batched: true, DataFilters: [], Format: 
Parquet, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>

{noformat}



  was:
1. Push down the foldable expressions through CaseWhen/If
2. Simplify conditional in predicate
3. Push the UnaryExpression into (if / case) branches
4. Simplify CaseWhen if elseValue is None
5. Simplify conditional if all branches are foldable boolean type

Common use cases are:
{code:sql}
create table t1 using parquet as select * from range(100);
create table t2 using parquet as select * from range(200);

create temp view v1 as
select 'a' as event_type, * from t1   
union all 
select CASE WHEN id = 1 THEN 'b' ELSE 'c' end as event_type, * from t2
{code}

1. Reduce read the whole table.
{noformat}
explain select * from v1 where event_type = 'a';
Before simplify:
== Physical Plan ==
Union
:- *(1) Project [a AS event_type#7, id#9L]
:  +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[id#9L] Batched: true, DataFilters: [], 
Format: Parquet, PartitionFilters: [], PushedFilters: [], ReadSchema:

[jira] [Commented] (SPARK-33848) Push the UnaryExpression into (if / case) branches

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255790#comment-17255790
 ] 

Apache Spark commented on SPARK-33848:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30955

> Push the UnaryExpression into (if / case) branches
> --
>
> Key: SPARK-33848
> URL: https://issues.apache.org/jira/browse/SPARK-33848
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Push the cast into (if / case) branches. The use case is:
> {code:sql}
> create table t1 using parquet as select id from range(10);
> explain select id from t1 where (CASE WHEN id = 1 THEN '1' WHEN id = 3 THEN 
> '2' end) > 3;
> {code}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as 
> int) > 3)
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: 
> [(cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as int) > 
> 3)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> LocalTableScan , [id#1L]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33927) Fix Spark Release image

2020-12-28 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255777#comment-17255777
 ] 

Dongjoon Hyun commented on SPARK-33927:
---

cc [~hyukjin.kwon]

> Fix Spark Release image
> ---
>
> Key: SPARK-33927
> URL: https://issues.apache.org/jira/browse/SPARK-33927
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> The release script seems to be broken. This is a blocker for Apache Spark 
> 3.1.0 release.
> {code}
> $ cd dev/create-release/spark-rm
> $ docker build -t spark-rm .
> ...
> exit code: 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33927) Fix Spark Release image

2020-12-28 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-33927:
-

 Summary: Fix Spark Release image
 Key: SPARK-33927
 URL: https://issues.apache.org/jira/browse/SPARK-33927
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun


The release script seems to be broken. This is a blocker for Apache Spark 3.1.0 
release.
{code}
$ cd dev/create-release/spark-rm
$ docker build -t spark-rm .
...
exit code: 1
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33925.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30945
[https://github.com/apache/spark/pull/30945]

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.2.0
>
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33925:
-

Assignee: Hyukjin Kwon

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33916) Fix fallback storage offset and improve compression codec test coverage

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33916:
-

Assignee: Dongjoon Hyun

> Fix fallback storage offset and improve compression codec test coverage
> ---
>
> Key: SPARK-33916
> URL: https://issues.apache.org/jira/browse/SPARK-33916
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33916) Fix fallback storage offset and improve compression codec test coverage

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33916.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30934
[https://github.com/apache/spark/pull/30934]

> Fix fallback storage offset and improve compression codec test coverage
> ---
>
> Key: SPARK-33916
> URL: https://issues.apache.org/jira/browse/SPARK-33916
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33871) Cannot access to column after left semi join and left join

2020-12-28 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-33871.
-
Resolution: Not A Problem

> Cannot access to column after left semi join  and left join
> ---
>
> Key: SPARK-33871
> URL: https://issues.apache.org/jira/browse/SPARK-33871
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Evgenii Samusenko
>Priority: Minor
>
> Cannot access to column after left semi join and left join
> {code}
> val col = "c1"
> val df = Seq((1, "a"),(2, "a"),(3, "a"),(4, "a")).toDF(col, "c2")
> val df2 = Seq(1).toDF(col)
> val semiJoin = df.join(df2, df(col) === df2(col), "left_semi")
> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
> left.show
> +---+---+++
> | c1| c2|  c1|  c2|
> +---+---+++
> |  1|  a|   1|   a|
> |  2|  a|null|null|
> |  3|  a|null|null|
> |  4|  a|null|null|
> +---+---+++
> left.select(semiJoin(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> left.select(df(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-33883.
-
Resolution: Invalid

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png, 
> image-2020-12-28-18-32-25-960.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2020-12-28 Thread Bryan Cutler (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255712#comment-17255712
 ] 

Bryan Cutler commented on SPARK-24632:
--

Ping [~huaxingao] in case you have some time to look into this.

> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> --
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the 
> doc linked below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers 
> implement a trait which provides the corresponding Python classpath in some 
> field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  
> They would ideally test a Java class + Python wrapper class pair sitting 
> outside of pyspark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33029) Standalone mode blacklist executors page UI marks driver as blacklisted

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33029:


Assignee: Apache Spark

> Standalone mode blacklist executors page UI marks driver as blacklisted
> ---
>
> Key: SPARK-33029
> URL: https://issues.apache.org/jira/browse/SPARK-33029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Apache Spark
>Priority: Major
> Attachments: Screen Shot 2020-09-29 at 1.52.09 PM.png, Screen Shot 
> 2020-09-29 at 1.53.37 PM.png
>
>
> I am running a spark shell on a 1 node standalone cluster.  I noticed that 
> the executors page ui was marking the driver as blacklisted for the stage 
> that is running.  Attached a screen shot.
> Also, in my case one of the executors died and it doesn't seem like the 
> schedule rpicked up the new one.  It doesn't show up on the stages page and 
> just shows it as active but none of the tasks ran there.
>  
> You can reproduce this by starting a master and slave on a single node, then 
> launch a shell like where you will get multiple executors (in this case I got 
> 3)
> $SPARK_HOME/bin/spark-shell --master spark://yourhost:7077 --executor-cores 4 
> --conf spark.blacklist.enabled=true
>  
> From shell run:
> {code:java}
> import org.apache.spark.TaskContext
> val rdd = sc.makeRDD(1 to 1000, 5).mapPartitions { it =>
>  val context = TaskContext.get()
>  if (context.attemptNumber() < 2) {
>  throw new Exception("test attempt num")
>  }
>  it
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33029) Standalone mode blacklist executors page UI marks driver as blacklisted

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255703#comment-17255703
 ] 

Apache Spark commented on SPARK-33029:
--

User 'baohe-zhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30954

> Standalone mode blacklist executors page UI marks driver as blacklisted
> ---
>
> Key: SPARK-33029
> URL: https://issues.apache.org/jira/browse/SPARK-33029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
> Attachments: Screen Shot 2020-09-29 at 1.52.09 PM.png, Screen Shot 
> 2020-09-29 at 1.53.37 PM.png
>
>
> I am running a spark shell on a 1 node standalone cluster.  I noticed that 
> the executors page ui was marking the driver as blacklisted for the stage 
> that is running.  Attached a screen shot.
> Also, in my case one of the executors died and it doesn't seem like the 
> schedule rpicked up the new one.  It doesn't show up on the stages page and 
> just shows it as active but none of the tasks ran there.
>  
> You can reproduce this by starting a master and slave on a single node, then 
> launch a shell like where you will get multiple executors (in this case I got 
> 3)
> $SPARK_HOME/bin/spark-shell --master spark://yourhost:7077 --executor-cores 4 
> --conf spark.blacklist.enabled=true
>  
> From shell run:
> {code:java}
> import org.apache.spark.TaskContext
> val rdd = sc.makeRDD(1 to 1000, 5).mapPartitions { it =>
>  val context = TaskContext.get()
>  if (context.attemptNumber() < 2) {
>  throw new Exception("test attempt num")
>  }
>  it
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33029) Standalone mode blacklist executors page UI marks driver as blacklisted

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255702#comment-17255702
 ] 

Apache Spark commented on SPARK-33029:
--

User 'baohe-zhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30954

> Standalone mode blacklist executors page UI marks driver as blacklisted
> ---
>
> Key: SPARK-33029
> URL: https://issues.apache.org/jira/browse/SPARK-33029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
> Attachments: Screen Shot 2020-09-29 at 1.52.09 PM.png, Screen Shot 
> 2020-09-29 at 1.53.37 PM.png
>
>
> I am running a spark shell on a 1 node standalone cluster.  I noticed that 
> the executors page ui was marking the driver as blacklisted for the stage 
> that is running.  Attached a screen shot.
> Also, in my case one of the executors died and it doesn't seem like the 
> schedule rpicked up the new one.  It doesn't show up on the stages page and 
> just shows it as active but none of the tasks ran there.
>  
> You can reproduce this by starting a master and slave on a single node, then 
> launch a shell like where you will get multiple executors (in this case I got 
> 3)
> $SPARK_HOME/bin/spark-shell --master spark://yourhost:7077 --executor-cores 4 
> --conf spark.blacklist.enabled=true
>  
> From shell run:
> {code:java}
> import org.apache.spark.TaskContext
> val rdd = sc.makeRDD(1 to 1000, 5).mapPartitions { it =>
>  val context = TaskContext.get()
>  if (context.attemptNumber() < 2) {
>  throw new Exception("test attempt num")
>  }
>  it
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33029) Standalone mode blacklist executors page UI marks driver as blacklisted

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33029:


Assignee: (was: Apache Spark)

> Standalone mode blacklist executors page UI marks driver as blacklisted
> ---
>
> Key: SPARK-33029
> URL: https://issues.apache.org/jira/browse/SPARK-33029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
> Attachments: Screen Shot 2020-09-29 at 1.52.09 PM.png, Screen Shot 
> 2020-09-29 at 1.53.37 PM.png
>
>
> I am running a spark shell on a 1 node standalone cluster.  I noticed that 
> the executors page ui was marking the driver as blacklisted for the stage 
> that is running.  Attached a screen shot.
> Also, in my case one of the executors died and it doesn't seem like the 
> schedule rpicked up the new one.  It doesn't show up on the stages page and 
> just shows it as active but none of the tasks ran there.
>  
> You can reproduce this by starting a master and slave on a single node, then 
> launch a shell like where you will get multiple executors (in this case I got 
> 3)
> $SPARK_HOME/bin/spark-shell --master spark://yourhost:7077 --executor-cores 4 
> --conf spark.blacklist.enabled=true
>  
> From shell run:
> {code:java}
> import org.apache.spark.TaskContext
> val rdd = sc.makeRDD(1 to 1000, 5).mapPartitions { it =>
>  val context = TaskContext.get()
>  if (context.attemptNumber() < 2) {
>  throw new Exception("test attempt num")
>  }
>  it
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33845) Improve SimplifyConditionals

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255688#comment-17255688
 ] 

Apache Spark commented on SPARK-33845:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/30953

>  Improve SimplifyConditionals
> -
>
> Key: SPARK-33845
> URL: https://issues.apache.org/jira/browse/SPARK-33845
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Simplify If(cond, TrueLiteral, FalseLiteral) to cond.
> Simplify If(cond, FalseLiteral, TrueLiteral) to Not(cond).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-33899:

Fix Version/s: (was: 3.2.0)
   3.1.0

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33031) scheduler with blacklisting doesn't appear to pick up new executor added

2020-12-28 Thread Baohe Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255685#comment-17255685
 ] 

Baohe Zhang commented on SPARK-33031:
-

New tasks won't be scheduled because the node is marked as blacklisted after 2 
executors on that node are blacklisted. The behavior seems correct if the 
experiment is done on a single node.

> scheduler with blacklisting doesn't appear to pick up new executor added
> 
>
> Key: SPARK-33031
> URL: https://issues.apache.org/jira/browse/SPARK-33031
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Thomas Graves
>Priority: Critical
>
> I was running a test with blacklisting  standalone mode and all the executors 
> were initially blacklisted.  Then one of the executors died and we got 
> allocated another one. The scheduler did not appear to pick up the new one 
> and try to schedule on it though.
> You can reproduce this by starting a master and slave on a single node, then 
> launch a shell like where you will get multiple executors (in this case I got 
> 3)
> $SPARK_HOME/bin/spark-shell --master spark://yourhost:7077 --executor-cores 4 
> --conf spark.blacklist.enabled=true
> From shell run:
> {code:java}
> import org.apache.spark.TaskContext
> val rdd = sc.makeRDD(1 to 1000, 5).mapPartitions { it =>
>  val context = TaskContext.get()
>  if (context.attemptNumber() < 2) {
>  throw new Exception("test attempt num")
>  }
>  it
> }
> rdd.collect(){code}
>  
> Note that I tried both with and without dynamic allocation enabled.
>  
> You can see screen shot related on 
> https://issues.apache.org/jira/browse/SPARK-33029



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33029) Standalone mode blacklist executors page UI marks driver as blacklisted

2020-12-28 Thread Baohe Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255682#comment-17255682
 ] 

Baohe Zhang commented on SPARK-33029:
-

With the blacklist feature enabled, by default, a node will be excluded when 2 
executors on this node have been excluded. In this case, the node is excluded 
and we will mark all executors in that node as excluded. Since we are running 
standalone mode in a single node, the driver and all executors share the same 
hostname. the driver will be marked as excluded on AppStatusListener when 
handling "SparkListenerNodeExcludedForStage" event. We can fix it by filter out 
the driver entity when handling this event, hence the UI won't show the driver 
is excluded.

> Standalone mode blacklist executors page UI marks driver as blacklisted
> ---
>
> Key: SPARK-33029
> URL: https://issues.apache.org/jira/browse/SPARK-33029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
> Attachments: Screen Shot 2020-09-29 at 1.52.09 PM.png, Screen Shot 
> 2020-09-29 at 1.53.37 PM.png
>
>
> I am running a spark shell on a 1 node standalone cluster.  I noticed that 
> the executors page ui was marking the driver as blacklisted for the stage 
> that is running.  Attached a screen shot.
> Also, in my case one of the executors died and it doesn't seem like the 
> schedule rpicked up the new one.  It doesn't show up on the stages page and 
> just shows it as active but none of the tasks ran there.
>  
> You can reproduce this by starting a master and slave on a single node, then 
> launch a shell like where you will get multiple executors (in this case I got 
> 3)
> $SPARK_HOME/bin/spark-shell --master spark://yourhost:7077 --executor-cores 4 
> --conf spark.blacklist.enabled=true
>  
> From shell run:
> {code:java}
> import org.apache.spark.TaskContext
> val rdd = sc.makeRDD(1 to 1000, 5).mapPartitions { it =>
>  val context = TaskContext.get()
>  if (context.attemptNumber() < 2) {
>  throw new Exception("test attempt num")
>  }
>  it
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.

2020-12-28 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255681#comment-17255681
 ] 

L. C. Hsieh commented on SPARK-33920:
-

Could you elaborate more on the data and datatype you want explicitly assign? 
Unlike Python, Scala API might not allow you arbitrarily assign different 
datatype to an input data. For example, for DecimalType, the input data must be 
Decimal related JVM types like BigDecimal, Decimal, Java's BigDecimal, Java's 
BigInteger. If you assign Decimal datatype to a float input, the converter 
cannot convert it.

> We cannot pass schema to a createDataFrame function in scala, however we can 
> do this in python.
> ---
>
> Key: SPARK-33920
> URL: https://issues.apache.org/jira/browse/SPARK-33920
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Abdul Rafay Abdul Rafay
>Priority: Major
> Attachments: Screenshot 2020-12-28 at 2.23.13 PM.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> ~spark.createDataFrame(data, schema)~
> ~I am able to pass schema as a parameter to a function createDataFrame in 
> python but cannot pass this in scala for static data.~



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-28 Thread Ted Yu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255631#comment-17255631
 ] 

Ted Yu commented on SPARK-33915:


[~codingcat] [~XuanYuan][~viirya][~Alex Herman]
Can you provide your comment ?

Thanks

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33924) v2 INSERT INTO .. PARTITION drops the partition location

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255617#comment-17255617
 ] 

Apache Spark commented on SPARK-33924:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30952

> v2 INSERT INTO .. PARTITION drops the partition location
> 
>
> Key: SPARK-33924
> URL: https://issues.apache.org/jira/browse/SPARK-33924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> See the test "with location" in v2.AlterTableRenamePartitionSuite:
> {code:scala}
> val loc = "location1"
> sql(s"ALTER TABLE $t ADD PARTITION (id = 2) LOCATION '$loc'")
> checkLocation(t, Map("id" -> "2"), loc)
> sql(s"INSERT INTO $t PARTITION (id = 2) SELECT 'def'")
> checkLocation(t, Map("id" -> "2"), loc)
> {code}
> The second check must not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33924) v2 INSERT INTO .. PARTITION drops the partition location

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33924:


Assignee: Apache Spark

> v2 INSERT INTO .. PARTITION drops the partition location
> 
>
> Key: SPARK-33924
> URL: https://issues.apache.org/jira/browse/SPARK-33924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> See the test "with location" in v2.AlterTableRenamePartitionSuite:
> {code:scala}
> val loc = "location1"
> sql(s"ALTER TABLE $t ADD PARTITION (id = 2) LOCATION '$loc'")
> checkLocation(t, Map("id" -> "2"), loc)
> sql(s"INSERT INTO $t PARTITION (id = 2) SELECT 'def'")
> checkLocation(t, Map("id" -> "2"), loc)
> {code}
> The second check must not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33924) v2 INSERT INTO .. PARTITION drops the partition location

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33924:


Assignee: (was: Apache Spark)

> v2 INSERT INTO .. PARTITION drops the partition location
> 
>
> Key: SPARK-33924
> URL: https://issues.apache.org/jira/browse/SPARK-33924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> See the test "with location" in v2.AlterTableRenamePartitionSuite:
> {code:scala}
> val loc = "location1"
> sql(s"ALTER TABLE $t ADD PARTITION (id = 2) LOCATION '$loc'")
> checkLocation(t, Map("id" -> "2"), loc)
> sql(s"INSERT INTO $t PARTITION (id = 2) SELECT 'def'")
> checkLocation(t, Map("id" -> "2"), loc)
> {code}
> The second check must not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33924) v2 INSERT INTO .. PARTITION drops the partition location

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33924:


Assignee: Apache Spark

> v2 INSERT INTO .. PARTITION drops the partition location
> 
>
> Key: SPARK-33924
> URL: https://issues.apache.org/jira/browse/SPARK-33924
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> See the test "with location" in v2.AlterTableRenamePartitionSuite:
> {code:scala}
> val loc = "location1"
> sql(s"ALTER TABLE $t ADD PARTITION (id = 2) LOCATION '$loc'")
> checkLocation(t, Map("id" -> "2"), loc)
> sql(s"INSERT INTO $t PARTITION (id = 2) SELECT 'def'")
> checkLocation(t, Map("id" -> "2"), loc)
> {code}
> The second check must not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread Stu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255603#comment-17255603
 ] 

Stu commented on SPARK-33883:
-

that makes so much sense, thanks!

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png, 
> image-2020-12-28-18-32-25-960.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33775) Suppress unimportant compilation warnings in Scala 2.13

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255581#comment-17255581
 ] 

Apache Spark commented on SPARK-33775:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30951

> Suppress unimportant compilation warnings in Scala 2.13 
> 
>
> Key: SPARK-33775
> URL: https://issues.apache.org/jira/browse/SPARK-33775
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.1.0
>
>
> Too many compilation warnings in Scala 2.13, add some `-Wconf:msg=regex` 
> rules to suppress unimportants.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33775) Suppress unimportant compilation warnings in Scala 2.13

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255580#comment-17255580
 ] 

Apache Spark commented on SPARK-33775:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30951

> Suppress unimportant compilation warnings in Scala 2.13 
> 
>
> Key: SPARK-33775
> URL: https://issues.apache.org/jira/browse/SPARK-33775
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.1.0
>
>
> Too many compilation warnings in Scala 2.13, add some `-Wconf:msg=regex` 
> rules to suppress unimportants.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255567#comment-17255567
 ] 

Apache Spark commented on SPARK-33899:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30950

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255566#comment-17255566
 ] 

Apache Spark commented on SPARK-33899:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30950

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33639) The external table does not specify a location

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1725#comment-1725
 ] 

Apache Spark commented on SPARK-33639:
--

User 'guixiaowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/30949

>  The external table does not specify a location
> ---
>
> Key: SPARK-33639
> URL: https://issues.apache.org/jira/browse/SPARK-33639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: guihuawen
>Priority: Major
>
> If the location of the external table is not specified, an error will be 
> reported. If the partition is not specified, the default is the same as the 
> specified partition of the internal table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33639) The external table does not specify a location

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255549#comment-17255549
 ] 

Apache Spark commented on SPARK-33639:
--

User 'guixiaowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/30949

>  The external table does not specify a location
> ---
>
> Key: SPARK-33639
> URL: https://issues.apache.org/jira/browse/SPARK-33639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: guihuawen
>Priority: Major
>
> If the location of the external table is not specified, an error will be 
> reported. If the partition is not specified, the default is the same as the 
> specified partition of the internal table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33637) Spark sql drop non-existent table will not report an error after failure

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255541#comment-17255541
 ] 

Apache Spark commented on SPARK-33637:
--

User 'guixiaowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/30948

> Spark sql drop non-existent table will not report an error after failure
> 
>
> Key: SPARK-33637
> URL: https://issues.apache.org/jira/browse/SPARK-33637
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: guihuawen
>Priority: Major
>
> In the migration of spark sql to hive sql, in order to reduce user changes, 
> spark sql will report an error when executing a table where drop does not 
> exist, but hive sql will not. In order to ensure that the grammar can be 
> executed normally, no error is reported when changing to a table that does 
> not exist in drop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33926) Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33926:


Assignee: Apache Spark

> Improve the error message in resolving of DSv1 multi-part identifiers
> -
>
> Key: SPARK-33926
> URL: https://issues.apache.org/jira/browse/SPARK-33926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> This is a follow up of 
> https://github.com/apache/spark/pull/30915#discussion_r549240857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33926) Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33926:


Assignee: (was: Apache Spark)

> Improve the error message in resolving of DSv1 multi-part identifiers
> -
>
> Key: SPARK-33926
> URL: https://issues.apache.org/jira/browse/SPARK-33926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> This is a follow up of 
> https://github.com/apache/spark/pull/30915#discussion_r549240857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33926) Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255524#comment-17255524
 ] 

Apache Spark commented on SPARK-33926:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30947

> Improve the error message in resolving of DSv1 multi-part identifiers
> -
>
> Key: SPARK-33926
> URL: https://issues.apache.org/jira/browse/SPARK-33926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> This is a follow up of 
> https://github.com/apache/spark/pull/30915#discussion_r549240857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33926) Improve the error message in resolving of DSv1 multi-part identifiers

2020-12-28 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33926:
--

 Summary: Improve the error message in resolving of DSv1 multi-part 
identifiers
 Key: SPARK-33926
 URL: https://issues.apache.org/jira/browse/SPARK-33926
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


This is a follow up of 
https://github.com/apache/spark/pull/30915#discussion_r549240857



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-32679) update "no-serde" in the codebase in other TRANSFORM PRs.

2020-12-28 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu closed SPARK-32679.
-

>  update "no-serde" in the codebase in other TRANSFORM PRs.
> --
>
> Key: SPARK-32679
> URL: https://issues.apache.org/jira/browse/SPARK-32679
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> https://github.com/apache/spark/pull/29500#discussion_r474476579



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32679) update "no-serde" in the codebase in other TRANSFORM PRs.

2020-12-28 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-32679.
---
Resolution: Not A Problem

>  update "no-serde" in the codebase in other TRANSFORM PRs.
> --
>
> Key: SPARK-32679
> URL: https://issues.apache.org/jira/browse/SPARK-32679
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> https://github.com/apache/spark/pull/29500#discussion_r474476579



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32679) update "no-serde" in the codebase in other TRANSFORM PRs.

2020-12-28 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255516#comment-17255516
 ] 

angerszhu commented on SPARK-32679:
---

No pr use default serde now, close this pr

>  update "no-serde" in the codebase in other TRANSFORM PRs.
> --
>
> Key: SPARK-32679
> URL: https://issues.apache.org/jira/browse/SPARK-32679
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> https://github.com/apache/spark/pull/29500#discussion_r474476579



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread Liu Neng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255515#comment-17255515
 ] 

Liu Neng commented on SPARK-33883:
--

!image-2020-12-28-18-24-18-395.png!

the first where is a table alias,

you can try 'select where.* from person where where name is not null'.

you can set spark.sql.ansi.enabled=true to raise an exception in this case.

!image-2020-12-28-18-32-25-960.png!

so I think it is not an issue. 

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png, 
> image-2020-12-28-18-32-25-960.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread Liu Neng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33883:
-
Attachment: image-2020-12-28-18-32-25-960.png

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png, 
> image-2020-12-28-18-32-25-960.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33883) Can repeat "where" twice without error in spark sql

2020-12-28 Thread Liu Neng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33883:
-
Attachment: image-2020-12-28-18-24-18-395.png

> Can repeat "where" twice without error in spark sql
> ---
>
> Key: SPARK-33883
> URL: https://issues.apache.org/jira/browse/SPARK-33883
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Stu
>Priority: Minor
> Attachments: image-2020-12-28-18-24-18-395.png
>
>
> the following sql code works, despite having bad syntax ("where" is mentioned 
> twice):
> {code:java}
> select * from table
> where where field is not null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue

2020-12-28 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255502#comment-17255502
 ] 

Kent Yao commented on SPARK-31685:
--

Thanks for [~dongjoon] for the quick response.

This issue is not SS specific and any long-running application may run into it. 
For example, in the project [Kyuubi|https://github.com/yaooqinn/kyuubi], we 
will always try to cache the spark application as long as possible if users 
keep pushing SQL statements to the server. The HadoopFSDelegationTokenProvider 
is YARN specific in old Spark versions and now moved to CORE. 

I took a quick glance at this part of the master branch, based on my 
understanding of the HDFS token, it seems to me that the problem still exists.

> Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN 
> expiration issue
> ---
>
> Key: SPARK-31685
> URL: https://issues.apache.org/jira/browse/SPARK-31685
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: Rajeev Kumar
>Priority: Major
>
> I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured 
> streaming with Kafka. Reading the stream from Kafka and saving it to HBase.
> I get this error on the driver after 24 hours.
>  
> {code:java}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
> at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:130)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1169)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1165)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at 
> org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1171)
> at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1630)
> at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
> at 
>

[jira] [Updated] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.

2020-12-28 Thread Abdul Rafay Abdul Rafay (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdul Rafay Abdul Rafay updated SPARK-33920:

Attachment: (was: Screenshot 2020-12-28 at 2.23.13 PM.png)

> We cannot pass schema to a createDataFrame function in scala, however we can 
> do this in python.
> ---
>
> Key: SPARK-33920
> URL: https://issues.apache.org/jira/browse/SPARK-33920
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Abdul Rafay Abdul Rafay
>Priority: Major
> Attachments: Screenshot 2020-12-28 at 2.23.13 PM.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> ~spark.createDataFrame(data, schema)~
> ~I am able to pass schema as a parameter to a function createDataFrame in 
> python but cannot pass this in scala for static data.~



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.

2020-12-28 Thread Abdul Rafay Abdul Rafay (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdul Rafay Abdul Rafay updated SPARK-33920:

Attachment: Screenshot 2020-12-28 at 2.23.13 PM.png

> We cannot pass schema to a createDataFrame function in scala, however we can 
> do this in python.
> ---
>
> Key: SPARK-33920
> URL: https://issues.apache.org/jira/browse/SPARK-33920
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Abdul Rafay Abdul Rafay
>Priority: Major
> Attachments: Screenshot 2020-12-28 at 2.23.13 PM.png, Screenshot 
> 2020-12-28 at 2.23.13 PM.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> ~spark.createDataFrame(data, schema)~
> ~I am able to pass schema as a parameter to a function createDataFrame in 
> python but cannot pass this in scala for static data.~



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.

2020-12-28 Thread Abdul Rafay Abdul Rafay (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdul Rafay Abdul Rafay updated SPARK-33920:

Attachment: Screenshot 2020-12-28 at 2.23.13 PM.png

> We cannot pass schema to a createDataFrame function in scala, however we can 
> do this in python.
> ---
>
> Key: SPARK-33920
> URL: https://issues.apache.org/jira/browse/SPARK-33920
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Abdul Rafay Abdul Rafay
>Priority: Major
> Attachments: Screenshot 2020-12-28 at 2.23.13 PM.png, Screenshot 
> 2020-12-28 at 2.23.13 PM.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> ~spark.createDataFrame(data, schema)~
> ~I am able to pass schema as a parameter to a function createDataFrame in 
> python but cannot pass this in scala for static data.~



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.

2020-12-28 Thread Abdul Rafay Abdul Rafay (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255481#comment-17255481
 ] 

Abdul Rafay Abdul Rafay commented on SPARK-33920:
-

[~viirya] you're right and I know Scala API uses Scala reflection to infer the 
schema of the given Product. But where I want to assign a datatype explicitly 
like DecimalType or FloatType to a  dataframe created from a static sequence of 
rows, it creates a problem there, however, in the case of pyspark it does not

> We cannot pass schema to a createDataFrame function in scala, however we can 
> do this in python.
> ---
>
> Key: SPARK-33920
> URL: https://issues.apache.org/jira/browse/SPARK-33920
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Abdul Rafay Abdul Rafay
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> ~spark.createDataFrame(data, schema)~
> ~I am able to pass schema as a parameter to a function createDataFrame in 
> python but cannot pass this in scala for static data.~



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32684) Scrip transform hive serde/default-serde mode null value keep same with hive as '\\N'

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255482#comment-17255482
 ] 

Apache Spark commented on SPARK-32684:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30946

> Scrip transform hive serde/default-serde mode null value keep same with hive 
> as '\\N'
> -
>
> Key: SPARK-32684
> URL: https://issues.apache.org/jira/browse/SPARK-32684
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Hive serde default NULL value is '\N'
> {code:java}
> String nullString = tbl.getProperty(
> serdeConstants.SERIALIZATION_NULL_FORMAT, "\\N");
> nullSequence = new Text(nullString);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32684) Scrip transform hive serde/default-serde mode null value keep same with hive as '\\N'

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255485#comment-17255485
 ] 

Apache Spark commented on SPARK-32684:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30946

> Scrip transform hive serde/default-serde mode null value keep same with hive 
> as '\\N'
> -
>
> Key: SPARK-32684
> URL: https://issues.apache.org/jira/browse/SPARK-32684
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Hive serde default NULL value is '\N'
> {code:java}
> String nullString = tbl.getProperty(
> serdeConstants.SERIALIZATION_NULL_FORMAT, "\\N");
> nullSequence = new Text(nullString);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33925:


Assignee: (was: Apache Spark)

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2020-12-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255479#comment-17255479
 ] 

Apache Spark commented on SPARK-33925:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30945

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2020-12-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33925:


Assignee: Apache Spark

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33899.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30915
[https://github.com/apache/spark/pull/30915]

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33899:
---

Assignee: Maxim Gekk

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32684) Scrip transform hive serde/default-serde mode null value keep same with hive as '\\N'

2020-12-28 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255475#comment-17255475
 ] 

angerszhu commented on SPARK-32684:
---

Make a mistake, it don't need to fix 

> Scrip transform hive serde/default-serde mode null value keep same with hive 
> as '\\N'
> -
>
> Key: SPARK-32684
> URL: https://issues.apache.org/jira/browse/SPARK-32684
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Hive serde default NULL value is '\N'
> {code:java}
> String nullString = tbl.getProperty(
> serdeConstants.SERIALIZATION_NULL_FORMAT, "\\N");
> nullSequence = new Text(nullString);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32684) Scrip transform hive serde/default-serde mode null value keep same with hive as '\\N'

2020-12-28 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-32684.
---
Resolution: Not A Bug

> Scrip transform hive serde/default-serde mode null value keep same with hive 
> as '\\N'
> -
>
> Key: SPARK-32684
> URL: https://issues.apache.org/jira/browse/SPARK-32684
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> Hive serde default NULL value is '\N'
> {code:java}
> String nullString = tbl.getProperty(
> serdeConstants.SERIALIZATION_NULL_FORMAT, "\\N");
> nullSequence = new Text(nullString);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33586) BisectingKMeansModel save and load implementation in pyspark

2020-12-28 Thread RISHAV DUTTA (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255471#comment-17255471
 ] 

RISHAV DUTTA commented on SPARK-33586:
--

Can you point me to the location where these save and load functions are 
implemented?

> BisectingKMeansModel save and load implementation in pyspark
> 
>
> Key: SPARK-33586
> URL: https://issues.apache.org/jira/browse/SPARK-33586
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.0.1
> Environment: Spark 3.0.1 with Hadoop 2.7
>Reporter: Iman Kermani
>Priority: Minor
>
> BisectingKMeansModel save and load functions are implemented in Java and 
> Scala.
> It would be nice if it was implemented in pyspark too.
> Thanks in advance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2020-12-28 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255468#comment-17255468
 ] 

Hyukjin Kwon commented on SPARK-33925:
--

It's a improvement but I will add the affected version to 3.0.1+ to make it 
easier to maintain.

> Remove unused SecurityManager in Utils.fetchFile
> 
>
> Key: SPARK-33925
> URL: https://issues.apache.org/jira/browse/SPARK-33925
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
> SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33925) Remove unused SecurityManager in Utils.fetchFile

2020-12-28 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33925:


 Summary: Remove unused SecurityManager in Utils.fetchFile
 Key: SPARK-33925
 URL: https://issues.apache.org/jira/browse/SPARK-33925
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.1, 3.1.0, 3.2.0
Reporter: Hyukjin Kwon


The last usage of {{SecurityManager}} in {{Utils.fetchFile}} was removed in 
SPARK-27004. We don't need to pass it around anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33924) v2 INSERT INTO .. PARTITION drops the partition location

2020-12-28 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33924:
--

 Summary: v2 INSERT INTO .. PARTITION drops the partition location
 Key: SPARK-33924
 URL: https://issues.apache.org/jira/browse/SPARK-33924
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


See the test "with location" in v2.AlterTableRenamePartitionSuite:
{code:scala}
val loc = "location1"
sql(s"ALTER TABLE $t ADD PARTITION (id = 2) LOCATION '$loc'")
checkLocation(t, Map("id" -> "2"), loc)

sql(s"INSERT INTO $t PARTITION (id = 2) SELECT 'def'")
checkLocation(t, Map("id" -> "2"), loc)
{code}
The second check must not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32685) Script transform hive serde default field.delimit is '\t'

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-32685.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30942
[https://github.com/apache/spark/pull/30942]

> Script transform hive serde default field.delimit is '\t'
> -
>
> Key: SPARK-32685
> URL: https://issues.apache.org/jira/browse/SPARK-32685
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
>  
> {code:java}
> select split(value, "\t") from (
> SELECT TRANSFORM(a, b, c, null)
> USING 'cat' 
> FROM (select 1 as a, 2 as b, 3  as c) t
> ) temp;
> result is :
> _c0
> ["2","3","\\N"]{code}
>  
> {code:java}
> select split(value, "\t") from (
> SELECT TRANSFORM(a, b, c, null)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> USING 'cat' 
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
>'serialization.last.column.takes.rest' = 'true'
>   )
> FROM (select 1 as a, 2 as b, 3  as c) t
> ) temp;
> result is :
> _c0
> ["2","3","\\N"]{code}
>  
>  
>  
> {code:java}
> select split(value, "\t") from (
> SELECT TRANSFORM(a, b, c, null)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> USING 'cat' 
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> FROM (select 1 as a, 2 as b, 3  as c) t
> ) temp;
> result is :
> _c0 
> ["2"]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32685) Script transform hive serde default field.delimit is '\t'

2020-12-28 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-32685:
---

Assignee: angerszhu

> Script transform hive serde default field.delimit is '\t'
> -
>
> Key: SPARK-32685
> URL: https://issues.apache.org/jira/browse/SPARK-32685
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
>  
> {code:java}
> select split(value, "\t") from (
> SELECT TRANSFORM(a, b, c, null)
> USING 'cat' 
> FROM (select 1 as a, 2 as b, 3  as c) t
> ) temp;
> result is :
> _c0
> ["2","3","\\N"]{code}
>  
> {code:java}
> select split(value, "\t") from (
> SELECT TRANSFORM(a, b, c, null)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> USING 'cat' 
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>   WITH SERDEPROPERTIES (
>'serialization.last.column.takes.rest' = 'true'
>   )
> FROM (select 1 as a, 2 as b, 3  as c) t
> ) temp;
> result is :
> _c0
> ["2","3","\\N"]{code}
>  
>  
>  
> {code:java}
> select split(value, "\t") from (
> SELECT TRANSFORM(a, b, c, null)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> USING 'cat' 
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> FROM (select 1 as a, 2 as b, 3  as c) t
> ) temp;
> result is :
> _c0 
> ["2"]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue

2020-12-28 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255462#comment-17255462
 ] 

L. C. Hsieh commented on SPARK-31685:
-

Is this particularly related to SS? The issue described for 
{{HadoopFSDelegationTokenProvider}} looks not SS specific.

BTW, for the tokens obtained in {{getTokenRenewalInterval}}, looks like it is 
only used for getting renewal interval. Why it expires will cause the 
application failure? Spark should only use the first tokens in 
{{obtainDelegationTokens}}.

> Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN 
> expiration issue
> ---
>
> Key: SPARK-31685
> URL: https://issues.apache.org/jira/browse/SPARK-31685
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: Rajeev Kumar
>Priority: Major
>
> I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured 
> streaming with Kafka. Reading the stream from Kafka and saving it to HBase.
> I get this error on the driver after 24 hours.
>  
> {code:java}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
> at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:130)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1169)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1165)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at 
> org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1171)
> at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1630)
> at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
> at 
>

[jira] [Updated] (SPARK-33923) Fix some tests with AQE enabled

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33923:
--
Parent: SPARK-33828
Issue Type: Sub-task  (was: Improvement)

> Fix some tests with AQE enabled
> ---
>
> Key: SPARK-33923
> URL: https://issues.apache.org/jira/browse/SPARK-33923
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.1.0
>
>
> e.g.,
> DataFrameAggregateSuite
> DataFrameJoinSuite
> JoinSuite
> PlannerSuite
> BucketedReadSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33923) Fix some tests with AQE enabled

2020-12-28 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33923:
-

Assignee: wuyi

> Fix some tests with AQE enabled
> ---
>
> Key: SPARK-33923
> URL: https://issues.apache.org/jira/browse/SPARK-33923
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> e.g.,
> DataFrameAggregateSuite
> DataFrameJoinSuite
> JoinSuite
> PlannerSuite
> BucketedReadSuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 102 matches

Mail list logo