date:20210117

[jira] [Comment Edited] (SPARK-29890) Unable to fill na with 0 with duplicate columns

2021-01-17 Thread Peter Toth (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267059#comment-17267059
 ] 

Peter Toth edited comment on SPARK-29890 at 1/18/21, 7:57 AM:
--

[~imback82], yes, that's a good example. `fill` didn't throw any exception 
before this ticket. 


was (Author: petertoth):
[~imback82], yes, that's a good example. `fill` didn't throw any exception 
before this PR. 

> Unable to fill na with 0 with duplicate columns
> ---
>
> Key: SPARK-29890
> URL: https://issues.apache.org/jira/browse/SPARK-29890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3
>Reporter: sandeshyapuram
>Assignee: Terry Kim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Trying to fill out na values with 0.
> {noformat}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> val parent = 
> spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc")
> val c1 = parent.filter(lit(true))
> val c2 = parent.filter(lit(true))
> c1.join(c2, Seq("nums"), "left")
> .na.fill(0).show{noformat}
> {noformat}
> 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: 
> error looking up the name of group 820818257: No such file or directory
> org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could 
> be: abc, abc.;
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1246)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134)
>   ... 54 elided{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29890) Unable to fill na with 0 with duplicate columns

2021-01-17 Thread Peter Toth (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267059#comment-17267059
 ] 

Peter Toth commented on SPARK-29890:


[~imback82], yes, that's a good example. `fill` didn't throw any exception 
before this PR. 

> Unable to fill na with 0 with duplicate columns
> ---
>
> Key: SPARK-29890
> URL: https://issues.apache.org/jira/browse/SPARK-29890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3
>Reporter: sandeshyapuram
>Assignee: Terry Kim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Trying to fill out na values with 0.
> {noformat}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> val parent = 
> spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc")
> val c1 = parent.filter(lit(true))
> val c2 = parent.filter(lit(true))
> c1.join(c2, Seq("nums"), "left")
> .na.fill(0).show{noformat}
> {noformat}
> 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: 
> error looking up the name of group 820818257: No such file or directory
> org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could 
> be: abc, abc.;
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1246)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134)
>   ... 54 elided{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34150) Strip Null literal.sql in resolve alias

2021-01-17 Thread ulysses you (Jira)

ulysses you created SPARK-34150:
---

 Summary: Strip Null literal.sql in resolve alias
 Key: SPARK-34150
 URL: https://issues.apache.org/jira/browse/SPARK-34150
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: ulysses you


We will convert `Literal(null)` to target data type during analysis. Then the 
generated alias name will include something like `CAST(NULL AS INT)` instead of 
`NULL`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache

2021-01-17 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-34149:
---
Description: 
For example, the test below:
{code:scala}
  test("SPARK-X: refresh cache in partition adding") {
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)")
  sql(s"ALTER TABLE $t ADD PARTITION (part=0)")
  assert(!spark.catalog.isCached(t))
  sql(s"CACHE TABLE $t")
  assert(spark.catalog.isCached(t))
  checkAnswer(sql(s"SELECT * FROM $t"), Row(0))

  sql(s"ALTER TABLE $t ADD PARTITION (part=1)")
  assert(spark.catalog.isCached(t))
  checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1)))
}
  }
{code}
fails with;
{code}
!== Correct Answer - 2 ==   == Spark Answer - 1 ==
!struct<>   struct
 [0][0]
![1]

   
ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at 
(QueryTest.scala:243)
{code}
because the command doesn't refresh the cache.

> DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache
> -
>
> Key: SPARK-34149
> URL: https://issues.apache.org/jira/browse/SPARK-34149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example, the test below:
> {code:scala}
>   test("SPARK-X: refresh cache in partition adding") {
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)")
>   sql(s"ALTER TABLE $t ADD PARTITION (part=0)")
>   assert(!spark.catalog.isCached(t))
>   sql(s"CACHE TABLE $t")
>   assert(spark.catalog.isCached(t))
>   checkAnswer(sql(s"SELECT * FROM $t"), Row(0))
>   sql(s"ALTER TABLE $t ADD PARTITION (part=1)")
>   assert(spark.catalog.isCached(t))
>   checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1)))
> }
>   }
> {code}
> fails with;
> {code}
> !== Correct Answer - 2 ==   == Spark Answer - 1 ==
> !struct<>   struct
>  [0][0]
> ![1]
> 
>
> ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at 
> (QueryTest.scala:243)
> {code}
> because the command doesn't refresh the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache

2021-01-17 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34149:
--

 Summary: DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh 
table cache
 Key: SPARK-34149
 URL: https://issues.apache.org/jira/browse/SPARK-34149
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34064) Broadcast job is not aborted even the SQL statement canceled

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267055#comment-17267055
 ] 

Apache Spark commented on SPARK-34064:
--

User 'LantaoJin' has created a pull request for this issue:
https://github.com/apache/spark/pull/31227

> Broadcast job is not aborted even the SQL statement canceled
> 
>
> Key: SPARK-34064
> URL: https://issues.apache.org/jira/browse/SPARK-34064
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.0, 3.1.1
>Reporter: Lantao Jin
>Priority: Minor
> Attachments: Screen Shot 2021-01-11 at 12.03.13 PM.png
>
>
> SPARK-27036 introduced a runId for BroadcastExchangeExec to resolve the 
> problem that a broadcast job is not aborted when broadcast timeout happens. 
> Since the runId is a random UUID, when a SQL statement is cancelled, these 
> broadcast sub-jobs still not canceled as a whole.
>  !Screen Shot 2021-01-11 at 12.03.13 PM.png|width=100%! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33354) New explicit cast syntax rules in ANSI mode

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33354:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> New explicit cast syntax rules in ANSI mode
> ---
>
> Key: SPARK-33354
> URL: https://issues.apache.org/jira/browse/SPARK-33354
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> In section 6.13 of the ANSI SQL standard,  there are syntax rules for valid 
> combinations of the source and target data types.
> To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
> following casting in ANSI mode:
> {code:java}
> TimeStamp <=> Boolean
> Date <=> Boolean
> Numeric <=> Timestamp
> Numeric <=> Date
> Numeric <=> Binary
> String <=> Array
> String <=> Map
> String <=> Struct
> {code}
> The following castings are considered invalid in ANSI SQL standard, but they 
> are quite straight forward. Let's Allow them for now
> {code:java}
> Numeric <=> Boolean
> String <=> Boolean
> String <=> Binary
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33354) New explicit cast syntax rules in ANSI mode

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33354:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> New explicit cast syntax rules in ANSI mode
> ---
>
> Key: SPARK-33354
> URL: https://issues.apache.org/jira/browse/SPARK-33354
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.1
>
>
> In section 6.13 of the ANSI SQL standard,  there are syntax rules for valid 
> combinations of the source and target data types.
> To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
> following casting in ANSI mode:
> {code:java}
> TimeStamp <=> Boolean
> Date <=> Boolean
> Numeric <=> Timestamp
> Numeric <=> Date
> Numeric <=> Binary
> String <=> Array
> String <=> Map
> String <=> Struct
> {code}
> The following castings are considered invalid in ANSI SQL standard, but they 
> are quite straight forward. Let's Allow them for now
> {code:java}
> Numeric <=> Boolean
> String <=> Boolean
> String <=> Binary
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33819:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be 
> `package private`
> ---
>
> Key: SPARK-33819
> URL: https://issues.apache.org/jira/browse/SPARK-33819
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34069) Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34069:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL
> ---
>
> Key: SPARK-34069
> URL: https://issues.apache.org/jira/browse/SPARK-34069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Major
> Fix For: 3.1.1
>
>
> We should interrupt task thread if user set local property 
> `SPARK_JOB_INTERRUPT_ON_CANCEL` to true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34103:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
> --
>
> Key: SPARK-34103
> URL: https://issues.apache.org/jira/browse/SPARK-34103
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.2, 3.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34041) Miscellaneous cleanup for new PySpark documentation

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34041:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> Miscellaneous cleanup for new PySpark documentation
> ---
>
> Key: SPARK-34041
> URL: https://issues.apache.org/jira/browse/SPARK-34041
> Project: Spark
>  Issue Type: Sub-task
>  Components: docs
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.1.1
>
>
> 1. Add a link of quick start in PySpark docs into "Programming Guides" in 
> Spark main docs
> 2. ML MLlib -> MLlib (DataFrame-based)" and "MLlib (RDD-based)"
> 3. Mention MLlib user guide 
> (https://dist.apache.org/repos/dist/dev/spark/v3.1.0-rc1-docs/_site/ml-guide.html)
> 4. Mention other migration guides as well because PySpark can get affected by 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33100) Support parse the sql statements with c-style comments

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33100:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> Support parse the sql statements with c-style comments
> --
>
> Key: SPARK-33100
> URL: https://issues.apache.org/jira/browse/SPARK-33100
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: feiwang
>Assignee: feiwang
>Priority: Minor
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>
> Now the spark-sql does not support parse the sql statements with C-style 
> comments.
> For the sql statements:
> {code:java}
> /* SELECT 'test'; */
> SELECT 'test';
> {code}
> Would be split to two statements:
> The first: "/* SELECT 'test'"
> The second: "*/ SELECT 'test'"
> Then it would throw an exception because the first one is illegal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33796) Show hidden text from the left menu of Spark Doc

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33796:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> Show hidden text from the left menu of Spark Doc
> 
>
> Key: SPARK-33796
> URL: https://issues.apache.org/jira/browse/SPARK-33796
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.0, 3.0.1, 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.1
>
>
> If the text in the left menu of Spark is too long, it will be hidden. We 
> should fix it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30681) Add higher order functions API to PySpark

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-30681:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> Add higher order functions API to PySpark
> -
>
> Key: SPARK-30681
> URL: https://issues.apache.org/jira/browse/SPARK-30681
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> As of 3.0.0 higher order functions are available in SQL and Scala, but not in 
> PySpark, forcing Python users to invoke these through {{expr}}, 
> {{selectExpr}} or {{sql}}.
> This is error prone and not well documented. Spark should provide 
> {{pyspark.sql}} wrappers that accept plain Python functions (of course within 
> limits of {{(*Column) -> Column}}) as arguments.
> {code:python}
> df.select(transform("values", lambda c: trim(upper(c)))
> def  increment_values(k: Column, v: Column) -> Column:
> return v + 1
> df.select(transform_values("data"), increment_values)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30681) Add higher order functions API to PySpark

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-30681:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> Add higher order functions API to PySpark
> -
>
> Key: SPARK-30681
> URL: https://issues.apache.org/jira/browse/SPARK-30681
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.1
>
>
> As of 3.0.0 higher order functions are available in SQL and Scala, but not in 
> PySpark, forcing Python users to invoke these through {{expr}}, 
> {{selectExpr}} or {{sql}}.
> This is error prone and not well documented. Spark should provide 
> {{pyspark.sql}} wrappers that accept plain Python functions (of course within 
> limits of {{(*Column) -> Column}}) as arguments.
> {code:python}
> df.select(transform("values", lambda c: trim(upper(c)))
> def  increment_values(k: Column, v: Column) -> Column:
> return v + 1
> df.select(transform_values("data"), increment_values)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34021) Fix hyper links in SparkR documentation for CRAN submission

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34021:
-
Fix Version/s: (was: 3.1.0)
   3.1.1

> Fix hyper links in SparkR documentation for CRAN submission
> ---
>
> Key: SPARK-34021
> URL: https://issues.apache.org/jira/browse/SPARK-34021
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Fix For: 3.1.1
>
>
> CRAN submission fails due to:
> {code}
>Found the following (possibly) invalid URLs:
>  URL: http://jsonlines.org/ (moved to https://jsonlines.org/)
>From: man/read.json.Rd
>  man/write.json.Rd
>Status: 200
>Message: OK
>  URL: https://dl.acm.org/citation.cfm?id=1608614 (moved to
> https://dl.acm.org/doi/10.1109/MC.2009.263)
>From: inst/doc/sparkr-vignettes.html
>Status: 200
>Message: OK
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34139) UnresolvedRelation should retain SQL text position

2021-01-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34139:
---

Assignee: Terry Kim

> UnresolvedRelation should retain SQL text position
> --
>
> Key: SPARK-34139
> URL: https://issues.apache.org/jira/browse/SPARK-34139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> UnresolvedRelation should retain SQL text position. The following commands 
> will be handled:
> {code:java}
> CACHE TABLE unknown
> UNCACHE TABLE unknown
> DELETE FROM unknown
> UPDATE unknown SET name='abc'
> MERGE INTO unknown1 AS target USING unknown2 AS source ON target.col = 
> source.col WHEN MATCHED THEN DELETE
> INSERT INTO TABLE unknown SELECT 1
> INSERT OVERWRITE TABLE unknown VALUES (1, 'a')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34139) UnresolvedRelation should retain SQL text position

2021-01-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34139.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31209
[https://github.com/apache/spark/pull/31209]

> UnresolvedRelation should retain SQL text position
> --
>
> Key: SPARK-34139
> URL: https://issues.apache.org/jira/browse/SPARK-34139
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.2.0
>
>
> UnresolvedRelation should retain SQL text position. The following commands 
> will be handled:
> {code:java}
> CACHE TABLE unknown
> UNCACHE TABLE unknown
> DELETE FROM unknown
> UPDATE unknown SET name='abc'
> MERGE INTO unknown1 AS target USING unknown2 AS source ON target.col = 
> source.col WHEN MATCHED THEN DELETE
> INSERT INTO TABLE unknown SELECT 1
> INSERT OVERWRITE TABLE unknown VALUES (1, 'a')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33696) Upgrade built-in Hive to 2.3.8

2021-01-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33696:
-

Assignee: Yuming Wang

> Upgrade built-in Hive to 2.3.8
> --
>
> Key: SPARK-33696
> URL: https://issues.apache.org/jira/browse/SPARK-33696
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Hive 2.3.8 changes:
>  HIVE-19662: Upgrade Avro to 1.8.2
>  HIVE-24324: Remove deprecated API usage from Avro
>  HIVE-23980: Shade Guava from hive-exec in Hive 2.3
>  HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
>  HIVE-24512: Exclude calcite in packaging.
>  HIVE-22708: Fix for HttpTransport to replace String.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33696) Upgrade built-in Hive to 2.3.8

2021-01-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33696.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30657
[https://github.com/apache/spark/pull/30657]

> Upgrade built-in Hive to 2.3.8
> --
>
> Key: SPARK-33696
> URL: https://issues.apache.org/jira/browse/SPARK-33696
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Hive 2.3.8 changes:
>  HIVE-19662: Upgrade Avro to 1.8.2
>  HIVE-24324: Remove deprecated API usage from Avro
>  HIVE-23980: Shade Guava from hive-exec in Hive 2.3
>  HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
>  HIVE-24512: Exclude calcite in packaging.
>  HIVE-22708: Fix for HttpTransport to replace String.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33913) Upgrade Kafka to 2.7.0

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267028#comment-17267028
 ] 

Apache Spark commented on SPARK-33913:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31223

> Upgrade Kafka to 2.7.0
> --
>
> Key: SPARK-33913
> URL: https://issues.apache.org/jira/browse/SPARK-33913
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, DStreams
>Affects Versions: 3.2.0
>Reporter: dengziming
>Priority: Major
>
>  
> The Apache Kafka community has released for Apache Kafka 2.7.0, some features 
> are useful for example the KAFKA-9893
>  configurable TCP connection timeout, more details : 
> https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33913) Upgrade Kafka to 2.7.0

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267027#comment-17267027
 ] 

Apache Spark commented on SPARK-33913:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31223

> Upgrade Kafka to 2.7.0
> --
>
> Key: SPARK-33913
> URL: https://issues.apache.org/jira/browse/SPARK-33913
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, DStreams
>Affects Versions: 3.2.0
>Reporter: dengziming
>Priority: Major
>
>  
> The Apache Kafka community has released for Apache Kafka 2.7.0, some features 
> are useful for example the KAFKA-9893
>  configurable TCP connection timeout, more details : 
> https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30682) Add higher order functions API to SparkR

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266998#comment-17266998
 ] 

Apache Spark commented on SPARK-30682:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31226

> Add higher order functions API to SparkR
> 
>
> Key: SPARK-30682
> URL: https://issues.apache.org/jira/browse/SPARK-30682
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> As of 3.0.0 higher order functions are available in SQL and Scala, but not in 
> SparkR forcing R users to invoke these through {{expr}}, {{selectExpr}} or 
> {{sql}}.
> It would be great if Spark provided high level wrappers that accept plain R 
> functions operating on SQL expressions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30682) Add higher order functions API to SparkR

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266997#comment-17266997
 ] 

Apache Spark commented on SPARK-30682:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31226

> Add higher order functions API to SparkR
> 
>
> Key: SPARK-30682
> URL: https://issues.apache.org/jira/browse/SPARK-30682
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> As of 3.0.0 higher order functions are available in SQL and Scala, but not in 
> SparkR forcing R users to invoke these through {{expr}}, {{selectExpr}} or 
> {{sql}}.
> It would be great if Spark provided high level wrappers that accept plain R 
> functions operating on SQL expressions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266989#comment-17266989
 ] 

Apache Spark commented on SPARK-33819:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/31225

> SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be 
> `package private`
> ---
>
> Key: SPARK-33819
> URL: https://issues.apache.org/jira/browse/SPARK-33819
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266988#comment-17266988
 ] 

Apache Spark commented on SPARK-33819:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/31224

> SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be 
> `package private`
> ---
>
> Key: SPARK-33819
> URL: https://issues.apache.org/jira/browse/SPARK-33819
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266987#comment-17266987
 ] 

Apache Spark commented on SPARK-33819:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/31224

> SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be 
> `package private`
> ---
>
> Key: SPARK-33819
> URL: https://issues.apache.org/jira/browse/SPARK-33819
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31168) Upgrade Scala to 2.12.13

2021-01-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31168:
--
Parent: SPARK-33772
Issue Type: Sub-task  (was: Improvement)

> Upgrade Scala to 2.12.13
> 
>
> Key: SPARK-31168
> URL: https://issues.apache.org/jira/browse/SPARK-31168
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> h2. Highlights
>  * Performance improvements in the collections library: algorithmic 
> improvements and changes to avoid unnecessary allocations ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance])
>  * Performance improvements in the compiler ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+],
>  minor [effects in our 
> benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@])
>  * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL 
> encoding that avoids deadlocks (details on 
> [#8712|https://github.com/scala/scala/pull/8712])
>  * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in 
> the REPL, which can lead to deteriorating performance in long sessions 
> ([#8576|https://github.com/scala/scala/pull/8576])
>  * Fix some {{toX}} methods that could expose the underlying mutability of a 
> {{ListBuffer}}-generated collection 
> ([#8674|https://github.com/scala/scala/pull/8674])
> h3. JDK 9+ support
>  * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ 
> ([#8676|https://github.com/scala/scala/pull/8676])
>  * {{:javap}} in the REPL now works on JDK 9+ 
> ([#8400|https://github.com/scala/scala/pull/8400])
> h3. Other changes
>  * Support new labels for creating durations for consistency: 
> {{Duration("1m")}}, {{Duration("3 hrs")}} 
> ([#8325|https://github.com/scala/scala/pull/8325], 
> [#8450|https://github.com/scala/scala/pull/8450])
>  * Fix memory leak in runtime reflection's {{TypeTag}} caches 
> ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety 
> issues in runtime reflection 
> ([#8433|https://github.com/scala/scala/pull/8433])
>  * When using compiler plugins, the ordering of compiler phases may change 
> due to [#8427|https://github.com/scala/scala/pull/8427]
> For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29890) Unable to fill na with 0 with duplicate columns

2021-01-17 Thread Terry Kim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266970#comment-17266970
 ] 

Terry Kim commented on SPARK-29890:
---

[~petertoth] Could you share the example and the behavior change? Are you 
referring to something like the following:

{code:java}
scala> Seq(1).toDF("i").na.fill(0, Seq("j"))
org.apache.spark.sql.AnalysisException: Cannot resolve column name "j" among (i)
{code}
, which seems fine to me.

> Unable to fill na with 0 with duplicate columns
> ---
>
> Key: SPARK-29890
> URL: https://issues.apache.org/jira/browse/SPARK-29890
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3
>Reporter: sandeshyapuram
>Assignee: Terry Kim
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Trying to fill out na values with 0.
> {noformat}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> val parent = 
> spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc")
> val c1 = parent.filter(lit(true))
> val c2 = parent.filter(lit(true))
> c1.join(c2, Seq("nums"), "left")
> .na.fill(0).show{noformat}
> {noformat}
> 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: 
> error looking up the name of group 820818257: No such file or directory
> org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could 
> be: abc, abc.;
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117)
>   at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220)
>   at org.apache.spark.sql.Dataset.col(Dataset.scala:1246)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155)
>   at 
> org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134)
>   ... 54 elided{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31168) Upgrade Scala to 2.12.13

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266969#comment-17266969
 ] 

Apache Spark commented on SPARK-31168:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31223

> Upgrade Scala to 2.12.13
> 
>
> Key: SPARK-31168
> URL: https://issues.apache.org/jira/browse/SPARK-31168
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> h2. Highlights
>  * Performance improvements in the collections library: algorithmic 
> improvements and changes to avoid unnecessary allocations ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance])
>  * Performance improvements in the compiler ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+],
>  minor [effects in our 
> benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@])
>  * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL 
> encoding that avoids deadlocks (details on 
> [#8712|https://github.com/scala/scala/pull/8712])
>  * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in 
> the REPL, which can lead to deteriorating performance in long sessions 
> ([#8576|https://github.com/scala/scala/pull/8576])
>  * Fix some {{toX}} methods that could expose the underlying mutability of a 
> {{ListBuffer}}-generated collection 
> ([#8674|https://github.com/scala/scala/pull/8674])
> h3. JDK 9+ support
>  * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ 
> ([#8676|https://github.com/scala/scala/pull/8676])
>  * {{:javap}} in the REPL now works on JDK 9+ 
> ([#8400|https://github.com/scala/scala/pull/8400])
> h3. Other changes
>  * Support new labels for creating durations for consistency: 
> {{Duration("1m")}}, {{Duration("3 hrs")}} 
> ([#8325|https://github.com/scala/scala/pull/8325], 
> [#8450|https://github.com/scala/scala/pull/8450])
>  * Fix memory leak in runtime reflection's {{TypeTag}} caches 
> ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety 
> issues in runtime reflection 
> ([#8433|https://github.com/scala/scala/pull/8433])
>  * When using compiler plugins, the ordering of compiler phases may change 
> due to [#8427|https://github.com/scala/scala/pull/8427]
> For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31168) Upgrade Scala to 2.12.13

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266968#comment-17266968
 ] 

Apache Spark commented on SPARK-31168:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/31223

> Upgrade Scala to 2.12.13
> 
>
> Key: SPARK-31168
> URL: https://issues.apache.org/jira/browse/SPARK-31168
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> h2. Highlights
>  * Performance improvements in the collections library: algorithmic 
> improvements and changes to avoid unnecessary allocations ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance])
>  * Performance improvements in the compiler ([list of 
> PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+],
>  minor [effects in our 
> benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@])
>  * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL 
> encoding that avoids deadlocks (details on 
> [#8712|https://github.com/scala/scala/pull/8712])
>  * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in 
> the REPL, which can lead to deteriorating performance in long sessions 
> ([#8576|https://github.com/scala/scala/pull/8576])
>  * Fix some {{toX}} methods that could expose the underlying mutability of a 
> {{ListBuffer}}-generated collection 
> ([#8674|https://github.com/scala/scala/pull/8674])
> h3. JDK 9+ support
>  * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ 
> ([#8676|https://github.com/scala/scala/pull/8676])
>  * {{:javap}} in the REPL now works on JDK 9+ 
> ([#8400|https://github.com/scala/scala/pull/8400])
> h3. Other changes
>  * Support new labels for creating durations for consistency: 
> {{Duration("1m")}}, {{Duration("3 hrs")}} 
> ([#8325|https://github.com/scala/scala/pull/8325], 
> [#8450|https://github.com/scala/scala/pull/8450])
>  * Fix memory leak in runtime reflection's {{TypeTag}} caches 
> ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety 
> issues in runtime reflection 
> ([#8433|https://github.com/scala/scala/pull/8433])
>  * When using compiler plugins, the ordering of compiler phases may change 
> due to [#8427|https://github.com/scala/scala/pull/8427]
> For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34080) Add UnivariateFeatureSelector to deprecate existing selectors

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266966#comment-17266966
 ] 

Apache Spark commented on SPARK-34080:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/31222

> Add UnivariateFeatureSelector to deprecate existing selectors
> -
>
> Key: SPARK-34080
> URL: https://issues.apache.org/jira/browse/SPARK-34080
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Xiangrui Meng
>Assignee: Huaxin Gao
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
>
> In SPARK-26111, we introduced a few univariate feature selectors, which share 
> a common set of params. And they are named after the underlying test, which 
> requires users to understand the test to find the matched scenarios. It would 
> be nice if we introduce a single class called UnivariateFeatureSelector that 
> accepts a selection criterion and a score method (string names). Then we can 
> deprecate all other univariate selectors.
> For the params, instead of ask users to provide what score function to use, 
> it is more friendly to ask users to specify the feature and label types 
> (continuous or categorical) and we set a default score function for each 
> combo. We can also detect the types from feature metadata if given. Advanced 
> users can overwrite it (if there are multiple score function that is 
> compatible with the feature type and label type combo). Example (param names 
> are not finalized):
> {code}
> selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], 
> labelCol=["target"], featureType="categorical", labelType="continuous", 
> select="bestK", k=100)
> {code}
> cc: [~huaxingao] [~ruifengz] [~weichenxu123]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34080) Add UnivariateFeatureSelector to deprecate existing selectors

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266964#comment-17266964
 ] 

Apache Spark commented on SPARK-34080:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/31222

> Add UnivariateFeatureSelector to deprecate existing selectors
> -
>
> Key: SPARK-34080
> URL: https://issues.apache.org/jira/browse/SPARK-34080
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Xiangrui Meng
>Assignee: Huaxin Gao
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
>
> In SPARK-26111, we introduced a few univariate feature selectors, which share 
> a common set of params. And they are named after the underlying test, which 
> requires users to understand the test to find the matched scenarios. It would 
> be nice if we introduce a single class called UnivariateFeatureSelector that 
> accepts a selection criterion and a score method (string names). Then we can 
> deprecate all other univariate selectors.
> For the params, instead of ask users to provide what score function to use, 
> it is more friendly to ask users to specify the feature and label types 
> (continuous or categorical) and we set a default score function for each 
> combo. We can also detect the types from feature metadata if given. Advanced 
> users can overwrite it (if there are multiple score function that is 
> compatible with the feature type and label type combo). Example (param names 
> are not finalized):
> {code}
> selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], 
> labelCol=["target"], featureType="categorical", labelType="continuous", 
> select="bestK", k=100)
> {code}
> cc: [~huaxingao] [~ruifengz] [~weichenxu123]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34142) Support Fallback Storage Cleanup during stopping SparkContext

2021-01-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34142:
-

Assignee: Dongjoon Hyun

> Support Fallback Storage Cleanup during stopping SparkContext
> -
>
> Key: SPARK-34142
> URL: https://issues.apache.org/jira/browse/SPARK-34142
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> SPARK-33545 added `Support Fallback Storage during worker decommission` for 
> the managed cloud-storage with TTL support. This issue aims to add additional 
> clean-up feature during stopping SparkContext to save some money before TTL 
> or the other HDFS-compatible storage which doesn't have TTL support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34142) Support Fallback Storage Cleanup during stopping SparkContext

2021-01-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34142.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31215
[https://github.com/apache/spark/pull/31215]

> Support Fallback Storage Cleanup during stopping SparkContext
> -
>
> Key: SPARK-34142
> URL: https://issues.apache.org/jira/browse/SPARK-34142
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>
> SPARK-33545 added `Support Fallback Storage during worker decommission` for 
> the managed cloud-storage with TTL support. This issue aims to add additional 
> clean-up feature during stopping SparkContext to save some money before TTL 
> or the other HDFS-compatible storage which doesn't have TTL support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33730) Standardize warning types

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33730.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30985
[https://github.com/apache/spark/pull/30985]

> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Bryński
>Priority: Major
> Fix For: 3.2.0
>
>
> We should use warnings properly per 
> [https://docs.python.org/3/library/warnings.html#warning-categories]
> In particular,
>  - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
> places we should show the warnings to end-users by default.
>  - we should __maybe__ think about customizing stacklevel 
> ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas 
> does.
>  - ...
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code:java}
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:warnings.warn(
> pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
> pyspark/shuffle.py:warnings.warn("Please install psutil to have 
> better "
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
> and value is not None. value will be ignored.")
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
> approx_count_distinct instead.", DeprecationWarning)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/functions.py:warnings.warn(
> pyspark/sql/pandas/group_ops.py:warnings.warn(
> pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
> support because failing to access HiveConf, "
> {code}
> PySpark prints warnings via using {{print}} in some places as well. We should 
> also see if we should switch and replace to {{warnings.warn}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-17 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-34125.
--
Fix Version/s: 2.4.8
   Resolution: Fixed

Issue resolved by pull request 31194
[https://github.com/apache/spark/pull/31194]

> Make EventLoggingListener.codecMap thread-safe
> --
>
> Key: SPARK-34125
> URL: https://issues.apache.org/jira/browse/SPARK-34125
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 2.4.8
>
> Attachments: jstack.png, top.png
>
>
> 2.x version of history server
>  EventLoggingListener.codecMap is of type mutable.HashMap, which is not 
> thread safe
>  This will cause the history server to suddenly get stuck and not work.
> The 3.x version was changed to EventLogFileReader.codecMap to 
> ConcurrentHashMap type, so there is no such problem.(-SPARK-28869-)
> PID 117049 0x1c939
> !top.png!
>  
> !jstack.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe

2021-01-17 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-34125:


Assignee: dzcxzl

> Make EventLoggingListener.codecMap thread-safe
> --
>
> Key: SPARK-34125
> URL: https://issues.apache.org/jira/browse/SPARK-34125
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Attachments: jstack.png, top.png
>
>
> 2.x version of history server
>  EventLoggingListener.codecMap is of type mutable.HashMap, which is not 
> thread safe
>  This will cause the history server to suddenly get stuck and not work.
> The 3.x version was changed to EventLogFileReader.codecMap to 
> ConcurrentHashMap type, so there is no such problem.(-SPARK-28869-)
> PID 117049 0x1c939
> !top.png!
>  
> !jstack.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266938#comment-17266938
 ] 

Apache Spark commented on SPARK-34148:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/31219

> Move general StateStore tests to StateStoreSuiteBase
> 
>
> Key: SPARK-34148
> URL: https://issues.apache.org/jira/browse/SPARK-34148
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> There are some general StateStore tests in StateStoreSuite which is 
> HDFSBackedStateStoreProvider-specific test suite. We should move general 
> tests into StateStoreSuiteBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34148:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Move general StateStore tests to StateStoreSuiteBase
> 
>
> Key: SPARK-34148
> URL: https://issues.apache.org/jira/browse/SPARK-34148
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> There are some general StateStore tests in StateStoreSuite which is 
> HDFSBackedStateStoreProvider-specific test suite. We should move general 
> tests into StateStoreSuiteBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34148:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Move general StateStore tests to StateStoreSuiteBase
> 
>
> Key: SPARK-34148
> URL: https://issues.apache.org/jira/browse/SPARK-34148
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> There are some general StateStore tests in StateStoreSuite which is 
> HDFSBackedStateStoreProvider-specific test suite. We should move general 
> tests into StateStoreSuiteBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266937#comment-17266937
 ] 

Apache Spark commented on SPARK-34148:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/31219

> Move general StateStore tests to StateStoreSuiteBase
> 
>
> Key: SPARK-34148
> URL: https://issues.apache.org/jira/browse/SPARK-34148
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> There are some general StateStore tests in StateStoreSuite which is 
> HDFSBackedStateStoreProvider-specific test suite. We should move general 
> tests into StateStoreSuiteBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase

2021-01-17 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-34148:
---

 Summary: Move general StateStore tests to StateStoreSuiteBase
 Key: SPARK-34148
 URL: https://issues.apache.org/jira/browse/SPARK-34148
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


There are some general StateStore tests in StateStoreSuite which is 
HDFSBackedStateStoreProvider-specific test suite. We should move general tests 
into StateStoreSuiteBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34123) Faster way to display/render entries in HistoryPage (Spark history server summary page)

2021-01-17 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-34123.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31191
[https://github.com/apache/spark/pull/31191]

> Faster way to display/render entries in HistoryPage (Spark history server 
> summary page)
> ---
>
> Key: SPARK-34123
> URL: https://issues.apache.org/jira/browse/SPARK-34123
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Mohanad Elsafty
>Assignee: Mohanad Elsafty
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: Screenshot 2021-01-15 at 1.21.40 PM.png
>
>
> Since a long time ago my team/company suffered from history server being very 
> slow to display/search entries specially when entries grow over 50k entry, 
> regardless there is a pagination there in that page already but still very 
> slow to display the entries.
>   
> Current situation *Mustache Js* is used to render the entries and 
> *datatables* is used to manipulate it (sort by column and search).
>  
> By getting rid of *Mustache*  (stop rendering the entries using *Mustache*) 
> and using *datatables*  to display it proved to be faster.
>  
> Displaying > 100k entries (my case):
> Existing takes at least 30 to 40 seconds to display the entries, searching 
> takes at least 20 seconds and the page stop responding until it finishes.
> Improved takes ~3 seconds to display the entries searching is very fast and 
> the page stays responsive.
> *(These numbers will be different for others since JS is executed on your 
> browser)*
>  
> I am not sure why *Mustache* is used to display the data since data tables 
> can do the job,
> [~ajbozarth] [~sowen]  please elaborate more about this what is the reason to 
> use *Mustache*? what are the drawbacks if it's not used anymore to display 
> the entries (only this part)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34123) Faster way to display/render entries in HistoryPage (Spark history server summary page)

2021-01-17 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-34123:


Assignee: Mohanad Elsafty

> Faster way to display/render entries in HistoryPage (Spark history server 
> summary page)
> ---
>
> Key: SPARK-34123
> URL: https://issues.apache.org/jira/browse/SPARK-34123
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Mohanad Elsafty
>Assignee: Mohanad Elsafty
>Priority: Major
> Attachments: Screenshot 2021-01-15 at 1.21.40 PM.png
>
>
> Since a long time ago my team/company suffered from history server being very 
> slow to display/search entries specially when entries grow over 50k entry, 
> regardless there is a pagination there in that page already but still very 
> slow to display the entries.
>   
> Current situation *Mustache Js* is used to render the entries and 
> *datatables* is used to manipulate it (sort by column and search).
>  
> By getting rid of *Mustache*  (stop rendering the entries using *Mustache*) 
> and using *datatables*  to display it proved to be faster.
>  
> Displaying > 100k entries (my case):
> Existing takes at least 30 to 40 seconds to display the entries, searching 
> takes at least 20 seconds and the page stop responding until it finishes.
> Improved takes ~3 seconds to display the entries searching is very fast and 
> the page stays responsive.
> *(These numbers will be different for others since JS is executed on your 
> browser)*
>  
> I am not sure why *Mustache* is used to display the data since data tables 
> can do the job,
> [~ajbozarth] [~sowen]  please elaborate more about this what is the reason to 
> use *Mustache*? what are the drawbacks if it's not used anymore to display 
> the entries (only this part)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266889#comment-17266889
 ] 

Apache Spark commented on SPARK-34147:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/31218

> Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled 
> --
>
> Key: SPARK-34147
> URL: https://issues.apache.org/jira/browse/SPARK-34147
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Major
>
> {{--cbo}} should not change partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34147:


Assignee: (was: Apache Spark)

> Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled 
> --
>
> Key: SPARK-34147
> URL: https://issues.apache.org/jira/browse/SPARK-34147
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Major
>
> {{--cbo}} should not change partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34147:


Assignee: Apache Spark

> Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled 
> --
>
> Key: SPARK-34147
> URL: https://issues.apache.org/jira/browse/SPARK-34147
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Apache Spark
>Priority: Major
>
> {{--cbo}} should not change partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266888#comment-17266888
 ] 

Apache Spark commented on SPARK-34147:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/31218

> Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled 
> --
>
> Key: SPARK-34147
> URL: https://issues.apache.org/jira/browse/SPARK-34147
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Major
>
> {{--cbo}} should not change partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled

2021-01-17 Thread Peter Toth (Jira)

Peter Toth created SPARK-34147:
--

 Summary: Keep data partitioning in TPCDSQueryBenchmark when CBO is 
enabled 
 Key: SPARK-34147
 URL: https://issues.apache.org/jira/browse/SPARK-34147
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 3.2.0
Reporter: Peter Toth


{{--cbo}} should not change partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266848#comment-17266848
 ] 

Apache Spark commented on SPARK-34146:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/31217

> "*" should not throw Exception in SparkGetSchemasOperation
> --
>
> Key: SPARK-34146
> URL: https://issues.apache.org/jira/browse/SPARK-34146
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Cheng Pan
>Priority: Minor
>
> HiveServer2 treat "*" as list all databases, but spark will throw `Exception` 
> when handle global temp view since "" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34146:


Assignee: (was: Apache Spark)

> "*" should not throw Exception in SparkGetSchemasOperation
> --
>
> Key: SPARK-34146
> URL: https://issues.apache.org/jira/browse/SPARK-34146
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Cheng Pan
>Priority: Minor
>
> HiveServer2 treat "*" as list all databases, but spark will throw `Exception` 
> when handle global temp view since "" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266847#comment-17266847
 ] 

Apache Spark commented on SPARK-34146:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/31217

> "*" should not throw Exception in SparkGetSchemasOperation
> --
>
> Key: SPARK-34146
> URL: https://issues.apache.org/jira/browse/SPARK-34146
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Cheng Pan
>Priority: Minor
>
> HiveServer2 treat "*" as list all databases, but spark will throw `Exception` 
> when handle global temp view since "" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34146:


Assignee: Apache Spark

> "*" should not throw Exception in SparkGetSchemasOperation
> --
>
> Key: SPARK-34146
> URL: https://issues.apache.org/jira/browse/SPARK-34146
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Cheng Pan
>Assignee: Apache Spark
>Priority: Minor
>
> HiveServer2 treat "*" as list all databases, but spark will throw `Exception` 
> when handle global temp view since "" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34146) "*" should not crash in SparkGetSchemasOperation

2021-01-17 Thread Cheng Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan updated SPARK-34146:
--
Description: HiveServer2 treat "*" as list all databases, but spark will 
throw `Exception` when handle global temp view since "" is not a valid regex.  
(was: HiveServer2 treat "*" as list all databases, but spark will crashed when 
handle global temp view since "" is not a valid regex.)

> "*" should not crash in SparkGetSchemasOperation
> 
>
> Key: SPARK-34146
> URL: https://issues.apache.org/jira/browse/SPARK-34146
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Cheng Pan
>Priority: Minor
>
> HiveServer2 treat "*" as list all databases, but spark will throw `Exception` 
> when handle global temp view since "" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation

2021-01-17 Thread Cheng Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan updated SPARK-34146:
--
Summary: "*" should not throw Exception in SparkGetSchemasOperation  (was: 
"*" should not crash in SparkGetSchemasOperation)

> "*" should not throw Exception in SparkGetSchemasOperation
> --
>
> Key: SPARK-34146
> URL: https://issues.apache.org/jira/browse/SPARK-34146
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Cheng Pan
>Priority: Minor
>
> HiveServer2 treat "*" as list all databases, but spark will throw `Exception` 
> when handle global temp view since "" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34146) "*" should not crash in SparkGetSchemasOperation

2021-01-17 Thread Cheng Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan updated SPARK-34146:
--
Description: HiveServer2 treat "*" as list all databases, but spark will 
crashed when handle global temp view since "" is not a valid regex.  (was: 
HiveServer2 treat "*" as list all databases, but spark will crashed when handle 
global temp view since "*" is not a valid regex.)

> "*" should not crash in SparkGetSchemasOperation
> 
>
> Key: SPARK-34146
> URL: https://issues.apache.org/jira/browse/SPARK-34146
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.1
>Reporter: Cheng Pan
>Priority: Minor
>
> HiveServer2 treat "*" as list all databases, but spark will crashed when 
> handle global temp view since "" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34146) "*" should not crash in SparkGetSchemasOperation

2021-01-17 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-34146:
-

 Summary: "*" should not crash in SparkGetSchemasOperation
 Key: SPARK-34146
 URL: https://issues.apache.org/jira/browse/SPARK-34146
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 3.1.1
Reporter: Cheng Pan


HiveServer2 treat "*" as list all databases, but spark will crashed when handle 
global temp view since "*" is not a valid regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34145) Combine scalar subqueries

2021-01-17 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-34145:
---

 Summary: Combine scalar subqueries
 Key: SPARK-34145
 URL: https://issues.apache.org/jira/browse/SPARK-34145
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuming Wang


We can add a rule to combine scalar subqueries if it from same table to improve 
query performance. for example:
{code:sql}
-- TPC-DS q9.sql
SELECT
  CASE WHEN (SELECT count(*)
  FROM store_sales
  WHERE ss_quantity BETWEEN 1 AND 20) > 62316685
THEN (SELECT avg(ss_ext_discount_amt)
FROM store_sales
WHERE ss_quantity BETWEEN 1 AND 20)
  ELSE (SELECT avg(ss_net_paid)
  FROM store_sales
  WHERE ss_quantity BETWEEN 1 AND 20) END bucket1,
  CASE WHEN (SELECT count(*)
  FROM store_sales
  WHERE ss_quantity BETWEEN 21 AND 40) > 19045798
THEN (SELECT avg(ss_ext_discount_amt)
FROM store_sales
WHERE ss_quantity BETWEEN 21 AND 40)
  ELSE (SELECT avg(ss_net_paid)
  FROM store_sales
  WHERE ss_quantity BETWEEN 21 AND 40) END bucket2,
  CASE WHEN (SELECT count(*)
  FROM store_sales
  WHERE ss_quantity BETWEEN 41 AND 60) > 365541424
THEN (SELECT avg(ss_ext_discount_amt)
FROM store_sales
WHERE ss_quantity BETWEEN 41 AND 60)
  ELSE (SELECT avg(ss_net_paid)
  FROM store_sales
  WHERE ss_quantity BETWEEN 41 AND 60) END bucket3,
  CASE WHEN (SELECT count(*)
  FROM store_sales
  WHERE ss_quantity BETWEEN 61 AND 80) > 216357808
THEN (SELECT avg(ss_ext_discount_amt)
FROM store_sales
WHERE ss_quantity BETWEEN 61 AND 80)
  ELSE (SELECT avg(ss_net_paid)
  FROM store_sales
  WHERE ss_quantity BETWEEN 61 AND 80) END bucket4,
  CASE WHEN (SELECT count(*)
  FROM store_sales
  WHERE ss_quantity BETWEEN 81 AND 100) > 184483884
THEN (SELECT avg(ss_ext_discount_amt)
FROM store_sales
WHERE ss_quantity BETWEEN 81 AND 100)
  ELSE (SELECT avg(ss_net_paid)
  FROM store_sales
  WHERE ss_quantity BETWEEN 81 AND 100) END bucket5
FROM reason
WHERE r_reason_sk = 1
{code}

We can rewrite it to:
{code:sql}
WITH bucket_result AS (
SELECT
CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 1 AND 20)) 
> 62316685
  THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 1 AND 
20))
ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 1 AND 20)) END 
bucket1,
CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 21 AND 40)) 
> 62316685
  THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 21 AND 
40))
ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 21 AND 40)) END 
bucket2,
CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 41 AND 60)) 
> 62316685
  THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 41 AND 
60))
ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 41 AND 60)) END 
bucket3,
CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 61 AND 80)) 
> 62316685
  THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 61 AND 
80))
ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 61 AND 80)) END 
bucket4,
CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 81 AND 
100)) > 62316685
  THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 81 AND 
100))
ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 81 AND 100)) END 
bucket5
  FROM store_sales
)
SELECT
  (SELECT bucket1 FROM bucket_result) as bucket1,
  (SELECT bucket2 FROM bucket_result) as bucket2,
  (SELECT bucket3 FROM bucket_result) as bucket3,
  (SELECT bucket4 FROM bucket_result) as bucket4,
  (SELECT bucket5 FROM bucket_result) as bucket5
FROM reason
WHERE r_reason_sk = 1;

{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34144) java.time.Instant and java.time.LocalDate not handled when writing to tables

2021-01-17 Thread Cristi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cristi updated SPARK-34144:
---
Summary: java.time.Instant and java.time.LocalDate not handled when writing 
to tables  (was: java.time.Instant and java.time.LocalDate not handled not 
handled when writing to tables)

> java.time.Instant and java.time.LocalDate not handled when writing to tables
> 
>
> Key: SPARK-34144
> URL: https://issues.apache.org/jira/browse/SPARK-34144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.1.1
>Reporter: Cristi
>Priority: Major
>
> When using the new java time API (spark.sql.datetime.java8API.enabled=true) 
> LocalDate and Instant aren't handled in 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#makeSetter so 
> Instant and LocalDate are cast to Timestamp and Date when attempting to write 
> values to a table.
> Driver stacktrace:Driver stacktrace: at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
>  at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
>  at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007) 
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
>  at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
>  at scala.Option.foreach(Option.scala:407) at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2099) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2120) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2139) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2164) at 
> org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:994) at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:388) at 
> org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:992) at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:856)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:68)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121) 
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
> org.apache.spark.sq

[jira] [Updated] (SPARK-34144) java.time.Instant and java.time.LocalDate not handled not handled when writing to tables

2021-01-17 Thread Cristi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cristi updated SPARK-34144:
---
Description: 
When using the new java time API (spark.sql.datetime.java8API.enabled=true) 
LocalDate and Instant aren't handled in 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#makeSetter so Instant 
and LocalDate are cast to Timestamp and Date when attempting to write values to 
a table.

Driver stacktrace:Driver stacktrace: at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
 at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
 at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
 at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007) at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
 at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
 at scala.Option.foreach(Option.scala:407) at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2099) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2120) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2139) at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:2164) at 
org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:994) at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
at org.apache.spark.rdd.RDD.withScope(RDD.scala:388) at 
org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:992) at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:856)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:68)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
 at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
 at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121) 
at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)

Caused by: java.lang.ClassCastException: class java.time.LocalDate cannot be 
cast to class java.sql.Date (java.time.LocalDate is in module java.base of 
loader 'bootstrap'; java.sql.Date is in module java.sql of loader 'platform') 
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeSetter$11(JdbcUtils.scala:573)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeSetter$11$adapted(JdbcUtils.scala:572)
 at 
org.ap

[jira] [Commented] (SPARK-34115) Long runtime on many environment variables

2021-01-17 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266764#comment-17266764
 ] 

Hyukjin Kwon commented on SPARK-34115:
--

I think you can try and see if it works when we switch isTesting to a lazy val 
as you proposed. But I have to say theoretically both should have constant 
lookup time which should not affect performance heavily. 

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 2.4.7, 3.0.1
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34144) java.time.Instant and java.time.LocalDate not handled not handled when writing to tables

2021-01-17 Thread Cristi (Jira)

Cristi created SPARK-34144:
--

 Summary: java.time.Instant and java.time.LocalDate not handled not 
handled when writing to tables
 Key: SPARK-34144
 URL: https://issues.apache.org/jira/browse/SPARK-34144
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 3.1.0, 3.1.1
Reporter: Cristi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34121) Intersect operator missing rowCount when CBO enabled

2021-01-17 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266763#comment-17266763
 ] 

Hyukjin Kwon commented on SPARK-34121:
--

[~yumwang] mind filling the PR description?

> Intersect operator missing rowCount when CBO enabled
> 
>
> Key: SPARK-34121
> URL: https://issues.apache.org/jira/browse/SPARK-34121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34126) SQL running error, spark does not exit, resulting in data quality problems

2021-01-17 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266762#comment-17266762
 ] 

Hyukjin Kwon commented on SPARK-34126:
--

[~shikui] can you provide a self-contained reproducible step?

> SQL running error, spark does not exit, resulting in data quality problems
> --
>
> Key: SPARK-34126
> URL: https://issues.apache.org/jira/browse/SPARK-34126
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: spark3.0.1 on yarn 
>Reporter: shikui ye
>Priority: Major
>
> Spark SQL executes a SQL file containing multiple SQL segments. Because one 
> of the SQL segments fails to run, but spark driver or spark context does not 
> exit, an error will occur. The table written by the SQL segment is empty or 
> old data. Depending on this problematic table, the subsequent SQL will have 
> data quality problems even if it runs successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34131) NPE when driver.podTemplateFile defines no containers

2021-01-17 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266760#comment-17266760
 ] 

Hyukjin Kwon commented on SPARK-34131:
--

cc [~holdenkarau] and [~dongjoon] FYI

> NPE when driver.podTemplateFile defines no containers
> -
>
> Key: SPARK-34131
> URL: https://issues.apache.org/jira/browse/SPARK-34131
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1
>Reporter: Jacek Laskowski
>Priority: Minor
>
> An empty pod template leads to the following NPE:
> {code}
> 21/01/15 18:44:32 ERROR KubernetesUtils: Encountered exception while 
> attempting to load initial pod spec from file
> java.lang.NullPointerException
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.selectSparkContainer(KubernetesUtils.scala:108)
>   at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.loadPodFromTemplate(KubernetesUtils.scala:88)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$1(KubernetesDriverBuilder.scala:36)
>   at scala.Option.map(Option.scala:230)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:32)
>   at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215)
>   at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215)
>   at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> {code:java}
> $> cat empty-template.yml
> spec:
> {code}
> {code}
> $> ./bin/run-example \
>   --master k8s://$K8S_SERVER \
>   --deploy-mode cluster \
>   --conf spark.kubernetes.driver.podTemplateFile=empty-template.yml \
>   --name $POD_NAME \
>   --jars local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar \
>   --conf spark.kubernetes.container.image=spark:v3.0.1 \
>   --conf spark.kubernetes.driver.pod.name=$POD_NAME \
>   --conf spark.kubernetes.namespace=spark-demo \
>   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>   --verbose \
>SparkPi 10
> {code}
> It appears that the implicit requirement is that there's at least one 
> well-defined container of any name (not necessarily 
> {{spark.kubernetes.driver.podTemplateContainerName}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33245) Add built-in UDF - GETBIT

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33245.
--
Resolution: Won't Fix

> Add built-in UDF - GETBIT 
> --
>
> Key: SPARK-33245
> URL: https://issues.apache.org/jira/browse/SPARK-33245
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Teradata, Impala, Snowflake and Yellowbrick support this function:
> https://docs.teradata.com/reader/kmuOwjp1zEYg98JsB8fu_A/PK1oV1b2jqvG~ohRnOro9w
> https://docs.cloudera.com/runtime/7.2.0/impala-sql-reference/topics/impala-bit-functions.html#bit_functions__getbit
> https://docs.snowflake.com/en/sql-reference/functions/getbit.html
> https://www.yellowbrick.com/docs/2.2/ybd_sqlref/getbit.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34143) Adding partitions to fully partitioned v2 table

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266742#comment-17266742
 ] 

Apache Spark commented on SPARK-34143:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31216

> Adding partitions to fully partitioned v2 table
> ---
>
> Key: SPARK-34143
> URL: https://issues.apache.org/jira/browse/SPARK-34143
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below fails:
> {code:scala}
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY 
> (p0, p1)")
>   sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')")
>   checkPartitions(t, Map("p0" -> "0", "p1" -> "abc"))
>   checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc"))
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34143) Adding partitions to fully partitioned v2 table

2021-01-17 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266741#comment-17266741
 ] 

Apache Spark commented on SPARK-34143:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31216

> Adding partitions to fully partitioned v2 table
> ---
>
> Key: SPARK-34143
> URL: https://issues.apache.org/jira/browse/SPARK-34143
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below fails:
> {code:scala}
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY 
> (p0, p1)")
>   sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')")
>   checkPartitions(t, Map("p0" -> "0", "p1" -> "abc"))
>   checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc"))
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34143) Adding partitions to fully partitioned v2 table

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34143:


Assignee: (was: Apache Spark)

> Adding partitions to fully partitioned v2 table
> ---
>
> Key: SPARK-34143
> URL: https://issues.apache.org/jira/browse/SPARK-34143
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The test below fails:
> {code:scala}
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY 
> (p0, p1)")
>   sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')")
>   checkPartitions(t, Map("p0" -> "0", "p1" -> "abc"))
>   checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc"))
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34143) Adding partitions to fully partitioned v2 table

2021-01-17 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34143:


Assignee: Apache Spark

> Adding partitions to fully partitioned v2 table
> ---
>
> Key: SPARK-34143
> URL: https://issues.apache.org/jira/browse/SPARK-34143
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The test below fails:
> {code:scala}
> withNamespaceAndTable("ns", "tbl") { t =>
>   sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY 
> (p0, p1)")
>   sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')")
>   checkPartitions(t, Map("p0" -> "0", "p1" -> "abc"))
>   checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc"))
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34143) Adding partitions to fully partitioned v2 table

2021-01-17 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-34143:
--

 Summary: Adding partitions to fully partitioned v2 table
 Key: SPARK-34143
 URL: https://issues.apache.org/jira/browse/SPARK-34143
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The test below fails:
{code:scala}
withNamespaceAndTable("ns", "tbl") { t =>
  sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY 
(p0, p1)")
  sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')")
  checkPartitions(t, Map("p0" -> "0", "p1" -> "abc"))
  checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc"))
}
{code}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34111:


Assignee: Kent Yao

> Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
> javax.servlet-api-3.1.0.jar
> -
>
> Key: SPARK-34111
> URL: https://issues.apache.org/jira/browse/SPARK-34111
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Kent Yao
>Priority: Critical
>
> After SPARK-33705, we now happened to have two jars in the release artifact 
> with Hadoop 3:
> {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:
> {code}
> ...
> jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
> ...
> javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
> ...
> {code}
> It can potentially cause an issue, and we should better remove 
> {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar

2021-01-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34111:
-
Priority: Critical  (was: Blocker)

> Deconflict the jars jakarta.servlet-api-4.0.3.jar and 
> javax.servlet-api-3.1.0.jar
> -
>
> Key: SPARK-34111
> URL: https://issues.apache.org/jira/browse/SPARK-34111
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Critical
>
> After SPARK-33705, we now happened to have two jars in the release artifact 
> with Hadoop 3:
> {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}:
> {code}
> ...
> jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar
> ...
> javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
> ...
> {code}
> It can potentially cause an issue, and we should better remove 
> {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN 
> tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

75 matches

Mail list logo