date:20221108

[jira] [Resolved] (SPARK-40992) Support toDF(columnNames) in Connect DSL

2022-11-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40992.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38475
[https://github.com/apache/spark/pull/38475]

> Support toDF(columnNames) in Connect DSL
> 
>
> Key: SPARK-40992
> URL: https://issues.apache.org/jira/browse/SPARK-40992
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40992) Support toDF(columnNames) in Connect DSL

2022-11-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40992:
---

Assignee: Rui Wang

> Support toDF(columnNames) in Connect DSL
> 
>
> Key: SPARK-40992
> URL: https://issues.apache.org/jira/browse/SPARK-40992
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41056:


Assignee: Hyukjin Kwon

> Fix new R_LIBS_SITE behavior introduced in R 4.2
> 
>
> Key: SPARK-41056
> URL: https://issues.apache.org/jira/browse/SPARK-41056
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> R 4.2
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] 
> "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/lib/R/site-library"
> {code}
> R 4.1
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41056.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38570
[https://github.com/apache/spark/pull/38570]

> Fix new R_LIBS_SITE behavior introduced in R 4.2
> 
>
> Key: SPARK-41056
> URL: https://issues.apache.org/jira/browse/SPARK-41056
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> R 4.2
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] 
> "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/lib/R/site-library"
> {code}
> R 4.1
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41073) Spark ThriftServer generate huge amounts of DelegationToken

2022-11-08 Thread zhengchenyu (Jira)

zhengchenyu created SPARK-41073:
---

 Summary: Spark ThriftServer generate huge amounts of 
DelegationToken
 Key: SPARK-41073
 URL: https://issues.apache.org/jira/browse/SPARK-41073
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: zhengchenyu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41058) Removing unused code in connect

2022-11-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41058:
---

Assignee: Deng Ziming

> Removing unused code in connect
> ---
>
> Key: SPARK-41058
> URL: https://issues.apache.org/jira/browse/SPARK-41058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are some unused code in `connect` module,  for example unused import in 
> commands.proto, and used code in SparkConnectStreamHandler.scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41058) Removing unused code in connect

2022-11-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41058.
-
Resolution: Fixed

Issue resolved by pull request 38491
[https://github.com/apache/spark/pull/38491]

> Removing unused code in connect
> ---
>
> Key: SPARK-41058
> URL: https://issues.apache.org/jira/browse/SPARK-41058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are some unused code in `connect` module,  for example unused import in 
> commands.proto, and used code in SparkConnectStreamHandler.scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630780#comment-17630780
 ] 

Apache Spark commented on SPARK-41064:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38578

> Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
> 
>
> Key: SPARK-41064
> URL: https://issues.apache.org/jira/browse/SPARK-41064
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41064:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
> 
>
> Key: SPARK-41064
> URL: https://issues.apache.org/jira/browse/SPARK-41064
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630779#comment-17630779
 ] 

Apache Spark commented on SPARK-41064:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38578

> Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
> 
>
> Key: SPARK-41064
> URL: https://issues.apache.org/jira/browse/SPARK-41064
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41064:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
> 
>
> Key: SPARK-41064
> URL: https://issues.apache.org/jira/browse/SPARK-41064
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41072) Convert the internal error about failed stream to user-facing error

2022-11-08 Thread Max Gekk (Jira)

Max Gekk created SPARK-41072:


 Summary: Convert the internal error about failed stream to 
user-facing error
 Key: SPARK-41072
 URL: https://issues.apache.org/jira/browse/SPARK-41072
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Assign an error class to the following internal error since it is an 
user-facing error:

{code}
java.lang.Exception: org.apache.spark.sql.streaming.StreamingQueryException: 
Query cloudtrail_pipeline [id = 5a3758c3-3b3a-47ff-843a-23292cde3b4f, runId = 
c1a90694-daa2-4929-b749-82b8a43fa2b1] terminated with exception: 
[INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, 
fill a bug report in, and provide the full stack trace. 2 at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:403)
 3 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$4(StreamExecution.scala:269)
 4 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 5 
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:42) 6 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:269)
 7Caused by: java.lang.Exception: org.apache.spark.SparkException: 
[INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, 
fill a bug report in, and provide the full stack trace. 8 at 
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630772#comment-17630772
 ] 

Apache Spark commented on SPARK-41071:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38577

> Metaspace OOM when Local run dev/make-distribution.sh 
> --
>
> Key: SPARK-41071
> URL: https://issues.apache.org/jira/browse/SPARK-41071
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run
> {code:java}
> dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  
> {code}
> {code:java}
> [ERROR] ## Exception when compiling 19 sources to 
> /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes
> java.lang.OutOfMemoryError: Metaspace
> java.lang.ClassLoader.defineClass1(Native Method)
> java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44)
> java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:170)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:164)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455)
> scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153)
> scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown
>  Source)
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153)
> scala.collection.TraversableLike.groupBy(TraversableLike.scala:524)
> scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454)
> scala.collection.AbstractTraversable.groupBy(Traversable.scala:108)
> scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91)
> scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28)
> scala.tools.nsc.Settings.(Settings.scala:19)
> scala.tools.nsc.Settings.(Settings.scala:20)
> xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41071:


Assignee: (was: Apache Spark)

> Metaspace OOM when Local run dev/make-distribution.sh 
> --
>
> Key: SPARK-41071
> URL: https://issues.apache.org/jira/browse/SPARK-41071
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run
> {code:java}
> dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  
> {code}
> {code:java}
> [ERROR] ## Exception when compiling 19 sources to 
> /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes
> java.lang.OutOfMemoryError: Metaspace
> java.lang.ClassLoader.defineClass1(Native Method)
> java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44)
> java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:170)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:164)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455)
> scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153)
> scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown
>  Source)
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153)
> scala.collection.TraversableLike.groupBy(TraversableLike.scala:524)
> scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454)
> scala.collection.AbstractTraversable.groupBy(Traversable.scala:108)
> scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91)
> scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28)
> scala.tools.nsc.Settings.(Settings.scala:19)
> scala.tools.nsc.Settings.(Settings.scala:20)
> xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41071:


Assignee: Apache Spark

> Metaspace OOM when Local run dev/make-distribution.sh 
> --
>
> Key: SPARK-41071
> URL: https://issues.apache.org/jira/browse/SPARK-41071
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> Run
> {code:java}
> dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  
> {code}
> {code:java}
> [ERROR] ## Exception when compiling 19 sources to 
> /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes
> java.lang.OutOfMemoryError: Metaspace
> java.lang.ClassLoader.defineClass1(Native Method)
> java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44)
> java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:170)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:164)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455)
> scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153)
> scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown
>  Source)
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153)
> scala.collection.TraversableLike.groupBy(TraversableLike.scala:524)
> scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454)
> scala.collection.AbstractTraversable.groupBy(Traversable.scala:108)
> scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91)
> scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28)
> scala.tools.nsc.Settings.(Settings.scala:19)
> scala.tools.nsc.Settings.(Settings.scala:20)
> xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630768#comment-17630768
 ] 

Apache Spark commented on SPARK-41071:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38577

> Metaspace OOM when Local run dev/make-distribution.sh 
> --
>
> Key: SPARK-41071
> URL: https://issues.apache.org/jira/browse/SPARK-41071
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run
> {code:java}
> dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  
> {code}
> {code:java}
> [ERROR] ## Exception when compiling 19 sources to 
> /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes
> java.lang.OutOfMemoryError: Metaspace
> java.lang.ClassLoader.defineClass1(Native Method)
> java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44)
> java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:170)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:164)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455)
> scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153)
> scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown
>  Source)
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153)
> scala.collection.TraversableLike.groupBy(TraversableLike.scala:524)
> scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454)
> scala.collection.AbstractTraversable.groupBy(Traversable.scala:108)
> scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91)
> scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28)
> scala.tools.nsc.Settings.(Settings.scala:19)
> scala.tools.nsc.Settings.(Settings.scala:20)
> xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh

2022-11-08 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630761#comment-17630761
 ] 

Yang Jie commented on SPARK-41071:
--

cc [~hyukjin.kwon]  [~yumwang] 

I run 
{code:java}
dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive   
{code}
on the master  branch always fails.

 

also cc [~pancheng] 

 

> Metaspace OOM when Local run dev/make-distribution.sh 
> --
>
> Key: SPARK-41071
> URL: https://issues.apache.org/jira/browse/SPARK-41071
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run
> {code:java}
> dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  
> {code}
> {code:java}
> [ERROR] ## Exception when compiling 19 sources to 
> /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes
> java.lang.OutOfMemoryError: Metaspace
> java.lang.ClassLoader.defineClass1(Native Method)
> java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44)
> java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:170)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:164)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455)
> scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153)
> scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown
>  Source)
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153)
> scala.collection.TraversableLike.groupBy(TraversableLike.scala:524)
> scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454)
> scala.collection.AbstractTraversable.groupBy(Traversable.scala:108)
> scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91)
> scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28)
> scala.tools.nsc.Settings.(Settings.scala:19)
> scala.tools.nsc.Settings.(Settings.scala:20)
> xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630760#comment-17630760
 ] 

Apache Spark commented on SPARK-41062:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38576

> Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
> ---
>
> Key: SPARK-41062
> URL: https://issues.apache.org/jira/browse/SPARK-41062
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should use clear and brief name for every error class.
> This sub-error class duplicates the main class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41062:


Assignee: (was: Apache Spark)

> Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
> ---
>
> Key: SPARK-41062
> URL: https://issues.apache.org/jira/browse/SPARK-41062
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should use clear and brief name for every error class.
> This sub-error class duplicates the main class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41062:


Assignee: Apache Spark

> Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
> ---
>
> Key: SPARK-41062
> URL: https://issues.apache.org/jira/browse/SPARK-41062
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should use clear and brief name for every error class.
> This sub-error class duplicates the main class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh

2022-11-08 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-41071:
-
Summary: Metaspace OOM when Local run dev/make-distribution.sh   (was: 
Metaspace OOm when Local run dev/make-distribution.sh )

> Metaspace OOM when Local run dev/make-distribution.sh 
> --
>
> Key: SPARK-41071
> URL: https://issues.apache.org/jira/browse/SPARK-41071
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Run
> {code:java}
> dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  
> {code}
> {code:java}
> [ERROR] ## Exception when compiling 19 sources to 
> /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes
> java.lang.OutOfMemoryError: Metaspace
> java.lang.ClassLoader.defineClass1(Native Method)
> java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44)
> java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:170)
> scala.collection.immutable.Set$Set2.$plus(Set.scala:164)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
> scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467)
> scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455)
> scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153)
> scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown
>  Source)
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153)
> scala.collection.TraversableLike.groupBy(TraversableLike.scala:524)
> scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454)
> scala.collection.AbstractTraversable.groupBy(Traversable.scala:108)
> scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91)
> scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28)
> scala.tools.nsc.Settings.(Settings.scala:19)
> scala.tools.nsc.Settings.(Settings.scala:20)
> xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41071) Metaspace OOm when Local run dev/make-distribution.sh

2022-11-08 Thread Yang Jie (Jira)

Yang Jie created SPARK-41071:


 Summary: Metaspace OOm when Local run dev/make-distribution.sh 
 Key: SPARK-41071
 URL: https://issues.apache.org/jira/browse/SPARK-41071
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


Run
{code:java}
dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive  
{code}
{code:java}
[ERROR] ## Exception when compiling 19 sources to 
/Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes
java.lang.OutOfMemoryError: Metaspace
java.lang.ClassLoader.defineClass1(Native Method)
java.lang.ClassLoader.defineClass(ClassLoader.java:757)
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
java.net.URLClassLoader.access$100(URLClassLoader.java:74)
java.net.URLClassLoader$1.run(URLClassLoader.java:369)
java.net.URLClassLoader$1.run(URLClassLoader.java:363)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:362)
java.lang.ClassLoader.loadClass(ClassLoader.java:419)
scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44)
java.lang.ClassLoader.loadClass(ClassLoader.java:352)
scala.collection.immutable.Set$Set2.$plus(Set.scala:170)
scala.collection.immutable.Set$Set2.$plus(Set.scala:164)
scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467)
scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455)
scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153)
scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown 
Source)
scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153)
scala.collection.TraversableLike.groupBy(TraversableLike.scala:524)
scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454)
scala.collection.AbstractTraversable.groupBy(Traversable.scala:108)
scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91)
scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28)
scala.tools.nsc.Settings.(Settings.scala:19)
scala.tools.nsc.Settings.(Settings.scala:20)
xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-08 Thread Ramakrishna (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41070:

Description: 
We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this type of query is executing from our code it self while check for rows 
count from dataframe.

 

Please provide me your inputs on this.

 

  was:
We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

 

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 


> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this type of query is executing from our code it self while check for 
> rows count from dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-08 Thread Ramakrishna (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramakrishna updated SPARK-41070:

Description: 
We are connecting Tera data from spark SQL with below API

{color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
tableQuery, connectionProperties);{color}

 

We are facing one issue when we execute above logic on large table with million 
rows every time we are seeing below extra query is executing every time as this 
resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 

  was:
We are connecting Tera data from spark SQL with below API

Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, 
connectionProperties);

 

We are facing one issue when we execute this logic on large table with million 
rows every time we are seeing below extra query is executing every times as 
this resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 


> Performance issue when Spark SQL connects with TeraData 
> 
>
> Key: SPARK-41070
> URL: https://issues.apache.org/jira/browse/SPARK-41070
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Ramakrishna
>Priority: Major
>
> We are connecting Tera data from spark SQL with below API
> {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, 
> tableQuery, connectionProperties);{color}
>  
> We are facing one issue when we execute above logic on large table with 
> million rows every time we are seeing below extra query is executing every 
> time as this resulting performance hit on DB.
> This below information we got from DBA. We dont have any logs on SPARK SQL.
> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
> |1|
>  
> Can you please clarify why this query is executing or is there any chance 
> that this query is executing from our code it self while check for rows count 
> from  dataframe.
>  
> Please provide me your inputs on this.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41070) Performance issue when Spark SQL connects with TeraData

2022-11-08 Thread Ramakrishna (Jira)

Ramakrishna created SPARK-41070:
---

 Summary: Performance issue when Spark SQL connects with TeraData 
 Key: SPARK-41070
 URL: https://issues.apache.org/jira/browse/SPARK-41070
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Ramakrishna


We are connecting Tera data from spark SQL with below API

Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, 
connectionProperties);

 

We are facing one issue when we execute this logic on large table with million 
rows every time we are seeing below extra query is executing every times as 
this resulting performance hit on DB.

This below information we got from DBA. We dont have any logs on SPARK SQL.

SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|
|1|

 

Can you please clarify why this query is executing or is there any chance that 
this query is executing from our code it self while check for rows count from  
dataframe.

 

Please provide me your inputs on this.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40948) Introduce new error class: PATH_NOT_FOUND

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630757#comment-17630757
 ] 

Apache Spark commented on SPARK-40948:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38575

> Introduce new error class: PATH_NOT_FOUND
> -
>
> Key: SPARK-40948
> URL: https://issues.apache.org/jira/browse/SPARK-40948
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Recently we added many error classes named by LEGACY_ERROR_TEMP_.
> We should update them to use proper error class name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40948) Introduce new error class: PATH_NOT_FOUND

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630758#comment-17630758
 ] 

Apache Spark commented on SPARK-40948:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38575

> Introduce new error class: PATH_NOT_FOUND
> -
>
> Key: SPARK-40948
> URL: https://issues.apache.org/jira/browse/SPARK-40948
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Recently we added many error classes named by LEGACY_ERROR_TEMP_.
> We should update them to use proper error class name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Serhii Nesterov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Affects Version/s: 3.1.2

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.2, 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-08 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630744#comment-17630744
 ] 

Hyukjin Kwon commented on SPARK-41063:
--

Is this from our CI?

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-08 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630742#comment-17630742
 ] 

Hyukjin Kwon commented on SPARK-41063:
--

I am aware of this issue but I don't exactly know how to fix. As a workaround, 
you can try {{git clean -fxd}}  (see also 
https://github.com/sbt/sbt/issues/6183)

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41069) Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile`

2022-11-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41069:
-

 Summary: Implement `DataFrame.approxQuantile` and 
`DataFrame.stat.approxQuantile`
 Key: SPARK-41069
 URL: https://issues.apache.org/jira/browse/SPARK-41069
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng
Assignee: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41068) Implement `DataFrame.stat.corr`

2022-11-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41068:
-

 Summary: Implement `DataFrame.stat.corr`
 Key: SPARK-41068
 URL: https://issues.apache.org/jira/browse/SPARK-41068
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng
Assignee: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41067) Implement `DataFrame.stat.cov`

2022-11-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41067:
-

 Summary: Implement `DataFrame.stat.cov`
 Key: SPARK-41067
 URL: https://issues.apache.org/jira/browse/SPARK-41067
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng
Assignee: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Serhii Nesterov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Description: 
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should be created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs in the same 
manner). So, if I run 5 jobs, then only one driver config map will be generated 
and used for every job.  During those runs we experience issues when deleting 
pods from the cluster: executors pods are endlessly created and immediately 
terminated overloading cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

  was:
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs). So, if I run 5 
jobs, then only one driver config map will be generated and used for every job. 
 During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).


> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should be created: for a 
> driver and an executor. However, the library creates only one driver config 
> map for all jobs (in some cases it generates only one executor map for all 
> jobs in the same manner). So, if I run 5 jobs, then only one driver config 
> map will be generated and used for every job.  During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41066) Implement `DataFrame.sampleBy ` and `DataFrame.stat.sampleBy `

2022-11-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41066:
-

 Summary: Implement `DataFrame.sampleBy ` and 
`DataFrame.stat.sampleBy `
 Key: SPARK-41066
 URL: https://issues.apache.org/jira/browse/SPARK-41066
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng
Assignee: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41065) Implement `DataFrame.freqItems ` and `DataFrame.stat.freqItems `

2022-11-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41065:
-

 Summary: Implement `DataFrame.freqItems ` and 
`DataFrame.stat.freqItems `
 Key: SPARK-41065
 URL: https://issues.apache.org/jira/browse/SPARK-41065
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng
Assignee: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-41064:
-

Assignee: Ruifeng Zheng

> Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
> 
>
> Key: SPARK-41064
> URL: https://issues.apache.org/jira/browse/SPARK-41064
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`

2022-11-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41064:
-

 Summary: Implement `DataFrame.crosstab` and 
`DataFrame.stat.crosstab`
 Key: SPARK-41064
 URL: https://issues.apache.org/jira/browse/SPARK-41064
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40917) Add a dedicated logical plan for `Summary`

2022-11-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40917:
-

Assignee: Ruifeng Zheng

> Add a dedicated logical plan for `Summary`
> --
>
> Key: SPARK-40917
> URL: https://issues.apache.org/jira/browse/SPARK-40917
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40917) Add a dedicated logical plan for `Summary`

2022-11-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40917.
---
Resolution: Workaround

> Add a dedicated logical plan for `Summary`
> --
>
> Key: SPARK-40917
> URL: https://issues.apache.org/jira/browse/SPARK-40917
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630737#comment-17630737
 ] 

Apache Spark commented on SPARK-41060:
--

User '19Serhii99' has created a pull request for this issue:
https://github.com/apache/spark/pull/38574

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should created: for a driver 
> and an executor. However, the library creates only one driver config map for 
> all jobs (in some cases it generates only one executor map for all jobs). So, 
> if I run 5 jobs, then only one driver config map will be generated and used 
> for every job.  During those runs we experience issues when deleting pods 
> from the cluster: executors pods are endlessly created and immediately 
> terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41060:


Assignee: (was: Apache Spark)

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should created: for a driver 
> and an executor. However, the library creates only one driver config map for 
> all jobs (in some cases it generates only one executor map for all jobs). So, 
> if I run 5 jobs, then only one driver config map will be generated and used 
> for every job.  During those runs we experience issues when deleting pods 
> from the cluster: executors pods are endlessly created and immediately 
> terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630736#comment-17630736
 ] 

Apache Spark commented on SPARK-41060:
--

User '19Serhii99' has created a pull request for this issue:
https://github.com/apache/spark/pull/38574

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should created: for a driver 
> and an executor. However, the library creates only one driver config map for 
> all jobs (in some cases it generates only one executor map for all jobs). So, 
> if I run 5 jobs, then only one driver config map will be generated and used 
> for every job.  During those runs we experience issues when deleting pods 
> from the cluster: executors pods are endlessly created and immediately 
> terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41060:


Assignee: Apache Spark

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Assignee: Apache Spark
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should created: for a driver 
> and an executor. However, the library creates only one driver config map for 
> all jobs (in some cases it generates only one executor map for all jobs). So, 
> if I run 5 jobs, then only one driver config map will be generated and used 
> for every job.  During those runs we experience issues when deleting pods 
> from the cluster: executors pods are endlessly created and immediately 
> terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40513) SPIP: Support Docker Official Image for Spark

2022-11-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630735#comment-17630735
 ] 

Yikun Jiang commented on SPARK-40513:
-

Enable Github Autolink references for spark-docker
https://issues.apache.org/jira/browse/INFRA-23789

> SPIP: Support Docker Official Image for Spark
> -
>
> Key: SPARK-40513
> URL: https://issues.apache.org/jira/browse/SPARK-40513
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, PySpark, SparkR
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>  Labels: SPIP
>
> This SPIP is proposed to add [Docker Official 
> Image(DOI)|https://github.com/docker-library/official-images] to ensure the 
> Spark Docker images meet the quality standards for Docker images, to provide 
> these Docker images for users who want to use Apache Spark via Docker image.
> There are also several [Apache projects that release the Docker Official 
> Images|https://hub.docker.com/search?q=apache&image_filter=official], such 
> as: [flink|https://hub.docker.com/_/flink], 
> [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], 
> [zookeeper|https://hub.docker.com/_/zookeeper], 
> [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). 
> From the huge download statistics, we can see the real demands of users, and 
> from the support of other apache projects, we should also be able to do it.
> After support:
>  * The Dockerfile will still be maintained by the Apache Spark community and 
> reviewed by Docker.
>  * The images will be maintained by the Docker community to ensure the 
> quality standards for Docker images of the Docker community.
> It will also reduce the extra docker images maintenance effort (such as 
> frequently rebuilding, image security update) of the Apache Spark community.
>  
> SPIP DOC: 
> [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o]
> DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40513) SPIP: Support Docker Official Image for Spark

2022-11-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630733#comment-17630733
 ] 

Yikun Jiang commented on SPARK-40513:
-

Add secrets.DOCKER_USER and secrets.DOCKER_TOKEN for apache/spark-docker
https://issues.apache.org/jira/browse/INFRA-23882

> SPIP: Support Docker Official Image for Spark
> -
>
> Key: SPARK-40513
> URL: https://issues.apache.org/jira/browse/SPARK-40513
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, PySpark, SparkR
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>  Labels: SPIP
>
> This SPIP is proposed to add [Docker Official 
> Image(DOI)|https://github.com/docker-library/official-images] to ensure the 
> Spark Docker images meet the quality standards for Docker images, to provide 
> these Docker images for users who want to use Apache Spark via Docker image.
> There are also several [Apache projects that release the Docker Official 
> Images|https://hub.docker.com/search?q=apache&image_filter=official], such 
> as: [flink|https://hub.docker.com/_/flink], 
> [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], 
> [zookeeper|https://hub.docker.com/_/zookeeper], 
> [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). 
> From the huge download statistics, we can see the real demands of users, and 
> from the support of other apache projects, we should also be able to do it.
> After support:
>  * The Dockerfile will still be maintained by the Apache Spark community and 
> reviewed by Docker.
>  * The images will be maintained by the Docker community to ensure the 
> quality standards for Docker images of the Docker community.
> It will also reduce the extra docker images maintenance effort (such as 
> frequently rebuilding, image security update) of the Apache Spark community.
>  
> SPIP DOC: 
> [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o]
> DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Serhii Nesterov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Description: 
There's a problem with submitting spark jobs to K8s cluster: the library 
generates and reuses the same name for config maps (for drivers and executors). 
Ideally, for each job 2 config maps should created: for a driver and an 
executor. However, the library creates only one driver config map for all jobs 
(in some cases it generates only one executor map for all jobs). So, if I run 5 
jobs, then only one driver config map will be generated and used for every job. 
 During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be urgently fixed. I've prepared some changes for 
review to fix the issue (tested in the cluster of our project).

  was:
There's a problem with submitting spark job to K8s cluster: the library 
generates and reuses the same name for config map (for drivers and executors). 
So, if we run 5 jobs sequantially or in parallel, then one config map will be 
created and then overwritten 4 times which means this config map will be 
applied / used for all 5 jobs instead of creating one config map for each job. 
During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be fixed. I've prepared some changes for review to 
fix the issue.


> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark jobs to K8s cluster: the library 
> generates and reuses the same name for config maps (for drivers and 
> executors). Ideally, for each job 2 config maps should created: for a driver 
> and an executor. However, the library creates only one driver config map for 
> all jobs (in some cases it generates only one executor map for all jobs). So, 
> if I run 5 jobs, then only one driver config map will be generated and used 
> for every job.  During those runs we experience issues when deleting pods 
> from the cluster: executors pods are endlessly created and immediately 
> terminated overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be urgently fixed. I've prepared some changes for 
> review to fix the issue (tested in the cluster of our project).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-08 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630731#comment-17630731
 ] 

Yang Jie commented on SPARK-41063:
--

cc [~hyukjin.kwon] 

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41051) Optimize ProcfsMetrics file acquisition

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630730#comment-17630730
 ] 

Apache Spark commented on SPARK-41051:
--

User 'Narcasserun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38563

> Optimize ProcfsMetrics file acquisition
> ---
>
> Key: SPARK-41051
> URL: https://issues.apache.org/jira/browse/SPARK-41051
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
> Environment: spark-master
>Reporter: sur
>Priority: Minor
> Fix For: 3.4.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> When obtaining the Procfs file, variables are created but not used, and there 
> are duplicate codes. We should reduce such situations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-08 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630729#comment-17630729
 ] 

Yang Jie commented on SPARK-41063:
--

 

Always printing similar logs in a loop

 
{code:java}
2022-11-09T01:01:42.8632147Z [0m[[0m[0minfo[0m] [0m[0mNote: 
/home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java
 uses or overrides a deprecated API.[0m
2022-11-09T01:01:42.8632717Z [0m[[0m[0minfo[0m] [0m[0mNote: Recompile 
with -Xlint:deprecation for details.[0m
2022-11-09T01:01:42.8633082Z [0m[[0m[0minfo[0m] [0m[0mdone compiling[0m
2022-11-09T01:01:42.8633618Z [0m[[0m[0minfo[0m] [0m[0mcompiling 22 Scala 
sources and 24 Java sources to 
/home/runner/work/spark/spark/sql/hive-thriftserver/target/scala-2.13/classes 
...[0m
2022-11-09T01:01:42.8634133Z [0m[[0m[0minfo[0m] [0m[0mNote: Some input 
files use or override a deprecated API.[0m
2022-11-09T01:01:42.8634581Z [0m[[0m[0minfo[0m] [0m[0mNote: Recompile 
with -Xlint:deprecation for details.[0m
2022-11-09T01:01:42.8634941Z [0m[[0m[0minfo[0m] [0m[0mdone compiling[0m
2022-11-09T01:01:42.8635478Z [0m[[0m[0minfo[0m] [0m[0mcompiling 22 Scala 
sources and 9 Java sources to 
/home/runner/work/spark/spark/sql/hive-thriftserver/target/scala-2.13/classes 
...[0m {code}

> `hive-thriftserver` module compilation deadlock
> ---
>
> Key: SPARK-41063
> URL: https://issues.apache.org/jira/browse/SPARK-41063
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]
>  
> I have seen it when compiling with Maven locally,  but I haven't investigated
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41063) `hive-thriftserver` module compilation deadlock

2022-11-08 Thread Yang Jie (Jira)

Yang Jie created SPARK-41063:


 Summary: `hive-thriftserver` module compilation deadlock
 Key: SPARK-41063
 URL: https://issues.apache.org/jira/browse/SPARK-41063
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


[https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D]

 

I have seen it when compiling with Maven locally,  but I haven't investigated

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41051) Optimize ProcfsMetrics file acquisition

2022-11-08 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-41051.
-
Fix Version/s: 3.4.0
   (was: 3.3.2)
   Resolution: Fixed

Issue resolved by pull request 38563
[https://github.com/apache/spark/pull/38563]

> Optimize ProcfsMetrics file acquisition
> ---
>
> Key: SPARK-41051
> URL: https://issues.apache.org/jira/browse/SPARK-41051
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
> Environment: spark-master
>Reporter: sur
>Priority: Minor
> Fix For: 3.4.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> When obtaining the Procfs file, variables are created but not used, and there 
> are duplicate codes. We should reduce such situations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Serhii Nesterov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Description: 
There's a problem with submitting spark job to K8s cluster: the library 
generates and reuses the same name for config map (for drivers and executors). 
So, if we run 5 jobs sequantially or in parallel, then one config map will be 
created and then overwritten 4 times which means this config map will be 
applied / used for all 5 jobs instead of creating one config map for each job. 
During those runs we experience issues when deleting pods from the cluster: 
executors pods are endlessly created and immediately terminated overloading 
cluster resources.

This problem occurs because of the *KubernetesClientUtils* class in which we 
have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
to be incorrect and should be fixed. I've prepared some changes for review to 
fix the issue.

  was:There's a problem with submitting spark job to K8s cluster: the library 
generates and reuses the same name for Config Map (for drivers and executors). 
So, if we run 5 jobs sequantially or in parallel, then one Config Map will be 
created and then overwritten 4 times. During those runs we experience issues 
when deleting pods from the cluster: executors pods are endlessly created and 
immediately terminated overloading cluster resources.


> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark job to K8s cluster: the library 
> generates and reuses the same name for config map (for drivers and 
> executors). So, if we run 5 jobs sequantially or in parallel, then one config 
> map will be created and then overwritten 4 times which means this config map 
> will be applied / used for all 5 jobs instead of creating one config map for 
> each job. During those runs we experience issues when deleting pods from the 
> cluster: executors pods are endlessly created and immediately terminated 
> overloading cluster resources.
> This problem occurs because of the *KubernetesClientUtils* class in which we 
> have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems 
> to be incorrect and should be fixed. I've prepared some changes for review to 
> fix the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE

2022-11-08 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-41062:
---

 Summary: Rename UNSUPPORTED_CORRELATED_REFERENCE to 
CORRELATED_REFERENCE
 Key: SPARK-41062
 URL: https://issues.apache.org/jira/browse/SPARK-41062
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee


We should use clear and brief name for every error class.

This sub-error class duplicates the main class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE

2022-11-08 Thread Haejoon Lee (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630724#comment-17630724
 ] 

Haejoon Lee commented on SPARK-41062:
-

I'm working on it

> Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
> ---
>
> Key: SPARK-41062
> URL: https://issues.apache.org/jira/browse/SPARK-41062
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should use clear and brief name for every error class.
> This sub-error class duplicates the main class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630723#comment-17630723
 ] 

Apache Spark commented on SPARK-41061:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38573

> Support SelectExpr which apply Projection by expressions in Strings in 
> Connect DSL
> --
>
> Key: SPARK-41061
> URL: https://issues.apache.org/jira/browse/SPARK-41061
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Serhii Nesterov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhii Nesterov updated SPARK-41060:

Description: There's a problem with submitting spark job to K8s cluster: 
the library generates and reuses the same name for Config Map (for drivers and 
executors). So, if we run 5 jobs sequantially or in parallel, then one Config 
Map will be created and then overwritten 4 times. During those runs we 
experience issues when deleting pods from the cluster: executors pods are 
endlessly created and immediately terminated overloading cluster resources.

> Spark Submitter generates a ConfigMap with the same name
> 
>
> Key: SPARK-41060
> URL: https://issues.apache.org/jira/browse/SPARK-41060
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Serhii Nesterov
>Priority: Major
>
> There's a problem with submitting spark job to K8s cluster: the library 
> generates and reuses the same name for Config Map (for drivers and 
> executors). So, if we run 5 jobs sequantially or in parallel, then one Config 
> Map will be created and then overwritten 4 times. During those runs we 
> experience issues when deleting pods from the cluster: executors pods are 
> endlessly created and immediately terminated overloading cluster resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41061:


Assignee: Apache Spark

> Support SelectExpr which apply Projection by expressions in Strings in 
> Connect DSL
> --
>
> Key: SPARK-41061
> URL: https://issues.apache.org/jira/browse/SPARK-41061
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630720#comment-17630720
 ] 

Apache Spark commented on SPARK-41061:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38573

> Support SelectExpr which apply Projection by expressions in Strings in 
> Connect DSL
> --
>
> Key: SPARK-41061
> URL: https://issues.apache.org/jira/browse/SPARK-41061
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41061:


Assignee: (was: Apache Spark)

> Support SelectExpr which apply Projection by expressions in Strings in 
> Connect DSL
> --
>
> Key: SPARK-41061
> URL: https://issues.apache.org/jira/browse/SPARK-41061
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41059:


Assignee: (was: Apache Spark)

> Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
> ---
>
> Key: SPARK-41059
> URL: https://issues.apache.org/jira/browse/SPARK-41059
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41059:


Assignee: Apache Spark

> Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
> ---
>
> Key: SPARK-41059
> URL: https://issues.apache.org/jira/browse/SPARK-41059
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630718#comment-17630718
 ] 

Apache Spark commented on SPARK-41059:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38572

> Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
> ---
>
> Key: SPARK-41059
> URL: https://issues.apache.org/jira/browse/SPARK-41059
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41059:


Assignee: Apache Spark

> Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
> ---
>
> Key: SPARK-41059
> URL: https://issues.apache.org/jira/browse/SPARK-41059
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630719#comment-17630719
 ] 

Apache Spark commented on SPARK-41059:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38572

> Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
> ---
>
> Key: SPARK-41059
> URL: https://issues.apache.org/jira/browse/SPARK-41059
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL

2022-11-08 Thread Rui Wang (Jira)

Rui Wang created SPARK-41061:


 Summary: Support SelectExpr which apply Projection by expressions 
in Strings in Connect DSL
 Key: SPARK-41061
 URL: https://issues.apache.org/jira/browse/SPARK-41061
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name

2022-11-08 Thread Serhii Nesterov (Jira)

Serhii Nesterov created SPARK-41060:
---

 Summary: Spark Submitter generates a ConfigMap with the same name
 Key: SPARK-41060
 URL: https://issues.apache.org/jira/browse/SPARK-41060
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.3.1, 3.3.0
Reporter: Serhii Nesterov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-08 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-41059:
---

 Summary: Rename _LEGACY_ERROR_TEMP_2420 to 
NESTED_AGGREGATE_FUNCTION
 Key: SPARK-41059
 URL: https://issues.apache.org/jira/browse/SPARK-41059
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee


We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION

2022-11-08 Thread Haejoon Lee (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630716#comment-17630716
 ] 

Haejoon Lee commented on SPARK-41059:
-

I'm working on it

> Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
> ---
>
> Key: SPARK-41059
> URL: https://issues.apache.org/jira/browse/SPARK-41059
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should all _LEGACY errors to error proper named error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37555) spark-sql should pass last unclosed comment to backend and execute throw a exception

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630715#comment-17630715
 ] 

Apache Spark commented on SPARK-37555:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/38571

> spark-sql should pass last unclosed comment to backend and execute throw a 
> exception
> 
>
> Key: SPARK-37555
> URL: https://issues.apache.org/jira/browse/SPARK-37555
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> In current spark-sql. if last statement is an unclosed comment. It won't 
> execute this SQL. is not correct. We should pass this and throw an exception 
> throw antlr. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41039) Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13

2022-11-08 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-41039.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38550
[https://github.com/apache/spark/pull/38550]

> Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13
> 
>
> Key: SPARK-41039
> URL: https://issues.apache.org/jira/browse/SPARK-41039
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> scala-parallel-collections started from version 1.0.4 to verify the 
> compatibility of Java 17 through CI
>  * 
> [https://github.com/scala/scala-parallel-collections/compare/v1.0.3...v1.0.4]
>  * https://github.com/scala/scala-parallel-collections/releases/tag/v1.0.4



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41039) Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13

2022-11-08 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-41039:


Assignee: Yang Jie

> Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13
> 
>
> Key: SPARK-41039
> URL: https://issues.apache.org/jira/browse/SPARK-41039
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> scala-parallel-collections started from version 1.0.4 to verify the 
> compatibility of Java 17 through CI
>  * 
> [https://github.com/scala/scala-parallel-collections/compare/v1.0.3...v1.0.4]
>  * https://github.com/scala/scala-parallel-collections/releases/tag/v1.0.4



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41039) Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13

2022-11-08 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-41039:
-
Priority: Trivial  (was: Major)

> Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13
> 
>
> Key: SPARK-41039
> URL: https://issues.apache.org/jira/browse/SPARK-41039
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.4.0
>
>
> scala-parallel-collections started from version 1.0.4 to verify the 
> compatibility of Java 17 through CI
>  * 
> [https://github.com/scala/scala-parallel-collections/compare/v1.0.3...v1.0.4]
>  * https://github.com/scala/scala-parallel-collections/releases/tag/v1.0.4



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter

2022-11-08 Thread Deng Ziming (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630711#comment-17630711
 ] 

Deng Ziming commented on SPARK-41057:
-

Thank you [~amaliujia] , I'm willing to have a try.🤝

> Support other data type conversion in the DataTypeProtoConverter
> 
>
> Key: SPARK-41057
> URL: https://issues.apache.org/jira/browse/SPARK-41057
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> In 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34
>  we only support INT, STRING and STRUCT type conversion to and from catalyst 
> and connect proto.
> We should be able to support all the types defined by 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter

2022-11-08 Thread Rui Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630706#comment-17630706
 ] 

Rui Wang edited comment on SPARK-41057 at 11/9/22 2:55 AM:
---

[~dengziming] [~dengziming]
Are you interested in this JIRA?


was (Author: amaliujia):
@dengziming

Are you interested in this JIRA?

> Support other data type conversion in the DataTypeProtoConverter
> 
>
> Key: SPARK-41057
> URL: https://issues.apache.org/jira/browse/SPARK-41057
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> In 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34
>  we only support INT, STRING and STRUCT type conversion to and from catalyst 
> and connect proto.
> We should be able to support all the types defined by 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41058) Removing unused code in connect

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41058:


Assignee: (was: Apache Spark)

> Removing unused code in connect
> ---
>
> Key: SPARK-41058
> URL: https://issues.apache.org/jira/browse/SPARK-41058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are some unused code in `connect` module,  for example unused import in 
> commands.proto, and used code in SparkConnectStreamHandler.scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41058) Removing unused code in connect

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41058:


Assignee: Apache Spark

> Removing unused code in connect
> ---
>
> Key: SPARK-41058
> URL: https://issues.apache.org/jira/browse/SPARK-41058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are some unused code in `connect` module,  for example unused import in 
> commands.proto, and used code in SparkConnectStreamHandler.scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41058) Removing unused code in connect

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630709#comment-17630709
 ] 

Apache Spark commented on SPARK-41058:
--

User 'dengziming' has created a pull request for this issue:
https://github.com/apache/spark/pull/38491

> Removing unused code in connect
> ---
>
> Key: SPARK-41058
> URL: https://issues.apache.org/jira/browse/SPARK-41058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Deng Ziming
>Priority: Minor
> Fix For: 3.4.0
>
>
> There are some unused code in `connect` module,  for example unused import in 
> commands.proto, and used code in SparkConnectStreamHandler.scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41056:


Assignee: Apache Spark

> Fix new R_LIBS_SITE behavior introduced in R 4.2
> 
>
> Key: SPARK-41056
> URL: https://issues.apache.org/jira/browse/SPARK-41056
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> R 4.2
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] 
> "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/lib/R/site-library"
> {code}
> R 4.1
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630708#comment-17630708
 ] 

Apache Spark commented on SPARK-41056:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/38570

> Fix new R_LIBS_SITE behavior introduced in R 4.2
> 
>
> Key: SPARK-41056
> URL: https://issues.apache.org/jira/browse/SPARK-41056
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> R 4.2
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] 
> "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/lib/R/site-library"
> {code}
> R 4.1
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41056:


Assignee: (was: Apache Spark)

> Fix new R_LIBS_SITE behavior introduced in R 4.2
> 
>
> Key: SPARK-41056
> URL: https://issues.apache.org/jira/browse/SPARK-41056
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> R 4.2
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] 
> "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/lib/R/site-library"
> {code}
> R 4.1
> {code}
> # R
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}
> {code}
> # R --vanilla
> > Sys.getenv("R_LIBS_SITE")
> [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41058) Removing unused code in connect

2022-11-08 Thread Deng Ziming (Jira)

Deng Ziming created SPARK-41058:
---

 Summary: Removing unused code in connect
 Key: SPARK-41058
 URL: https://issues.apache.org/jira/browse/SPARK-41058
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Deng Ziming
 Fix For: 3.4.0


There are some unused code in `connect` module,  for example unused import in 
commands.proto, and used code in SparkConnectStreamHandler.scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter

2022-11-08 Thread Rui Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630706#comment-17630706
 ] 

Rui Wang commented on SPARK-41057:
--

@dengziming

Are you interested in this JIRA?

> Support other data type conversion in the DataTypeProtoConverter
> 
>
> Key: SPARK-41057
> URL: https://issues.apache.org/jira/browse/SPARK-41057
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> In 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34
>  we only support INT, STRING and STRUCT type conversion to and from catalyst 
> and connect proto.
> We should be able to support all the types defined by 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter

2022-11-08 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated SPARK-41057:
-
Description: 
In 
https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34
 we only support INT, STRING and STRUCT type conversion to and from catalyst 
and connect proto.

We should be able to support all the types defined by 
https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto

> Support other data type conversion in the DataTypeProtoConverter
> 
>
> Key: SPARK-41057
> URL: https://issues.apache.org/jira/browse/SPARK-41057
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>
> In 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34
>  we only support INT, STRING and STRUCT type conversion to and from catalyst 
> and connect proto.
> We should be able to support all the types defined by 
> https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter

2022-11-08 Thread Rui Wang (Jira)

Rui Wang created SPARK-41057:


 Summary: Support other data type conversion in the 
DataTypeProtoConverter
 Key: SPARK-41057
 URL: https://issues.apache.org/jira/browse/SPARK-41057
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2

2022-11-08 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-41056:


 Summary: Fix new R_LIBS_SITE behavior introduced in R 4.2
 Key: SPARK-41056
 URL: https://issues.apache.org/jira/browse/SPARK-41056
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


R 4.2

{code}
# R
> Sys.getenv("R_LIBS_SITE")
[1] "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'"
{code}

{code}
# R --vanilla
> Sys.getenv("R_LIBS_SITE")
[1] "/usr/lib/R/site-library"
{code}

R 4.1

{code}
# R
> Sys.getenv("R_LIBS_SITE")
[1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
{code}

{code}
# R --vanilla
> Sys.getenv("R_LIBS_SITE")
[1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library"
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630699#comment-17630699
 ] 

Apache Spark commented on SPARK-41055:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38569

> Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
> 
>
> Key: SPARK-41055
> URL: https://issues.apache.org/jira/browse/SPARK-41055
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Make the legacy error class name properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630698#comment-17630698
 ] 

Apache Spark commented on SPARK-41055:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38569

> Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
> 
>
> Key: SPARK-41055
> URL: https://issues.apache.org/jira/browse/SPARK-41055
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Make the legacy error class name properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41055:


Assignee: Apache Spark

> Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
> 
>
> Key: SPARK-41055
> URL: https://issues.apache.org/jira/browse/SPARK-41055
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> Make the legacy error class name properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE

2022-11-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41055:


Assignee: (was: Apache Spark)

> Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
> 
>
> Key: SPARK-41055
> URL: https://issues.apache.org/jira/browse/SPARK-41055
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Make the legacy error class name properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41046) Support CreateView in Connect DSL

2022-11-08 Thread Deng Ziming (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630696#comment-17630696
 ] 

Deng Ziming commented on SPARK-41046:
-

[~amaliujia] Aha, I'll review your code when free.

> Support CreateView in Connect DSL
> -
>
> Key: SPARK-41046
> URL: https://issues.apache.org/jira/browse/SPARK-41046
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE

2022-11-08 Thread Haejoon Lee (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630694#comment-17630694
 ] 

Haejoon Lee commented on SPARK-41055:
-

I'm working on it

> Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
> 
>
> Key: SPARK-41055
> URL: https://issues.apache.org/jira/browse/SPARK-41055
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Make the legacy error class name properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE

2022-11-08 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-41055:
---

 Summary: Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
 Key: SPARK-41055
 URL: https://issues.apache.org/jira/browse/SPARK-41055
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee


Make the legacy error class name properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41035) Incorrect results or NPE when a literal is reused across distinct aggregations

2022-11-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41035:


Assignee: Bruce Robbins

> Incorrect results or NPE when a literal is reused across distinct aggregations
> --
>
> Key: SPARK-41035
> URL: https://issues.apache.org/jira/browse/SPARK-41035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> This query produces incorrect results:
> {noformat}
> select a, count(distinct 100) as cnt1, count(distinct b, 100) as cnt2
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> +---+++
> |a  |cnt1|cnt2|
> +---+++
> |1  |1   |0   |
> |4  |1   |0   |
> +---+++
> {noformat}
> The values for {{cnt2}} should be 1 and 1 (not 0 and 0).
> If you change the literal used in the first aggregate function, the second 
> aggregate function now works correctly:
> {noformat}
> select a, count(distinct 101) as cnt1, count(distinct b, 100) as cnt2
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> +---+++
> |a  |cnt1|cnt2|
> +---+++
> |1  |1   |1   |
> |4  |1   |1   |
> +---+++
> {noformat}
> The same bug causes the following query to get a NullPointerException:
> {noformat}
> select a, count(distinct 1), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> {noformat}
> If you change the literal used in the first aggregation, then the query 
> succeeds:
> {noformat}
> select a, count(distinct 2), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> +---+-+-+
> |a  |count(DISTINCT 2)|count_min_sketch(DISTINCT b, 0.5, 0.5, 1)  
>   
> |
> +---+-+-+
> |1  |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 
> 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
> |4  |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 
> 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
> +---+-+-+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41035) Incorrect results or NPE when a literal is reused across distinct aggregations

2022-11-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41035.
--
Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 38565
[https://github.com/apache/spark/pull/38565]

> Incorrect results or NPE when a literal is reused across distinct aggregations
> --
>
> Key: SPARK-41035
> URL: https://issues.apache.org/jira/browse/SPARK-41035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.2, 3.4.0, 3.3.1
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
> Fix For: 3.3.2, 3.2.3, 3.4.0
>
>
> This query produces incorrect results:
> {noformat}
> select a, count(distinct 100) as cnt1, count(distinct b, 100) as cnt2
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> +---+++
> |a  |cnt1|cnt2|
> +---+++
> |1  |1   |0   |
> |4  |1   |0   |
> +---+++
> {noformat}
> The values for {{cnt2}} should be 1 and 1 (not 0 and 0).
> If you change the literal used in the first aggregate function, the second 
> aggregate function now works correctly:
> {noformat}
> select a, count(distinct 101) as cnt1, count(distinct b, 100) as cnt2
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> +---+++
> |a  |cnt1|cnt2|
> +---+++
> |1  |1   |1   |
> |4  |1   |1   |
> +---+++
> {noformat}
> The same bug causes the following query to get a NullPointerException:
> {noformat}
> select a, count(distinct 1), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> {noformat}
> If you change the literal used in the first aggregation, then the query 
> succeeds:
> {noformat}
> select a, count(distinct 2), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
> from values (1, 2), (4, 5) as data(a, b)
> group by a;
> +---+-+-+
> |a  |count(DISTINCT 2)|count_min_sketch(DISTINCT b, 0.5, 0.5, 1)  
>   
> |
> +---+-+-+
> |1  |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 
> 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
> |4  |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 
> 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
> +---+-+-+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40569) Add smoke test in standalone cluster for spark-docker

2022-11-08 Thread Yikun Jiang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40569.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 21
[https://github.com/apache/spark-docker/pull/21]

> Add smoke test in standalone cluster for spark-docker
> -
>
> Key: SPARK-40569
> URL: https://issues.apache.org/jira/browse/SPARK-40569
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Qian Sun
>Assignee: Qian Sun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41051) Optimize ProcfsMetrics file acquisition

2022-11-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630685#comment-17630685
 ] 

Apache Spark commented on SPARK-41051:
--

User 'Narcasserun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38568

> Optimize ProcfsMetrics file acquisition
> ---
>
> Key: SPARK-41051
> URL: https://issues.apache.org/jira/browse/SPARK-41051
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
> Environment: spark-master
>Reporter: sur
>Priority: Minor
> Fix For: 3.3.2
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> When obtaining the Procfs file, variables are created but not used, and there 
> are duplicate codes. We should reduce such situations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40852) Implement `DataFrame.summary`

2022-11-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40852.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38318
[https://github.com/apache/spark/pull/38318]

> Implement `DataFrame.summary`
> -
>
> Key: SPARK-40852
> URL: https://issues.apache.org/jira/browse/SPARK-40852
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41047) round function with negative scale value runs failed

2022-11-08 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630676#comment-17630676
 ] 

Hyukjin Kwon commented on SPARK-41047:
--

cc [~wuyi] [~cloud_fan]. This seems from 
https://github.com/apache/spark/commit/ff39c9271ca04951b045c5d9fca2128a82d50b46 
https://issues.apache.org/jira/browse/SPARK-30252

> round function with negative scale value runs failed
> 
>
> Key: SPARK-41047
> URL: https://issues.apache.org/jira/browse/SPARK-41047
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.2
>Reporter: liyang
>Priority: Major
>
> Run this sql in spark-sql:select round(1.233, -1); 
> Error: org.apache.spark.sql.AnalysisException: Negative scale is not allowed: 
> -1. You can use spark.sql.legacy.allowNegativeScaleOfDecimal=true to enable 
> legacy mode to allow it.; 
> But the document of round function implies that negative scale argument is 
> allowed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41050) Upgrade scalafmt from 3.5.9 to 3.6.1

2022-11-08 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-41050:
-
Priority: Trivial  (was: Minor)

>  Upgrade scalafmt from 3.5.9 to 3.6.1
> -
>
> Key: SPARK-41050
> URL: https://issues.apache.org/jira/browse/SPARK-41050
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
> Fix For: 3.4.0
>
>
> v3.6.1 release notes:
> https://github.com/scalameta/scalafmt/releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 171 matches

Mail list logo