[jira] [Resolved] (SPARK-40992) Support toDF(columnNames) in Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40992. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38475 [https://github.com/apache/spark/pull/38475] > Support toDF(columnNames) in Connect DSL > > > Key: SPARK-40992 > URL: https://issues.apache.org/jira/browse/SPARK-40992 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40992) Support toDF(columnNames) in Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-40992: --- Assignee: Rui Wang > Support toDF(columnNames) in Connect DSL > > > Key: SPARK-40992 > URL: https://issues.apache.org/jira/browse/SPARK-40992 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2
[ https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41056: Assignee: Hyukjin Kwon > Fix new R_LIBS_SITE behavior introduced in R 4.2 > > > Key: SPARK-41056 > URL: https://issues.apache.org/jira/browse/SPARK-41056 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > R 4.2 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] > "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/lib/R/site-library" > {code} > R 4.1 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2
[ https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41056. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38570 [https://github.com/apache/spark/pull/38570] > Fix new R_LIBS_SITE behavior introduced in R 4.2 > > > Key: SPARK-41056 > URL: https://issues.apache.org/jira/browse/SPARK-41056 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > R 4.2 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] > "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/lib/R/site-library" > {code} > R 4.1 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41073) Spark ThriftServer generate huge amounts of DelegationToken
zhengchenyu created SPARK-41073: --- Summary: Spark ThriftServer generate huge amounts of DelegationToken Key: SPARK-41073 URL: https://issues.apache.org/jira/browse/SPARK-41073 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: zhengchenyu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41058) Removing unused code in connect
[ https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41058: --- Assignee: Deng Ziming > Removing unused code in connect > --- > > Key: SPARK-41058 > URL: https://issues.apache.org/jira/browse/SPARK-41058 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Deng Ziming >Assignee: Deng Ziming >Priority: Minor > Fix For: 3.4.0 > > > There are some unused code in `connect` module, for example unused import in > commands.proto, and used code in SparkConnectStreamHandler.scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41058) Removing unused code in connect
[ https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41058. - Resolution: Fixed Issue resolved by pull request 38491 [https://github.com/apache/spark/pull/38491] > Removing unused code in connect > --- > > Key: SPARK-41058 > URL: https://issues.apache.org/jira/browse/SPARK-41058 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Deng Ziming >Assignee: Deng Ziming >Priority: Minor > Fix For: 3.4.0 > > > There are some unused code in `connect` module, for example unused import in > commands.proto, and used code in SparkConnectStreamHandler.scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
[ https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630780#comment-17630780 ] Apache Spark commented on SPARK-41064: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38578 > Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` > > > Key: SPARK-41064 > URL: https://issues.apache.org/jira/browse/SPARK-41064 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
[ https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41064: Assignee: Ruifeng Zheng (was: Apache Spark) > Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` > > > Key: SPARK-41064 > URL: https://issues.apache.org/jira/browse/SPARK-41064 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
[ https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630779#comment-17630779 ] Apache Spark commented on SPARK-41064: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38578 > Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` > > > Key: SPARK-41064 > URL: https://issues.apache.org/jira/browse/SPARK-41064 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
[ https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41064: Assignee: Apache Spark (was: Ruifeng Zheng) > Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` > > > Key: SPARK-41064 > URL: https://issues.apache.org/jira/browse/SPARK-41064 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41072) Convert the internal error about failed stream to user-facing error
Max Gekk created SPARK-41072: Summary: Convert the internal error about failed stream to user-facing error Key: SPARK-41072 URL: https://issues.apache.org/jira/browse/SPARK-41072 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assign an error class to the following internal error since it is an user-facing error: {code} java.lang.Exception: org.apache.spark.sql.streaming.StreamingQueryException: Query cloudtrail_pipeline [id = 5a3758c3-3b3a-47ff-843a-23292cde3b4f, runId = c1a90694-daa2-4929-b749-82b8a43fa2b1] terminated with exception: [INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, fill a bug report in, and provide the full stack trace. 2 at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:403) 3 at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$4(StreamExecution.scala:269) 4 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 5 at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:42) 6 at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:269) 7Caused by: java.lang.Exception: org.apache.spark.SparkException: [INTERNAL_ERROR] Execution of the stream cloudtrail_pipeline failed. Please, fill a bug report in, and provide the full stack trace. 8 at {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630772#comment-17630772 ] Apache Spark commented on SPARK-41071: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38577 > Metaspace OOM when Local run dev/make-distribution.sh > -- > > Key: SPARK-41071 > URL: https://issues.apache.org/jira/browse/SPARK-41071 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run > {code:java} > dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > {code} > {code:java} > [ERROR] ## Exception when compiling 19 sources to > /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes > java.lang.OutOfMemoryError: Metaspace > java.lang.ClassLoader.defineClass1(Native Method) > java.lang.ClassLoader.defineClass(ClassLoader.java:757) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:473) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:419) > scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44) > java.lang.ClassLoader.loadClass(ClassLoader.java:352) > scala.collection.immutable.Set$Set2.$plus(Set.scala:170) > scala.collection.immutable.Set$Set2.$plus(Set.scala:164) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455) > scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153) > scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown > Source) > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153) > scala.collection.TraversableLike.groupBy(TraversableLike.scala:524) > scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454) > scala.collection.AbstractTraversable.groupBy(Traversable.scala:108) > scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91) > scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28) > scala.tools.nsc.Settings.(Settings.scala:19) > scala.tools.nsc.Settings.(Settings.scala:20) > xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41071: Assignee: (was: Apache Spark) > Metaspace OOM when Local run dev/make-distribution.sh > -- > > Key: SPARK-41071 > URL: https://issues.apache.org/jira/browse/SPARK-41071 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run > {code:java} > dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > {code} > {code:java} > [ERROR] ## Exception when compiling 19 sources to > /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes > java.lang.OutOfMemoryError: Metaspace > java.lang.ClassLoader.defineClass1(Native Method) > java.lang.ClassLoader.defineClass(ClassLoader.java:757) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:473) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:419) > scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44) > java.lang.ClassLoader.loadClass(ClassLoader.java:352) > scala.collection.immutable.Set$Set2.$plus(Set.scala:170) > scala.collection.immutable.Set$Set2.$plus(Set.scala:164) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455) > scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153) > scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown > Source) > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153) > scala.collection.TraversableLike.groupBy(TraversableLike.scala:524) > scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454) > scala.collection.AbstractTraversable.groupBy(Traversable.scala:108) > scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91) > scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28) > scala.tools.nsc.Settings.(Settings.scala:19) > scala.tools.nsc.Settings.(Settings.scala:20) > xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41071: Assignee: Apache Spark > Metaspace OOM when Local run dev/make-distribution.sh > -- > > Key: SPARK-41071 > URL: https://issues.apache.org/jira/browse/SPARK-41071 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > Run > {code:java} > dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > {code} > {code:java} > [ERROR] ## Exception when compiling 19 sources to > /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes > java.lang.OutOfMemoryError: Metaspace > java.lang.ClassLoader.defineClass1(Native Method) > java.lang.ClassLoader.defineClass(ClassLoader.java:757) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:473) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:419) > scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44) > java.lang.ClassLoader.loadClass(ClassLoader.java:352) > scala.collection.immutable.Set$Set2.$plus(Set.scala:170) > scala.collection.immutable.Set$Set2.$plus(Set.scala:164) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455) > scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153) > scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown > Source) > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153) > scala.collection.TraversableLike.groupBy(TraversableLike.scala:524) > scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454) > scala.collection.AbstractTraversable.groupBy(Traversable.scala:108) > scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91) > scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28) > scala.tools.nsc.Settings.(Settings.scala:19) > scala.tools.nsc.Settings.(Settings.scala:20) > xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630768#comment-17630768 ] Apache Spark commented on SPARK-41071: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38577 > Metaspace OOM when Local run dev/make-distribution.sh > -- > > Key: SPARK-41071 > URL: https://issues.apache.org/jira/browse/SPARK-41071 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run > {code:java} > dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > {code} > {code:java} > [ERROR] ## Exception when compiling 19 sources to > /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes > java.lang.OutOfMemoryError: Metaspace > java.lang.ClassLoader.defineClass1(Native Method) > java.lang.ClassLoader.defineClass(ClassLoader.java:757) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:473) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:419) > scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44) > java.lang.ClassLoader.loadClass(ClassLoader.java:352) > scala.collection.immutable.Set$Set2.$plus(Set.scala:170) > scala.collection.immutable.Set$Set2.$plus(Set.scala:164) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455) > scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153) > scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown > Source) > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153) > scala.collection.TraversableLike.groupBy(TraversableLike.scala:524) > scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454) > scala.collection.AbstractTraversable.groupBy(Traversable.scala:108) > scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91) > scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28) > scala.tools.nsc.Settings.(Settings.scala:19) > scala.tools.nsc.Settings.(Settings.scala:20) > xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630761#comment-17630761 ] Yang Jie commented on SPARK-41071: -- cc [~hyukjin.kwon] [~yumwang] I run {code:java} dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive {code} on the master branch always fails. also cc [~pancheng] > Metaspace OOM when Local run dev/make-distribution.sh > -- > > Key: SPARK-41071 > URL: https://issues.apache.org/jira/browse/SPARK-41071 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run > {code:java} > dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > {code} > {code:java} > [ERROR] ## Exception when compiling 19 sources to > /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes > java.lang.OutOfMemoryError: Metaspace > java.lang.ClassLoader.defineClass1(Native Method) > java.lang.ClassLoader.defineClass(ClassLoader.java:757) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:473) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:419) > scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44) > java.lang.ClassLoader.loadClass(ClassLoader.java:352) > scala.collection.immutable.Set$Set2.$plus(Set.scala:170) > scala.collection.immutable.Set$Set2.$plus(Set.scala:164) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455) > scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153) > scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown > Source) > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153) > scala.collection.TraversableLike.groupBy(TraversableLike.scala:524) > scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454) > scala.collection.AbstractTraversable.groupBy(Traversable.scala:108) > scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91) > scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28) > scala.tools.nsc.Settings.(Settings.scala:19) > scala.tools.nsc.Settings.(Settings.scala:20) > xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
[ https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630760#comment-17630760 ] Apache Spark commented on SPARK-41062: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38576 > Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE > --- > > Key: SPARK-41062 > URL: https://issues.apache.org/jira/browse/SPARK-41062 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should use clear and brief name for every error class. > This sub-error class duplicates the main class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
[ https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41062: Assignee: (was: Apache Spark) > Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE > --- > > Key: SPARK-41062 > URL: https://issues.apache.org/jira/browse/SPARK-41062 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should use clear and brief name for every error class. > This sub-error class duplicates the main class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
[ https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41062: Assignee: Apache Spark > Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE > --- > > Key: SPARK-41062 > URL: https://issues.apache.org/jira/browse/SPARK-41062 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should use clear and brief name for every error class. > This sub-error class duplicates the main class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41071) Metaspace OOM when Local run dev/make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-41071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-41071: - Summary: Metaspace OOM when Local run dev/make-distribution.sh (was: Metaspace OOm when Local run dev/make-distribution.sh ) > Metaspace OOM when Local run dev/make-distribution.sh > -- > > Key: SPARK-41071 > URL: https://issues.apache.org/jira/browse/SPARK-41071 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Run > {code:java} > dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > {code} > {code:java} > [ERROR] ## Exception when compiling 19 sources to > /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes > java.lang.OutOfMemoryError: Metaspace > java.lang.ClassLoader.defineClass1(Native Method) > java.lang.ClassLoader.defineClass(ClassLoader.java:757) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:473) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:419) > scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44) > java.lang.ClassLoader.loadClass(ClassLoader.java:352) > scala.collection.immutable.Set$Set2.$plus(Set.scala:170) > scala.collection.immutable.Set$Set2.$plus(Set.scala:164) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) > scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467) > scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455) > scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153) > scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown > Source) > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153) > scala.collection.TraversableLike.groupBy(TraversableLike.scala:524) > scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454) > scala.collection.AbstractTraversable.groupBy(Traversable.scala:108) > scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91) > scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28) > scala.tools.nsc.Settings.(Settings.scala:19) > scala.tools.nsc.Settings.(Settings.scala:20) > xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41071) Metaspace OOm when Local run dev/make-distribution.sh
Yang Jie created SPARK-41071: Summary: Metaspace OOm when Local run dev/make-distribution.sh Key: SPARK-41071 URL: https://issues.apache.org/jira/browse/SPARK-41071 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie Run {code:java} dev/make-distribution.sh --tgz -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive {code} {code:java} [ERROR] ## Exception when compiling 19 sources to /Users/yangjie01/SourceCode/git/spark-mine-12/connector/avro/target/scala-2.12/classes java.lang.OutOfMemoryError: Metaspace java.lang.ClassLoader.defineClass1(Native Method) java.lang.ClassLoader.defineClass(ClassLoader.java:757) java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) java.net.URLClassLoader.defineClass(URLClassLoader.java:473) java.net.URLClassLoader.access$100(URLClassLoader.java:74) java.net.URLClassLoader$1.run(URLClassLoader.java:369) java.net.URLClassLoader$1.run(URLClassLoader.java:363) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:362) java.lang.ClassLoader.loadClass(ClassLoader.java:419) scala_maven.ScalaCompilerLoader.loadClass(ScalaCompilerLoader.java:44) java.lang.ClassLoader.loadClass(ClassLoader.java:352) scala.collection.immutable.Set$Set2.$plus(Set.scala:170) scala.collection.immutable.Set$Set2.$plus(Set.scala:164) scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28) scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24) scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:467) scala.collection.TraversableLike$grouper$1$.apply(TraversableLike.scala:455) scala.collection.mutable.HashMap$$anon$1.$anonfun$foreach$2(HashMap.scala:153) scala.collection.mutable.HashMap$$anon$1$$Lambda$68804/294594059.apply(Unknown Source) scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:153) scala.collection.TraversableLike.groupBy(TraversableLike.scala:524) scala.collection.TraversableLike.groupBy$(TraversableLike.scala:454) scala.collection.AbstractTraversable.groupBy(Traversable.scala:108) scala.tools.nsc.settings.Warnings.$init$(Warnings.scala:91) scala.tools.nsc.settings.MutableSettings.(MutableSettings.scala:28) scala.tools.nsc.Settings.(Settings.scala:19) scala.tools.nsc.Settings.(Settings.scala:20) xsbt.CachedCompiler0.(CompilerBridge.scala:79) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData
[ https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna updated SPARK-41070: Description: We are connecting Tera data from spark SQL with below API {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, connectionProperties);{color} We are facing one issue when we execute above logic on large table with million rows every time we are seeing below extra query is executing every time as this resulting performance hit on DB. This below information we got from DBA. We dont have any logs on SPARK SQL. SELECT 1 FROM ONE_MILLION_ROWS_TABLE; |1| |1| |1| |1| |1| |1| |1| |1| |1| |1| Can you please clarify why this query is executing or is there any chance that this type of query is executing from our code it self while check for rows count from dataframe. Please provide me your inputs on this. was: We are connecting Tera data from spark SQL with below API {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, connectionProperties);{color} We are facing one issue when we execute above logic on large table with million rows every time we are seeing below extra query is executing every time as this resulting performance hit on DB. This below information we got from DBA. We dont have any logs on SPARK SQL. SELECT 1 FROM ONE_MILLION_ROWS_TABLE; |1| |1| |1| |1| |1| |1| |1| |1| |1| |1| Can you please clarify why this query is executing or is there any chance that this query is executing from our code it self while check for rows count from dataframe. Please provide me your inputs on this. > Performance issue when Spark SQL connects with TeraData > > > Key: SPARK-41070 > URL: https://issues.apache.org/jira/browse/SPARK-41070 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Ramakrishna >Priority: Major > > We are connecting Tera data from spark SQL with below API > {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, > tableQuery, connectionProperties);{color} > We are facing one issue when we execute above logic on large table with > million rows every time we are seeing below extra query is executing every > time as this resulting performance hit on DB. > This below information we got from DBA. We dont have any logs on SPARK SQL. > SELECT 1 FROM ONE_MILLION_ROWS_TABLE; > |1| > |1| > |1| > |1| > |1| > |1| > |1| > |1| > |1| > |1| > > Can you please clarify why this query is executing or is there any chance > that this type of query is executing from our code it self while check for > rows count from dataframe. > > Please provide me your inputs on this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41070) Performance issue when Spark SQL connects with TeraData
[ https://issues.apache.org/jira/browse/SPARK-41070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramakrishna updated SPARK-41070: Description: We are connecting Tera data from spark SQL with below API {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, connectionProperties);{color} We are facing one issue when we execute above logic on large table with million rows every time we are seeing below extra query is executing every time as this resulting performance hit on DB. This below information we got from DBA. We dont have any logs on SPARK SQL. SELECT 1 FROM ONE_MILLION_ROWS_TABLE; |1| |1| |1| |1| |1| |1| |1| |1| |1| |1| Can you please clarify why this query is executing or is there any chance that this query is executing from our code it self while check for rows count from dataframe. Please provide me your inputs on this. was: We are connecting Tera data from spark SQL with below API Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, connectionProperties); We are facing one issue when we execute this logic on large table with million rows every time we are seeing below extra query is executing every times as this resulting performance hit on DB. This below information we got from DBA. We dont have any logs on SPARK SQL. SELECT 1 FROM ONE_MILLION_ROWS_TABLE; |1| |1| |1| |1| |1| |1| |1| |1| |1| |1| Can you please clarify why this query is executing or is there any chance that this query is executing from our code it self while check for rows count from dataframe. Please provide me your inputs on this. > Performance issue when Spark SQL connects with TeraData > > > Key: SPARK-41070 > URL: https://issues.apache.org/jira/browse/SPARK-41070 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Ramakrishna >Priority: Major > > We are connecting Tera data from spark SQL with below API > {color:#ff8b00}Dataset jdbcDF = spark.read().jdbc(connectionUrl, > tableQuery, connectionProperties);{color} > > We are facing one issue when we execute above logic on large table with > million rows every time we are seeing below extra query is executing every > time as this resulting performance hit on DB. > This below information we got from DBA. We dont have any logs on SPARK SQL. > SELECT 1 FROM ONE_MILLION_ROWS_TABLE; > |1| > |1| > |1| > |1| > |1| > |1| > |1| > |1| > |1| > |1| > > Can you please clarify why this query is executing or is there any chance > that this query is executing from our code it self while check for rows count > from dataframe. > > Please provide me your inputs on this. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41070) Performance issue when Spark SQL connects with TeraData
Ramakrishna created SPARK-41070: --- Summary: Performance issue when Spark SQL connects with TeraData Key: SPARK-41070 URL: https://issues.apache.org/jira/browse/SPARK-41070 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4 Reporter: Ramakrishna We are connecting Tera data from spark SQL with below API Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, connectionProperties); We are facing one issue when we execute this logic on large table with million rows every time we are seeing below extra query is executing every times as this resulting performance hit on DB. This below information we got from DBA. We dont have any logs on SPARK SQL. SELECT 1 FROM ONE_MILLION_ROWS_TABLE; |1| |1| |1| |1| |1| |1| |1| |1| |1| |1| Can you please clarify why this query is executing or is there any chance that this query is executing from our code it self while check for rows count from dataframe. Please provide me your inputs on this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40948) Introduce new error class: PATH_NOT_FOUND
[ https://issues.apache.org/jira/browse/SPARK-40948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630757#comment-17630757 ] Apache Spark commented on SPARK-40948: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38575 > Introduce new error class: PATH_NOT_FOUND > - > > Key: SPARK-40948 > URL: https://issues.apache.org/jira/browse/SPARK-40948 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Recently we added many error classes named by LEGACY_ERROR_TEMP_. > We should update them to use proper error class name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40948) Introduce new error class: PATH_NOT_FOUND
[ https://issues.apache.org/jira/browse/SPARK-40948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630758#comment-17630758 ] Apache Spark commented on SPARK-40948: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38575 > Introduce new error class: PATH_NOT_FOUND > - > > Key: SPARK-40948 > URL: https://issues.apache.org/jira/browse/SPARK-40948 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Recently we added many error classes named by LEGACY_ERROR_TEMP_. > We should update them to use proper error class name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Affects Version/s: 3.1.2 > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2, 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should be created: for a > driver and an executor. However, the library creates only one driver config > map for all jobs (in some cases it generates only one executor map for all > jobs in the same manner). So, if I run 5 jobs, then only one driver config > map will be generated and used for every job. During those runs we > experience issues when deleting pods from the cluster: executors pods are > endlessly created and immediately terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630744#comment-17630744 ] Hyukjin Kwon commented on SPARK-41063: -- Is this from our CI? > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630742#comment-17630742 ] Hyukjin Kwon commented on SPARK-41063: -- I am aware of this issue but I don't exactly know how to fix. As a workaround, you can try {{git clean -fxd}} (see also https://github.com/sbt/sbt/issues/6183) > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41069) Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile`
Ruifeng Zheng created SPARK-41069: - Summary: Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile` Key: SPARK-41069 URL: https://issues.apache.org/jira/browse/SPARK-41069 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng Assignee: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41068) Implement `DataFrame.stat.corr`
Ruifeng Zheng created SPARK-41068: - Summary: Implement `DataFrame.stat.corr` Key: SPARK-41068 URL: https://issues.apache.org/jira/browse/SPARK-41068 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng Assignee: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41067) Implement `DataFrame.stat.cov`
Ruifeng Zheng created SPARK-41067: - Summary: Implement `DataFrame.stat.cov` Key: SPARK-41067 URL: https://issues.apache.org/jira/browse/SPARK-41067 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng Assignee: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Description: There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should be created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs in the same manner). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project). was: There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project). > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should be created: for a > driver and an executor. However, the library creates only one driver config > map for all jobs (in some cases it generates only one executor map for all > jobs in the same manner). So, if I run 5 jobs, then only one driver config > map will be generated and used for every job. During those runs we > experience issues when deleting pods from the cluster: executors pods are > endlessly created and immediately terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41066) Implement `DataFrame.sampleBy ` and `DataFrame.stat.sampleBy `
Ruifeng Zheng created SPARK-41066: - Summary: Implement `DataFrame.sampleBy ` and `DataFrame.stat.sampleBy ` Key: SPARK-41066 URL: https://issues.apache.org/jira/browse/SPARK-41066 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng Assignee: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41065) Implement `DataFrame.freqItems ` and `DataFrame.stat.freqItems `
Ruifeng Zheng created SPARK-41065: - Summary: Implement `DataFrame.freqItems ` and `DataFrame.stat.freqItems ` Key: SPARK-41065 URL: https://issues.apache.org/jira/browse/SPARK-41065 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng Assignee: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
[ https://issues.apache.org/jira/browse/SPARK-41064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41064: - Assignee: Ruifeng Zheng > Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` > > > Key: SPARK-41064 > URL: https://issues.apache.org/jira/browse/SPARK-41064 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41064) Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab`
Ruifeng Zheng created SPARK-41064: - Summary: Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` Key: SPARK-41064 URL: https://issues.apache.org/jira/browse/SPARK-41064 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40917) Add a dedicated logical plan for `Summary`
[ https://issues.apache.org/jira/browse/SPARK-40917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40917: - Assignee: Ruifeng Zheng > Add a dedicated logical plan for `Summary` > -- > > Key: SPARK-40917 > URL: https://issues.apache.org/jira/browse/SPARK-40917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40917) Add a dedicated logical plan for `Summary`
[ https://issues.apache.org/jira/browse/SPARK-40917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-40917. --- Resolution: Workaround > Add a dedicated logical plan for `Summary` > -- > > Key: SPARK-40917 > URL: https://issues.apache.org/jira/browse/SPARK-40917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630737#comment-17630737 ] Apache Spark commented on SPARK-41060: -- User '19Serhii99' has created a pull request for this issue: https://github.com/apache/spark/pull/38574 > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should created: for a driver > and an executor. However, the library creates only one driver config map for > all jobs (in some cases it generates only one executor map for all jobs). So, > if I run 5 jobs, then only one driver config map will be generated and used > for every job. During those runs we experience issues when deleting pods > from the cluster: executors pods are endlessly created and immediately > terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41060: Assignee: (was: Apache Spark) > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should created: for a driver > and an executor. However, the library creates only one driver config map for > all jobs (in some cases it generates only one executor map for all jobs). So, > if I run 5 jobs, then only one driver config map will be generated and used > for every job. During those runs we experience issues when deleting pods > from the cluster: executors pods are endlessly created and immediately > terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630736#comment-17630736 ] Apache Spark commented on SPARK-41060: -- User '19Serhii99' has created a pull request for this issue: https://github.com/apache/spark/pull/38574 > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should created: for a driver > and an executor. However, the library creates only one driver config map for > all jobs (in some cases it generates only one executor map for all jobs). So, > if I run 5 jobs, then only one driver config map will be generated and used > for every job. During those runs we experience issues when deleting pods > from the cluster: executors pods are endlessly created and immediately > terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41060: Assignee: Apache Spark > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Assignee: Apache Spark >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should created: for a driver > and an executor. However, the library creates only one driver config map for > all jobs (in some cases it generates only one executor map for all jobs). So, > if I run 5 jobs, then only one driver config map will be generated and used > for every job. During those runs we experience issues when deleting pods > from the cluster: executors pods are endlessly created and immediately > terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40513) SPIP: Support Docker Official Image for Spark
[ https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630735#comment-17630735 ] Yikun Jiang commented on SPARK-40513: - Enable Github Autolink references for spark-docker https://issues.apache.org/jira/browse/INFRA-23789 > SPIP: Support Docker Official Image for Spark > - > > Key: SPARK-40513 > URL: https://issues.apache.org/jira/browse/SPARK-40513 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, PySpark, SparkR >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > Labels: SPIP > > This SPIP is proposed to add [Docker Official > Image(DOI)|https://github.com/docker-library/official-images] to ensure the > Spark Docker images meet the quality standards for Docker images, to provide > these Docker images for users who want to use Apache Spark via Docker image. > There are also several [Apache projects that release the Docker Official > Images|https://hub.docker.com/search?q=apache&image_filter=official], such > as: [flink|https://hub.docker.com/_/flink], > [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], > [zookeeper|https://hub.docker.com/_/zookeeper], > [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). > From the huge download statistics, we can see the real demands of users, and > from the support of other apache projects, we should also be able to do it. > After support: > * The Dockerfile will still be maintained by the Apache Spark community and > reviewed by Docker. > * The images will be maintained by the Docker community to ensure the > quality standards for Docker images of the Docker community. > It will also reduce the extra docker images maintenance effort (such as > frequently rebuilding, image security update) of the Apache Spark community. > > SPIP DOC: > [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o] > DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40513) SPIP: Support Docker Official Image for Spark
[ https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630733#comment-17630733 ] Yikun Jiang commented on SPARK-40513: - Add secrets.DOCKER_USER and secrets.DOCKER_TOKEN for apache/spark-docker https://issues.apache.org/jira/browse/INFRA-23882 > SPIP: Support Docker Official Image for Spark > - > > Key: SPARK-40513 > URL: https://issues.apache.org/jira/browse/SPARK-40513 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, PySpark, SparkR >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > Labels: SPIP > > This SPIP is proposed to add [Docker Official > Image(DOI)|https://github.com/docker-library/official-images] to ensure the > Spark Docker images meet the quality standards for Docker images, to provide > these Docker images for users who want to use Apache Spark via Docker image. > There are also several [Apache projects that release the Docker Official > Images|https://hub.docker.com/search?q=apache&image_filter=official], such > as: [flink|https://hub.docker.com/_/flink], > [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], > [zookeeper|https://hub.docker.com/_/zookeeper], > [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). > From the huge download statistics, we can see the real demands of users, and > from the support of other apache projects, we should also be able to do it. > After support: > * The Dockerfile will still be maintained by the Apache Spark community and > reviewed by Docker. > * The images will be maintained by the Docker community to ensure the > quality standards for Docker images of the Docker community. > It will also reduce the extra docker images maintenance effort (such as > frequently rebuilding, image security update) of the Apache Spark community. > > SPIP DOC: > [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o] > DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Description: There's a problem with submitting spark jobs to K8s cluster: the library generates and reuses the same name for config maps (for drivers and executors). Ideally, for each job 2 config maps should created: for a driver and an executor. However, the library creates only one driver config map for all jobs (in some cases it generates only one executor map for all jobs). So, if I run 5 jobs, then only one driver config map will be generated and used for every job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be urgently fixed. I've prepared some changes for review to fix the issue (tested in the cluster of our project). was: There's a problem with submitting spark job to K8s cluster: the library generates and reuses the same name for config map (for drivers and executors). So, if we run 5 jobs sequantially or in parallel, then one config map will be created and then overwritten 4 times which means this config map will be applied / used for all 5 jobs instead of creating one config map for each job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be fixed. I've prepared some changes for review to fix the issue. > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark jobs to K8s cluster: the library > generates and reuses the same name for config maps (for drivers and > executors). Ideally, for each job 2 config maps should created: for a driver > and an executor. However, the library creates only one driver config map for > all jobs (in some cases it generates only one executor map for all jobs). So, > if I run 5 jobs, then only one driver config map will be generated and used > for every job. During those runs we experience issues when deleting pods > from the cluster: executors pods are endlessly created and immediately > terminated overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be urgently fixed. I've prepared some changes for > review to fix the issue (tested in the cluster of our project). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630731#comment-17630731 ] Yang Jie commented on SPARK-41063: -- cc [~hyukjin.kwon] > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41051) Optimize ProcfsMetrics file acquisition
[ https://issues.apache.org/jira/browse/SPARK-41051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630730#comment-17630730 ] Apache Spark commented on SPARK-41051: -- User 'Narcasserun' has created a pull request for this issue: https://github.com/apache/spark/pull/38563 > Optimize ProcfsMetrics file acquisition > --- > > Key: SPARK-41051 > URL: https://issues.apache.org/jira/browse/SPARK-41051 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.1 > Environment: spark-master >Reporter: sur >Priority: Minor > Fix For: 3.4.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > When obtaining the Procfs file, variables are created but not used, and there > are duplicate codes. We should reduce such situations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41063) `hive-thriftserver` module compilation deadlock
[ https://issues.apache.org/jira/browse/SPARK-41063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630729#comment-17630729 ] Yang Jie commented on SPARK-41063: -- Always printing similar logs in a loop {code:java} 2022-11-09T01:01:42.8632147Z [0m[[0m[0minfo[0m] [0m[0mNote: /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java uses or overrides a deprecated API.[0m 2022-11-09T01:01:42.8632717Z [0m[[0m[0minfo[0m] [0m[0mNote: Recompile with -Xlint:deprecation for details.[0m 2022-11-09T01:01:42.8633082Z [0m[[0m[0minfo[0m] [0m[0mdone compiling[0m 2022-11-09T01:01:42.8633618Z [0m[[0m[0minfo[0m] [0m[0mcompiling 22 Scala sources and 24 Java sources to /home/runner/work/spark/spark/sql/hive-thriftserver/target/scala-2.13/classes ...[0m 2022-11-09T01:01:42.8634133Z [0m[[0m[0minfo[0m] [0m[0mNote: Some input files use or override a deprecated API.[0m 2022-11-09T01:01:42.8634581Z [0m[[0m[0minfo[0m] [0m[0mNote: Recompile with -Xlint:deprecation for details.[0m 2022-11-09T01:01:42.8634941Z [0m[[0m[0minfo[0m] [0m[0mdone compiling[0m 2022-11-09T01:01:42.8635478Z [0m[[0m[0minfo[0m] [0m[0mcompiling 22 Scala sources and 9 Java sources to /home/runner/work/spark/spark/sql/hive-thriftserver/target/scala-2.13/classes ...[0m {code} > `hive-thriftserver` module compilation deadlock > --- > > Key: SPARK-41063 > URL: https://issues.apache.org/jira/browse/SPARK-41063 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] > > I have seen it when compiling with Maven locally, but I haven't investigated > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41063) `hive-thriftserver` module compilation deadlock
Yang Jie created SPARK-41063: Summary: `hive-thriftserver` module compilation deadlock Key: SPARK-41063 URL: https://issues.apache.org/jira/browse/SPARK-41063 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie [https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194220/signedlogcontent/20?urlExpires=2022-11-09T03%3A31%3A51.5657953Z&urlSigningMethod=HMACV1&urlSignature=Jnn4uML8U79K6MF%2F%2BRUrrUbaOqxsqMA0DL0%2BQtrlBpM%3D] I have seen it when compiling with Maven locally, but I haven't investigated -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41051) Optimize ProcfsMetrics file acquisition
[ https://issues.apache.org/jira/browse/SPARK-41051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-41051. - Fix Version/s: 3.4.0 (was: 3.3.2) Resolution: Fixed Issue resolved by pull request 38563 [https://github.com/apache/spark/pull/38563] > Optimize ProcfsMetrics file acquisition > --- > > Key: SPARK-41051 > URL: https://issues.apache.org/jira/browse/SPARK-41051 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.1 > Environment: spark-master >Reporter: sur >Priority: Minor > Fix For: 3.4.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > When obtaining the Procfs file, variables are created but not used, and there > are duplicate codes. We should reduce such situations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Description: There's a problem with submitting spark job to K8s cluster: the library generates and reuses the same name for config map (for drivers and executors). So, if we run 5 jobs sequantially or in parallel, then one config map will be created and then overwritten 4 times which means this config map will be applied / used for all 5 jobs instead of creating one config map for each job. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. This problem occurs because of the *KubernetesClientUtils* class in which we have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems to be incorrect and should be fixed. I've prepared some changes for review to fix the issue. was:There's a problem with submitting spark job to K8s cluster: the library generates and reuses the same name for Config Map (for drivers and executors). So, if we run 5 jobs sequantially or in parallel, then one Config Map will be created and then overwritten 4 times. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark job to K8s cluster: the library > generates and reuses the same name for config map (for drivers and > executors). So, if we run 5 jobs sequantially or in parallel, then one config > map will be created and then overwritten 4 times which means this config map > will be applied / used for all 5 jobs instead of creating one config map for > each job. During those runs we experience issues when deleting pods from the > cluster: executors pods are endlessly created and immediately terminated > overloading cluster resources. > This problem occurs because of the *KubernetesClientUtils* class in which we > have *configMapNameExecutor* and *configMapNameDriver* as constants. It seems > to be incorrect and should be fixed. I've prepared some changes for review to > fix the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
Haejoon Lee created SPARK-41062: --- Summary: Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE Key: SPARK-41062 URL: https://issues.apache.org/jira/browse/SPARK-41062 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Haejoon Lee We should use clear and brief name for every error class. This sub-error class duplicates the main class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41062) Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE
[ https://issues.apache.org/jira/browse/SPARK-41062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630724#comment-17630724 ] Haejoon Lee commented on SPARK-41062: - I'm working on it > Rename UNSUPPORTED_CORRELATED_REFERENCE to CORRELATED_REFERENCE > --- > > Key: SPARK-41062 > URL: https://issues.apache.org/jira/browse/SPARK-41062 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should use clear and brief name for every error class. > This sub-error class duplicates the main class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630723#comment-17630723 ] Apache Spark commented on SPARK-41061: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38573 > Support SelectExpr which apply Projection by expressions in Strings in > Connect DSL > -- > > Key: SPARK-41061 > URL: https://issues.apache.org/jira/browse/SPARK-41061 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
[ https://issues.apache.org/jira/browse/SPARK-41060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serhii Nesterov updated SPARK-41060: Description: There's a problem with submitting spark job to K8s cluster: the library generates and reuses the same name for Config Map (for drivers and executors). So, if we run 5 jobs sequantially or in parallel, then one Config Map will be created and then overwritten 4 times. During those runs we experience issues when deleting pods from the cluster: executors pods are endlessly created and immediately terminated overloading cluster resources. > Spark Submitter generates a ConfigMap with the same name > > > Key: SPARK-41060 > URL: https://issues.apache.org/jira/browse/SPARK-41060 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0, 3.3.1 >Reporter: Serhii Nesterov >Priority: Major > > There's a problem with submitting spark job to K8s cluster: the library > generates and reuses the same name for Config Map (for drivers and > executors). So, if we run 5 jobs sequantially or in parallel, then one Config > Map will be created and then overwritten 4 times. During those runs we > experience issues when deleting pods from the cluster: executors pods are > endlessly created and immediately terminated overloading cluster resources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41061: Assignee: Apache Spark > Support SelectExpr which apply Projection by expressions in Strings in > Connect DSL > -- > > Key: SPARK-41061 > URL: https://issues.apache.org/jira/browse/SPARK-41061 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630720#comment-17630720 ] Apache Spark commented on SPARK-41061: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38573 > Support SelectExpr which apply Projection by expressions in Strings in > Connect DSL > -- > > Key: SPARK-41061 > URL: https://issues.apache.org/jira/browse/SPARK-41061 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-41061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41061: Assignee: (was: Apache Spark) > Support SelectExpr which apply Projection by expressions in Strings in > Connect DSL > -- > > Key: SPARK-41061 > URL: https://issues.apache.org/jira/browse/SPARK-41061 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
[ https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41059: Assignee: (was: Apache Spark) > Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION > --- > > Key: SPARK-41059 > URL: https://issues.apache.org/jira/browse/SPARK-41059 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
[ https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41059: Assignee: Apache Spark > Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION > --- > > Key: SPARK-41059 > URL: https://issues.apache.org/jira/browse/SPARK-41059 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
[ https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630718#comment-17630718 ] Apache Spark commented on SPARK-41059: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38572 > Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION > --- > > Key: SPARK-41059 > URL: https://issues.apache.org/jira/browse/SPARK-41059 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
[ https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41059: Assignee: Apache Spark > Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION > --- > > Key: SPARK-41059 > URL: https://issues.apache.org/jira/browse/SPARK-41059 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
[ https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630719#comment-17630719 ] Apache Spark commented on SPARK-41059: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38572 > Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION > --- > > Key: SPARK-41059 > URL: https://issues.apache.org/jira/browse/SPARK-41059 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41061) Support SelectExpr which apply Projection by expressions in Strings in Connect DSL
Rui Wang created SPARK-41061: Summary: Support SelectExpr which apply Projection by expressions in Strings in Connect DSL Key: SPARK-41061 URL: https://issues.apache.org/jira/browse/SPARK-41061 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41060) Spark Submitter generates a ConfigMap with the same name
Serhii Nesterov created SPARK-41060: --- Summary: Spark Submitter generates a ConfigMap with the same name Key: SPARK-41060 URL: https://issues.apache.org/jira/browse/SPARK-41060 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.3.1, 3.3.0 Reporter: Serhii Nesterov -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
Haejoon Lee created SPARK-41059: --- Summary: Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION Key: SPARK-41059 URL: https://issues.apache.org/jira/browse/SPARK-41059 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Haejoon Lee We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41059) Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION
[ https://issues.apache.org/jira/browse/SPARK-41059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630716#comment-17630716 ] Haejoon Lee commented on SPARK-41059: - I'm working on it > Rename _LEGACY_ERROR_TEMP_2420 to NESTED_AGGREGATE_FUNCTION > --- > > Key: SPARK-41059 > URL: https://issues.apache.org/jira/browse/SPARK-41059 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should all _LEGACY errors to error proper named error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37555) spark-sql should pass last unclosed comment to backend and execute throw a exception
[ https://issues.apache.org/jira/browse/SPARK-37555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630715#comment-17630715 ] Apache Spark commented on SPARK-37555: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/38571 > spark-sql should pass last unclosed comment to backend and execute throw a > exception > > > Key: SPARK-37555 > URL: https://issues.apache.org/jira/browse/SPARK-37555 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > In current spark-sql. if last statement is an unclosed comment. It won't > execute this SQL. is not correct. We should pass this and throw an exception > throw antlr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41039) Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-41039. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38550 [https://github.com/apache/spark/pull/38550] > Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13 > > > Key: SPARK-41039 > URL: https://issues.apache.org/jira/browse/SPARK-41039 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > > scala-parallel-collections started from version 1.0.4 to verify the > compatibility of Java 17 through CI > * > [https://github.com/scala/scala-parallel-collections/compare/v1.0.3...v1.0.4] > * https://github.com/scala/scala-parallel-collections/releases/tag/v1.0.4 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41039) Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-41039: Assignee: Yang Jie > Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13 > > > Key: SPARK-41039 > URL: https://issues.apache.org/jira/browse/SPARK-41039 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > scala-parallel-collections started from version 1.0.4 to verify the > compatibility of Java 17 through CI > * > [https://github.com/scala/scala-parallel-collections/compare/v1.0.3...v1.0.4] > * https://github.com/scala/scala-parallel-collections/releases/tag/v1.0.4 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41039) Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-41039: - Priority: Trivial (was: Major) > Upgrade `scala-parallel-collections` to 1.0.4 for Scala 2.13 > > > Key: SPARK-41039 > URL: https://issues.apache.org/jira/browse/SPARK-41039 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.4.0 > > > scala-parallel-collections started from version 1.0.4 to verify the > compatibility of Java 17 through CI > * > [https://github.com/scala/scala-parallel-collections/compare/v1.0.3...v1.0.4] > * https://github.com/scala/scala-parallel-collections/releases/tag/v1.0.4 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter
[ https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630711#comment-17630711 ] Deng Ziming commented on SPARK-41057: - Thank you [~amaliujia] , I'm willing to have a try.🤝 > Support other data type conversion in the DataTypeProtoConverter > > > Key: SPARK-41057 > URL: https://issues.apache.org/jira/browse/SPARK-41057 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > > In > https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34 > we only support INT, STRING and STRUCT type conversion to and from catalyst > and connect proto. > We should be able to support all the types defined by > https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter
[ https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630706#comment-17630706 ] Rui Wang edited comment on SPARK-41057 at 11/9/22 2:55 AM: --- [~dengziming] [~dengziming] Are you interested in this JIRA? was (Author: amaliujia): @dengziming Are you interested in this JIRA? > Support other data type conversion in the DataTypeProtoConverter > > > Key: SPARK-41057 > URL: https://issues.apache.org/jira/browse/SPARK-41057 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > > In > https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34 > we only support INT, STRING and STRUCT type conversion to and from catalyst > and connect proto. > We should be able to support all the types defined by > https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41058) Removing unused code in connect
[ https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41058: Assignee: (was: Apache Spark) > Removing unused code in connect > --- > > Key: SPARK-41058 > URL: https://issues.apache.org/jira/browse/SPARK-41058 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Deng Ziming >Priority: Minor > Fix For: 3.4.0 > > > There are some unused code in `connect` module, for example unused import in > commands.proto, and used code in SparkConnectStreamHandler.scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41058) Removing unused code in connect
[ https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41058: Assignee: Apache Spark > Removing unused code in connect > --- > > Key: SPARK-41058 > URL: https://issues.apache.org/jira/browse/SPARK-41058 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Deng Ziming >Assignee: Apache Spark >Priority: Minor > Fix For: 3.4.0 > > > There are some unused code in `connect` module, for example unused import in > commands.proto, and used code in SparkConnectStreamHandler.scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41058) Removing unused code in connect
[ https://issues.apache.org/jira/browse/SPARK-41058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630709#comment-17630709 ] Apache Spark commented on SPARK-41058: -- User 'dengziming' has created a pull request for this issue: https://github.com/apache/spark/pull/38491 > Removing unused code in connect > --- > > Key: SPARK-41058 > URL: https://issues.apache.org/jira/browse/SPARK-41058 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Deng Ziming >Priority: Minor > Fix For: 3.4.0 > > > There are some unused code in `connect` module, for example unused import in > commands.proto, and used code in SparkConnectStreamHandler.scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2
[ https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41056: Assignee: Apache Spark > Fix new R_LIBS_SITE behavior introduced in R 4.2 > > > Key: SPARK-41056 > URL: https://issues.apache.org/jira/browse/SPARK-41056 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > R 4.2 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] > "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/lib/R/site-library" > {code} > R 4.1 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2
[ https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630708#comment-17630708 ] Apache Spark commented on SPARK-41056: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/38570 > Fix new R_LIBS_SITE behavior introduced in R 4.2 > > > Key: SPARK-41056 > URL: https://issues.apache.org/jira/browse/SPARK-41056 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > R 4.2 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] > "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/lib/R/site-library" > {code} > R 4.1 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2
[ https://issues.apache.org/jira/browse/SPARK-41056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41056: Assignee: (was: Apache Spark) > Fix new R_LIBS_SITE behavior introduced in R 4.2 > > > Key: SPARK-41056 > URL: https://issues.apache.org/jira/browse/SPARK-41056 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > R 4.2 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] > "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/lib/R/site-library" > {code} > R 4.1 > {code} > # R > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} > {code} > # R --vanilla > > Sys.getenv("R_LIBS_SITE") > [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41058) Removing unused code in connect
Deng Ziming created SPARK-41058: --- Summary: Removing unused code in connect Key: SPARK-41058 URL: https://issues.apache.org/jira/browse/SPARK-41058 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Deng Ziming Fix For: 3.4.0 There are some unused code in `connect` module, for example unused import in commands.proto, and used code in SparkConnectStreamHandler.scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter
[ https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630706#comment-17630706 ] Rui Wang commented on SPARK-41057: -- @dengziming Are you interested in this JIRA? > Support other data type conversion in the DataTypeProtoConverter > > > Key: SPARK-41057 > URL: https://issues.apache.org/jira/browse/SPARK-41057 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > > In > https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34 > we only support INT, STRING and STRUCT type conversion to and from catalyst > and connect proto. > We should be able to support all the types defined by > https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter
[ https://issues.apache.org/jira/browse/SPARK-41057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated SPARK-41057: - Description: In https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34 we only support INT, STRING and STRUCT type conversion to and from catalyst and connect proto. We should be able to support all the types defined by https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto > Support other data type conversion in the DataTypeProtoConverter > > > Key: SPARK-41057 > URL: https://issues.apache.org/jira/browse/SPARK-41057 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > > In > https://github.com/apache/spark/blob/master/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/DataTypeProtoConverter.scala#L34 > we only support INT, STRING and STRUCT type conversion to and from catalyst > and connect proto. > We should be able to support all the types defined by > https://github.com/apache/spark/blob/master/connector/connect/src/main/protobuf/spark/connect/types.proto -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41057) Support other data type conversion in the DataTypeProtoConverter
Rui Wang created SPARK-41057: Summary: Support other data type conversion in the DataTypeProtoConverter Key: SPARK-41057 URL: https://issues.apache.org/jira/browse/SPARK-41057 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41056) Fix new R_LIBS_SITE behavior introduced in R 4.2
Hyukjin Kwon created SPARK-41056: Summary: Fix new R_LIBS_SITE behavior introduced in R 4.2 Key: SPARK-41056 URL: https://issues.apache.org/jira/browse/SPARK-41056 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 3.4.0 Reporter: Hyukjin Kwon R 4.2 {code} # R > Sys.getenv("R_LIBS_SITE") [1] "/usr/local/lib/R/site-library/:/usr/lib/R/site-library:/usr/lib/R/library'" {code} {code} # R --vanilla > Sys.getenv("R_LIBS_SITE") [1] "/usr/lib/R/site-library" {code} R 4.1 {code} # R > Sys.getenv("R_LIBS_SITE") [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" {code} {code} # R --vanilla > Sys.getenv("R_LIBS_SITE") [1] "/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
[ https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630699#comment-17630699 ] Apache Spark commented on SPARK-41055: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38569 > Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE > > > Key: SPARK-41055 > URL: https://issues.apache.org/jira/browse/SPARK-41055 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Make the legacy error class name properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
[ https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630698#comment-17630698 ] Apache Spark commented on SPARK-41055: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38569 > Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE > > > Key: SPARK-41055 > URL: https://issues.apache.org/jira/browse/SPARK-41055 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Make the legacy error class name properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
[ https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41055: Assignee: Apache Spark > Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE > > > Key: SPARK-41055 > URL: https://issues.apache.org/jira/browse/SPARK-41055 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > Make the legacy error class name properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
[ https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41055: Assignee: (was: Apache Spark) > Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE > > > Key: SPARK-41055 > URL: https://issues.apache.org/jira/browse/SPARK-41055 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Make the legacy error class name properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41046) Support CreateView in Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-41046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630696#comment-17630696 ] Deng Ziming commented on SPARK-41046: - [~amaliujia] Aha, I'll review your code when free. > Support CreateView in Connect DSL > - > > Key: SPARK-41046 > URL: https://issues.apache.org/jira/browse/SPARK-41046 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
[ https://issues.apache.org/jira/browse/SPARK-41055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630694#comment-17630694 ] Haejoon Lee commented on SPARK-41055: - I'm working on it > Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE > > > Key: SPARK-41055 > URL: https://issues.apache.org/jira/browse/SPARK-41055 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Make the legacy error class name properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41055) Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE
Haejoon Lee created SPARK-41055: --- Summary: Rename _LEGACY_ERROR_TEMP_2424 to GROUP_BY_AGGREGATE Key: SPARK-41055 URL: https://issues.apache.org/jira/browse/SPARK-41055 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Haejoon Lee Make the legacy error class name properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41035) Incorrect results or NPE when a literal is reused across distinct aggregations
[ https://issues.apache.org/jira/browse/SPARK-41035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41035: Assignee: Bruce Robbins > Incorrect results or NPE when a literal is reused across distinct aggregations > -- > > Key: SPARK-41035 > URL: https://issues.apache.org/jira/browse/SPARK-41035 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.2, 3.4.0, 3.3.1 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > > This query produces incorrect results: > {noformat} > select a, count(distinct 100) as cnt1, count(distinct b, 100) as cnt2 > from values (1, 2), (4, 5) as data(a, b) > group by a; > +---+++ > |a |cnt1|cnt2| > +---+++ > |1 |1 |0 | > |4 |1 |0 | > +---+++ > {noformat} > The values for {{cnt2}} should be 1 and 1 (not 0 and 0). > If you change the literal used in the first aggregate function, the second > aggregate function now works correctly: > {noformat} > select a, count(distinct 101) as cnt1, count(distinct b, 100) as cnt2 > from values (1, 2), (4, 5) as data(a, b) > group by a; > +---+++ > |a |cnt1|cnt2| > +---+++ > |1 |1 |1 | > |4 |1 |1 | > +---+++ > {noformat} > The same bug causes the following query to get a NullPointerException: > {noformat} > select a, count(distinct 1), count_min_sketch(distinct b, 0.5d, 0.5d, 1) > from values (1, 2), (4, 5) as data(a, b) > group by a; > {noformat} > If you change the literal used in the first aggregation, then the query > succeeds: > {noformat} > select a, count(distinct 2), count_min_sketch(distinct b, 0.5d, 0.5d, 1) > from values (1, 2), (4, 5) as data(a, b) > group by a; > +---+-+-+ > |a |count(DISTINCT 2)|count_min_sketch(DISTINCT b, 0.5, 0.5, 1) > > | > +---+-+-+ > |1 |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 > 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]| > |4 |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 > 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]| > +---+-+-+ > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41035) Incorrect results or NPE when a literal is reused across distinct aggregations
[ https://issues.apache.org/jira/browse/SPARK-41035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41035. -- Fix Version/s: 3.3.2 3.2.3 3.4.0 Resolution: Fixed Issue resolved by pull request 38565 [https://github.com/apache/spark/pull/38565] > Incorrect results or NPE when a literal is reused across distinct aggregations > -- > > Key: SPARK-41035 > URL: https://issues.apache.org/jira/browse/SPARK-41035 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.2, 3.4.0, 3.3.1 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > Fix For: 3.3.2, 3.2.3, 3.4.0 > > > This query produces incorrect results: > {noformat} > select a, count(distinct 100) as cnt1, count(distinct b, 100) as cnt2 > from values (1, 2), (4, 5) as data(a, b) > group by a; > +---+++ > |a |cnt1|cnt2| > +---+++ > |1 |1 |0 | > |4 |1 |0 | > +---+++ > {noformat} > The values for {{cnt2}} should be 1 and 1 (not 0 and 0). > If you change the literal used in the first aggregate function, the second > aggregate function now works correctly: > {noformat} > select a, count(distinct 101) as cnt1, count(distinct b, 100) as cnt2 > from values (1, 2), (4, 5) as data(a, b) > group by a; > +---+++ > |a |cnt1|cnt2| > +---+++ > |1 |1 |1 | > |4 |1 |1 | > +---+++ > {noformat} > The same bug causes the following query to get a NullPointerException: > {noformat} > select a, count(distinct 1), count_min_sketch(distinct b, 0.5d, 0.5d, 1) > from values (1, 2), (4, 5) as data(a, b) > group by a; > {noformat} > If you change the literal used in the first aggregation, then the query > succeeds: > {noformat} > select a, count(distinct 2), count_min_sketch(distinct b, 0.5d, 0.5d, 1) > from values (1, 2), (4, 5) as data(a, b) > group by a; > +---+-+-+ > |a |count(DISTINCT 2)|count_min_sketch(DISTINCT b, 0.5, 0.5, 1) > > | > +---+-+-+ > |1 |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 > 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]| > |4 |1|[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 > 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]| > +---+-+-+ > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40569) Add smoke test in standalone cluster for spark-docker
[ https://issues.apache.org/jira/browse/SPARK-40569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40569. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 21 [https://github.com/apache/spark-docker/pull/21] > Add smoke test in standalone cluster for spark-docker > - > > Key: SPARK-40569 > URL: https://issues.apache.org/jira/browse/SPARK-40569 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41051) Optimize ProcfsMetrics file acquisition
[ https://issues.apache.org/jira/browse/SPARK-41051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630685#comment-17630685 ] Apache Spark commented on SPARK-41051: -- User 'Narcasserun' has created a pull request for this issue: https://github.com/apache/spark/pull/38568 > Optimize ProcfsMetrics file acquisition > --- > > Key: SPARK-41051 > URL: https://issues.apache.org/jira/browse/SPARK-41051 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.1 > Environment: spark-master >Reporter: sur >Priority: Minor > Fix For: 3.3.2 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > When obtaining the Procfs file, variables are created but not used, and there > are duplicate codes. We should reduce such situations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40852) Implement `DataFrame.summary`
[ https://issues.apache.org/jira/browse/SPARK-40852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40852. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38318 [https://github.com/apache/spark/pull/38318] > Implement `DataFrame.summary` > - > > Key: SPARK-40852 > URL: https://issues.apache.org/jira/browse/SPARK-40852 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41047) round function with negative scale value runs failed
[ https://issues.apache.org/jira/browse/SPARK-41047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630676#comment-17630676 ] Hyukjin Kwon commented on SPARK-41047: -- cc [~wuyi] [~cloud_fan]. This seems from https://github.com/apache/spark/commit/ff39c9271ca04951b045c5d9fca2128a82d50b46 https://issues.apache.org/jira/browse/SPARK-30252 > round function with negative scale value runs failed > > > Key: SPARK-41047 > URL: https://issues.apache.org/jira/browse/SPARK-41047 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.2 >Reporter: liyang >Priority: Major > > Run this sql in spark-sql:select round(1.233, -1); > Error: org.apache.spark.sql.AnalysisException: Negative scale is not allowed: > -1. You can use spark.sql.legacy.allowNegativeScaleOfDecimal=true to enable > legacy mode to allow it.; > But the document of round function implies that negative scale argument is > allowed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41050) Upgrade scalafmt from 3.5.9 to 3.6.1
[ https://issues.apache.org/jira/browse/SPARK-41050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-41050: - Priority: Trivial (was: Minor) > Upgrade scalafmt from 3.5.9 to 3.6.1 > - > > Key: SPARK-41050 > URL: https://issues.apache.org/jira/browse/SPARK-41050 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Fix For: 3.4.0 > > > v3.6.1 release notes: > https://github.com/scalameta/scalafmt/releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org