[jira] [Updated] (SPARK-34882) RewriteDistinctAggregates can cause a bug if the aggregator does not ignore NULLs
[ https://issues.apache.org/jira/browse/SPARK-34882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-34882: - Affects Version/s: 3.0.3 3.1.2 2.4.8 > RewriteDistinctAggregates can cause a bug if the aggregator does not ignore > NULLs > - > > Key: SPARK-34882 > URL: https://issues.apache.org/jira/browse/SPARK-34882 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.8, 3.2.0, 3.1.2, 3.0.3 >Reporter: Tanel Kiis >Priority: Major > Labels: correctness > > {code:title=group-by.sql} > SELECT > first(DISTINCT a), last(DISTINCT a), > first(a), last(a), > first(DISTINCT b), last(DISTINCT b), > first(b), last(b) > FROM testData WHERE a IS NOT NULL AND b IS NOT NULL;{code} > {code:title=group-by.sql.out} > -- !query schema > struct a):int,first(a):int,last(a):int,first(DISTINCT b):int,last(DISTINCT > b):int,first(b):int,last(b):int> > -- !query output > NULL 1 1 3 1 NULL1 2 > {code} > The results should not be NULL, because NULL inputs are filtered out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34904) Old import of LZ4 package inside CompressionCodec.scala
Michal Zeman created SPARK-34904: Summary: Old import of LZ4 package inside CompressionCodec.scala Key: SPARK-34904 URL: https://issues.apache.org/jira/browse/SPARK-34904 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.1, 2.4.0 Reporter: Michal Zeman This commit should upgrade the version of the LZ4 package: [https://github.com/apache/spark/commit/b78cf13bf05f0eadd7ae97df84b6e1505dc5ff9f] The dependency was changed. However, inside the file [CompressionCodec.scala |https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala] the old import referencing net.jpountz.lz4 (where versions up to 1.3. are held) remains. Because of probably backward compatibility, the newer version of org.lz4 package still contains net.jpountz.lz4 ([https://github.com/lz4/lz4-java/tree/master/src/java/net/jpountz/lz4)]. Therefore this import does not cause problems at a first sight. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34900) Some `spark-submit` commands used to run benchmarks in the user's guide is wrong
[ https://issues.apache.org/jira/browse/SPARK-34900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311418#comment-17311418 ] Apache Spark commented on SPARK-34900: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/32002 > Some `spark-submit` commands used to run benchmarks in the user's guide is > wrong > - > > Key: SPARK-34900 > URL: https://issues.apache.org/jira/browse/SPARK-34900 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Trivial > Fix For: 3.2.0 > > > For example, the guide for running JoinBenchmark as follows: > > {code:java} > /** > * Benchmark to measure performance for joins. > * To run this benchmark: > * {{{ > * 1. without sbt: > * bin/spark-submit --class --jars > > * 2. build/sbt "sql/test:runMain " > * 3. generate result: > * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain class>" > * Results will be written to "benchmarks/JoinBenchmark-results.txt". > * }}} > */ > object JoinBenchmark extends SqlBasedBenchmark { > {code} > > > but if we run JoinBenchmark with commnad > > {code:java} > bin/spark-submit --class > org.apache.spark.sql.execution.benchmark.JoinBenchmark --jars > spark-core_2.12-3.2.0-SNAPSHOT-tests.jar > spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar > {code} > > The following exception will be thrown: > > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/spark/sql/catalyst/plans/SQLHelper > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:756) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369){code} > > because SqlBasedBenchmark trait extends BenchmarkBase and SQLHelper, > SQLHelper def in spark-catalyst-tests.jar. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34900) Some `spark-submit` commands used to run benchmarks in the user's guide is wrong
[ https://issues.apache.org/jira/browse/SPARK-34900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311423#comment-17311423 ] Apache Spark commented on SPARK-34900: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/32003 > Some `spark-submit` commands used to run benchmarks in the user's guide is > wrong > - > > Key: SPARK-34900 > URL: https://issues.apache.org/jira/browse/SPARK-34900 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Trivial > Fix For: 3.2.0 > > > For example, the guide for running JoinBenchmark as follows: > > {code:java} > /** > * Benchmark to measure performance for joins. > * To run this benchmark: > * {{{ > * 1. without sbt: > * bin/spark-submit --class --jars > > * 2. build/sbt "sql/test:runMain " > * 3. generate result: > * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain class>" > * Results will be written to "benchmarks/JoinBenchmark-results.txt". > * }}} > */ > object JoinBenchmark extends SqlBasedBenchmark { > {code} > > > but if we run JoinBenchmark with commnad > > {code:java} > bin/spark-submit --class > org.apache.spark.sql.execution.benchmark.JoinBenchmark --jars > spark-core_2.12-3.2.0-SNAPSHOT-tests.jar > spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar > {code} > > The following exception will be thrown: > > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/spark/sql/catalyst/plans/SQLHelper > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:756) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369){code} > > because SqlBasedBenchmark trait extends BenchmarkBase and SQLHelper, > SQLHelper def in spark-catalyst-tests.jar. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34900) Some `spark-submit` commands used to run benchmarks in the user's guide is wrong
[ https://issues.apache.org/jira/browse/SPARK-34900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311424#comment-17311424 ] Apache Spark commented on SPARK-34900: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/32003 > Some `spark-submit` commands used to run benchmarks in the user's guide is > wrong > - > > Key: SPARK-34900 > URL: https://issues.apache.org/jira/browse/SPARK-34900 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Trivial > Fix For: 3.2.0 > > > For example, the guide for running JoinBenchmark as follows: > > {code:java} > /** > * Benchmark to measure performance for joins. > * To run this benchmark: > * {{{ > * 1. without sbt: > * bin/spark-submit --class --jars > > * 2. build/sbt "sql/test:runMain " > * 3. generate result: > * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain class>" > * Results will be written to "benchmarks/JoinBenchmark-results.txt". > * }}} > */ > object JoinBenchmark extends SqlBasedBenchmark { > {code} > > > but if we run JoinBenchmark with commnad > > {code:java} > bin/spark-submit --class > org.apache.spark.sql.execution.benchmark.JoinBenchmark --jars > spark-core_2.12-3.2.0-SNAPSHOT-tests.jar > spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar > {code} > > The following exception will be thrown: > > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/spark/sql/catalyst/plans/SQLHelper > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:756) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369){code} > > because SqlBasedBenchmark trait extends BenchmarkBase and SQLHelper, > SQLHelper def in spark-catalyst-tests.jar. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34905) Enable ANSI intervals in SQLQueryTestSuite
Max Gekk created SPARK-34905: Summary: Enable ANSI intervals in SQLQueryTestSuite Key: SPARK-34905 URL: https://issues.apache.org/jira/browse/SPARK-34905 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 3.2.0 Reporter: Max Gekk Remove the following code from SQLQueryTestSuite: {code:java} localSparkSession.conf.set(SQLConf.LEGACY_INTERVAL_ENABLED.key, true) {code} and use the ANSI interval where it is possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34905) Enable ANSI intervals in SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-34905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-34905: - Description: Remove the following code from SQLQueryTestSuite: {code:java} localSparkSession.conf.set(SQLConf.LEGACY_INTERVAL_ENABLED.key, true) {code} and use the ANSI interval where it is possible. Probably, this depends on casting intervals to strings. was: Remove the following code from SQLQueryTestSuite: {code:java} localSparkSession.conf.set(SQLConf.LEGACY_INTERVAL_ENABLED.key, true) {code} and use the ANSI interval where it is possible. > Enable ANSI intervals in SQLQueryTestSuite > -- > > Key: SPARK-34905 > URL: https://issues.apache.org/jira/browse/SPARK-34905 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Remove the following code from SQLQueryTestSuite: > {code:java} > localSparkSession.conf.set(SQLConf.LEGACY_INTERVAL_ENABLED.key, true) > {code} > and use the ANSI interval where it is possible. > Probably, this depends on casting intervals to strings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33308) support CUBE(...) and ROLLUP(...), GROUPING SETS(...) as group by expr in parser level
[ https://issues.apache.org/jira/browse/SPARK-33308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33308: --- Assignee: angerszhu > support CUBE(...) and ROLLUP(...), GROUPING SETS(...) as group by expr in > parser level > -- > > Key: SPARK-33308 > URL: https://issues.apache.org/jira/browse/SPARK-33308 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > support CUBE(...) and ROLLUP(...), GROUPING SETS(...) as group by expr in > parser level -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34884) Improve dynamic partition pruning evaluation
[ https://issues.apache.org/jira/browse/SPARK-34884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34884: --- Assignee: Yuming Wang > Improve dynamic partition pruning evaluation > > > Key: SPARK-34884 > URL: https://issues.apache.org/jira/browse/SPARK-34884 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Fast fail if filtering side can not build broadcast by size. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34884) Improve dynamic partition pruning evaluation
[ https://issues.apache.org/jira/browse/SPARK-34884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34884. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31984 [https://github.com/apache/spark/pull/31984] > Improve dynamic partition pruning evaluation > > > Key: SPARK-34884 > URL: https://issues.apache.org/jira/browse/SPARK-34884 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > Fast fail if filtering side can not build broadcast by size. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33308) support CUBE(...) and ROLLUP(...), GROUPING SETS(...) as group by expr in parser level
[ https://issues.apache.org/jira/browse/SPARK-33308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33308. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30212 [https://github.com/apache/spark/pull/30212] > support CUBE(...) and ROLLUP(...), GROUPING SETS(...) as group by expr in > parser level > -- > > Key: SPARK-33308 > URL: https://issues.apache.org/jira/browse/SPARK-33308 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > support CUBE(...) and ROLLUP(...), GROUPING SETS(...) as group by expr in > parser level -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26404) set spark.pyspark.python or PYSPARK_PYTHON doesn't work in k8s client-cluster mode.
[ https://issues.apache.org/jira/browse/SPARK-26404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311505#comment-17311505 ] Vincenzo Eduardo Padulano commented on SPARK-26404: --- I found the same issue with this very basic setup: my default python version is Python 3 and I also have a Python 2 environment with pyspark installed. I've written this simple script base on the Pi estimation at [https://spark.apache.org/examples.html] : {code:python} import sys import random import pyspark confdict = {"spark.app.name": "spark_pi", "spark.master": "local[4]", "spark.pyspark.python": sys.executable} sparkconf = pyspark.SparkConf().setAll(confdict.items()) sparkcontext = pyspark.SparkContext(conf=sparkconf) def inside(p): x, y = random.random(), random.random() return x * x + y * y < 1 num_samples = 1e4 num_partitions = 256 count = sparkcontext.parallelize(range(int(num_samples)), num_partitions).filter(inside).count() print("Pi is roughly %.4f" % (4.0 * count / num_samples)) {code} If I run the script with my default python executable, all good {code:bash} $: python spark_pi.py 21/03/30 14:55:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Pi is roughly 3.1292 {code} But when I use Python 2 it doesn't pick the `spark.pyspark.python` configuration option I asked in my `SparkConf` object {code:bash} $: python2 spark_pi.py 21/03/30 14:56:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). /home/vpadulan/.local/lib/python2.7/site-packages/pyspark/context.py:227: DeprecationWarning: Support for Python 2 and Python 3 prior to version 3.6 is deprecated as of Spark 3.0. See also the plan for dropping Python 2 support at https://spark.apache.org/news/plan-for-dropping-python-2-support.html. DeprecationWarning) 21/03/30 14:57:03 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)256] org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/vpadulan/.local/lib/python2.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 473, in main raise Exception(("Python in worker has different version %s than that in " + Exception: Python in worker has different version 3.8 than that in driver 2.7, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set. {code} I can't understand how this issue was resolved in the first place, on the [official documentation|https://spark.apache.org/docs/latest/configuration.html#environment-variables] it is stated that `spark.pyspark.python` should take precedence over `PYSPARK_PYTHON`: {noformat} Python binary executable to use for PySpark in both driver and workers (default is python3 if available, otherwise python). Property spark.pyspark.python take precedence if it is set {noformat} > set spark.pyspark.python or PYSPARK_PYTHON doesn't work in k8s client-cluster > mode. > --- > > Key: SPARK-26404 > URL: https://issues.apache.org/jira/browse/SPARK-26404 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 2.4.0 >Reporter: Dongqing Liu >Priority: Major > > Neither > conf.set("spark.executorEnv.PYSPARK_PYTHON", "/opt/pythonenvs/bin/python") > nor > conf.set("spark.pyspark.python", "/opt/pythonenvs/bin/python") > works. > Looks like the executor always picks python from PATH. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34856) ANSI mode: Allow casting complex types as string type
[ https://issues.apache.org/jira/browse/SPARK-34856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311506#comment-17311506 ] Apache Spark commented on SPARK-34856: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32004 > ANSI mode: Allow casting complex types as string type > - > > Key: SPARK-34856 > URL: https://issues.apache.org/jira/browse/SPARK-34856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Currently, complex types are not allowed to cast as string type. This breaks > the Dataset.show() API. E.g > {code:java} > scala> sql(“select array(1, 2, 2)“).show(false) > org.apache.spark.sql.AnalysisException: cannot resolve ‘CAST(`array(1, 2, 2)` > AS STRING)’ due to data type mismatch: > cannot cast array to string with ANSI mode on. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34856) ANSI mode: Allow casting complex types as string type
[ https://issues.apache.org/jira/browse/SPARK-34856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311508#comment-17311508 ] Apache Spark commented on SPARK-34856: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32004 > ANSI mode: Allow casting complex types as string type > - > > Key: SPARK-34856 > URL: https://issues.apache.org/jira/browse/SPARK-34856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Currently, complex types are not allowed to cast as string type. This breaks > the Dataset.show() API. E.g > {code:java} > scala> sql(“select array(1, 2, 2)“).show(false) > org.apache.spark.sql.AnalysisException: cannot resolve ‘CAST(`array(1, 2, 2)` > AS STRING)’ due to data type mismatch: > cannot cast array to string with ANSI mode on. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} * This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of
[jira] [Created] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
Ali Afroozeh created SPARK-34906: Summary: Refactor TreeNode's children handling methods into specialized traits Key: SPARK-34906 URL: https://issues.apache.org/jira/browse/SPARK-34906 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Ali Afroozeh Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} * This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling
[jira] [Commented] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311535#comment-17311535 ] Apache Spark commented on SPARK-34906: -- User 'dbaliafroozeh' has created a pull request for this issue: https://github.com/apache/spark/pull/31932 > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of children, for example > `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an > expression, a logical plan and a physical plan with only one child, > respectively. This PR refactors the `TreeNode` hierarchy by extracting the > children handling functionality into the following traits. The former nodes > such as `UnaryExpression` now extend the corresponding new trait: > {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ override final def children: Seq[T] = Nil}} > {{}}} > {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def child: T}} > {{ @transient override final lazy val children: Seq[T] = child :: Nil}} > {{}}} > {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def left: T}} > {{ def right: T}} > {{ @transient override final lazy val children: Seq[T] = left :: right :: > Nil}} > {{}}} > {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def first: T}} > {{ def second: T}} > {{ def third: T}} > {{ @transient override final lazy val children: Seq[T] = first :: second :: > third :: Nil}} > {{}}} > > This refactoring, which is part of a bigger effort to make tree > transformations in Spark more efficient, has two benefits: > * It moves the children handling to a single place, instead of being spread > in specific subclasses, which will help the future optimizations for tree > traversals. > * It allows to mix in these traits with some concrete node types that could > not extend the previous classes. For example, expressions with one child that > extend `AggregateFunction` cannot extend `UnaryExpression` as > `AggregateFunction` defines the `foldable` method final while > `UnaryExpression` defines it as non final. With the new traits, we can > directly extend the concrete class from `UnaryLike` in these cases. Classes > with more specific child handling will make tree traversal methods faster. > In this PR we have also updated many concrete node types to extend these > traits to benefit from more specific child handling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34906: Assignee: (was: Apache Spark) > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of children, for example > `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an > expression, a logical plan and a physical plan with only one child, > respectively. This PR refactors the `TreeNode` hierarchy by extracting the > children handling functionality into the following traits. The former nodes > such as `UnaryExpression` now extend the corresponding new trait: > {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ override final def children: Seq[T] = Nil}} > {{}}} > {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def child: T}} > {{ @transient override final lazy val children: Seq[T] = child :: Nil}} > {{}}} > {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def left: T}} > {{ def right: T}} > {{ @transient override final lazy val children: Seq[T] = left :: right :: > Nil}} > {{}}} > {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def first: T}} > {{ def second: T}} > {{ def third: T}} > {{ @transient override final lazy val children: Seq[T] = first :: second :: > third :: Nil}} > {{}}} > > This refactoring, which is part of a bigger effort to make tree > transformations in Spark more efficient, has two benefits: > * It moves the children handling to a single place, instead of being spread > in specific subclasses, which will help the future optimizations for tree > traversals. > * It allows to mix in these traits with some concrete node types that could > not extend the previous classes. For example, expressions with one child that > extend `AggregateFunction` cannot extend `UnaryExpression` as > `AggregateFunction` defines the `foldable` method final while > `UnaryExpression` defines it as non final. With the new traits, we can > directly extend the concrete class from `UnaryLike` in these cases. Classes > with more specific child handling will make tree traversal methods faster. > In this PR we have also updated many concrete node types to extend these > traits to benefit from more specific child handling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34906: Assignee: Apache Spark > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Assignee: Apache Spark >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of children, for example > `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an > expression, a logical plan and a physical plan with only one child, > respectively. This PR refactors the `TreeNode` hierarchy by extracting the > children handling functionality into the following traits. The former nodes > such as `UnaryExpression` now extend the corresponding new trait: > {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ override final def children: Seq[T] = Nil}} > {{}}} > {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def child: T}} > {{ @transient override final lazy val children: Seq[T] = child :: Nil}} > {{}}} > {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def left: T}} > {{ def right: T}} > {{ @transient override final lazy val children: Seq[T] = left :: right :: > Nil}} > {{}}} > {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def first: T}} > {{ def second: T}} > {{ def third: T}} > {{ @transient override final lazy val children: Seq[T] = first :: second :: > third :: Nil}} > {{}}} > > This refactoring, which is part of a bigger effort to make tree > transformations in Spark more efficient, has two benefits: > * It moves the children handling to a single place, instead of being spread > in specific subclasses, which will help the future optimizations for tree > traversals. > * It allows to mix in these traits with some concrete node types that could > not extend the previous classes. For example, expressions with one child that > extend `AggregateFunction` cannot extend `UnaryExpression` as > `AggregateFunction` defines the `foldable` method final while > `UnaryExpression` defines it as non final. With the new traits, we can > directly extend the concrete class from `UnaryLike` in these cases. Classes > with more specific child handling will make tree traversal methods faster. > In this PR we have also updated many concrete node types to extend these > traits to benefit from more specific child handling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as `UnaryExpression` now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the TreeNode hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of childr
[jira] [Assigned] (SPARK-34899) Use origin plan if can not coalesce shuffle partition
[ https://issues.apache.org/jira/browse/SPARK-34899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34899: --- Assignee: ulysses you > Use origin plan if can not coalesce shuffle partition > - > > Key: SPARK-34899 > URL: https://issues.apache.org/jira/browse/SPARK-34899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Minor > > The `CoalesceShufflePartitions` can not coalesce such case if the total > shuffle partitions size of mappers are big enough. Then it's confused to use > `CustomShuffleReaderExec` which marked as `coalesced` but has no affect with > partition number. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34899) Use origin plan if can not coalesce shuffle partition
[ https://issues.apache.org/jira/browse/SPARK-34899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34899. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31994 [https://github.com/apache/spark/pull/31994] > Use origin plan if can not coalesce shuffle partition > - > > Key: SPARK-34899 > URL: https://issues.apache.org/jira/browse/SPARK-34899 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Minor > Fix For: 3.2.0 > > > The `CoalesceShufflePartitions` can not coalesce such case if the total > shuffle partitions size of mappers are big enough. Then it's confused to use > `CustomShuffleReaderExec` which marked as `coalesced` but has no affect with > partition number. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ali Afroozeh updated SPARK-34906: - Description: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the TreeNode hierarchy by extracting the children handling functionality into the following traits. UnaryExpression` and other similar classes now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. was: Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example UnaryExpression, UnaryNode and UnaryExec for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the TreeNode hierarchy by extracting the children handling functionality into the following traits. The former nodes such as UnaryExpression now extend the corresponding new trait: {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ override final def children: Seq[T] = Nil}} {{}}} {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def child: T}} {{ @transient override final lazy val children: Seq[T] = child :: Nil}} {{}}} {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def left: T}} {{ def right: T}} {{ @transient override final lazy val children: Seq[T] = left :: right :: Nil}} {{}}} {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} {{ def first: T}} {{ def second: T}} {{ def third: T}} {{ @transient override final lazy val children: Seq[T] = first :: second :: third :: Nil}} {{}}} This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits: * It moves the children handling to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals. * It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend AggregateFunction cannot extend UnaryExpression as AggregateFunction defines the foldable method final while UnaryExpression defines it as non final. With the new traits, we can directly extend the concrete class from UnaryLike in these cases. Classes with more specific child handling will make tree traversal methods faster. In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling. > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Priority: Major > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of childr
[jira] [Commented] (SPARK-34668) Support casting of day-time intervals to strings
[ https://issues.apache.org/jira/browse/SPARK-34668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311544#comment-17311544 ] angerszhu commented on SPARK-34668: --- [~maxgekk] Should we support cast String to DayTimeIntervalType too. > Support casting of day-time intervals to strings > > > Key: SPARK-34668 > URL: https://issues.apache.org/jira/browse/SPARK-34668 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Extend the Cast expression and support DayTimeIntervalType in casting to > StringType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34668) Support casting of day-time intervals to strings
[ https://issues.apache.org/jira/browse/SPARK-34668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311544#comment-17311544 ] angerszhu edited comment on SPARK-34668 at 3/30/21, 2:15 PM: - [~maxgekk] Should we support cast String to DayTimeIntervalType too ? was (Author: angerszhuuu): [~maxgekk] Should we support cast String to DayTimeIntervalType too. > Support casting of day-time intervals to strings > > > Key: SPARK-34668 > URL: https://issues.apache.org/jira/browse/SPARK-34668 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Extend the Cast expression and support DayTimeIntervalType in casting to > StringType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32027) EventLoggingListener threw java.util.ConcurrentModificationException
[ https://issues.apache.org/jira/browse/SPARK-32027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311545#comment-17311545 ] Kristopher Kane commented on SPARK-32027: - Possibly related and fixed with https://issues.apache.org/jira/browse/SPARK-34731 > EventLoggingListener threw java.util.ConcurrentModificationException > - > > Key: SPARK-32027 > URL: https://issues.apache.org/jira/browse/SPARK-32027 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > 20/06/18 20:22:25 ERROR AsyncEventQueue: Listener EventLoggingListener threw > an exception > java.util.ConcurrentModificationException > at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:424) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:420) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at org.apache.spark.util.JsonProtocol$.mapToJson(JsonProtocol.scala:568) > at > org.apache.spark.util.JsonProtocol$.$anonfun$propertiesToJson$1(JsonProtocol.scala:574) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.util.JsonProtocol$.propertiesToJson(JsonProtocol.scala:573) > at > org.apache.spark.util.JsonProtocol$.jobStartToJson(JsonProtocol.scala:159) > at > org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:81) > at > org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:97) > at > org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:159) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) > at > scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) > at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1319) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) > 20/06/18 20:22:25 ERROR AsyncEventQueue: Listener EventLoggingListener threw > an exception > java.util.ConcurrentModificationException > at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:424) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:420) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at org.apache.spark.util.JsonProtocol$.mapToJson(JsonPr
[jira] [Updated] (SPARK-34900) Some `spark-submit` commands used to run benchmarks in the user's guide is wrong
[ https://issues.apache.org/jira/browse/SPARK-34900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34900: - Fix Version/s: 3.0.3 3.1.2 > Some `spark-submit` commands used to run benchmarks in the user's guide is > wrong > - > > Key: SPARK-34900 > URL: https://issues.apache.org/jira/browse/SPARK-34900 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Trivial > Fix For: 3.2.0, 3.1.2, 3.0.3 > > > For example, the guide for running JoinBenchmark as follows: > > {code:java} > /** > * Benchmark to measure performance for joins. > * To run this benchmark: > * {{{ > * 1. without sbt: > * bin/spark-submit --class --jars > > * 2. build/sbt "sql/test:runMain " > * 3. generate result: > * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain class>" > * Results will be written to "benchmarks/JoinBenchmark-results.txt". > * }}} > */ > object JoinBenchmark extends SqlBasedBenchmark { > {code} > > > but if we run JoinBenchmark with commnad > > {code:java} > bin/spark-submit --class > org.apache.spark.sql.execution.benchmark.JoinBenchmark --jars > spark-core_2.12-3.2.0-SNAPSHOT-tests.jar > spark-sql_2.12-3.2.0-SNAPSHOT-tests.jar > {code} > > The following exception will be thrown: > > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/spark/sql/catalyst/plans/SQLHelper > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:756) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369){code} > > because SqlBasedBenchmark trait extends BenchmarkBase and SQLHelper, > SQLHelper def in spark-catalyst-tests.jar. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34828) YARN Shuffle Service: Support configurability of aux service name and service-specific config overrides
[ https://issues.apache.org/jira/browse/SPARK-34828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-34828. --- Fix Version/s: 3.2.0 Assignee: Erik Krogen Resolution: Fixed > YARN Shuffle Service: Support configurability of aux service name and > service-specific config overrides > --- > > Key: SPARK-34828 > URL: https://issues.apache.org/jira/browse/SPARK-34828 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.2.0 > > > In some cases it may be desirable to run multiple instances of the Spark > Shuffle Service which are using different versions of Spark. This can be > helpful, for example, when running a YARN cluster with a mixed workload of > applications running multiple Spark versions, since a given version of the > shuffle service is not always compatible with other versions of Spark. (See > SPARK-27780 for more detail on this) > YARN versions since 2.9.0 support the ability to run shuffle services within > an isolated classloader (see YARN-4577), meaning multiple Spark versions can > coexist within a single NodeManager. > To support this from the Spark side, we need to make two enhancements: > * Make the name of the shuffle service configurable. Currently it is > hard-coded to be {{spark_shuffle}} on both the client and server side. The > server-side name is not actually used anywhere, as it is the value within the > {{yarn.nodemanager.aux-services}} which is considered by the NodeManager to > be definitive name. However, if you change this in the configs, the > hard-coded name within the client will no longer match. So, this needs to be > configurable. > * Add a way to separately configure the two shuffle service instances. Since > the configurations such as the port number are taken from the NodeManager > config, they will both try to use the same port, which obviously won't work. > So, we need to provide a way to selectively configure the two shuffle service > instances. I will go into details on my proposal for how to achieve this > within the PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34907) Add main class that runs all benchmarks
Hyukjin Kwon created SPARK-34907: Summary: Add main class that runs all benchmarks Key: SPARK-34907 URL: https://issues.apache.org/jira/browse/SPARK-34907 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.2.0 Reporter: Hyukjin Kwon This is related to SPARK-31471. It should be good if we can have an automatic way to do it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31471) Add a script to run multiple benchmarks
[ https://issues.apache.org/jira/browse/SPARK-31471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31471. -- Resolution: Duplicate > Add a script to run multiple benchmarks > --- > > Key: SPARK-31471 > URL: https://issues.apache.org/jira/browse/SPARK-31471 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Max Gekk >Priority: Minor > > Add a python script to run multiple benchmarks. The script can be taken from > [https://github.com/apache/spark/pull/27078] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34907) Add main class that runs all benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311643#comment-17311643 ] Apache Spark commented on SPARK-34907: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/32005 > Add main class that runs all benchmarks > --- > > Key: SPARK-34907 > URL: https://issues.apache.org/jira/browse/SPARK-34907 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > This is related to SPARK-31471. It should be good if we can have an automatic > way to do it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34907) Add main class that runs all benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34907: Assignee: (was: Apache Spark) > Add main class that runs all benchmarks > --- > > Key: SPARK-34907 > URL: https://issues.apache.org/jira/browse/SPARK-34907 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > This is related to SPARK-31471. It should be good if we can have an automatic > way to do it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34907) Add main class that runs all benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34907: Assignee: Apache Spark > Add main class that runs all benchmarks > --- > > Key: SPARK-34907 > URL: https://issues.apache.org/jira/browse/SPARK-34907 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > This is related to SPARK-31471. It should be good if we can have an automatic > way to do it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34908) Add test cases for char and varchar with functions
Kent Yao created SPARK-34908: Summary: Add test cases for char and varchar with functions Key: SPARK-34908 URL: https://issues.apache.org/jira/browse/SPARK-34908 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Kent Yao Add test cases for char and varchar with functions to show the behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34908) Add test cases for char and varchar with functions
[ https://issues.apache.org/jira/browse/SPARK-34908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-34908: - Priority: Minor (was: Major) > Add test cases for char and varchar with functions > -- > > Key: SPARK-34908 > URL: https://issues.apache.org/jira/browse/SPARK-34908 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > Add test cases for char and varchar with functions to show the behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34909) conv() does not convert negative inputs to unsigned correctly
Tim Armstrong created SPARK-34909: - Summary: conv() does not convert negative inputs to unsigned correctly Key: SPARK-34909 URL: https://issues.apache.org/jira/browse/SPARK-34909 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Tim Armstrong {noformat} scala> spark.sql("select conv('-10', 11, 7)").show(20, 150) +---+ | conv(-10, 11, 7)| +---+ |4501202152252313413456| +---+ scala> spark.sql("select hex(conv('-10', 11, 7))").show(20, 150) +--+ | hex(conv(-10, 11, 7))| +--+ |3435303132303231353232353233313334313334353600| +--+ {noformat} The correct result is 45012021522523134134555. The above output has an incorrect second-to-last digit (6 instead of 5) and the last digit is a non-printing character the null byte. I tracked the bug down to NumberConverter.unsignedLongDiv returning incorrect results. I tried replacing with java.lang.Long.divideUnsigned and that fixed it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34909) conv() does not convert negative inputs to unsigned correctly
[ https://issues.apache.org/jira/browse/SPARK-34909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34909: Assignee: Apache Spark > conv() does not convert negative inputs to unsigned correctly > - > > Key: SPARK-34909 > URL: https://issues.apache.org/jira/browse/SPARK-34909 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tim Armstrong >Assignee: Apache Spark >Priority: Major > > {noformat} > scala> spark.sql("select conv('-10', 11, 7)").show(20, 150) > +---+ > | conv(-10, 11, 7)| > +---+ > |4501202152252313413456| > +---+ > scala> spark.sql("select hex(conv('-10', 11, 7))").show(20, 150) > +--+ > | hex(conv(-10, 11, 7))| > +--+ > |3435303132303231353232353233313334313334353600| > +--+ > {noformat} > The correct result is 45012021522523134134555. The above output has an > incorrect second-to-last digit (6 instead of 5) and the last digit is a > non-printing character the null byte. > I tracked the bug down to NumberConverter.unsignedLongDiv returning incorrect > results. I tried replacing with java.lang.Long.divideUnsigned and that fixed > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34909) conv() does not convert negative inputs to unsigned correctly
[ https://issues.apache.org/jira/browse/SPARK-34909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311699#comment-17311699 ] Apache Spark commented on SPARK-34909: -- User 'timarmstrong' has created a pull request for this issue: https://github.com/apache/spark/pull/32006 > conv() does not convert negative inputs to unsigned correctly > - > > Key: SPARK-34909 > URL: https://issues.apache.org/jira/browse/SPARK-34909 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tim Armstrong >Priority: Major > > {noformat} > scala> spark.sql("select conv('-10', 11, 7)").show(20, 150) > +---+ > | conv(-10, 11, 7)| > +---+ > |4501202152252313413456| > +---+ > scala> spark.sql("select hex(conv('-10', 11, 7))").show(20, 150) > +--+ > | hex(conv(-10, 11, 7))| > +--+ > |3435303132303231353232353233313334313334353600| > +--+ > {noformat} > The correct result is 45012021522523134134555. The above output has an > incorrect second-to-last digit (6 instead of 5) and the last digit is a > non-printing character the null byte. > I tracked the bug down to NumberConverter.unsignedLongDiv returning incorrect > results. I tried replacing with java.lang.Long.divideUnsigned and that fixed > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34909) conv() does not convert negative inputs to unsigned correctly
[ https://issues.apache.org/jira/browse/SPARK-34909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34909: Assignee: (was: Apache Spark) > conv() does not convert negative inputs to unsigned correctly > - > > Key: SPARK-34909 > URL: https://issues.apache.org/jira/browse/SPARK-34909 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tim Armstrong >Priority: Major > > {noformat} > scala> spark.sql("select conv('-10', 11, 7)").show(20, 150) > +---+ > | conv(-10, 11, 7)| > +---+ > |4501202152252313413456| > +---+ > scala> spark.sql("select hex(conv('-10', 11, 7))").show(20, 150) > +--+ > | hex(conv(-10, 11, 7))| > +--+ > |3435303132303231353232353233313334313334353600| > +--+ > {noformat} > The correct result is 45012021522523134134555. The above output has an > incorrect second-to-last digit (6 instead of 5) and the last digit is a > non-printing character the null byte. > I tracked the bug down to NumberConverter.unsignedLongDiv returning incorrect > results. I tried replacing with java.lang.Long.divideUnsigned and that fixed > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34909) conv() does not convert negative inputs to unsigned correctly
[ https://issues.apache.org/jira/browse/SPARK-34909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311700#comment-17311700 ] Apache Spark commented on SPARK-34909: -- User 'timarmstrong' has created a pull request for this issue: https://github.com/apache/spark/pull/32006 > conv() does not convert negative inputs to unsigned correctly > - > > Key: SPARK-34909 > URL: https://issues.apache.org/jira/browse/SPARK-34909 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tim Armstrong >Priority: Major > > {noformat} > scala> spark.sql("select conv('-10', 11, 7)").show(20, 150) > +---+ > | conv(-10, 11, 7)| > +---+ > |4501202152252313413456| > +---+ > scala> spark.sql("select hex(conv('-10', 11, 7))").show(20, 150) > +--+ > | hex(conv(-10, 11, 7))| > +--+ > |3435303132303231353232353233313334313334353600| > +--+ > {noformat} > The correct result is 45012021522523134134555. The above output has an > incorrect second-to-last digit (6 instead of 5) and the last digit is a > non-printing character the null byte. > I tracked the bug down to NumberConverter.unsignedLongDiv returning incorrect > results. I tried replacing with java.lang.Long.divideUnsigned and that fixed > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34906) Refactor TreeNode's children handling methods into specialized traits
[ https://issues.apache.org/jira/browse/SPARK-34906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-34906. --- Fix Version/s: 3.2.0 Assignee: Ali Afroozeh Resolution: Fixed > Refactor TreeNode's children handling methods into specialized traits > - > > Key: SPARK-34906 > URL: https://issues.apache.org/jira/browse/SPARK-34906 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Ali Afroozeh >Assignee: Ali Afroozeh >Priority: Major > Fix For: 3.2.0 > > > Spark query plan node hierarchy has specialized traits (or abstract classes) > for handling nodes with fixed number of children, for example > UnaryExpression, UnaryNode and UnaryExec for representing an expression, a > logical plan and a physical plan with only one child, respectively. This PR > refactors the TreeNode hierarchy by extracting the children handling > functionality into the following traits. UnaryExpression` and other similar > classes now extend the corresponding new trait: > {{trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ override final def children: Seq[T] = Nil}} > {{}}} > {{trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def child: T}} > {{ @transient override final lazy val children: Seq[T] = child :: Nil}} > {{}}} > {{trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def left: T}} > {{ def right: T}} > {{ @transient override final lazy val children: Seq[T] = left :: right :: > Nil}} > {{}}} > {{trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>}} > {{ def first: T}} > {{ def second: T}} > {{ def third: T}} > {{ @transient override final lazy val children: Seq[T] = first :: second :: > third :: Nil}} > {{}}} > > This refactoring, which is part of a bigger effort to make tree > transformations in Spark more efficient, has two benefits: > * It moves the children handling to a single place, instead of being spread > in specific subclasses, which will help the future optimizations for tree > traversals. > * It allows to mix in these traits with some concrete node types that could > not extend the previous classes. For example, expressions with one child that > extend AggregateFunction cannot extend UnaryExpression as AggregateFunction > defines the foldable method final while UnaryExpression defines it as non > final. With the new traits, we can directly extend the concrete class from > UnaryLike in these cases. Classes with more specific child handling will make > tree traversal methods faster. > In this PR we have also updated many concrete node types to extend these > traits to benefit from more specific child handling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23977) Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism
[ https://issues.apache.org/jira/browse/SPARK-23977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311818#comment-17311818 ] Daniel Zhi commented on SPARK-23977: [~ste...@apache.org] Thanks for the info. Below are the related (key, value) we used: # spark.hadoop.fs.s3a.committer.name --- partitioned # spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a --- org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory # spark.sql.sources.commitProtocolClass --- org.apache.spark.internal.io.cloud.PathOutputCommitProtocol # spark.sql.parquet.output.committer.class --- org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter 3 & 4 appear to be necessary to ensure S3A committers being used by Spark for parquet outputs, except that "INSERT OVERWRITE" is blocked by the dynamicPartitionOverwrite exception. It will be helpful and appreciated if you can patiently elaborate on the proper way to "use the partitioned committer and configure it to do the right thing ..." in Spark. > Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism > --- > > Key: SPARK-23977 > URL: https://issues.apache.org/jira/browse/SPARK-23977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.0.0 > > > Hadoop 3.1 adds a mechanism for job-specific and store-specific committers > (MAPREDUCE-6823, MAPREDUCE-6956), and one key implementation, S3A committers, > HADOOP-13786 > These committers deliver high-performance output of MR and spark jobs to S3, > and offer the key semantics which Spark depends on: no visible output until > job commit, a failure of a task at an stage, including partway through task > commit, can be handled by executing and committing another task attempt. > In contrast, the FileOutputFormat commit algorithms on S3 have issues: > * Awful performance because files are copied by rename > * FileOutputFormat v1: weak task commit failure recovery semantics as the > (v1) expectation: "directory renames are atomic" doesn't hold. > * S3 metadata eventual consistency can cause rename to miss files or fail > entirely (SPARK-15849) > Note also that FileOutputFormat "v2" commit algorithm doesn't offer any of > the commit semantics w.r.t observability of or recovery from task commit > failure, on any filesystem. > The S3A committers address these by way of uploading all data to the > destination through multipart uploads, uploads which are only completed in > job commit. > The new {{PathOutputCommitter}} factory mechanism allows applications to work > with the S3A committers and any other, by adding a plugin mechanism into the > MRv2 FileOutputFormat class, where it job config and filesystem configuration > options can dynamically choose the output committer. > Spark can use these with some binding classes to > # Add a subclass of {{HadoopMapReduceCommitProtocol}} which uses the MRv2 > classes and {{PathOutputCommitterFactory}} to create the committers. > # Add a {{BindingParquetOutputCommitter extends ParquetOutputCommitter}} > to wire up Parquet output even when code requires the committer to be a > subclass of {{ParquetOutputCommitter}} > This patch builds on SPARK-23807 for setting up the dependencies. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34910) Add an option for different stride orders
Jason Yarbrough created SPARK-34910: --- Summary: Add an option for different stride orders Key: SPARK-34910 URL: https://issues.apache.org/jira/browse/SPARK-34910 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Jason Yarbrough Currently, the JDBCRelation columnPartition function orders the strides in ascending order, starting from the lower bound and working its way towards the upper bound. I'm proposing leaving that as the default, but adding an option (such as strideOrder) in JDBCOptions. Since it will default to the current behavior, this will keep people's current code working as expected. However, people who may have data skew closer to the upper bound might appreciate being able to have the strides in descending order, thus filling up the first partition with the last stride and so forth. Also, people with nondeterministic data skew or sporadic data density might be able to benefit from a random ordering of the strides. I have the code created to implement this, and it creates a pattern that can be used to add other algorithms that people may want to add (such as counting the rows and ranking each stride, and then ordering from most dense to least). The current two options I have coded is 'descending' and 'random.' The original idea was to create something closer to Spark's hash partitioner, but for JDBC and pushed down to the database engine for efficiency. However, that would require adding hashing algorithms for each dialect, and the performance from those algorithms may outweigh the benefit. The method I'm proposing in this ticket avoids those complexities while still giving some of the benefit (in the case of random ordering). I'll put a PR in if others feel this is a good idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34910) JDBC - Add an option for different stride orders
[ https://issues.apache.org/jira/browse/SPARK-34910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Yarbrough updated SPARK-34910: Summary: JDBC - Add an option for different stride orders (was: Add an option for different stride orders) > JDBC - Add an option for different stride orders > > > Key: SPARK-34910 > URL: https://issues.apache.org/jira/browse/SPARK-34910 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Jason Yarbrough >Priority: Trivial > > Currently, the JDBCRelation columnPartition function orders the strides in > ascending order, starting from the lower bound and working its way towards > the upper bound. > I'm proposing leaving that as the default, but adding an option (such as > strideOrder) in JDBCOptions. Since it will default to the current behavior, > this will keep people's current code working as expected. However, people who > may have data skew closer to the upper bound might appreciate being able to > have the strides in descending order, thus filling up the first partition > with the last stride and so forth. Also, people with nondeterministic data > skew or sporadic data density might be able to benefit from a random ordering > of the strides. > I have the code created to implement this, and it creates a pattern that can > be used to add other algorithms that people may want to add (such as counting > the rows and ranking each stride, and then ordering from most dense to > least). The current two options I have coded is 'descending' and 'random.' > The original idea was to create something closer to Spark's hash partitioner, > but for JDBC and pushed down to the database engine for efficiency. However, > that would require adding hashing algorithms for each dialect, and the > performance from those algorithms may outweigh the benefit. The method I'm > proposing in this ticket avoids those complexities while still giving some of > the benefit (in the case of random ordering). > I'll put a PR in if others feel this is a good idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34464) `first` function is sorting the dataset while sometimes it is used to get "any value"
[ https://issues.apache.org/jira/browse/SPARK-34464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311845#comment-17311845 ] Pablo Langa Blanco commented on SPARK-34464: Hi [~lfruleux], Here it's a link that explain very good the reasons when the different types of aggregations are applied. [https://www.waitingforcode.com/apache-spark-sql/aggregations-execution-apache-spark-sql/read] In the case you expose there are two things that make the aggregation fallback in a SortAggregate. The first is that the types of the aggregation are not primitive mutable types (necessary for HashAggregate). The first fallback is ObjectHashAggregate, but in this case first function is not supported by ObjectHashAggregate because it's not a TypedImperativeAggregate, so it fallback to SorteAggregate. I don't know if this has any reason, I'm going to take a look if it's possible to TypedImperativeAggregate to fallback to ObjectHashAggregate. Thanks! > `first` function is sorting the dataset while sometimes it is used to get > "any value" > - > > Key: SPARK-34464 > URL: https://issues.apache.org/jira/browse/SPARK-34464 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Louis Fruleux >Priority: Minor > > When one wants to groupBy and take any value (not necessarily the first), one > usually uses > [first|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L485] > aggregation function. > Unfortunately, this method uses a `SortAggregate` for some data types, which > is not always necessary and might impact performances. Is this the desired > behavior? > > > {code:java} > Current behavior: > val df = Seq((0, "value")).toDF("key", "value") > df.groupBy("key").agg(first("value")).explain() > /* > == Physical Plan == > SortAggregate(key=key#342, functions=first(value#343, false)) > +- *(2) Sort key#342 ASC NULLS FIRST, false, 0 > +- Exchange hashpartitioning(key#342, 200) > +- SortAggregate(key=key#342, functions=partial_first(value#343, > false)) > +- *(1) Sort key#342 ASC NULLS FIRST, false, 0 > +- LocalTableScan key#342, value#343 > */ > {code} > > My understanding of the source code does not allow me to fully understand why > this is the current behavior. > The solution might be to implement a new aggregate function. But the code > would be highly similar to the first one. And if I don't fully understand why > this > [createAggregate|https://github.com/apache/spark/blob/3a299aa6480ac22501512cd0310d31a441d7dfdc/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala#L45] > method falls back to SortAggregate. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34860) Multinomial Logistic Regression with intercept support centering
[ https://issues.apache.org/jira/browse/SPARK-34860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-34860. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31985 [https://github.com/apache/spark/pull/31985] > Multinomial Logistic Regression with intercept support centering > > > Key: SPARK-34860 > URL: https://issues.apache.org/jira/browse/SPARK-34860 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.2.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34860) Multinomial Logistic Regression with intercept support centering
[ https://issues.apache.org/jira/browse/SPARK-34860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-34860: Assignee: zhengruifeng > Multinomial Logistic Regression with intercept support centering > > > Key: SPARK-34860 > URL: https://issues.apache.org/jira/browse/SPARK-34860 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.2.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34795) Adds a new job in GitHub Actions to check the output of TPC-DS queries
[ https://issues.apache.org/jira/browse/SPARK-34795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-34795. -- Fix Version/s: 3.2.0 Assignee: Takeshi Yamamuro Resolution: Fixed Resolved by https://github.com/apache/spark/pull/31886 > Adds a new job in GitHub Actions to check the output of TPC-DS queries > -- > > Key: SPARK-34795 > URL: https://issues.apache.org/jira/browse/SPARK-34795 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > Fix For: 3.2.0 > > > This ticket aims at adding a new job in GitHub Actions to check the output of > TPC-DS queries. There are some cases where we noticed runtime-realted bugs > after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth > adding a new job in GitHub Actions to check query output of TPC-DS (sf=1). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33350) Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
[ https://issues.apache.org/jira/browse/SPARK-33350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33350: Assignee: Apache Spark > Add support to DiskBlockManager to create merge directory and to get the > local shuffle merged data > -- > > Key: SPARK-33350 > URL: https://issues.apache.org/jira/browse/SPARK-33350 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Assignee: Apache Spark >Priority: Major > > DiskBlockManager should be able to create the {{merge_manager}} directory, > where the push-based merged shuffle files are written and also create > sub-dirs under it. > It should also be able to serve the local merged shuffle data/index/meta > files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33350) Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
[ https://issues.apache.org/jira/browse/SPARK-33350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311942#comment-17311942 ] Apache Spark commented on SPARK-33350: -- User 'zhouyejoe' has created a pull request for this issue: https://github.com/apache/spark/pull/32007 > Add support to DiskBlockManager to create merge directory and to get the > local shuffle merged data > -- > > Key: SPARK-33350 > URL: https://issues.apache.org/jira/browse/SPARK-33350 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > DiskBlockManager should be able to create the {{merge_manager}} directory, > where the push-based merged shuffle files are written and also create > sub-dirs under it. > It should also be able to serve the local merged shuffle data/index/meta > files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33350) Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
[ https://issues.apache.org/jira/browse/SPARK-33350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33350: Assignee: (was: Apache Spark) > Add support to DiskBlockManager to create merge directory and to get the > local shuffle merged data > -- > > Key: SPARK-33350 > URL: https://issues.apache.org/jira/browse/SPARK-33350 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > DiskBlockManager should be able to create the {{merge_manager}} directory, > where the push-based merged shuffle files are written and also create > sub-dirs under it. > It should also be able to serve the local merged shuffle data/index/meta > files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34911) Fix code close issue in monitoring.md
angerszhu created SPARK-34911: - Summary: Fix code close issue in monitoring.md Key: SPARK-34911 URL: https://issues.apache.org/jira/browse/SPARK-34911 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34911) Fix code close issue in monitoring.md
[ https://issues.apache.org/jira/browse/SPARK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34911: Assignee: (was: Apache Spark) > Fix code close issue in monitoring.md > - > > Key: SPARK-34911 > URL: https://issues.apache.org/jira/browse/SPARK-34911 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34911) Fix code close issue in monitoring.md
[ https://issues.apache.org/jira/browse/SPARK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34911: Assignee: Apache Spark > Fix code close issue in monitoring.md > - > > Key: SPARK-34911 > URL: https://issues.apache.org/jira/browse/SPARK-34911 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34911) Fix code close issue in monitoring.md
[ https://issues.apache.org/jira/browse/SPARK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311964#comment-17311964 ] Apache Spark commented on SPARK-34911: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/32008 > Fix code close issue in monitoring.md > - > > Key: SPARK-34911 > URL: https://issues.apache.org/jira/browse/SPARK-34911 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34911) Fix code close issue in monitoring.md
[ https://issues.apache.org/jira/browse/SPARK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311965#comment-17311965 ] Apache Spark commented on SPARK-34911: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/32008 > Fix code close issue in monitoring.md > - > > Key: SPARK-34911 > URL: https://issues.apache.org/jira/browse/SPARK-34911 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34911) Fix code close issue in monitoring.md
[ https://issues.apache.org/jira/browse/SPARK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-34911: -- Component/s: (was: SQL) Spark Core > Fix code close issue in monitoring.md > - > > Key: SPARK-34911 > URL: https://issues.apache.org/jira/browse/SPARK-34911 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34912) 启动spark-shell读取文件报错
czh created SPARK-34912: --- Summary: 启动spark-shell读取文件报错 Key: SPARK-34912 URL: https://issues.apache.org/jira/browse/SPARK-34912 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.6.0 Reporter: czh 启动spark-shell读取外部文件报错 Class org.apache.hadoop.fs.s3a.S3AFileSystem not found spark版本1.6,hadoop版本2.6.0,已经把aws-java-sdk-1.7.4.jar和hadoop-aws-2.6.0.jar复制到spark下lib包中 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34913) 启动spark-shell读取文件报错
czh created SPARK-34913: --- Summary: 启动spark-shell读取文件报错 Key: SPARK-34913 URL: https://issues.apache.org/jira/browse/SPARK-34913 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.6.0 Reporter: czh 启动spark-shell读取外部文件报错 Class org.apache.hadoop.fs.s3a.S3AFileSystem not found spark版本1.6,hadoop版本2.6.0,已经把aws-java-sdk-1.7.4.jar和hadoop-aws-2.6.0.jar复制到spark下lib包中 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34510) .foreachPartition command hangs when ran inside Python package but works when ran from Python file outside the package on EMR
[ https://issues.apache.org/jira/browse/SPARK-34510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311978#comment-17311978 ] Yuriy commented on SPARK-34510: --- Sean, the code is not actually performing any S3 operations, if you take a closer look at s3_repo.py all it is doing is running .foreachPartition on a data frame that contains 3 records and then printing out the results. It works locally for me just not when it’s deployed to EMR. > .foreachPartition command hangs when ran inside Python package but works when > ran from Python file outside the package on EMR > - > > Key: SPARK-34510 > URL: https://issues.apache.org/jira/browse/SPARK-34510 > Project: Spark > Issue Type: Bug > Components: EC2, PySpark >Affects Versions: 3.0.0 >Reporter: Yuriy >Priority: Minor > Attachments: Code.zip > > > I'm running on EMR Pyspark 3.0.0. with project structure below, process.py is > what controls the flow of the application and calls code inside the > _file_processor_ package. The command hangs when the .foreachPartition code > that is located inside _s3_repo.py_ is called by _process.py_. When the same > .foreachPartition code is moved from _s3_repo.py_ and placed inside the > _process.py_ it runs just fine. > {code:java} > process.py > file_processor > config > spark.py > repository > s3_repo.py > structure > table_creator.py > {code} > *process.py* > {code:java} > from file_processor.structure import table_creator > from file_processor.repository import s3_repo > def process(): > table_creator.create_table() > s3_repo.save_to_s3() > if __name__ == '__main__': > process() > {code} > *spark.py* > {code:java} > from pyspark.sql import SparkSession > spark_session = SparkSession.builder.appName("Test").getOrCreate() > {code} > *s3_repo.py* > {code:java} > from file_processor.config.spark import spark_session > def save_to_s3(): > spark_session.sql('SELECT * FROM > rawFileData').toJSON().foreachPartition(_save_to_s3) > def _save_to_s3(iterator): > for record in iterator: > print(record) > {code} > *table_creator.py* > {code:java} > from file_processor.config.spark import spark_session > from pyspark.sql import Row > def create_table(): > file_contents = [ > {'line_num': 1, 'contents': 'line 1'}, > {'line_num': 2, 'contents': 'line 2'}, > {'line_num': 3, 'contents': 'line 3'} > ] > spark_session.createDataFrame(Row(**row) for row in > file_contents).cache().createOrReplaceTempView("rawFileData") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34914) Local scheduler backend support update token
ulysses you created SPARK-34914: --- Summary: Local scheduler backend support update token Key: SPARK-34914 URL: https://issues.apache.org/jira/browse/SPARK-34914 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Reporter: ulysses you `LocalSchedulerBackend` doesn't extend `CoarseGrainedSchedulerBackend` so in local mode, we don't support update token. In proxy use case with follow cmd, we will get exception {code:java} ./bin/spark-shell --master local --proxy-user user_name > spark.sql("show tables") {code} {code:java} javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:477) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:285) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34914) Local scheduler backend support update token
[ https://issues.apache.org/jira/browse/SPARK-34914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34914: Assignee: (was: Apache Spark) > Local scheduler backend support update token > > > Key: SPARK-34914 > URL: https://issues.apache.org/jira/browse/SPARK-34914 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > `LocalSchedulerBackend` doesn't extend `CoarseGrainedSchedulerBackend` so in > local mode, we don't support update token. > In proxy use case with follow cmd, we will get exception > {code:java} > ./bin/spark-shell --master local --proxy-user user_name > > spark.sql("show tables") > {code} > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:477) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:285) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34914) Local scheduler backend support update token
[ https://issues.apache.org/jira/browse/SPARK-34914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34914: Assignee: Apache Spark > Local scheduler backend support update token > > > Key: SPARK-34914 > URL: https://issues.apache.org/jira/browse/SPARK-34914 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: Apache Spark >Priority: Minor > > `LocalSchedulerBackend` doesn't extend `CoarseGrainedSchedulerBackend` so in > local mode, we don't support update token. > In proxy use case with follow cmd, we will get exception > {code:java} > ./bin/spark-shell --master local --proxy-user user_name > > spark.sql("show tables") > {code} > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:477) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:285) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34914) Local scheduler backend support update token
[ https://issues.apache.org/jira/browse/SPARK-34914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311989#comment-17311989 ] Apache Spark commented on SPARK-34914: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/32009 > Local scheduler backend support update token > > > Key: SPARK-34914 > URL: https://issues.apache.org/jira/browse/SPARK-34914 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > `LocalSchedulerBackend` doesn't extend `CoarseGrainedSchedulerBackend` so in > local mode, we don't support update token. > In proxy use case with follow cmd, we will get exception > {code:java} > ./bin/spark-shell --master local --proxy-user user_name > > spark.sql("show tables") > {code} > {code:java} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:477) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:285) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34913) 启动spark-shell读取文件报错
[ https://issues.apache.org/jira/browse/SPARK-34913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] czh updated SPARK-34913: Attachment: 微信截图_20210331110125.png > 启动spark-shell读取文件报错 > --- > > Key: SPARK-34913 > URL: https://issues.apache.org/jira/browse/SPARK-34913 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.6.0 >Reporter: czh >Priority: Major > Attachments: 微信截图_20210331110125.png > > > 启动spark-shell读取外部文件报错 > Class org.apache.hadoop.fs.s3a.S3AFileSystem not found > spark版本1.6,hadoop版本2.6.0,已经把aws-java-sdk-1.7.4.jar和hadoop-aws-2.6.0.jar复制到spark下lib包中 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34911) Fix code close issue in monitoring.md
[ https://issues.apache.org/jira/browse/SPARK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-34911: - Priority: Trivial (was: Major) > Fix code close issue in monitoring.md > - > > Key: SPARK-34911 > URL: https://issues.apache.org/jira/browse/SPARK-34911 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Trivial > > Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34911) Fix code close issue in monitoring.md
[ https://issues.apache.org/jira/browse/SPARK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312000#comment-17312000 ] Sean R. Owen commented on SPARK-34911: -- This could have just been a follow up to the other JIRA(s) too > Fix code close issue in monitoring.md > - > > Key: SPARK-34911 > URL: https://issues.apache.org/jira/browse/SPARK-34911 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Trivial > > Fix code close issue in monitoring.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34912) 启动spark-shell读取文件报错
[ https://issues.apache.org/jira/browse/SPARK-34912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] czh updated SPARK-34912: Description: Start spark shell to read external file and report an error Class org.apache.hadoop .fs.s3a.S3AFileSystem not found Spark version 1.6 and Hadoop version 2.6.0 have copied aws-java-sdk-1.7.4.jar and hadoop-aws-2.6.0.jar to the Lib package under spark was: 启动spark-shell读取外部文件报错 Class org.apache.hadoop.fs.s3a.S3AFileSystem not found spark版本1.6,hadoop版本2.6.0,已经把aws-java-sdk-1.7.4.jar和hadoop-aws-2.6.0.jar复制到spark下lib包中 > 启动spark-shell读取文件报错 > --- > > Key: SPARK-34912 > URL: https://issues.apache.org/jira/browse/SPARK-34912 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.6.0 >Reporter: czh >Priority: Major > > Start spark shell to read external file and report an error > > Class org.apache.hadoop .fs.s3a.S3AFileSystem not found > > Spark version 1.6 and Hadoop version 2.6.0 have copied aws-java-sdk-1.7.4.jar > and hadoop-aws-2.6.0.jar to the Lib package under spark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34913) 启动spark-shell读取文件报错
[ https://issues.apache.org/jira/browse/SPARK-34913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] czh updated SPARK-34913: Description: Start spark shell to read external file and report an error Class org.apache.hadoop .fs.s3a.S3AFileSystem not found Spark version 1.6 and Hadoop version 2.6.0 have copied aws-java-sdk-1.7.4.jar and hadoop-aws-2.6.0.jar to the Lib package under spark was: 启动spark-shell读取外部文件报错 Class org.apache.hadoop.fs.s3a.S3AFileSystem not found spark版本1.6,hadoop版本2.6.0,已经把aws-java-sdk-1.7.4.jar和hadoop-aws-2.6.0.jar复制到spark下lib包中 > 启动spark-shell读取文件报错 > --- > > Key: SPARK-34913 > URL: https://issues.apache.org/jira/browse/SPARK-34913 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.6.0 >Reporter: czh >Priority: Major > Attachments: 微信截图_20210331110125.png > > > Start spark shell to read external file and report an error > Class org.apache.hadoop .fs.s3a.S3AFileSystem not found > Spark version 1.6 and Hadoop version 2.6.0 have copied aws-java-sdk-1.7.4.jar > and hadoop-aws-2.6.0.jar to the Lib package under spark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34913) Start spark shell to read file and report an error
[ https://issues.apache.org/jira/browse/SPARK-34913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] czh updated SPARK-34913: Summary: Start spark shell to read file and report an error (was: 启动spark-shell读取文件报错) > Start spark shell to read file and report an error > -- > > Key: SPARK-34913 > URL: https://issues.apache.org/jira/browse/SPARK-34913 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.6.0 >Reporter: czh >Priority: Major > Attachments: 微信截图_20210331110125.png > > > Start spark shell to read external file and report an error > Class org.apache.hadoop .fs.s3a.S3AFileSystem not found > Spark version 1.6 and Hadoop version 2.6.0 have copied aws-java-sdk-1.7.4.jar > and hadoop-aws-2.6.0.jar to the Lib package under spark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34912) Start spark shell to read file and report an error
[ https://issues.apache.org/jira/browse/SPARK-34912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] czh updated SPARK-34912: Summary: Start spark shell to read file and report an error (was: 启动spark-shell读取文件报错) > Start spark shell to read file and report an error > -- > > Key: SPARK-34912 > URL: https://issues.apache.org/jira/browse/SPARK-34912 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 1.6.0 >Reporter: czh >Priority: Major > > Start spark shell to read external file and report an error > > Class org.apache.hadoop .fs.s3a.S3AFileSystem not found > > Spark version 1.6 and Hadoop version 2.6.0 have copied aws-java-sdk-1.7.4.jar > and hadoop-aws-2.6.0.jar to the Lib package under spark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34908) Add test cases for char and varchar with functions
[ https://issues.apache.org/jira/browse/SPARK-34908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34908: Assignee: (was: Apache Spark) > Add test cases for char and varchar with functions > -- > > Key: SPARK-34908 > URL: https://issues.apache.org/jira/browse/SPARK-34908 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > Add test cases for char and varchar with functions to show the behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34908) Add test cases for char and varchar with functions
[ https://issues.apache.org/jira/browse/SPARK-34908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312035#comment-17312035 ] Apache Spark commented on SPARK-34908: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/32010 > Add test cases for char and varchar with functions > -- > > Key: SPARK-34908 > URL: https://issues.apache.org/jira/browse/SPARK-34908 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Priority: Minor > > Add test cases for char and varchar with functions to show the behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34908) Add test cases for char and varchar with functions
[ https://issues.apache.org/jira/browse/SPARK-34908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34908: Assignee: Apache Spark > Add test cases for char and varchar with functions > -- > > Key: SPARK-34908 > URL: https://issues.apache.org/jira/browse/SPARK-34908 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Minor > > Add test cases for char and varchar with functions to show the behavior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34907) Add main class that runs all benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34907. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32005 [https://github.com/apache/spark/pull/32005] > Add main class that runs all benchmarks > --- > > Key: SPARK-34907 > URL: https://issues.apache.org/jira/browse/SPARK-34907 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.2.0 > > > This is related to SPARK-31471. It should be good if we can have an automatic > way to do it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34907) Add main class that runs all benchmarks
[ https://issues.apache.org/jira/browse/SPARK-34907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34907: Assignee: Hyukjin Kwon > Add main class that runs all benchmarks > --- > > Key: SPARK-34907 > URL: https://issues.apache.org/jira/browse/SPARK-34907 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > This is related to SPARK-31471. It should be good if we can have an automatic > way to do it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34915) Cache Maven, SBT and Scala in all jobs that use them
Hyukjin Kwon created SPARK-34915: Summary: Cache Maven, SBT and Scala in all jobs that use them Key: SPARK-34915 URL: https://issues.apache.org/jira/browse/SPARK-34915 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 3.1.1, 3.0.2, 3.2.0 Reporter: Hyukjin Kwon We should cache SBT, Maven and Scala for all jobs that use them. This is currently missing in some jobs such as https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L411-L430 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34916) Reduce tree traversals in transform/resolve function families
Yingyi Bu created SPARK-34916: - Summary: Reduce tree traversals in transform/resolve function families Key: SPARK-34916 URL: https://issues.apache.org/jira/browse/SPARK-34916 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Yingyi Bu Fix For: 3.2.0 Transform/resolve functions are called ~280k times per query on average for TPC-DS queries, which are way more than necessary. We can reduce those calls with early exit information and conditons. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34915) Cache Maven, SBT and Scala in all jobs that use them
[ https://issues.apache.org/jira/browse/SPARK-34915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34915: Assignee: (was: Apache Spark) > Cache Maven, SBT and Scala in all jobs that use them > > > Key: SPARK-34915 > URL: https://issues.apache.org/jira/browse/SPARK-34915 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Hyukjin Kwon >Priority: Minor > > We should cache SBT, Maven and Scala for all jobs that use them. This is > currently missing in some jobs such as > https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L411-L430 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34915) Cache Maven, SBT and Scala in all jobs that use them
[ https://issues.apache.org/jira/browse/SPARK-34915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34915: Assignee: Apache Spark > Cache Maven, SBT and Scala in all jobs that use them > > > Key: SPARK-34915 > URL: https://issues.apache.org/jira/browse/SPARK-34915 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > We should cache SBT, Maven and Scala for all jobs that use them. This is > currently missing in some jobs such as > https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L411-L430 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34915) Cache Maven, SBT and Scala in all jobs that use them
[ https://issues.apache.org/jira/browse/SPARK-34915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312055#comment-17312055 ] Apache Spark commented on SPARK-34915: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/32011 > Cache Maven, SBT and Scala in all jobs that use them > > > Key: SPARK-34915 > URL: https://issues.apache.org/jira/browse/SPARK-34915 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Hyukjin Kwon >Priority: Minor > > We should cache SBT, Maven and Scala for all jobs that use them. This is > currently missing in some jobs such as > https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L411-L430 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34916) Reduce tree traversals in transform/resolve function families
[ https://issues.apache.org/jira/browse/SPARK-34916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingyi Bu updated SPARK-34916: -- Fix Version/s: (was: 3.2.0) Shepherd: Herman van Hövell Target Version/s: 3.2.0 > Reduce tree traversals in transform/resolve function families > - > > Key: SPARK-34916 > URL: https://issues.apache.org/jira/browse/SPARK-34916 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yingyi Bu >Priority: Major > > Transform/resolve functions are called ~280k times per query on average for > TPC-DS queries, which are way more than necessary. We can reduce those calls > with early exit information and conditons. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34915) Cache Maven, SBT and Scala in all jobs that use them
[ https://issues.apache.org/jira/browse/SPARK-34915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312056#comment-17312056 ] Apache Spark commented on SPARK-34915: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/32011 > Cache Maven, SBT and Scala in all jobs that use them > > > Key: SPARK-34915 > URL: https://issues.apache.org/jira/browse/SPARK-34915 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Hyukjin Kwon >Priority: Minor > > We should cache SBT, Maven and Scala for all jobs that use them. This is > currently missing in some jobs such as > https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L411-L430 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34909) conv() does not convert negative inputs to unsigned correctly
[ https://issues.apache.org/jira/browse/SPARK-34909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34909: --- Assignee: Tim Armstrong > conv() does not convert negative inputs to unsigned correctly > - > > Key: SPARK-34909 > URL: https://issues.apache.org/jira/browse/SPARK-34909 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > {noformat} > scala> spark.sql("select conv('-10', 11, 7)").show(20, 150) > +---+ > | conv(-10, 11, 7)| > +---+ > |4501202152252313413456| > +---+ > scala> spark.sql("select hex(conv('-10', 11, 7))").show(20, 150) > +--+ > | hex(conv(-10, 11, 7))| > +--+ > |3435303132303231353232353233313334313334353600| > +--+ > {noformat} > The correct result is 45012021522523134134555. The above output has an > incorrect second-to-last digit (6 instead of 5) and the last digit is a > non-printing character the null byte. > I tracked the bug down to NumberConverter.unsignedLongDiv returning incorrect > results. I tried replacing with java.lang.Long.divideUnsigned and that fixed > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34909) conv() does not convert negative inputs to unsigned correctly
[ https://issues.apache.org/jira/browse/SPARK-34909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34909. - Fix Version/s: 3.0.3 3.1.2 2.4.8 3.2.0 Resolution: Fixed Issue resolved by pull request 32006 [https://github.com/apache/spark/pull/32006] > conv() does not convert negative inputs to unsigned correctly > - > > Key: SPARK-34909 > URL: https://issues.apache.org/jira/browse/SPARK-34909 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: 3.2.0, 2.4.8, 3.1.2, 3.0.3 > > > {noformat} > scala> spark.sql("select conv('-10', 11, 7)").show(20, 150) > +---+ > | conv(-10, 11, 7)| > +---+ > |4501202152252313413456| > +---+ > scala> spark.sql("select hex(conv('-10', 11, 7))").show(20, 150) > +--+ > | hex(conv(-10, 11, 7))| > +--+ > |3435303132303231353232353233313334313334353600| > +--+ > {noformat} > The correct result is 45012021522523134134555. The above output has an > incorrect second-to-last digit (6 instead of 5) and the last digit is a > non-printing character the null byte. > I tracked the bug down to NumberConverter.unsignedLongDiv returning incorrect > results. I tried replacing with java.lang.Long.divideUnsigned and that fixed > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34916) Reduce tree traversals in transform/resolve function families
[ https://issues.apache.org/jira/browse/SPARK-34916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingyi Bu updated SPARK-34916: -- Description: Transform/resolve functions are called ~280k times per query on average for a TPC-DS query, which are way more than necessary. We can reduce those calls with early exit information and conditons. (was: Transform/resolve functions are called ~280k times per query on average for TPC-DS queries, which are way more than necessary. We can reduce those calls with early exit information and conditons.) > Reduce tree traversals in transform/resolve function families > - > > Key: SPARK-34916 > URL: https://issues.apache.org/jira/browse/SPARK-34916 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yingyi Bu >Priority: Major > > Transform/resolve functions are called ~280k times per query on average for a > TPC-DS query, which are way more than necessary. We can reduce those calls > with early exit information and conditons. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34896) Return day-time interval from dates subtraction
[ https://issues.apache.org/jira/browse/SPARK-34896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-34896. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31996 [https://github.com/apache/spark/pull/31996] > Return day-time interval from dates subtraction > --- > > Key: SPARK-34896 > URL: https://issues.apache.org/jira/browse/SPARK-34896 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.2.0 > > > # Add SQL config to switch between new ANSI intervals and CalendarIntervalType > # Modify SubtractDates to return DayTimeIntervalType when the config is > enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34917) Create SQL syntax document for CAST
Gengliang Wang created SPARK-34917: -- Summary: Create SQL syntax document for CAST Key: SPARK-34917 URL: https://issues.apache.org/jira/browse/SPARK-34917 Project: Spark Issue Type: Task Components: Documentation Affects Versions: 3.2.0 Reporter: Gengliang Wang Documentation for the behavior of CAST, including valid conversion types combinations, the result of integral overflow/string parsing errors, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34918) Create SQL syntax document for TRT_CAST
Gengliang Wang created SPARK-34918: -- Summary: Create SQL syntax document for TRT_CAST Key: SPARK-34918 URL: https://issues.apache.org/jira/browse/SPARK-34918 Project: Spark Issue Type: Task Components: Documentation Affects Versions: 3.2.0 Reporter: Gengliang Wang Documentation for the behavior of CAST, including valid conversion types combinations, the result of integral overflow/string parsing errors, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34568) enableHiveSupport should ignore if SparkContext is created
[ https://issues.apache.org/jira/browse/SPARK-34568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34568: --- Assignee: angerszhu > enableHiveSupport should ignore if SparkContext is created > -- > > Key: SPARK-34568 > URL: https://issues.apache.org/jira/browse/SPARK-34568 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > If SparkContext is created, > SparkSession.builder.enableHiveSupport().getOrCreate() won't load hive > metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34568) enableHiveSupport should ignore if SparkContext is created
[ https://issues.apache.org/jira/browse/SPARK-34568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34568. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31680 [https://github.com/apache/spark/pull/31680] > enableHiveSupport should ignore if SparkContext is created > -- > > Key: SPARK-34568 > URL: https://issues.apache.org/jira/browse/SPARK-34568 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > If SparkContext is created, > SparkSession.builder.enableHiveSupport().getOrCreate() won't load hive > metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34354) CostBasedJoinReorder can fail on self-join
[ https://issues.apache.org/jira/browse/SPARK-34354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34354: --- Assignee: wuyi > CostBasedJoinReorder can fail on self-join > -- > > Key: SPARK-34354 > URL: https://issues.apache.org/jira/browse/SPARK-34354 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > > For example: > {code:java} > test("join reorder with self-join") { > val plan = t2.join(t1, Inner, Some(nameToAttr("t1.k-1-2") === > nameToAttr("t2.k-1-5"))) > .select(nameToAttr("t1.v-1-10")) > .join(t2, Inner, Some(nameToAttr("t1.v-1-10") === nameToAttr("t2.k-1-5"))) > // this can fail > Optimize.execute(plan.analyze) > } > {code} > error: > {code:java} > [info] java.lang.AssertionError: assertion failed > [info] at scala.Predef$.assert(Predef.scala:208) > [info] at > org.apache.spark.sql.catalyst.optimizer.JoinReorderDP$.search(CostBasedJoinReorder.scala:178) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.org$apache$spark$sql$catalyst$optimizer$CostBasedJoinReorder$$reorder(CostBasedJoinReorder.scala:64) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$1.applyOrElse(CostBasedJoinReorder.scala:45) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$1.applyOrElse(CostBasedJoinReorder.scala:41) > [info] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:317) > [info] at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > [info] at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:317) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:171) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:169) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.apply(CostBasedJoinReorder.scala:41) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.apply(CostBasedJoinReorder.scala:35) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34354) CostBasedJoinReorder can fail on self-join
[ https://issues.apache.org/jira/browse/SPARK-34354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34354. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31470 [https://github.com/apache/spark/pull/31470] > CostBasedJoinReorder can fail on self-join > -- > > Key: SPARK-34354 > URL: https://issues.apache.org/jira/browse/SPARK-34354 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.2.0 > > > > For example: > {code:java} > test("join reorder with self-join") { > val plan = t2.join(t1, Inner, Some(nameToAttr("t1.k-1-2") === > nameToAttr("t2.k-1-5"))) > .select(nameToAttr("t1.v-1-10")) > .join(t2, Inner, Some(nameToAttr("t1.v-1-10") === nameToAttr("t2.k-1-5"))) > // this can fail > Optimize.execute(plan.analyze) > } > {code} > error: > {code:java} > [info] java.lang.AssertionError: assertion failed > [info] at scala.Predef$.assert(Predef.scala:208) > [info] at > org.apache.spark.sql.catalyst.optimizer.JoinReorderDP$.search(CostBasedJoinReorder.scala:178) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.org$apache$spark$sql$catalyst$optimizer$CostBasedJoinReorder$$reorder(CostBasedJoinReorder.scala:64) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$1.applyOrElse(CostBasedJoinReorder.scala:45) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$1.applyOrElse(CostBasedJoinReorder.scala:41) > [info] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:317) > [info] at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > [info] at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:317) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:171) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:169) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.apply(CostBasedJoinReorder.scala:41) > [info] at > org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.apply(CostBasedJoinReorder.scala:35) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org