[jira] [Resolved] (SPARK-31270) Expose executor memory metrics at the task detal, in the Stages tab
[ https://issues.apache.org/jira/browse/SPARK-31270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu resolved SPARK-31270. --- Resolution: Won't Fix > Expose executor memory metrics at the task detal, in the Stages tab > --- > > Key: SPARK-31270 > URL: https://issues.apache.org/jira/browse/SPARK-31270 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257669#comment-17257669 ] angerszhu commented on SPARK-26399: --- working on this PR > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http://:18080/api/v1/applications/ id>// attempt>/executorSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257647#comment-17257647 ] Ted Yu commented on SPARK-33915: Here is sample code for capturing the column and fields in downstream PredicatePushDown.scala {code} private val JSONCapture = "`GetJsonObject\\((.*),(.*)\\)`".r private def transformGetJsonObject(p: Predicate): Predicate = { val eq = p.asInstanceOf[sources.EqualTo] eq.attribute match { case JSONCapture(column,field) => val colName = column.toString.split("#")(0) val names = field.toString.split("\\.").foldLeft(List[String]()){(z, n) => z :+ "->'"+n+"'" } sources.EqualTo(colName + names.slice(1, names.size).mkString(""), eq.value).asInstanceOf[Predicate] case _ => sources.EqualTo("foo", "bar").asInstanceOf[Predicate] } } {code} > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Assignee: Apache Spark >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-33915: --- Comment: was deleted (was: Opened https://github.com/apache/spark/pull/30984) > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Assignee: Apache Spark >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26399) Add new stage-level REST APIs and parameters
[ https://issues.apache.org/jira/browse/SPARK-26399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257639#comment-17257639 ] Ron Hu commented on SPARK-26399: Hi [~Baohe Zhang] , This ticket proposes a new REST API: http://:18080/api/v1/applicationsexecutorSummary. It means to display the percentile distribution of peak memory metrics among the executors used in a given stage. It can help Spark users debug/monitor a bottleneck of a stage. In the ticket https://issues.apache.org/jira/browse/SPARK-32446, it proposed to add a REST API, which can display the percentile distribution of peak memory metrics for all executors used in an application. The REST API is: http://:18080/api/v1/applications///executorSummary Hence this ticket displays executorSummary for a given stage inside an application. SPARK-32446 wants to display executorSummary for the entire application. They are different. > Add new stage-level REST APIs and parameters > > > Key: SPARK-26399 > URL: https://issues.apache.org/jira/browse/SPARK-26399 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Edward Lu >Priority: Major > > Add the peak values for the metrics to the stages REST API. Also add a new > executorSummary REST API, which will return executor summary metrics for a > specified stage: > {code:java} > curl http://:18080/api/v1/applications/ id>// attempt>/executorSummary{code} > Add parameters to the stages REST API to specify: > * filtering for task status, and returning tasks that match (for example, > FAILED tasks). > * task metric quantiles, add adding the task summary if specified > * executor metric quantiles, and adding the executor summary if specified -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33963) `isCached` return `false` for cached Hive table
[ https://issues.apache.org/jira/browse/SPARK-33963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33963: Assignee: Maxim Gekk > `isCached` return `false` for cached Hive table > --- > > Key: SPARK-33963 > URL: https://issues.apache.org/jira/browse/SPARK-33963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 > probably): > *Spark 3.0:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.1 > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res3: Boolean = false > scala> sql("CACHE TABLE tbl") > res4: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res5: Boolean = true > {code} > *Spark master:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res2: Boolean = false > scala> sql("CACHE TABLE tbl") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res4: Boolean = false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33963) `isCached` return `false` for cached Hive table
[ https://issues.apache.org/jira/browse/SPARK-33963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33963. -- Fix Version/s: 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 30995 [https://github.com/apache/spark/pull/30995] > `isCached` return `false` for cached Hive table > --- > > Key: SPARK-33963 > URL: https://issues.apache.org/jira/browse/SPARK-33963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0, 3.0.2 > > > The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 > probably): > *Spark 3.0:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.1 > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res3: Boolean = false > scala> sql("CACHE TABLE tbl") > res4: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res5: Boolean = true > {code} > *Spark master:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res2: Boolean = false > scala> sql("CACHE TABLE tbl") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res4: Boolean = false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33959) Improve the statistics estimation of the Tail
[ https://issues.apache.org/jira/browse/SPARK-33959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33959. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30991 [https://github.com/apache/spark/pull/30991] > Improve the statistics estimation of the Tail > - > > Key: SPARK-33959 > URL: https://issues.apache.org/jira/browse/SPARK-33959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as > e").write.saveAsTable("t1") > println(Tail(Literal(5), spark.sql("SELECT * FROM > t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) > {code} > Current: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=3.8 KiB) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33959) Improve the statistics estimation of the Tail
[ https://issues.apache.org/jira/browse/SPARK-33959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33959: Assignee: Yuming Wang > Improve the statistics estimation of the Tail > - > > Key: SPARK-33959 > URL: https://issues.apache.org/jira/browse/SPARK-33959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as > e").write.saveAsTable("t1") > println(Tail(Literal(5), spark.sql("SELECT * FROM > t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) > {code} > Current: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=3.8 KiB) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33960) LimitPushDown support Sort
[ https://issues.apache.org/jira/browse/SPARK-33960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-33960. - Resolution: Not A Problem > LimitPushDown support Sort > -- > > Key: SPARK-33960 > URL: https://issues.apache.org/jira/browse/SPARK-33960 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > LimitPushDown support Sort. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33922) Fix error test SparkLauncherSuite.testSparkLauncherGetError
[ https://issues.apache.org/jira/browse/SPARK-33922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257620#comment-17257620 ] Hyukjin Kwon commented on SPARK-33922: -- Would you mind sharing the full logs? Meanwhile, it might be helpful to take a look for https://spark.apache.org/developer-tools.html about who to run a test. > Fix error test SparkLauncherSuite.testSparkLauncherGetError > --- > > Key: SPARK-33922 > URL: https://issues.apache.org/jira/browse/SPARK-33922 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.1 >Reporter: dengziming >Priority: Minor > > org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError get > failed everytime when executing, note that it's not a flaky test because it > failed everytime. > ``` > java.lang.AssertionErrorjava.lang.AssertionError at > org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33961: -- Affects Version/s: (was: 3.2.0) > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33961: - Assignee: Dongjoon Hyun > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33961. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30993 [https://github.com/apache/spark/pull/30993] > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33964) Combine distinct unions in more cases
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257499#comment-17257499 ] Apache Spark commented on SPARK-33964: -- User 'tanelk' has created a pull request for this issue: https://github.com/apache/spark/pull/30996 > Combine distinct unions in more cases > - > > Key: SPARK-33964 > URL: https://issues.apache.org/jira/browse/SPARK-33964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > In several TPCDS queries the CombineUnions rule does not manage to combine > unions, because they have noop Projects between them. > The Projects will be removed by RemoveNoopOperators, but by then > ReplaceDistinctWithAggregate has been applied and there are aggregates > between the unions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33964) Combine distinct unions in more cases
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33964: Assignee: Apache Spark > Combine distinct unions in more cases > - > > Key: SPARK-33964 > URL: https://issues.apache.org/jira/browse/SPARK-33964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Assignee: Apache Spark >Priority: Major > > In several TPCDS queries the CombineUnions rule does not manage to combine > unions, because they have noop Projects between them. > The Projects will be removed by RemoveNoopOperators, but by then > ReplaceDistinctWithAggregate has been applied and there are aggregates > between the unions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33964) Combine distinct unions in more cases
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257498#comment-17257498 ] Apache Spark commented on SPARK-33964: -- User 'tanelk' has created a pull request for this issue: https://github.com/apache/spark/pull/30996 > Combine distinct unions in more cases > - > > Key: SPARK-33964 > URL: https://issues.apache.org/jira/browse/SPARK-33964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > In several TPCDS queries the CombineUnions rule does not manage to combine > unions, because they have noop Projects between them. > The Projects will be removed by RemoveNoopOperators, but by then > ReplaceDistinctWithAggregate has been applied and there are aggregates > between the unions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33964) Combine distinct unions in more cases
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33964: Assignee: (was: Apache Spark) > Combine distinct unions in more cases > - > > Key: SPARK-33964 > URL: https://issues.apache.org/jira/browse/SPARK-33964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > In several TPCDS queries the CombineUnions rule does not manage to combine > unions, because they have noop Projects between them. > The Projects will be removed by RemoveNoopOperators, but by then > ReplaceDistinctWithAggregate has been applied and there are aggregates > between the unions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33964) Combine distinct unions in more cases
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanel Kiis updated SPARK-33964: --- Description: In several TPCDS queries the CombineUnions rule does not manage to combine unions, because they have noop Projects between them. The Projects will be removed by RemoveNoopOperators, but by then ReplaceDistinctWithAggregate has been applied and there are aggregates between the unions. > Combine distinct unions in more cases > - > > Key: SPARK-33964 > URL: https://issues.apache.org/jira/browse/SPARK-33964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > In several TPCDS queries the CombineUnions rule does not manage to combine > unions, because they have noop Projects between them. > The Projects will be removed by RemoveNoopOperators, but by then > ReplaceDistinctWithAggregate has been applied and there are aggregates > between the unions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33964) Combine distinct unions in more cases
Tanel Kiis created SPARK-33964: -- Summary: Combine distinct unions in more cases Key: SPARK-33964 URL: https://issues.apache.org/jira/browse/SPARK-33964 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Tanel Kiis -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33963) `isCached` return `false` for cached Hive table
[ https://issues.apache.org/jira/browse/SPARK-33963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257490#comment-17257490 ] Apache Spark commented on SPARK-33963: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30995 > `isCached` return `false` for cached Hive table > --- > > Key: SPARK-33963 > URL: https://issues.apache.org/jira/browse/SPARK-33963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 > probably): > *Spark 3.0:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.1 > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res3: Boolean = false > scala> sql("CACHE TABLE tbl") > res4: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res5: Boolean = true > {code} > *Spark master:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res2: Boolean = false > scala> sql("CACHE TABLE tbl") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res4: Boolean = false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33963) `isCached` return `false` for cached Hive table
[ https://issues.apache.org/jira/browse/SPARK-33963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33963: Assignee: Apache Spark > `isCached` return `false` for cached Hive table > --- > > Key: SPARK-33963 > URL: https://issues.apache.org/jira/browse/SPARK-33963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 > probably): > *Spark 3.0:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.1 > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res3: Boolean = false > scala> sql("CACHE TABLE tbl") > res4: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res5: Boolean = true > {code} > *Spark master:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res2: Boolean = false > scala> sql("CACHE TABLE tbl") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res4: Boolean = false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33963) `isCached` return `false` for cached Hive table
[ https://issues.apache.org/jira/browse/SPARK-33963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33963: Assignee: (was: Apache Spark) > `isCached` return `false` for cached Hive table > --- > > Key: SPARK-33963 > URL: https://issues.apache.org/jira/browse/SPARK-33963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 > probably): > *Spark 3.0:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.1 > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res3: Boolean = false > scala> sql("CACHE TABLE tbl") > res4: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res5: Boolean = true > {code} > *Spark master:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res2: Boolean = false > scala> sql("CACHE TABLE tbl") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res4: Boolean = false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33963) `isCached` return `false` for cached Hive table
[ https://issues.apache.org/jira/browse/SPARK-33963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257489#comment-17257489 ] Apache Spark commented on SPARK-33963: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30995 > `isCached` return `false` for cached Hive table > --- > > Key: SPARK-33963 > URL: https://issues.apache.org/jira/browse/SPARK-33963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 > probably): > *Spark 3.0:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.1 > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res3: Boolean = false > scala> sql("CACHE TABLE tbl") > res4: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res5: Boolean = true > {code} > *Spark master:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res2: Boolean = false > scala> sql("CACHE TABLE tbl") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res4: Boolean = false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33963) `isCached` return `false` for cached Hive table
[ https://issues.apache.org/jira/browse/SPARK-33963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257483#comment-17257483 ] Maxim Gekk commented on SPARK-33963: I am working on a bug fix. > `isCached` return `false` for cached Hive table > --- > > Key: SPARK-33963 > URL: https://issues.apache.org/jira/browse/SPARK-33963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 > probably): > *Spark 3.0:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.1 > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res3: Boolean = false > scala> sql("CACHE TABLE tbl") > res4: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res5: Boolean = true > {code} > *Spark master:* > {code:scala} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT > /_/ > scala> sql("CREATE TABLE tbl (col int)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res2: Boolean = false > scala> sql("CACHE TABLE tbl") > res3: org.apache.spark.sql.DataFrame = [] > scala> spark.catalog.isCached("tbl") > res4: Boolean = false > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33963) `isCached` return `false` for cached Hive table
Maxim Gekk created SPARK-33963: -- Summary: `isCached` return `false` for cached Hive table Key: SPARK-33963 URL: https://issues.apache.org/jira/browse/SPARK-33963 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0, 3.2.0 Reporter: Maxim Gekk The same works in Spark 3.0 but fails in Spark 3.2.0-SNAPSHOT (and 3.1.0 probably): *Spark 3.0:* {code:scala} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.1 /_/ scala> sql("CREATE TABLE tbl (col int)") res2: org.apache.spark.sql.DataFrame = [] scala> spark.catalog.isCached("tbl") res3: Boolean = false scala> sql("CACHE TABLE tbl") res4: org.apache.spark.sql.DataFrame = [] scala> spark.catalog.isCached("tbl") res5: Boolean = true {code} *Spark master:* {code:scala} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT /_/ scala> sql("CREATE TABLE tbl (col int)") res1: org.apache.spark.sql.DataFrame = [] scala> spark.catalog.isCached("tbl") res2: Boolean = false scala> sql("CACHE TABLE tbl") res3: org.apache.spark.sql.DataFrame = [] scala> spark.catalog.isCached("tbl") res4: Boolean = false {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33962) Fix incorrect min partition condition in getRanges
[ https://issues.apache.org/jira/browse/SPARK-33962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-33962: Issue Type: Bug (was: Improvement) > Fix incorrect min partition condition in getRanges > -- > > Key: SPARK-33962 > URL: https://issues.apache.org/jira/browse/SPARK-33962 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > When calculating offset ranges, we consider minPartitions configuration. If > minPartitions is not set or is less than or equal the size of given ranges, > it means there are enough partitions at Kafka so we don't need to split > offsets to satisfy min partition requirement. But the current condition is > offsetRanges.size > minPartitions.get and is not correct. Currently getRanges > will split offsets in unnecessary case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33962) Fix incorrect min partition condition in getRanges
[ https://issues.apache.org/jira/browse/SPARK-33962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33962: Assignee: Apache Spark (was: L. C. Hsieh) > Fix incorrect min partition condition in getRanges > -- > > Key: SPARK-33962 > URL: https://issues.apache.org/jira/browse/SPARK-33962 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Minor > > When calculating offset ranges, we consider minPartitions configuration. If > minPartitions is not set or is less than or equal the size of given ranges, > it means there are enough partitions at Kafka so we don't need to split > offsets to satisfy min partition requirement. But the current condition is > offsetRanges.size > minPartitions.get and is not correct. Currently getRanges > will split offsets in unnecessary case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33962) Fix incorrect min partition condition in getRanges
[ https://issues.apache.org/jira/browse/SPARK-33962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257466#comment-17257466 ] Apache Spark commented on SPARK-33962: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30994 > Fix incorrect min partition condition in getRanges > -- > > Key: SPARK-33962 > URL: https://issues.apache.org/jira/browse/SPARK-33962 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > When calculating offset ranges, we consider minPartitions configuration. If > minPartitions is not set or is less than or equal the size of given ranges, > it means there are enough partitions at Kafka so we don't need to split > offsets to satisfy min partition requirement. But the current condition is > offsetRanges.size > minPartitions.get and is not correct. Currently getRanges > will split offsets in unnecessary case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33962) Fix incorrect min partition condition in getRanges
[ https://issues.apache.org/jira/browse/SPARK-33962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33962: Assignee: L. C. Hsieh (was: Apache Spark) > Fix incorrect min partition condition in getRanges > -- > > Key: SPARK-33962 > URL: https://issues.apache.org/jira/browse/SPARK-33962 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > When calculating offset ranges, we consider minPartitions configuration. If > minPartitions is not set or is less than or equal the size of given ranges, > it means there are enough partitions at Kafka so we don't need to split > offsets to satisfy min partition requirement. But the current condition is > offsetRanges.size > minPartitions.get and is not correct. Currently getRanges > will split offsets in unnecessary case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33962) Fix incorrect min partition condition in getRanges
L. C. Hsieh created SPARK-33962: --- Summary: Fix incorrect min partition condition in getRanges Key: SPARK-33962 URL: https://issues.apache.org/jira/browse/SPARK-33962 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.2.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh When calculating offset ranges, we consider minPartitions configuration. If minPartitions is not set or is less than or equal the size of given ranges, it means there are enough partitions at Kafka so we don't need to split offsets to satisfy min partition requirement. But the current condition is offsetRanges.size > minPartitions.get and is not correct. Currently getRanges will split offsets in unnecessary case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33961: -- Affects Version/s: 3.1.0 > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257459#comment-17257459 ] Apache Spark commented on SPARK-33961: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30993 > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33961: -- Comment: was deleted (was: User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30993) > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33961: Assignee: Apache Spark > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33961: Assignee: (was: Apache Spark) > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257458#comment-17257458 ] Apache Spark commented on SPARK-33961: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30993 > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33961) Upgrade SBT to 1.4.6
Dongjoon Hyun created SPARK-33961: - Summary: Upgrade SBT to 1.4.6 Key: SPARK-33961 URL: https://issues.apache.org/jira/browse/SPARK-33961 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33961) Upgrade SBT to 1.4.6
[ https://issues.apache.org/jira/browse/SPARK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33961: -- Priority: Minor (was: Major) > Upgrade SBT to 1.4.6 > > > Key: SPARK-33961 > URL: https://issues.apache.org/jira/browse/SPARK-33961 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33933) Broadcast timeout happened unexpectedly in AQE
[ https://issues.apache.org/jira/browse/SPARK-33933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257437#comment-17257437 ] Dongjoon Hyun commented on SPARK-33933: --- Thank you for reporting a bug, [~zhongyu09]. I converted this to a subtask of SPARK-33828 . > Broadcast timeout happened unexpectedly in AQE > --- > > Key: SPARK-33933 > URL: https://issues.apache.org/jira/browse/SPARK-33933 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yu Zhong >Priority: Major > > In Spark 3.0, when AQE is enabled, there is often broadcast timeout in normal > queries as below. > > {code:java} > Could not execute broadcast in 300 secs. You can increase the timeout for > broadcasts via spark.sql.broadcastTimeout or disable broadcast join by > setting spark.sql.autoBroadcastJoinThreshold to -1 > {code} > > This is usually happens when broadcast join(with or without hint) after a > long running shuffle (more than 5 minutes). By disable AQE, the issues > disappear. > The workaround is to increase spark.sql.broadcastTimeout and it works. But > because the data to broadcast is very small, that doesn't make sense. > After investigation, the root cause should be like this: when enable AQE, in > getFinalPhysicalPlan, spark traversal the physical plan bottom up and create > query stage for materialized part by createQueryStages and materialize those > new created query stages to submit map stages or broadcasting. When > ShuffleQueryStage are materializing before BroadcastQueryStage, the map job > and broadcast job are submitted almost at the same time, but map job will > hold all the computing resources. If the map job runs slow (when lots of data > needs to process and the resource is limited), the broadcast job cannot be > started(and finished) before spark.sql.broadcastTimeout, thus cause whole job > failed (introduced in SPARK-31475). > Code to reproduce: > > {code:java} > import java.util.UUID > import scala.util.Random > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.SparkSession > val spark = SparkSession.builder() > .master("local[2]") > .appName("Test Broadcast").getOrCreate() > import spark.implicits._ > spark.conf.set("spark.sql.adaptive.enabled", "true") > val sc = spark.sparkContext > sc.setLogLevel("INFO") > val uuid = UUID.randomUUID > val df = sc.parallelize(Range(0, 1), 1).flatMap(x => { > for (i <- Range(0, 1 + Random.nextInt(1))) > yield (x % 26, x, Random.nextInt(10), UUID.randomUUID.toString) > }).toDF("index", "part", "pv", "uuid") > .withColumn("md5", md5($"uuid")) > val dim_data = Range(0, 26).map(x => (('a' + x).toChar.toString, x)) > val dim = dim_data.toDF("name", "index") > val result = df.groupBy("index") > .agg(sum($"pv").alias("pv"), countDistinct("uuid").alias("uv")) > .join(dim, Seq("index")) > .collect(){code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33933) Broadcast timeout happened unexpectedly in AQE
[ https://issues.apache.org/jira/browse/SPARK-33933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33933: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Bug) > Broadcast timeout happened unexpectedly in AQE > --- > > Key: SPARK-33933 > URL: https://issues.apache.org/jira/browse/SPARK-33933 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yu Zhong >Priority: Major > > In Spark 3.0, when AQE is enabled, there is often broadcast timeout in normal > queries as below. > > {code:java} > Could not execute broadcast in 300 secs. You can increase the timeout for > broadcasts via spark.sql.broadcastTimeout or disable broadcast join by > setting spark.sql.autoBroadcastJoinThreshold to -1 > {code} > > This is usually happens when broadcast join(with or without hint) after a > long running shuffle (more than 5 minutes). By disable AQE, the issues > disappear. > The workaround is to increase spark.sql.broadcastTimeout and it works. But > because the data to broadcast is very small, that doesn't make sense. > After investigation, the root cause should be like this: when enable AQE, in > getFinalPhysicalPlan, spark traversal the physical plan bottom up and create > query stage for materialized part by createQueryStages and materialize those > new created query stages to submit map stages or broadcasting. When > ShuffleQueryStage are materializing before BroadcastQueryStage, the map job > and broadcast job are submitted almost at the same time, but map job will > hold all the computing resources. If the map job runs slow (when lots of data > needs to process and the resource is limited), the broadcast job cannot be > started(and finished) before spark.sql.broadcastTimeout, thus cause whole job > failed (introduced in SPARK-31475). > Code to reproduce: > > {code:java} > import java.util.UUID > import scala.util.Random > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.SparkSession > val spark = SparkSession.builder() > .master("local[2]") > .appName("Test Broadcast").getOrCreate() > import spark.implicits._ > spark.conf.set("spark.sql.adaptive.enabled", "true") > val sc = spark.sparkContext > sc.setLogLevel("INFO") > val uuid = UUID.randomUUID > val df = sc.parallelize(Range(0, 1), 1).flatMap(x => { > for (i <- Range(0, 1 + Random.nextInt(1))) > yield (x % 26, x, Random.nextInt(10), UUID.randomUUID.toString) > }).toDF("index", "part", "pv", "uuid") > .withColumn("md5", md5($"uuid")) > val dim_data = Range(0, 26).map(x => (('a' + x).toChar.toString, x)) > val dim = dim_data.toDF("name", "index") > val result = df.groupBy("index") > .agg(sum($"pv").alias("pv"), countDistinct("uuid").alias("uv")) > .join(dim, Seq("index")) > .collect(){code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33956) Add rowCount for Range operator
[ https://issues.apache.org/jira/browse/SPARK-33956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33956: - Assignee: Yuming Wang > Add rowCount for Range operator > --- > > Key: SPARK-33956 > URL: https://issues.apache.org/jira/browse/SPARK-33956 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.sql("select id from range(100)").explain("cost") > {code} > Current: > {noformat} > == Optimized Logical Plan == > Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B, > rowCount=100) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33956) Add rowCount for Range operator
[ https://issues.apache.org/jira/browse/SPARK-33956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33956. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30989 [https://github.com/apache/spark/pull/30989] > Add rowCount for Range operator > --- > > Key: SPARK-33956 > URL: https://issues.apache.org/jira/browse/SPARK-33956 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.sql("select id from range(100)").explain("cost") > {code} > Current: > {noformat} > == Optimized Logical Plan == > Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B, > rowCount=100) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33960) LimitPushDown support Sort
[ https://issues.apache.org/jira/browse/SPARK-33960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33960: Assignee: Apache Spark > LimitPushDown support Sort > -- > > Key: SPARK-33960 > URL: https://issues.apache.org/jira/browse/SPARK-33960 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > LimitPushDown support Sort. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33960) LimitPushDown support Sort
[ https://issues.apache.org/jira/browse/SPARK-33960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33960: Assignee: Apache Spark > LimitPushDown support Sort > -- > > Key: SPARK-33960 > URL: https://issues.apache.org/jira/browse/SPARK-33960 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > LimitPushDown support Sort. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33960) LimitPushDown support Sort
[ https://issues.apache.org/jira/browse/SPARK-33960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257426#comment-17257426 ] Apache Spark commented on SPARK-33960: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30992 > LimitPushDown support Sort > -- > > Key: SPARK-33960 > URL: https://issues.apache.org/jira/browse/SPARK-33960 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > LimitPushDown support Sort. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33960) LimitPushDown support Sort
[ https://issues.apache.org/jira/browse/SPARK-33960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33960: Assignee: (was: Apache Spark) > LimitPushDown support Sort > -- > > Key: SPARK-33960 > URL: https://issues.apache.org/jira/browse/SPARK-33960 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > LimitPushDown support Sort. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33960) LimitPushDown support Sort
Yuming Wang created SPARK-33960: --- Summary: LimitPushDown support Sort Key: SPARK-33960 URL: https://issues.apache.org/jira/browse/SPARK-33960 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang LimitPushDown support Sort. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33888) JDBC SQL TIME type represents incorrectly as TimestampType, it should be physical Int in millis
[ https://issues.apache.org/jira/browse/SPARK-33888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257370#comment-17257370 ] Duc Hoa Nguyen commented on SPARK-33888: Seems like the PR is accepted. Anything else that is needed before this can be merged? > JDBC SQL TIME type represents incorrectly as TimestampType, it should be > physical Int in millis > --- > > Key: SPARK-33888 > URL: https://issues.apache.org/jira/browse/SPARK-33888 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3, 3.0.0, 3.0.1 >Reporter: Duc Hoa Nguyen >Assignee: Apache Spark >Priority: Minor > > Currently, for JDBC, SQL TIME type represents incorrectly as Spark > TimestampType. This should be represent as physical int in millis Represents > a time of day, with no reference to a particular calendar, time zone or date, > with a precision of one millisecond. It stores the number of milliseconds > after midnight, 00:00:00.000. > We encountered the issue of Avro logical type of `TimeMillis` not being > converted correctly to Spark `Timestamp` struct type using the > `SchemaConverters`, but it converts to regular `int` instead. Reproducible by > ingest data from MySQL table with a column of TIME type: Spark JDBC dataframe > will get the correct type (Timestamp), but enforcing our avro schema > (`{"type": "int"," logicalType": "time-millis"}`) externally will fail to > apply with the following exception: > {{java.lang.RuntimeException: java.sql.Timestamp is not a valid external type > for schema of int}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257347#comment-17257347 ] Yuming Wang commented on SPARK-33958: - PostgreSQL also returns -0: {noformat} postgres=# create table test_zjg(a float8); CREATE TABLE postgres=# insert into test_zjg values(-1.0); INSERT 0 1 postgres=# select a*0 from test_zjg; ?column? -- -0 (1 row) {noformat} > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > {code:java} > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > {code} > After select operation, *{color:#de350b}we will get -0.0 which expected as > 0.0:{color}* > \+\+ > \|(a * CAST(0 AS DOUBLE))\| > \+\+ > \|-0.0 \| > \+\+ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33959) Improve the statistics estimation of the Tail
[ https://issues.apache.org/jira/browse/SPARK-33959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257329#comment-17257329 ] Apache Spark commented on SPARK-33959: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30991 > Improve the statistics estimation of the Tail > - > > Key: SPARK-33959 > URL: https://issues.apache.org/jira/browse/SPARK-33959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as > e").write.saveAsTable("t1") > println(Tail(Literal(5), spark.sql("SELECT * FROM > t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) > {code} > Current: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=3.8 KiB) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33959) Improve the statistics estimation of the Tail
[ https://issues.apache.org/jira/browse/SPARK-33959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33959: Assignee: (was: Apache Spark) > Improve the statistics estimation of the Tail > - > > Key: SPARK-33959 > URL: https://issues.apache.org/jira/browse/SPARK-33959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as > e").write.saveAsTable("t1") > println(Tail(Literal(5), spark.sql("SELECT * FROM > t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) > {code} > Current: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=3.8 KiB) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33959) Improve the statistics estimation of the Tail
[ https://issues.apache.org/jira/browse/SPARK-33959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257330#comment-17257330 ] Apache Spark commented on SPARK-33959: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30991 > Improve the statistics estimation of the Tail > - > > Key: SPARK-33959 > URL: https://issues.apache.org/jira/browse/SPARK-33959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as > e").write.saveAsTable("t1") > println(Tail(Literal(5), spark.sql("SELECT * FROM > t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) > {code} > Current: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=3.8 KiB) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33959) Improve the statistics estimation of the Tail
[ https://issues.apache.org/jira/browse/SPARK-33959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33959: Assignee: Apache Spark > Improve the statistics estimation of the Tail > - > > Key: SPARK-33959 > URL: https://issues.apache.org/jira/browse/SPARK-33959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as > e").write.saveAsTable("t1") > println(Tail(Literal(5), spark.sql("SELECT * FROM > t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) > {code} > Current: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=3.8 KiB) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33959) Improve the statistics estimation of the Tail
[ https://issues.apache.org/jira/browse/SPARK-33959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33959: Description: {code:scala} spark.sql("set spark.sql.cbo.enabled=true") spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as e").write.saveAsTable("t1") println(Tail(Literal(5), spark.sql("SELECT * FROM t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) {code} Current: {noformat} == Optimized Logical Plan == Tail 5, Statistics(sizeInBytes=3.8 KiB) +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) {noformat} Expected: {noformat} == Optimized Logical Plan == Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) {noformat} > Improve the statistics estimation of the Tail > - > > Key: SPARK-33959 > URL: https://issues.apache.org/jira/browse/SPARK-33959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > {code:scala} > spark.sql("set spark.sql.cbo.enabled=true") > spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as > e").write.saveAsTable("t1") > println(Tail(Literal(5), spark.sql("SELECT * FROM > t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode)) > {code} > Current: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=3.8 KiB) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} > Expected: > {noformat} > == Optimized Logical Plan == > Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5) > +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33959) Improve the statistics estimation of the Tail
Yuming Wang created SPARK-33959: --- Summary: Improve the statistics estimation of the Tail Key: SPARK-33959 URL: https://issues.apache.org/jira/browse/SPARK-33959 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Jianguo updated SPARK-33958: -- Description: spark version: 2.3.2 {code:java} create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg {code} After select operation, *{color:#de350b}we will get -0.0 which expected as 0.0:{color}* \+\+ \|(a * CAST(0 AS DOUBLE))\| \+\+ \|-0.0 \| \+\+ was: spark version: 2.3.2 {code:java} create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg {code} After select operation, *{color:#de350b}we will get -0.0 which expected as 0.0:{color}* ++ |(a * CAST(0 AS DOUBLE))| ++ |-0.0 | ++ > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > {code:java} > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > {code} > After select operation, *{color:#de350b}we will get -0.0 which expected as > 0.0:{color}* > \+\+ > \|(a * CAST(0 AS DOUBLE))\| > \+\+ > \|-0.0 \| > \+\+ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Jianguo updated SPARK-33958: -- Description: spark version: 2.3.2 {code:java} create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg {code} After select operation, *{color:#de350b}we will get -0.0 which expected as 0.0:{color}* ++ |(a * CAST(0 AS DOUBLE))| ++ |-0.0 | ++ was: spark version: 2.3.2 {code:java} create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg {code} After select operation, we will get -0.0 which expected as 0.0: \+\+ \|(a * CAST(0 AS DOUBLE))\| \+\+ \|-0.0 \| \+\+ > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > {code:java} > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > {code} > After select operation, *{color:#de350b}we will get -0.0 which expected as > 0.0:{color}* > ++ > |(a * CAST(0 AS DOUBLE))| > ++ > |-0.0 | > ++ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Jianguo updated SPARK-33958: -- Description: spark version: 2.3.2 {code:java} create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg {code} After select operation, we will get -0.0 which expected as 0.0: \+\+ \|(a * CAST(0 AS DOUBLE))\| \+\+ \|-0.0 \| \+\+ was: spark version: 2.3.2 ```sql create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg ``` After select operation, we will get -0.0 which expected as 0.0: \+\+ \|(a * CAST(0 AS DOUBLE))\| \+\+ \|-0.0 \| \+\+ > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > {code:java} > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > {code} > After select operation, we will get -0.0 which expected as 0.0: > \+\+ > \|(a * CAST(0 AS DOUBLE))\| > \+\+ > \|-0.0 \| > \+\+ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Jianguo updated SPARK-33958: -- Description: spark version: 2.3.2 ```sql create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg ``` After select operation, we will get -0.0 which expected as 0.0: \+\+ \|(a * CAST(0 AS DOUBLE))\| \+\+ \|-0.0 \| \+\+ was: spark version: 2.3.2 ```sql create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg ``` After select operation, we will get -0.0 which expected as 0.0: +\+--\+ | (a * CAST(0 AS DOUBLE)) | \+--\+ | -0.0 | \+--\++ > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > > ```sql > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > ``` > > After select operation, we will get -0.0 which expected as 0.0: > \+\+ > \|(a * CAST(0 AS DOUBLE))\| > \+\+ > \|-0.0 \| > \+\+ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
[ https://issues.apache.org/jira/browse/SPARK-33958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhang Jianguo updated SPARK-33958: -- Description: spark version: 2.3.2 ```sql create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg ``` After select operation, we will get -0.0 which expected as 0.0: +\+--\+ | (a * CAST(0 AS DOUBLE)) | \+--\+ | -0.0 | \+--\++ was: spark version: 2.3.2 ```sql create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg ``` After select operation, we will get -0.0 which expected as 0.0: +--+ | (a * CAST(0 AS DOUBLE)) | +--+ | -0.0 | +--+ > spark sql DoubleType(0 * (-1)) return "-0.0" > - > > Key: SPARK-33958 > URL: https://issues.apache.org/jira/browse/SPARK-33958 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > spark version: 2.3.2 > > ```sql > create table test_zjg(a double); > insert into test_zjg values(-1.0); > select a*0 from test_zjg > ``` > > After select operation, we will get -0.0 which expected as 0.0: > +\+--\+ > | (a * CAST(0 AS DOUBLE)) | > \+--\+ > | -0.0 | > \+--\++ > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33958) spark sql DoubleType(0 * (-1)) return "-0.0"
Zhang Jianguo created SPARK-33958: - Summary: spark sql DoubleType(0 * (-1)) return "-0.0" Key: SPARK-33958 URL: https://issues.apache.org/jira/browse/SPARK-33958 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0, 2.4.5, 2.3.2 Reporter: Zhang Jianguo spark version: 2.3.2 ```sql create table test_zjg(a double); insert into test_zjg values(-1.0); select a*0 from test_zjg ``` After select operation, we will get -0.0 which expected as 0.0: +--+ | (a * CAST(0 AS DOUBLE)) | +--+ | -0.0 | +--+ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org