[jira] [Updated] (SPARK-33351) WithColumn should add a column with specific position
[ https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-33351: -- Description: In `DataSet`, WithColumn usually add a new col at the end of the DF. But sometime users want to add new col at the specific position. > WithColumn should add a column with specific position > - > > Key: SPARK-33351 > URL: https://issues.apache.org/jira/browse/SPARK-33351 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: karl wang >Priority: Major > > In `DataSet`, WithColumn usually add a new col at the end of the DF. > But sometime users want to add new col at the specific position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33351) WithColumn should add a column with specific position
karl wang created SPARK-33351: - Summary: WithColumn should add a column with specific position Key: SPARK-33351 URL: https://issues.apache.org/jira/browse/SPARK-33351 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.0 Reporter: karl wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang resolved SPARK-32542. --- Resolution: Fixed > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang reopened SPARK-32542: --- > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang resolved SPARK-32542. --- Resolution: Fixed > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179578#comment-17179578 ] karl wang commented on SPARK-32542: --- [~maropu] hi, could you please see this pr if there are any comments? Thank you > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178477#comment-17178477 ] karl wang commented on SPARK-32542: --- ok > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Target Version/s: 3.0.0 > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > Fix For: 3.0.0 > > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Shepherd: karl wang > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > Fix For: 3.0.0 > > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Shepherd: (was: karl wang) > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > Fix For: 3.0.0 > > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Description: Split an expand into several small Expand, which contains the Specified number of projections. For instance, like this sql.select a, b, c, d, count(1) from table1 group by a, b, c, d with cube. It can expand 2^4 times of original data size. Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve performance in multidimensional analysis when data is huge. was: Split an expand into several small Expand, which contains the Specified number of projections. For instance, like this sql.select a, b, c, d, count(1) from table1 group by a, b, c, d with cube. It can expand 2^4 times of original data size. Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve performance in multidimensional analysis when data is huge. runBenchmark("cube multianalysis agg") { val N = 20 << 20 val benchmark = new Benchmark("cube multianalysis agg", N, output = output) def f(): Unit = { val df = spark.range(N).cache() df.selectExpr( "id", "(id & 1023) as k1", "cast(id & 1023 as string) as k2", "cast(id & 1023 as int) as k3", "cast(id & 1023 as double) as k4", "cast(id & 1023 as float) as k5") .cube("k1", "k2", "k3", "k4", "k5") .sum() .noop() df.unpersist() } benchmark.addCase("grouping = F") { _ => withSQLConf( SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.GROUPING_WITH_UNION.key -> "false") { f() } } benchmark.addCase("grouping = T projectionSize= 16") { _ => withSQLConf( SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.GROUPING_WITH_UNION.key -> "true", SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "16") { f() } } benchmark.addCase("grouping = T projectionSize= 8") { _ => withSQLConf( SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.GROUPING_WITH_UNION.key -> "true", SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "8") { f() } } benchmark.run() } Running benchmark: cube multianalysis agg : cube 5 fields k1, k2, k3, k4, k5 Running case: GROUPING_WITH_UNION off Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 8 Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15 Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz cube multianalysis agg: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative grouping = F 54329 54931 852 0.42590.6 1.0X grouping = T projectionSize= 16 44584 44781 278 0.52125.9 1.2X grouping = T projectionSize= 842764 43272 718 0.52039.1 1.3X Running benchmark: cube multianalysis agg : cube 6 fields k1, k2, k3, k4, k5, k6 Running case: GROUPING_WITH_UNION off Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15 Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz cube multianalysis agg: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative grouping = F 141607 143424 2569 0.16752.4 1.0X grouping = T projectionSize= 32 109465 109603 196 0.25219.7 1.3X grouping = T projectionSize= 16 99752 100411 933 0.24756.5 1.4X Running benchmark: cube multianalysis agg : cube 7 fields k1, k2, k3, k4, k5, k6, k7 Running case: GROUPING_WITH_UNION off Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 64 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15 Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz cube multianalysis agg:
[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Summary: Add an optimizer rule to split an Expand into multiple Expands for aggregates (was: add a batch for optimizing logicalPlan) > Add an optimizer rule to split an Expand into multiple Expands for aggregates > - > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > Fix For: 3.0.0 > > > Split an expand into several small Expand, which contains the Specified > number of projections. > For instance, like this sql.select a, b, c, d, count(1) from table1 group by > a, b, c, d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Description: Split an expand into several small Expand, which contains the Specified number of projections. For instance, like this sql.select a, b, c, d, count(1) from table1 group by a, b, c, d with cube. It can expand 2^4 times of original data size. Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve performance in multidimensional analysis when data is huge. runBenchmark("cube multianalysis agg") { val N = 20 << 20 val benchmark = new Benchmark("cube multianalysis agg", N, output = output) def f(): Unit = { val df = spark.range(N).cache() df.selectExpr( "id", "(id & 1023) as k1", "cast(id & 1023 as string) as k2", "cast(id & 1023 as int) as k3", "cast(id & 1023 as double) as k4", "cast(id & 1023 as float) as k5") .cube("k1", "k2", "k3", "k4", "k5") .sum() .noop() df.unpersist() } benchmark.addCase("grouping = F") { _ => withSQLConf( SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.GROUPING_WITH_UNION.key -> "false") { f() } } benchmark.addCase("grouping = T projectionSize= 16") { _ => withSQLConf( SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.GROUPING_WITH_UNION.key -> "true", SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "16") { f() } } benchmark.addCase("grouping = T projectionSize= 8") { _ => withSQLConf( SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.GROUPING_WITH_UNION.key -> "true", SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "8") { f() } } benchmark.run() } Running benchmark: cube multianalysis agg : cube 5 fields k1, k2, k3, k4, k5 Running case: GROUPING_WITH_UNION off Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 8 Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15 Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz cube multianalysis agg: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative grouping = F 54329 54931 852 0.42590.6 1.0X grouping = T projectionSize= 16 44584 44781 278 0.52125.9 1.2X grouping = T projectionSize= 842764 43272 718 0.52039.1 1.3X Running benchmark: cube multianalysis agg : cube 6 fields k1, k2, k3, k4, k5, k6 Running case: GROUPING_WITH_UNION off Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15 Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz cube multianalysis agg: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative grouping = F 141607 143424 2569 0.16752.4 1.0X grouping = T projectionSize= 32 109465 109603 196 0.25219.7 1.3X grouping = T projectionSize= 16 99752 100411 933 0.24756.5 1.4X Running benchmark: cube multianalysis agg : cube 7 fields k1, k2, k3, k4, k5, k6, k7 Running case: GROUPING_WITH_UNION off Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 64 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32 Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15 Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz cube multianalysis agg: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative grouping = F 516941 519658 NaN 0.0 24649.7 1.0X grouping = T projectionSize= 64 267170 267547 533 0.1 12739.6 1.9X grouping = T project
[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Priority: Major (was: Minor) > add a batch for optimizing logicalPlan > -- > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Major > Fix For: 3.0.0 > > > Split an expand into several smallExpand,which contains the Specified number > of projections. > For instance,like this sql.select a,b,c,d,count(1) from table1 group by > a,b,c,d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan
[ https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-32542: -- Description: Split an expand into several smallExpand,which contains the Specified number of projections. For instance,like this sql.select a,b,c,d,count(1) from table1 group by a,b,c,d with cube. It can expand 2^4 times of original data size. Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve performance in multidimensional analysis when data is huge. > add a batch for optimizing logicalPlan > -- > > Key: SPARK-32542 > URL: https://issues.apache.org/jira/browse/SPARK-32542 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.0.0 >Reporter: karl wang >Priority: Minor > Fix For: 3.0.0 > > > Split an expand into several smallExpand,which contains the Specified number > of projections. > For instance,like this sql.select a,b,c,d,count(1) from table1 group by > a,b,c,d with cube. It can expand 2^4 times of original data size. > Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be > split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve > performance in multidimensional analysis when data is huge. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32542) add a batch for optimizing logicalPlan
karl wang created SPARK-32542: - Summary: add a batch for optimizing logicalPlan Key: SPARK-32542 URL: https://issues.apache.org/jira/browse/SPARK-32542 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.0.0 Reporter: karl wang Fix For: 3.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org