[jira] [Updated] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-33351:
--
Description: 
In `DataSet`, WithColumn usually add a new col at the end of the DF.

But  sometime users want to add new col at the specific position.

> WithColumn should add a column with specific position
> -
>
> Key: SPARK-33351
> URL: https://issues.apache.org/jira/browse/SPARK-33351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: karl wang
>Priority: Major
>
> In `DataSet`, WithColumn usually add a new col at the end of the DF.
> But  sometime users want to add new col at the specific position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread karl wang (Jira)
karl wang created SPARK-33351:
-

 Summary: WithColumn should add a column with specific position
 Key: SPARK-33351
 URL: https://issues.apache.org/jira/browse/SPARK-33351
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: karl wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-09-12 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang resolved SPARK-32542.
---
Resolution: Fixed

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-31 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang reopened SPARK-32542:
---

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-31 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang resolved SPARK-32542.
---
Resolution: Fixed

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-18 Thread karl wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179578#comment-17179578
 ] 

karl wang commented on SPARK-32542:
---

[~maropu]  hi, could you  please see this pr if there are any comments?  Thank 
you 

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-16 Thread karl wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178477#comment-17178477
 ] 

karl wang commented on SPARK-32542:
---

ok

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-15 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Target Version/s: 3.0.0

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-15 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Shepherd: karl wang

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-15 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Shepherd:   (was: karl wang)

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan

2020-08-14 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Description: 
Split an expand into several small Expand, which contains the Specified number 
of projections.
For instance, like this sql.select a, b, c, d, count(1) from table1 group by a, 
b, c, d with cube. It can expand 2^4 times of original data size.
Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
performance in multidimensional analysis when data is huge.


  was:
Split an expand into several small Expand, which contains the Specified number 
of projections.
For instance, like this sql.select a, b, c, d, count(1) from table1 group by a, 
b, c, d with cube. It can expand 2^4 times of original data size.
Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
performance in multidimensional analysis when data is huge.

runBenchmark("cube multianalysis agg") {
  val N = 20 << 20

  val benchmark = new Benchmark("cube multianalysis agg", N, output = 
output)

  def f(): Unit = {
val df = spark.range(N).cache()
df.selectExpr(
  "id",
  "(id & 1023) as k1",
  "cast(id & 1023 as string) as k2",
  "cast(id & 1023 as int) as k3",
  "cast(id & 1023 as double) as k4",
  "cast(id & 1023 as float) as k5")
  .cube("k1", "k2", "k3", "k4", "k5")
  .sum()
  .noop()
df.unpersist()

  }

  benchmark.addCase("grouping = F") { _ =>
withSQLConf(
  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.GROUPING_WITH_UNION.key -> "false") {
  f()
}
  }

  benchmark.addCase("grouping = T projectionSize= 16") { _ =>
withSQLConf(
  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.GROUPING_WITH_UNION.key -> "true",
  SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "16") {
  f()
}
  }

  benchmark.addCase("grouping = T projectionSize= 8") { _ =>
withSQLConf(
  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.GROUPING_WITH_UNION.key -> "true",
  SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "8") {
  f()
}
  }
  benchmark.run()
}
Running benchmark: cube multianalysis agg :
cube 5 fields k1, k2, k3, k4, k5
Running case: GROUPING_WITH_UNION off 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 8

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15
Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
cube multianalysis agg:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

grouping = F  54329  54931 
852  0.42590.6   1.0X
grouping = T projectionSize= 16   44584  44781 
278  0.52125.9   1.2X
grouping = T projectionSize= 842764  43272 
718  0.52039.1   1.3X
Running benchmark: cube multianalysis agg :
cube 6 fields k1, k2, k3, k4, k5, k6
Running case: GROUPING_WITH_UNION off 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15
Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
cube multianalysis agg:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

grouping = F 141607 143424
2569  0.16752.4   1.0X
grouping = T projectionSize= 32  109465 109603 
196  0.25219.7   1.3X
grouping = T projectionSize= 16   99752 100411 
933  0.24756.5   1.4X
Running benchmark: cube multianalysis agg :
cube 7 fields k1, k2, k3, k4, k5, k6, k7
Running case: GROUPING_WITH_UNION off 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 64
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15
Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
cube multianalysis agg: 

[jira] [Updated] (SPARK-32542) Add an optimizer rule to split an Expand into multiple Expands for aggregates

2020-08-14 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Summary: Add an optimizer rule to split an Expand into multiple Expands for 
aggregates  (was: add a batch for optimizing logicalPlan)

> Add an optimizer rule to split an Expand into multiple Expands for aggregates
> -
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Split an expand into several small Expand, which contains the Specified 
> number of projections.
> For instance, like this sql.select a, b, c, d, count(1) from table1 group by 
> a, b, c, d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan

2020-08-14 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Description: 
Split an expand into several small Expand, which contains the Specified number 
of projections.
For instance, like this sql.select a, b, c, d, count(1) from table1 group by a, 
b, c, d with cube. It can expand 2^4 times of original data size.
Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
performance in multidimensional analysis when data is huge.

runBenchmark("cube multianalysis agg") {
  val N = 20 << 20

  val benchmark = new Benchmark("cube multianalysis agg", N, output = 
output)

  def f(): Unit = {
val df = spark.range(N).cache()
df.selectExpr(
  "id",
  "(id & 1023) as k1",
  "cast(id & 1023 as string) as k2",
  "cast(id & 1023 as int) as k3",
  "cast(id & 1023 as double) as k4",
  "cast(id & 1023 as float) as k5")
  .cube("k1", "k2", "k3", "k4", "k5")
  .sum()
  .noop()
df.unpersist()

  }

  benchmark.addCase("grouping = F") { _ =>
withSQLConf(
  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.GROUPING_WITH_UNION.key -> "false") {
  f()
}
  }

  benchmark.addCase("grouping = T projectionSize= 16") { _ =>
withSQLConf(
  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.GROUPING_WITH_UNION.key -> "true",
  SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "16") {
  f()
}
  }

  benchmark.addCase("grouping = T projectionSize= 8") { _ =>
withSQLConf(
  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.GROUPING_WITH_UNION.key -> "true",
  SQLConf.GROUPING_EXPAND_PROJECTIONS.key -> "8") {
  f()
}
  }
  benchmark.run()
}
Running benchmark: cube multianalysis agg :
cube 5 fields k1, k2, k3, k4, k5
Running case: GROUPING_WITH_UNION off 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 8

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15
Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
cube multianalysis agg:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

grouping = F  54329  54931 
852  0.42590.6   1.0X
grouping = T projectionSize= 16   44584  44781 
278  0.52125.9   1.2X
grouping = T projectionSize= 842764  43272 
718  0.52039.1   1.3X
Running benchmark: cube multianalysis agg :
cube 6 fields k1, k2, k3, k4, k5, k6
Running case: GROUPING_WITH_UNION off 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15
Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
cube multianalysis agg:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

grouping = F 141607 143424
2569  0.16752.4   1.0X
grouping = T projectionSize= 32  109465 109603 
196  0.25219.7   1.3X
grouping = T projectionSize= 16   99752 100411 
933  0.24756.5   1.4X
Running benchmark: cube multianalysis agg :
cube 7 fields k1, k2, k3, k4, k5, k6, k7
Running case: GROUPING_WITH_UNION off 
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 64
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 32
Running case: GROUPING_WITH_UNION on and GROUPING_EXPAND_PROJECTIONS = 16

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15
Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz
cube multianalysis agg:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative

grouping = F 516941 519658 
NaN  0.0   24649.7   1.0X
grouping = T projectionSize= 64  267170 267547 
533  0.1   12739.6   1.9X
grouping = T project

[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan

2020-08-14 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Priority: Major  (was: Minor)

> add a batch for optimizing logicalPlan
> --
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Split an expand into several smallExpand,which contains the Specified number 
> of projections.
> For instance,like this sql.select a,b,c,d,count(1) from table1 group by 
> a,b,c,d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32542) add a batch for optimizing logicalPlan

2020-08-05 Thread karl wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-32542:
--
Description: 
Split an expand into several smallExpand,which contains the Specified number of 
projections.
For instance,like this sql.select a,b,c,d,count(1) from table1 group by a,b,c,d 
with cube. It can expand 2^4 times of original data size.
Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
performance in multidimensional analysis when data is huge.

> add a batch for optimizing logicalPlan
> --
>
> Key: SPARK-32542
> URL: https://issues.apache.org/jira/browse/SPARK-32542
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.0.0
>Reporter: karl wang
>Priority: Minor
> Fix For: 3.0.0
>
>
> Split an expand into several smallExpand,which contains the Specified number 
> of projections.
> For instance,like this sql.select a,b,c,d,count(1) from table1 group by 
> a,b,c,d with cube. It can expand 2^4 times of original data size.
> Now we specify the spark.sql.optimizer.projections.size=4.The Expand will be 
> split into 2^4/4 smallExpand.It can reduce the shuffle pressure and improve 
> performance in multidimensional analysis when data is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32542) add a batch for optimizing logicalPlan

2020-08-05 Thread karl wang (Jira)
karl wang created SPARK-32542:
-

 Summary: add a batch for optimizing logicalPlan
 Key: SPARK-32542
 URL: https://issues.apache.org/jira/browse/SPARK-32542
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 3.0.0
Reporter: karl wang
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org