[jira] [Assigned] (SPARK-21351) Update nullability based on children's output in optimized logical plan

2019-01-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21351:


Assignee: Takeshi Yamamuro  (was: Apache Spark)

> Update nullability based on children's output in optimized logical plan
> ---
>
> Key: SPARK-21351
> URL: https://issues.apache.org/jira/browse/SPARK-21351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.2
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
>
> In the master, optimized plans do not respect the nullability that `Filter` 
> might change when having `IsNotNull`.
> This generates unnecessary code for NULL checks. For example:
> {code}
> scala> val df = Seq((Some(1), Some(2))).toDF("a", "b")
> scala> val bIsNotNull = df.where($"b" =!= 2).select($"b")
> scala> val targetQuery = bIsNotNull.distinct
> scala> val targetQuery.queryExecution.optimizedPlan.output(0).nullable
> res5: Boolean = true
> scala> targetQuery.debugCodegen
> Found 2 WholeStageCodegen subtrees.
> == Subtree 1 / 2 ==
> *HashAggregate(keys=[b#19], functions=[], output=[b#19])
> +- Exchange hashpartitioning(b#19, 200)
>+- *HashAggregate(keys=[b#19], functions=[], output=[b#19])
>   +- *Project [_2#16 AS b#19]
>  +- *Filter isnotnull(_2#16)
> +- LocalTableScan [_1#15, _2#16]
> Generated code:
> ...
> /* 124 */   protected void processNext() throws java.io.IOException {
> ...
> /* 132 */ // output the result
> /* 133 */
> /* 134 */ while (agg_mapIter.next()) {
> /* 135 */   wholestagecodegen_numOutputRows.add(1);
> /* 136 */   UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey();
> /* 137 */   UnsafeRow agg_aggBuffer = (UnsafeRow) agg_mapIter.getValue();
> /* 138 */
> /* 139 */   boolean agg_isNull4 = agg_aggKey.isNullAt(0);
> /* 140 */   int agg_value4 = agg_isNull4 ? -1 : (agg_aggKey.getInt(0));
> /* 141 */   agg_rowWriter1.zeroOutNullBytes();
> /* 142 */
> // We don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`
> /* 143 */   if (agg_isNull4) {
> /* 144 */ agg_rowWriter1.setNullAt(0);
> /* 145 */   } else {
> /* 146 */ agg_rowWriter1.write(0, agg_value4);
> /* 147 */   }
> /* 148 */   append(agg_result1);
> /* 149 */
> /* 150 */   if (shouldStop()) return;
> /* 151 */ }
> /* 152 */
> /* 153 */ agg_mapIter.close();
> /* 154 */ if (agg_sorter == null) {
> /* 155 */   agg_hashMap.free();
> /* 156 */ }
> /* 157 */   }
> /* 158 */
> /* 159 */ }
> {code}
> In the line 143, we don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21351) Update nullability based on children's output in optimized logical plan

2019-01-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21351:


Assignee: Apache Spark  (was: Takeshi Yamamuro)

> Update nullability based on children's output in optimized logical plan
> ---
>
> Key: SPARK-21351
> URL: https://issues.apache.org/jira/browse/SPARK-21351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.2
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Minor
>
> In the master, optimized plans do not respect the nullability that `Filter` 
> might change when having `IsNotNull`.
> This generates unnecessary code for NULL checks. For example:
> {code}
> scala> val df = Seq((Some(1), Some(2))).toDF("a", "b")
> scala> val bIsNotNull = df.where($"b" =!= 2).select($"b")
> scala> val targetQuery = bIsNotNull.distinct
> scala> val targetQuery.queryExecution.optimizedPlan.output(0).nullable
> res5: Boolean = true
> scala> targetQuery.debugCodegen
> Found 2 WholeStageCodegen subtrees.
> == Subtree 1 / 2 ==
> *HashAggregate(keys=[b#19], functions=[], output=[b#19])
> +- Exchange hashpartitioning(b#19, 200)
>+- *HashAggregate(keys=[b#19], functions=[], output=[b#19])
>   +- *Project [_2#16 AS b#19]
>  +- *Filter isnotnull(_2#16)
> +- LocalTableScan [_1#15, _2#16]
> Generated code:
> ...
> /* 124 */   protected void processNext() throws java.io.IOException {
> ...
> /* 132 */ // output the result
> /* 133 */
> /* 134 */ while (agg_mapIter.next()) {
> /* 135 */   wholestagecodegen_numOutputRows.add(1);
> /* 136 */   UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey();
> /* 137 */   UnsafeRow agg_aggBuffer = (UnsafeRow) agg_mapIter.getValue();
> /* 138 */
> /* 139 */   boolean agg_isNull4 = agg_aggKey.isNullAt(0);
> /* 140 */   int agg_value4 = agg_isNull4 ? -1 : (agg_aggKey.getInt(0));
> /* 141 */   agg_rowWriter1.zeroOutNullBytes();
> /* 142 */
> // We don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`
> /* 143 */   if (agg_isNull4) {
> /* 144 */ agg_rowWriter1.setNullAt(0);
> /* 145 */   } else {
> /* 146 */ agg_rowWriter1.write(0, agg_value4);
> /* 147 */   }
> /* 148 */   append(agg_result1);
> /* 149 */
> /* 150 */   if (shouldStop()) return;
> /* 151 */ }
> /* 152 */
> /* 153 */ agg_mapIter.close();
> /* 154 */ if (agg_sorter == null) {
> /* 155 */   agg_hashMap.free();
> /* 156 */ }
> /* 157 */   }
> /* 158 */
> /* 159 */ }
> {code}
> In the line 143, we don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21351) Update nullability based on children's output in optimized logical plan

2018-04-04 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-21351:
---

Assignee: Takeshi Yamamuro

> Update nullability based on children's output in optimized logical plan
> ---
>
> Key: SPARK-21351
> URL: https://issues.apache.org/jira/browse/SPARK-21351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 2.4.0
>
>
> In the master, optimized plans do not respect the nullability that `Filter` 
> might change when having `IsNotNull`.
> This generates unnecessary code for NULL checks. For example:
> {code}
> scala> val df = Seq((Some(1), Some(2))).toDF("a", "b")
> scala> val bIsNotNull = df.where($"b" =!= 2).select($"b")
> scala> val targetQuery = bIsNotNull.distinct
> scala> val targetQuery.queryExecution.optimizedPlan.output(0).nullable
> res5: Boolean = true
> scala> targetQuery.debugCodegen
> Found 2 WholeStageCodegen subtrees.
> == Subtree 1 / 2 ==
> *HashAggregate(keys=[b#19], functions=[], output=[b#19])
> +- Exchange hashpartitioning(b#19, 200)
>+- *HashAggregate(keys=[b#19], functions=[], output=[b#19])
>   +- *Project [_2#16 AS b#19]
>  +- *Filter isnotnull(_2#16)
> +- LocalTableScan [_1#15, _2#16]
> Generated code:
> ...
> /* 124 */   protected void processNext() throws java.io.IOException {
> ...
> /* 132 */ // output the result
> /* 133 */
> /* 134 */ while (agg_mapIter.next()) {
> /* 135 */   wholestagecodegen_numOutputRows.add(1);
> /* 136 */   UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey();
> /* 137 */   UnsafeRow agg_aggBuffer = (UnsafeRow) agg_mapIter.getValue();
> /* 138 */
> /* 139 */   boolean agg_isNull4 = agg_aggKey.isNullAt(0);
> /* 140 */   int agg_value4 = agg_isNull4 ? -1 : (agg_aggKey.getInt(0));
> /* 141 */   agg_rowWriter1.zeroOutNullBytes();
> /* 142 */
> // We don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`
> /* 143 */   if (agg_isNull4) {
> /* 144 */ agg_rowWriter1.setNullAt(0);
> /* 145 */   } else {
> /* 146 */ agg_rowWriter1.write(0, agg_value4);
> /* 147 */   }
> /* 148 */   append(agg_result1);
> /* 149 */
> /* 150 */   if (shouldStop()) return;
> /* 151 */ }
> /* 152 */
> /* 153 */ agg_mapIter.close();
> /* 154 */ if (agg_sorter == null) {
> /* 155 */   agg_hashMap.free();
> /* 156 */ }
> /* 157 */   }
> /* 158 */
> /* 159 */ }
> {code}
> In the line 143, we don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21351) Update nullability based on children's output in optimized logical plan

2017-07-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21351:


Assignee: (was: Apache Spark)

> Update nullability based on children's output in optimized logical plan
> ---
>
> Key: SPARK-21351
> URL: https://issues.apache.org/jira/browse/SPARK-21351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> In the master, optimized plans do not respect the nullability that `Filter` 
> might change when having `IsNotNull`.
> This generates unnecessary code for NULL checks. For example:
> {code}
> scala> val df = Seq((Some(1), Some(2))).toDF("a", "b")
> scala> val bIsNotNull = df.where($"b" =!= 2).select($"b")
> scala> val targetQuery = bIsNotNull.distinct
> scala> val targetQuery.queryExecution.optimizedPlan.output(0).nullable
> res5: Boolean = true
> scala> targetQuery.debugCodegen
> Found 2 WholeStageCodegen subtrees.
> == Subtree 1 / 2 ==
> *HashAggregate(keys=[b#19], functions=[], output=[b#19])
> +- Exchange hashpartitioning(b#19, 200)
>+- *HashAggregate(keys=[b#19], functions=[], output=[b#19])
>   +- *Project [_2#16 AS b#19]
>  +- *Filter isnotnull(_2#16)
> +- LocalTableScan [_1#15, _2#16]
> Generated code:
> ...
> /* 124 */   protected void processNext() throws java.io.IOException {
> ...
> /* 132 */ // output the result
> /* 133 */
> /* 134 */ while (agg_mapIter.next()) {
> /* 135 */   wholestagecodegen_numOutputRows.add(1);
> /* 136 */   UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey();
> /* 137 */   UnsafeRow agg_aggBuffer = (UnsafeRow) agg_mapIter.getValue();
> /* 138 */
> /* 139 */   boolean agg_isNull4 = agg_aggKey.isNullAt(0);
> /* 140 */   int agg_value4 = agg_isNull4 ? -1 : (agg_aggKey.getInt(0));
> /* 141 */   agg_rowWriter1.zeroOutNullBytes();
> /* 142 */
> // We don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`
> /* 143 */   if (agg_isNull4) {
> /* 144 */ agg_rowWriter1.setNullAt(0);
> /* 145 */   } else {
> /* 146 */ agg_rowWriter1.write(0, agg_value4);
> /* 147 */   }
> /* 148 */   append(agg_result1);
> /* 149 */
> /* 150 */   if (shouldStop()) return;
> /* 151 */ }
> /* 152 */
> /* 153 */ agg_mapIter.close();
> /* 154 */ if (agg_sorter == null) {
> /* 155 */   agg_hashMap.free();
> /* 156 */ }
> /* 157 */   }
> /* 158 */
> /* 159 */ }
> {code}
> In the line 143, we don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21351) Update nullability based on children's output in optimized logical plan

2017-07-09 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21351:


Assignee: Apache Spark

> Update nullability based on children's output in optimized logical plan
> ---
>
> Key: SPARK-21351
> URL: https://issues.apache.org/jira/browse/SPARK-21351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Minor
>
> In the master, optimized plans do not respect the nullability that `Filter` 
> might change when having `IsNotNull`.
> This generates unnecessary code for NULL checks. For example:
> {code}
> scala> val df = Seq((Some(1), Some(2))).toDF("a", "b")
> scala> val bIsNotNull = df.where($"b" =!= 2).select($"b")
> scala> val targetQuery = bIsNotNull.distinct
> scala> val targetQuery.queryExecution.optimizedPlan.output(0).nullable
> res5: Boolean = true
> scala> targetQuery.debugCodegen
> Found 2 WholeStageCodegen subtrees.
> == Subtree 1 / 2 ==
> *HashAggregate(keys=[b#19], functions=[], output=[b#19])
> +- Exchange hashpartitioning(b#19, 200)
>+- *HashAggregate(keys=[b#19], functions=[], output=[b#19])
>   +- *Project [_2#16 AS b#19]
>  +- *Filter isnotnull(_2#16)
> +- LocalTableScan [_1#15, _2#16]
> Generated code:
> ...
> /* 124 */   protected void processNext() throws java.io.IOException {
> ...
> /* 132 */ // output the result
> /* 133 */
> /* 134 */ while (agg_mapIter.next()) {
> /* 135 */   wholestagecodegen_numOutputRows.add(1);
> /* 136 */   UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey();
> /* 137 */   UnsafeRow agg_aggBuffer = (UnsafeRow) agg_mapIter.getValue();
> /* 138 */
> /* 139 */   boolean agg_isNull4 = agg_aggKey.isNullAt(0);
> /* 140 */   int agg_value4 = agg_isNull4 ? -1 : (agg_aggKey.getInt(0));
> /* 141 */   agg_rowWriter1.zeroOutNullBytes();
> /* 142 */
> // We don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`
> /* 143 */   if (agg_isNull4) {
> /* 144 */ agg_rowWriter1.setNullAt(0);
> /* 145 */   } else {
> /* 146 */ agg_rowWriter1.write(0, agg_value4);
> /* 147 */   }
> /* 148 */   append(agg_result1);
> /* 149 */
> /* 150 */   if (shouldStop()) return;
> /* 151 */ }
> /* 152 */
> /* 153 */ agg_mapIter.close();
> /* 154 */ if (agg_sorter == null) {
> /* 155 */   agg_hashMap.free();
> /* 156 */ }
> /* 157 */   }
> /* 158 */
> /* 159 */ }
> {code}
> In the line 143, we don't need this NULL check because NULL is filtered out 
> in `$"b" =!=2`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org