[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi 

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi 

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi 

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi 

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi