[spark] branch branch-2.4 updated: [SPARK-32635][SQL][2.4] Fix foldable propagation

2020-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 62708db  [SPARK-32635][SQL][2.4] Fix foldable propagation
62708db is described below

commit 62708db4f90f652cd9bc73998ac5f1e949bd41ac
Author: Peter Toth 
AuthorDate: Fri Sep 18 10:28:30 2020 -0700

[SPARK-32635][SQL][2.4] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29805 from peter-toth/SPARK-32635-fix-foldable-propagation-2.4.

Authored-by: Peter Toth 
Signed-off-by: Do

[spark] branch branch-2.4 updated: [SPARK-32635][SQL][2.4] Fix foldable propagation

2020-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 62708db  [SPARK-32635][SQL][2.4] Fix foldable propagation
62708db is described below

commit 62708db4f90f652cd9bc73998ac5f1e949bd41ac
Author: Peter Toth 
AuthorDate: Fri Sep 18 10:28:30 2020 -0700

[SPARK-32635][SQL][2.4] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29805 from peter-toth/SPARK-32635-fix-foldable-propagation-2.4.

Authored-by: Peter Toth 
Signed-off-by: Do

[spark] branch branch-2.4 updated: [SPARK-32635][SQL][2.4] Fix foldable propagation

2020-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 62708db  [SPARK-32635][SQL][2.4] Fix foldable propagation
62708db is described below

commit 62708db4f90f652cd9bc73998ac5f1e949bd41ac
Author: Peter Toth 
AuthorDate: Fri Sep 18 10:28:30 2020 -0700

[SPARK-32635][SQL][2.4] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29805 from peter-toth/SPARK-32635-fix-foldable-propagation-2.4.

Authored-by: Peter Toth 
Signed-off-by: Do

[spark] branch branch-2.4 updated: [SPARK-32635][SQL][2.4] Fix foldable propagation

2020-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 62708db  [SPARK-32635][SQL][2.4] Fix foldable propagation
62708db is described below

commit 62708db4f90f652cd9bc73998ac5f1e949bd41ac
Author: Peter Toth 
AuthorDate: Fri Sep 18 10:28:30 2020 -0700

[SPARK-32635][SQL][2.4] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29805 from peter-toth/SPARK-32635-fix-foldable-propagation-2.4.

Authored-by: Peter Toth 
Signed-off-by: Do

[spark] branch branch-2.4 updated: [SPARK-32635][SQL][2.4] Fix foldable propagation

2020-09-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 62708db  [SPARK-32635][SQL][2.4] Fix foldable propagation
62708db is described below

commit 62708db4f90f652cd9bc73998ac5f1e949bd41ac
Author: Peter Toth 
AuthorDate: Fri Sep 18 10:28:30 2020 -0700

[SPARK-32635][SQL][2.4] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29805 from peter-toth/SPARK-32635-fix-foldable-propagation-2.4.

Authored-by: Peter Toth 
Signed-off-by: Do