[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-09-06 Thread ajithme
Github user ajithme commented on the issue:

https://github.com/apache/spark/pull/22277
  
Attaching a sql file to reproduce the issue and see the effect of PR : 
[test.txt](https://github.com/apache/spark/files/2356468/test.txt)




### Without patch: 
```
spark-2.3.1-bin-hadoop2.7/bin # ./spark-sql -f test.txt
Time taken: 3.405 seconds
Time taken: 0.373 seconds
Time taken: 0.202 seconds
Time taken: 0.024 seconds
18/09/06 11:29:49 WARN HiveMetaStore: Location: 
file:/user/hive/warehouse/table11 specified for non-external table:table11
Time taken: 0.541 seconds
18/09/06 11:29:49 WARN HiveMetaStore: Location: 
file:/user/hive/warehouse/table22 specified for non-external table:table22
Time taken: 0.115 seconds
18/09/06 11:29:50 WARN HiveMetaStore: Location: 
file:/user/hive/warehouse/table33 specified for non-external table:table33
Time taken: 6.075 seconds
18/09/06 11:31:38 ERROR SparkSQLDriver: Failed in [
create table table44 as
select a.*
from
(
select
(concat(
case when a1 is null then '' else cast(a1 as string) end,'|~|',
case when a2 is null then '' else cast(a2 as string) end,'|~|',
case when a3 is null then '' else cast(a3 as string) end,'|~|',
case when a4 is null then '' else cast(a4 as string) end,'|~|',
case when a5 is null then '' else cast(a5 as string) end,'|~|',
case when a6 is null then '' else cast(a6 as string) end,'|~|',
case when a7 is null then '' else cast(a7 as string) end,'|~|',
case when a8 is null then '' else cast(a8 as string) end,'|~|',
case when a9 is null then '' else cast(a9 as string) end,'|~|',
case when a10 is null then '' else cast(a10 as string) end,'|~|',
case when a11 is null then '' else cast(a11 as string) end,'|~|',
case when a12 is null then '' else cast(a12 as string) end,'|~|',
case when a13 is null then '' else cast(a13 as string) end,'|~|',
case when a14 is null then '' else cast(a14 as string) end,'|~|',
case when a15 is null then '' else cast(a15 as string) end,'|~|',
case when a16 is null then '' else cast(a16 as string) end,'|~|',
case when a17 is null then '' else cast(a17 as string) end,'|~|',
case when a18 is null then '' else cast(a18 as string) end,'|~|',
case when a19 is null then '' else cast(a19 as string) end
)) as KEY_ID ,
case when a1 is null then '' else cast(a1 as string) end as a1,
case when a2 is null then '' else cast(a2 as string) end as a2,
case when a3 is null then '' else cast(a3 as string) end as a3,
case when a4 is null then '' else cast(a4 as string) end as a4,
case when a5 is null then '' else cast(a5 as string) end as a5,
case when a6 is null then '' else cast(a6 as string) end as a6,
case when a7 is null then '' else cast(a7 as string) end as a7,
case when a8 is null then '' else cast(a8 as string) end as a8,
case when a9 is null then '' else cast(a9 as string) end as a9,
case when a10 is null then '' else cast(a10 as string) end as a10,
case when a11 is null then '' else cast(a11 as string) end as a11,
case when a12 is null then '' else cast(a12 as string) end as a12,
case when a13 is null then '' else cast(a13 as string) end as a13,
case when a14 is null then '' else cast(a14 as string) end as a14,
case when a15 is null then '' else cast(a15 as string) end as a15,
case when a16 is null then '' else cast(a16 as string) end as a16,
case when a17 is null then '' else cast(a17 as string) end as a17,
case when a18 is null then '' else cast(a18 as string) end as a18,
case when a19 is null then '' else cast(a19 as string) end as a19
from table22
) A
left join table11 B ON A.KEY_ID = B.KEY_ID
where b.KEY_ID is null]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Class.copyConstructors(Class.java:3130)
at java.lang.Class.getConstructors(Class.java:1651)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:387)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$makeCopy$1.apply(TreeNode.scala:385)
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:385)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:190)
at 
org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:188)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:189)
at 

[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-09-06 Thread ajithme
Github user ajithme commented on the issue:

https://github.com/apache/spark/pull/22277
  
I see. But the code modified in this PR is when alias is part of 
projection. The query mention by you seems not to hit the current alias logic @ 
__org.apache.spark.sql.catalyst.plans.logical.UnaryNode#getAliasedConstraints__ 
 as for outer query __a__ and __c__ as not aliases but rather 
AttributeReferences.

Do you mean we should cover the scenario where alias is referenced in 
filter as part of this PR.? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-09-05 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/22277
  
You can have `select * from (select a, a as c from table1 where a > 10) t 
where a > c`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-09-05 Thread ajithme
Github user ajithme commented on the issue:

https://github.com/apache/spark/pull/22277
  
@jiangxb1987 Thanks you for the feedback. Couple of points
1. If introduce a predicate which refers to alias( as u mentioned a > z), 
it will throw error
```
spark-sql> create table table1 (a int);
18/09/05 13:00:28 WARN HiveMetaStore: Location: 
file:/user/hive/warehouse/table1 specified for non-external table:table1
Time taken: 0.152 seconds

spark-sql> select a, a as c from table1 where a > 10 and a > c;
18/09/05 13:01:04 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
Error in query: cannot resolve '`c`' given input columns: [table1.a]; line 
1 pos 50;
'Project ['a, 'a AS c#6]
+- 'Filter ((a#7 > 10) && (a#7 > 'c))
   +- SubqueryAlias table1
  +- HiveTableRelation `default`.`table1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#7]
```
So i think its invalid scenario for a > z.? please correct me if i am wrong

2) if we add a predicate like __a > a__ instead of __a > z__ ( self 
referring)  the PR still produces valid constrain list  
```
  (x#5 > x#5),(b#1 <=> y#6),(x#5 > 10),(z#7 <=> x#5),isnotnull(x#5)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-09-04 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/22277
  
Thank you for interest in this issue, however, I don't think the changes 
proposed in this PR is valid, consider you have another predicate like `a > z`, 
it is surely desired to infer a new constraint `z > z`. Please correct me if 
I'm wrong about this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-08-31 Thread ajithme
Github user ajithme commented on the issue:

https://github.com/apache/spark/pull/22277
  
@gatorsmile and @jiangxb1987 any inputs.?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22277
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22277
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias

2018-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22277
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org