[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-03-31 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16998
  
Once we've added the flag, this issue is not urgent for now. I close first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-03-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16998
  
> Not really. Constraint propagation will still be enabled by default in 
Spark. The flag would just be a hammer to quickly get around issues like this 
and SPARK-17733.

Yeah, of course. I meant that when you disable the flag, you would enjoy 
the optimization relying on constraint propagation.

I will create another PR for this option.

> I'll take a closer look at this patch but given that this PR is primarily 
introducing a data structure that keeps track of aliased constraints, is there 
a fundamental reason for changing the underlying behavior (and restricting the 
optimization space)? Can there be a simpler alternative where we can still keep 
the old semantics?

I don't find an alternative fixing to keep the old semantics and not change 
the propagation structure, and also can largely improve performance at the same 
time.

#16785 keeps the old semantics and not change the propagation structure, 
but it just can cut half of the running time regarding the benchmark.

Adding the flag is one simpler option.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-03-06 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/16998
  
@viirya I'll take a closer look at this patch but given that this PR is 
primarily introducing a data structure that keeps track of aliased constraints, 
is there a fundamental reason for changing the underlying behavior (and 
restricting the optimization space)? Can there be a simpler alternative where 
we can still keep the old semantics?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-03-06 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/16998
  
> As we use constraints in optimization, if we turn off constraint 
inference/propagation, wouldn't it miss optimization chance for query plans?

Not really. Constraint propagation will still be enabled by default in 
Spark. The flag would just be a hammer to quickly get around issues like this 
and SPARK-17733.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-03-04 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16998
  
@hvanhovell Do you have any thoughts on this already? Please let me know. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-03-01 Thread chriso
Github user chriso commented on the issue:

https://github.com/apache/spark/pull/16998
  
I just run into the same issue with `spark-2.1.0`

Here's a minimal test case:

```scala
val max = 12 // try increasing this

val filter = (for (i <- 0 to max) yield col("value") <=> i) reduce (_ || _)
val projections = for (i <- 0 to max) yield (col("value") <=> 
i).as(s"value_$i")

val otherFilter = lit(true) // this can be anything

val df = Seq.empty[Int].toDF
val result = df.filter(otherFilter)
  .select(col("value") +: projections: _*)
  .filter(filter).filter(otherFilter)

result.explain
```

The final `explain` call hangs 
[here](https://github.com/apache/spark/blob/db0ddce523bb823cba996e92ef36ceca31492d2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L325-L328).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-02-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16998
  
@hvanhovell Do you have time to review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org