[jira] [Created] (SPARK-49819) Disable CollapseProject for correlated subqueries in projection over aggregate correctly
Nick Young created SPARK-49819: -- Summary: Disable CollapseProject for correlated subqueries in projection over aggregate correctly Key: SPARK-49819 URL: https://issues.apache.org/jira/browse/SPARK-49819 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 4.0.0 Reporter: Nick Young CollapseProject should block collapsing with an aggregate if any correlated subquery is present. There are other correlated subqueries that are not ScalarSubquery that are not accounted for here. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-49699) Disable unsuitable Optimizer rules for streaming and side-effect subplans
[ https://issues.apache.org/jira/browse/SPARK-49699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882771#comment-17882771 ] Nick Young commented on SPARK-49699: Assign to [~n-young-db] > Disable unsuitable Optimizer rules for streaming and side-effect subplans > - > > Key: SPARK-49699 > URL: https://issues.apache.org/jira/browse/SPARK-49699 > Project: Spark > Issue Type: Story > Components: Optimizer >Affects Versions: 4.0.0 >Reporter: Nick Young >Priority: Major > > Various optimizer rules are unsuitable for the streaming or side-effect > settings. Disable them, and roll out the disablement with care as to not > break existing queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49699) Disable unsuitable Optimizer rules for streaming and side-effect subplans
Nick Young created SPARK-49699: -- Summary: Disable unsuitable Optimizer rules for streaming and side-effect subplans Key: SPARK-49699 URL: https://issues.apache.org/jira/browse/SPARK-49699 Project: Spark Issue Type: Story Components: Optimizer Affects Versions: 4.0.0 Reporter: Nick Young Various optimizer rules are unsuitable for the streaming or side-effect settings. Disable them, and roll out the disablement with care as to not break existing queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48915) Add inequality (!=, <, <=, >, >=) predicates for correlation in GeneratedSubquerySuite
Nick Young created SPARK-48915: -- Summary: Add inequality (!=, <, <=, >, >=) predicates for correlation in GeneratedSubquerySuite Key: SPARK-48915 URL: https://issues.apache.org/jira/browse/SPARK-48915 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 4.0.0 Reporter: Nick Young {{GeneratedSubquerySuite}} is a test suite that generates SQL with variations of subqueries. Currently, the operators supported are Joins, Set Operations, Aggregate (with/without group by) and Limit. Implementing inequality (!=, <, <=, >, >=) predicates will increase coverage by 1 additional axis, and should be simple. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
Nick Young created SPARK-48656: -- Summary: ArrayIndexOutOfBoundsException in CartesianRDD getPartitions Key: SPARK-48656 URL: https://issues.apache.org/jira/browse/SPARK-48656 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Nick Young ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536) val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536)rdd2.cartesian(rdd1).partitions``` Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We should provide a better error message which indicates the number of partition overflows so it's easier for the user to debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46693) Inject LocalLimitExec when matching OffsetAndLimit or LimitAndOffset
Nick Young created SPARK-46693: -- Summary: Inject LocalLimitExec when matching OffsetAndLimit or LimitAndOffset Key: SPARK-46693 URL: https://issues.apache.org/jira/browse/SPARK-46693 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.5.0 Reporter: Nick Young For queries containing both a LIMIT and an OFFSET in a subquery, physical translation will drop the `LocalLimit` planned in the optimizer stage by mistake; this manifests as larger than necessary shuffle sizes for `GlobalLimitExec`. Fix to not drop this node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org