[jira] [Assigned] (SPARK-34682) Regression in "operating on canonicalized plan" check in CustomShuffleReaderExec

2021-03-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-34682:


Assignee: Andy Grove

> Regression in "operating on canonicalized plan" check in 
> CustomShuffleReaderExec
> 
>
> Key: SPARK-34682
> URL: https://issues.apache.org/jira/browse/SPARK-34682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
> Fix For: 3.2.0, 3.1.2
>
>
> In Spark 3.0.2 if I attempt to execute on a canonicalized version of 
> CustomShuffleReaderExec I get an error "operating on canonicalized plan", as 
> expected.
> There is a regression in Spark 3.1.1 where this check can never be reached 
> because of a new call to sendDriverMetrics that was added prior to the check. 
> This method will fail if operating on a canonicalized plan because it assumes 
> the existence of metrics that do not exist if this is a canonicalized plan.
> {code:java}
>  private lazy val shuffleRDD: RDD[_] = {
>   sendDriverMetrics()
>   shuffleStage.map { stage =>
> stage.shuffle.getShuffleRDD(partitionSpecs.toArray)
>   }.getOrElse {
> throw new IllegalStateException("operating on canonicalized plan")
>   }
> }{code}
> The specific error looks like this:
> {code:java}
> java.util.NoSuchElementException: key not found: numPartitions
> at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:101)
> at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:99)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.sendDriverMetrics(CustomShuffleReaderExec.scala:122)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD$lzycompute(CustomShuffleReaderExec.scala:182)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD(CustomShuffleReaderExec.scala:181)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.doExecuteColumnar(CustomShuffleReaderExec.scala:196)
>  {code}
> I think the fix is simply to avoid calling sendDriverMetrics if the plan is 
> canonicalized and I am planning on creating a PR to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34682) Regression in "operating on canonicalized plan" check in CustomShuffleReaderExec

2021-03-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34682:


Assignee: Apache Spark

> Regression in "operating on canonicalized plan" check in 
> CustomShuffleReaderExec
> 
>
> Key: SPARK-34682
> URL: https://issues.apache.org/jira/browse/SPARK-34682
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.2.0, 3.1.2
>
>
> In Spark 3.0.2 if I attempt to execute on a canonicalized version of 
> CustomShuffleReaderExec I get an error "operating on canonicalized plan", as 
> expected.
> There is a regression in Spark 3.1.1 where this check can never be reached 
> because of a new call to sendDriverMetrics that was added prior to the check. 
> This method will fail if operating on a canonicalized plan because it assumes 
> the existence of metrics that do not exist if this is a canonicalized plan.
> {code:java}
>  private lazy val shuffleRDD: RDD[_] = {
>   sendDriverMetrics()
>   shuffleStage.map { stage =>
> stage.shuffle.getShuffleRDD(partitionSpecs.toArray)
>   }.getOrElse {
> throw new IllegalStateException("operating on canonicalized plan")
>   }
> }{code}
> The specific error looks like this:
> {code:java}
> java.util.NoSuchElementException: key not found: numPartitions
> at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:101)
> at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:99)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.sendDriverMetrics(CustomShuffleReaderExec.scala:122)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD$lzycompute(CustomShuffleReaderExec.scala:182)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD(CustomShuffleReaderExec.scala:181)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.doExecuteColumnar(CustomShuffleReaderExec.scala:196)
>  {code}
> I think the fix is simply to avoid calling sendDriverMetrics if the plan is 
> canonicalized and I am planning on creating a PR to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34682) Regression in "operating on canonicalized plan" check in CustomShuffleReaderExec

2021-03-09 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34682:


Assignee: (was: Apache Spark)

> Regression in "operating on canonicalized plan" check in 
> CustomShuffleReaderExec
> 
>
> Key: SPARK-34682
> URL: https://issues.apache.org/jira/browse/SPARK-34682
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Priority: Minor
> Fix For: 3.2.0, 3.1.2
>
>
> In Spark 3.0.2 if I attempt to execute on a canonicalized version of 
> CustomShuffleReaderExec I get an error "operating on canonicalized plan", as 
> expected.
> There is a regression in Spark 3.1.1 where this check can never be reached 
> because of a new call to sendDriverMetrics that was added prior to the check. 
> This method will fail if operating on a canonicalized plan because it assumes 
> the existence of metrics that do not exist if this is a canonicalized plan.
> {code:java}
>  private lazy val shuffleRDD: RDD[_] = {
>   sendDriverMetrics()
>   shuffleStage.map { stage =>
> stage.shuffle.getShuffleRDD(partitionSpecs.toArray)
>   }.getOrElse {
> throw new IllegalStateException("operating on canonicalized plan")
>   }
> }{code}
> The specific error looks like this:
> {code:java}
> java.util.NoSuchElementException: key not found: numPartitions
> at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:101)
> at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:99)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.sendDriverMetrics(CustomShuffleReaderExec.scala:122)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD$lzycompute(CustomShuffleReaderExec.scala:182)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD(CustomShuffleReaderExec.scala:181)
> at 
> org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.doExecuteColumnar(CustomShuffleReaderExec.scala:196)
>  {code}
> I think the fix is simply to avoid calling sendDriverMetrics if the plan is 
> canonicalized and I am planning on creating a PR to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org