[ 
https://issues.apache.org/jira/browse/SPARK-32615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leanken.Lin updated SPARK-32615:
--------------------------------
    Description: 
{code:java}
// Reproduce Step
sql/test-only org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite -- 
-z "SPARK-32573: Eliminate NAAJ when BuildSide is 
EmptyHashedRelationWithAllNullKeys"
{code}
{code:java}
// Error Message
14:40:44.089 ERROR org.apache.spark.util.Utils: Uncaught exception in thread 
element-tracking-store-worker
14:40:44.089 ERROR org.apache.spark.util.Utils: Uncaught exception in thread 
element-tracking-store-worker java.util.NoSuchElementException: key not found: 
12 
at scala.collection.immutable.Map$Map1.apply(Map.scala:114) 
at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.$anonfun$aggregateMetrics$11(SQLAppStatusListener.scala:257)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at 
scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at 
scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at 
scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at 
scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.aggregateMetrics(SQLAppStatusListener.scala:256)
 at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.$anonfun$onExecutionEnd$2(SQLAppStatusListener.scala:365)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.util.Utils$.tryLog(Utils.scala:1971) at 
org.apache.spark.status.ElementTrackingStore$$anon$1.run(ElementTrackingStore.scala:117)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)[info] - SPARK-32573: Eliminate NAAJ 
when BuildSide is EmptyHashedRelationWithAllNullKeys (2 seconds, 14 
milliseconds)
{code}
This issue is mainly because during AQE, while sub-plan changed, the metrics 
update is overwrite. for example, in this UT, change from BroadcastHashJoinExec 
into a LocalTableScanExec, and in the onExecutionEnd action it will try 
aggregate all metrics including old ones during the execution, which will cause 
NoSuchElementException, since the metricsType is already updated with plan 
rewritten. So we need to filter out those outdated metrics.

  was:
{code:java}
// Reproduce Step
sql/test-only org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite -- 
-z "SPARK-32573: Eliminate NAAJ when BuildSide is 
EmptyHashedRelationWithAllNullKeys"
{code}
{code:java}
// Error Message
14:40:44.089 ERROR org.apache.spark.util.Utils: Uncaught exception in thread 
element-tracking-store-worker
14:40:44.089 ERROR org.apache.spark.util.Utils: Uncaught exception in thread 
element-tracking-store-worker java.util.NoSuchElementException: key not found: 
12 
at scala.collection.immutable.Map$Map1.apply(Map.scala:114) 
at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.$anonfun$aggregateMetrics$11(SQLAppStatusListener.scala:257)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at 
scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at 
scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at 
scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at 
scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.aggregateMetrics(SQLAppStatusListener.scala:256)
 at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.$anonfun$onExecutionEnd$2(SQLAppStatusListener.scala:365)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.util.Utils$.tryLog(Utils.scala:1971) at 
org.apache.spark.status.ElementTrackingStore$$anon$1.run(ElementTrackingStore.scala:117)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)[info] - SPARK-32573: Eliminate NAAJ 
when BuildSide is EmptyHashedRelationWithAllNullKeys (2 seconds, 14 
milliseconds)
{code}
This issue is mainly because during AQE, while sub-plan changed, the metrics 
update is overwrite. for example, in this UT, change from BroadcastHashJoinExec 
into a LocalTableScanExec, and in the onExecutionEnd action it will try 
aggregate all metrics during the execution, which will cause 
NoSuchElementException


> Fix AQE aggregateMetrics java.util.NoSuchElementException
> ---------------------------------------------------------
>
>                 Key: SPARK-32615
>                 URL: https://issues.apache.org/jira/browse/SPARK-32615
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Leanken.Lin
>            Priority: Minor
>
> {code:java}
> // Reproduce Step
> sql/test-only org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite 
> -- -z "SPARK-32573: Eliminate NAAJ when BuildSide is 
> EmptyHashedRelationWithAllNullKeys"
> {code}
> {code:java}
> // Error Message
> 14:40:44.089 ERROR org.apache.spark.util.Utils: Uncaught exception in thread 
> element-tracking-store-worker
> 14:40:44.089 ERROR org.apache.spark.util.Utils: Uncaught exception in thread 
> element-tracking-store-worker java.util.NoSuchElementException: key not 
> found: 12 
> at scala.collection.immutable.Map$Map1.apply(Map.scala:114) 
> at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListener.$anonfun$aggregateMetrics$11(SQLAppStatusListener.scala:257)
>  at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at 
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at 
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at 
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at 
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at 
> scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListener.aggregateMetrics(SQLAppStatusListener.scala:256)
>  at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListener.$anonfun$onExecutionEnd$2(SQLAppStatusListener.scala:365)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.apache.spark.util.Utils$.tryLog(Utils.scala:1971) at 
> org.apache.spark.status.ElementTrackingStore$$anon$1.run(ElementTrackingStore.scala:117)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)[info] - SPARK-32573: Eliminate NAAJ 
> when BuildSide is EmptyHashedRelationWithAllNullKeys (2 seconds, 14 
> milliseconds)
> {code}
> This issue is mainly because during AQE, while sub-plan changed, the metrics 
> update is overwrite. for example, in this UT, change from 
> BroadcastHashJoinExec into a LocalTableScanExec, and in the onExecutionEnd 
> action it will try aggregate all metrics including old ones during the 
> execution, which will cause NoSuchElementException, since the metricsType is 
> already updated with plan rewritten. So we need to filter out those outdated 
> metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to