[jira] [Commented] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2021-04-12 Thread Jean-Yves STEPHAN (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319652#comment-17319652
 ] 

Jean-Yves STEPHAN commented on SPARK-31923:
---

After further investigation, I confirmed this is a red herring - our end user 
was using Spark 3.0.0 after all. Thanks for checking. 

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> 

[jira] [Commented] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2021-04-12 Thread Shixiong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319631#comment-17319631
 ] 

Shixiong Zhu commented on SPARK-31923:
--

Do you have a reproduction? It's weird to see 
`java.util.Collections$SynchronizedSet cannot be cast to java.util.List` since 
before calling `v.asScala.toList`, the pattern match `case v: 
java.util.List[_]` should not accept `java.util.Collections$SynchronizedSet`.

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> 

[jira] [Commented] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2021-04-12 Thread Jean-Yves STEPHAN (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319622#comment-17319622
 ] 

Jean-Yves STEPHAN commented on SPARK-31923:
---

Hi [~zsxwing] :) 



Despite your patch, we're running in the same issue while using Spark 3.0.1. 
The stack trace is informative (for the line numbers we need to refer to this 
file 
https://github.com/apache/spark/blob/v3.0.1/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L354).

The problem is that we're given a class (java.util.Collections$SynchronizedSet) 
that enters the branch
{code:java}
case v: java.util.List[_] =>
{code}
but on the next line the cast v.asScala.toList fails. 

 

```

21/04/12 15:11:40 ERROR AsyncEventQueue: Listener EventLoggingListener threw an 
exception21/04/12 15:11:40 ERROR AsyncEventQueue: Listener EventLoggingListener 
threw an exceptionjava.lang.ClassCastException: 
java.util.Collections$SynchronizedSet cannot be cast to java.util.List at 
org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:355) at 
org.apache.spark.util.JsonProtocol$.$anonfun$accumulableInfoToJson$4(JsonProtocol.scala:331)
 at scala.Option.map(Option.scala:230) at 
org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:331)
 at 
org.apache.spark.util.JsonProtocol$.$anonfun$accumulablesToJson$3(JsonProtocol.scala:324)
 at scala.collection.immutable.List.map(List.scala:290) at 
org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:324) 
at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:316) 
at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:151) at 
org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:79) at 
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:97)
 at 
org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:119)
 at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
 at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
 at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
 at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
 at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) at 
org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) at 
org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
 at 
org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
 at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at 
scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
 at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
 at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1319) at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

```

 

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4, 2.4.5, 2.4.6
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> 

[jira] [Commented] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128639#comment-17128639
 ] 

Apache Spark commented on SPARK-31923:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/28758

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> 

[jira] [Commented] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127444#comment-17127444
 ] 

Apache Spark commented on SPARK-31923:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/28744

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Priority: Major
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
> {code}

[jira] [Commented] (SPARK-31923) Event log cannot be generated when some internal accumulators use unexpected types

2020-06-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127443#comment-17127443
 ] 

Apache Spark commented on SPARK-31923:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/28744

> Event log cannot be generated when some internal accumulators use unexpected 
> types
> --
>
> Key: SPARK-31923
> URL: https://issues.apache.org/jira/browse/SPARK-31923
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: Shixiong Zhu
>Priority: Major
>
> A user may use internal accumulators by adding the "internal.metrics." prefix 
> to the accumulator name to hide sensitive information from UI (Accumulators 
> except internal ones will be shown in Spark UI).
> However, *org.apache.spark.util.JsonProtocol.accumValueToJson* assumes an 
> internal accumulator has only 3 possible types: int, long, and 
> java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an 
> unexpected type, it will crash.
> An event log that contains such accumulator will be dropped because it cannot 
> be converted to JSON, and it will cause weird UI issue when rendering in 
> Spark History Server. For example, if `SparkListenerTaskEnd` is dropped 
> because of this issue, the user will see the task is still running even if it 
> was finished.
> It's better to make *accumValueToJson* more robust.
> 
> How to reproduce it:
> - Enable Spark event log
> - Run the following command:
> {code}
> scala> val accu = sc.doubleAccumulator("internal.metrics.foo")
> accu: org.apache.spark.util.DoubleAccumulator = DoubleAccumulator(id: 0, 
> name: Some(internal.metrics.foo), value: 0.0)
> scala> sc.parallelize(1 to 1, 1).foreach { _ => accu.add(1.0) }
> 20/06/06 16:11:27 ERROR AsyncEventQueue: Listener EventLoggingListener threw 
> an exception
> java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.util.List
>   at 
> org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:330)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:306)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:306)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:299)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at 
> org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:299)
>   at 
> org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:291)
>   at 
> org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:145)
>   at 
> org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
> {code}