[ https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922771#comment-16922771 ]
Imran Rashid commented on SPARK-28770: -------------------------------------- I'm not sure I follow ... the metrics from the driver will be in the log file if and only if {{park.eventLog.logStageExecutorMetrics.enabled=true}} and there is a stage completed event. But so that same criteria should apply on replay, so the EventMonster would also "log" the metrics update (meaning, record it in its internal buffer) if and only if the event log file also contained the stage completed event. [~bzhaoopenstack] since you can reproduce this consistently, could you share the event log file from a failure? I think you just need to delete this block in ReplayListenerQuite: {code} after { Utils.deleteRecursively(testDir) } {code} and maybe print out that dir somewhere so you can grab the file. > Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression > failed > ------------------------------------------------------------------------------- > > Key: SPARK-28770 > URL: https://issues.apache.org/jira/browse/SPARK-28770 > Project: Spark > Issue Type: Test > Components: Spark Core > Affects Versions: 2.4.3 > Environment: Community jenkins and our arm testing instance. > Reporter: huangtianhua > Priority: Major > > Test > org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with > compression is failed see > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/] > > And also the test is failed on arm instance, I sent email to spark-dev > before, and we suspect there is something related with the commit > [https://github.com/apache/spark/pull/23767], we tried to revert it and the > tests are passed: > ReplayListenerSuite: > - ... > - End-to-end replay *** FAILED *** > "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) > - End-to-end replay with compression *** FAILED *** > "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) > > Not sure what's wrong, hope someone can help to figure it out, thanks very > much. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org