[jira] [Comment Edited] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed
[ https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444502#comment-17444502 ] Yang Jie edited comment on SPARK-37209 at 11/16/21, 1:31 PM: - After some investigation, I found that this issue maybe related to `hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful: {code:java} mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite Discovery starting. Discovery completed in 259 milliseconds. Run starting. Expected test count is: 1 YarnShuffleIntegrationSuite: - external shuffle service Run completed in 30 seconds, 765 milliseconds. Total number of tests run: 1 Suites: completed 2, aborted 0 Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 All tests passed.{code} It seems that when testing with hadoop-2.7, the result of executing `Utils.isTesting` is true, which helps test case to ignore the `NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the result of executing `Utils.isTesting` is false. But I haven't investigated the root cause with hadoop-3.2 cc [~hyukjin.kwon] [~dongjoon] [~srowen] was (Author: luciferyang): After some investigation, I found that this issue maybe related to `hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful: {code:java} mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite Discovery starting. Discovery completed in 259 milliseconds. Run starting. Expected test count is: 1 YarnShuffleIntegrationSuite: - external shuffle service Run completed in 30 seconds, 765 milliseconds. Total number of tests run: 1 Suites: completed 2, aborted 0 Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 All tests passed.{code} It seems that when testing with hadoop-2.7, the result of executing `Utils.isTesting` on the executor side is true, which helps test case to ignore the `NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the result of executing `Utils.isTesting` on the executor side is false. But I haven't investigated the root cause with hadoop-3.2 cc [~hyukjin.kwon] [~dongjoon] [~srowen] > YarnShuffleIntegrationSuite and other two similar cases in > `resource-managers` test failed > --- > > Key: SPARK-37209 > URL: https://issues.apache.org/jira/browse/SPARK-37209 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > Attachments: failed-unit-tests.log, success-unit-tests.log > > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will successful. > > Execute : > # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # Delete assembly/target/scala-2.12/jars manually > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > The error stack is : > {code:java} > 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: > User class threw exception: org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 4 times, > most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor > 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:216) > a
[jira] [Comment Edited] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed
[ https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444502#comment-17444502 ] Yang Jie edited comment on SPARK-37209 at 11/16/21, 12:10 PM: -- After some investigation, I found that this issue maybe related to `hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful: {code:java} mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite Discovery starting. Discovery completed in 259 milliseconds. Run starting. Expected test count is: 1 YarnShuffleIntegrationSuite: - external shuffle service Run completed in 30 seconds, 765 milliseconds. Total number of tests run: 1 Suites: completed 2, aborted 0 Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 All tests passed.{code} It seems that when testing with hadoop-2.7, the result of executing `Utils.isTesting` on the executor side is true, which helps test case to ignore the `NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the result of executing `Utils.isTesting` on the executor side is false. But I haven't investigated the root cause with hadoop-3.2 cc [~hyukjin.kwon] [~dongjoon] [~srowen] was (Author: luciferyang): After some investigation, I found that this issue maybe related to `hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful: {code:java} mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite Discovery starting. Discovery completed in 259 milliseconds. Run starting. Expected test count is: 1 YarnShuffleIntegrationSuite: - external shuffle service Run completed in 30 seconds, 765 milliseconds. Total number of tests run: 1 Suites: completed 2, aborted 0 Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 All tests passed.{code} It seems that when testing with hadoop-2.7, the result of executing `Utils.isTesting` on the executor side is true, which helps test case to ignore the `NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the result of executing `Utils.isTesting` on the executor side is false. But I haven't investigated the root cause with hadoop-3.2 > YarnShuffleIntegrationSuite and other two similar cases in > `resource-managers` test failed > --- > > Key: SPARK-37209 > URL: https://issues.apache.org/jira/browse/SPARK-37209 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > Attachments: failed-unit-tests.log, success-unit-tests.log > > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will successful. > > Execute : > # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # Delete assembly/target/scala-2.12/jars manually > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > The error stack is : > {code:java} > 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: > User class threw exception: org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 4 times, > most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor > 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:216) > at