[jira] [Comment Edited] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed

2021-11-16 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444502#comment-17444502
 ] 

Yang Jie edited comment on SPARK-37209 at 11/16/21, 1:31 PM:
-

After some investigation, I found that this issue maybe related to 
`hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful:

 
{code:java}
mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am
 mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none 
-DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite

Discovery starting.
Discovery completed in 259 milliseconds.
Run starting. Expected test count is: 1
YarnShuffleIntegrationSuite:
- external shuffle service
Run completed in 30 seconds, 765 milliseconds.
Total number of tests run: 1
Suites: completed 2, aborted 0
Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
All tests passed.{code}
It seems that when testing with hadoop-2.7, the result of executing 
`Utils.isTesting` is true, which helps test case to ignore the 
`NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the 
result of executing `Utils.isTesting` is false.

 

But I haven't investigated the root cause with hadoop-3.2

 

cc [~hyukjin.kwon] [~dongjoon] [~srowen] 

 

 


was (Author: luciferyang):
After some investigation, I found that this issue maybe related to 
`hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful:

 
{code:java}
mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am
 mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none 
-DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite

Discovery starting.
Discovery completed in 259 milliseconds.
Run starting. Expected test count is: 1
YarnShuffleIntegrationSuite:
- external shuffle service
Run completed in 30 seconds, 765 milliseconds.
Total number of tests run: 1
Suites: completed 2, aborted 0
Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
All tests passed.{code}
It seems that when testing with hadoop-2.7, the result of executing 
`Utils.isTesting` on the executor side is true, which helps test case to ignore 
the `NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the 
result of executing `Utils.isTesting` on the executor side is false.

 

But I haven't investigated the root cause with hadoop-3.2

 

cc [~hyukjin.kwon] [~dongjoon] [~srowen] 

 

 

> YarnShuffleIntegrationSuite  and other two similar cases in 
> `resource-managers` test failed
> ---
>
> Key: SPARK-37209
> URL: https://issues.apache.org/jira/browse/SPARK-37209
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
> Attachments: failed-unit-tests.log, success-unit-tests.log
>
>
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn
> The test will successful.
>  
> Execute :
>  # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
>  # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # Delete assembly/target/scala-2.12/jars manually
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> The error stack is :
> {code:java}
> 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: 
> User class threw exception: org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 4 times,
>  most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 
> 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
> a

[jira] [Comment Edited] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed

2021-11-16 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444502#comment-17444502
 ] 

Yang Jie edited comment on SPARK-37209 at 11/16/21, 12:10 PM:
--

After some investigation, I found that this issue maybe related to 
`hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful:

 
{code:java}
mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am
 mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none 
-DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite

Discovery starting.
Discovery completed in 259 milliseconds.
Run starting. Expected test count is: 1
YarnShuffleIntegrationSuite:
- external shuffle service
Run completed in 30 seconds, 765 milliseconds.
Total number of tests run: 1
Suites: completed 2, aborted 0
Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
All tests passed.{code}
It seems that when testing with hadoop-2.7, the result of executing 
`Utils.isTesting` on the executor side is true, which helps test case to ignore 
the `NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the 
result of executing `Utils.isTesting` on the executor side is false.

 

But I haven't investigated the root cause with hadoop-3.2

 

cc [~hyukjin.kwon] [~dongjoon] [~srowen] 

 

 


was (Author: luciferyang):
After some investigation, I found that this issue maybe related to 
`hadoop-3.x`, when use `hadoop-2.7` profile, the above test can be successful:

 
{code:java}
mvn clean install -DskipTests -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -am
 mvn test -pl resource-managers/yarn -Pyarn -Phadoop-2.7 -Dtest=none 
-DwildcardSuites=org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite

Discovery starting.
Discovery completed in 259 milliseconds.
Run starting. Expected test count is: 1
YarnShuffleIntegrationSuite:
- external shuffle service
Run completed in 30 seconds, 765 milliseconds.
Total number of tests run: 1
Suites: completed 2, aborted 0
Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
All tests passed.{code}
It seems that when testing with hadoop-2.7, the result of executing 
`Utils.isTesting` on the executor side is true, which helps test case to ignore 
the `NoClassDefFoundError` in the test, but when testing with hadoop-3.2, the 
result of executing `Utils.isTesting` on the executor side is false.

 

But I haven't investigated the root cause with hadoop-3.2

 

 

> YarnShuffleIntegrationSuite  and other two similar cases in 
> `resource-managers` test failed
> ---
>
> Key: SPARK-37209
> URL: https://issues.apache.org/jira/browse/SPARK-37209
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
> Attachments: failed-unit-tests.log, success-unit-tests.log
>
>
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn
> The test will successful.
>  
> Execute :
>  # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
>  # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # Delete assembly/target/scala-2.12/jars manually
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> The error stack is :
> {code:java}
> 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: 
> User class threw exception: org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 4 times,
>  most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 
> 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
> at