[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953858#comment-16953858
 ] 

zhao bo edited comment on SPARK-29106 at 10/17/19 3:46 PM:
-----------------------------------------------------------

Hi [~shaneknapp] Shane,
 As the whole test take some times, so I can not send out this summarize 
yesterday.

First Notes: The ARM VM just test with java8, as the java11 didn't get test 
before,
                   so we plan to add it in the future, and we test the java8 
first.
 * You mentioned the dependencies of ansible had been installed.
 * Also the dependencies which would use sudo priority have be installed by 
user 'root'.  The 'jenkins' user doesn't have sudo or any root-level access.
 * Source code location: /home/jenkins/spark – 2019/10/16 master branch
 * The all ansible test scripts are stored in 
/home/jenkins/ansible_test_scripts  You can run the test with the ansible CMD.
 * When we finish the whole test on the target ARM VM, we make a whole VM 
snapshot for it.

Now I have finished the following tests on the ARM VM:
 1. maven test - Spark Build and UT
 =======================
 env: java8javac 1.8.0_222
 spark: master branch
 TEST STEPS: 
 - ./build/mvn -B -e clean install -DskipTests -Phadoop-2.7 -Pyarn -Phive 
-Phive-thriftserver -Pkinesis-asl -Pmesos 
 - ./build/mvn -B -e test -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver 
-Pkinesis-asl -Pmesos
 TEST ANSIBLE CMD:  
 - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory 
/home/jenkins/ansible_test_scripts/maven_unittest.yml
 TEST LOG(including the full log, but success at the last time):  
 - /home/jenkins/ansible_test_scripts/test_logs/spark_build.log
   - /home/jenkins/ansible_test_scripts/test_logs/spark_test_original.log
   - /home/jenkins/ansible_test_scripts/test_logs/spark_test.log
     - For the spark_test.log, as I operate mistake, there is an error in the 
middle of the maven UT test, and stop the test(It seems I did other thing to 
locate the RAM to raise "not enough RAM" during test).   So I split the log 
into 2 file, One is test_logs/spark_test.log_before_test_fail,   the other is 
test_logs/spark_test_including_fail_and_following.log(This is rerun the fail 
test and the following tests which not run in the first log file) .The main 
reason is the maven test take so much time, and the whole tests are pass in the 
end. So I think it's better to not waste too much time here, then we could move 
the integration process forward quickly.

2. Pyspark and SparkR test
 =======================
 env: python2.7  python3.6  for PySpark test
      R 3.6.1 for SparkR test
 TEST STEPS:
   - python/run-tests --python-executables=python2.7,python3.6
   - ./R/run-tests.shTEST 
 ANSIBLE CMD: 
   - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory 
/home/jenkins/ansible_test_scripts/pyspark_sparkr_test.yml
 TEST LOG(including the full log, but success at the last time):
    - /home/jenkins/ansible_test_scripts/test_logs/pyspark_test.log
   - /home/jenkins/ansible_test_scripts/test_logs/sparkr_test.log

In the end, through the real test on the ARM vm, to be honest, we want to show 
you the time cost when test on ARM.
 Test cost summarize:
 The whole test may take very long time.
 * Spark build by maven  – First build take 1h42m, after that, this would take 
1h29m(This may be affected by the VM host performance during the time, the cost 
time may be shorter than we test.)
 * Spark UT test by maven  – This may take 8h-9h to finish the whole test
 * PySpark test  – 20 - 23 mins
 * SparkR test  – 15 - 20 mins

As the above time cost for different test jobs, we can choose multiple ways to 
test them as Periodic test jobs.
 * Split them and test one by one.
   - such as, if we just want to test PySpark, then we just add the periodic 
test  which including Spark Build and Pyspark test. That would just cost 2h per 
test. But if we want to test SparkR, we still need to test Spark Build. That 
means each test type, we must test it after Spark Build testing.
 * Test all of them each time.

we test all of them in one periodic test job, and just run Spark Build testing 
1 time. But it would cost nearly 11h.
 Each way is OK for us, you could choose the way to add the periodic testing 
for ARM.If you want to discuss and know more, please feel free to contact us.  


was (Author: bzhaoopenstack):
Hi [~shaneknapp] Shane,
 As the whole test take some times, so I can not send out this summarize 
yesterday.

First Notes: The ARM VM just test with java8, as the java11 didn't get test 
before,
                   so we plan to add it in the future, and we test the java8 
first.
 * You mentioned the dependencies of ansible had been installed.
 * Also the dependencies which would use sudo priority have be installed by 
user 'root'.  The 'jenkins' user doesn't have sudo or any root-level access.
 * Source code location: /home/jenkins/spark – 2019/10/16 master branch
 * The all ansible test scripts are stored in 
/home/jenkins/ansible_test_scripts  You can run the test with the ansible CMD.
 * When we finish the whole test on the target ARM VM, we make a whole VM 
snapshot for it.

Now I have finished the following tests on the ARM VM:
 1. maven test - Spark Build and UT
 =======================
 env: java8javac 1.8.0_222
 spark: master branch
 TEST STEPS: 
 - ./build/mvn -B -e clean install -DskipTests -Phadoop-2.7 -Pyarn -Phive 
-Phive-thriftserver -Pkinesis-asl -Pmesos 
 - ./build/mvn -B -e test -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver 
-Pkinesis-asl -Pmesos
 TEST ANSIBLE CMD:  
 - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory 
/home/jenkins/ansible_test_scripts/maven_unittest.yml
 TEST LOG(including the full log, but success at the last time):  
 - /home/jenkins/ansible_test_scripts/test_logs/spark_build.log
   - /home/jenkins/ansible_test_scripts/test_logs/spark_test_original.log
   - /home/jenkins/ansible_test_scripts/test_logs/spark_test.log
     - For the spark_test.log, as I operate mistake, there is an error in the 
middle of the maven UT test, and stop the test(It seems I did other thing to 
locate the 
 RAM to raise "not enough RAM" during test).   So I split the log into 2 file, 
One is test_logs/spark_test.log_before_test_fail,   the other is 
 test_logs/spark_test_including_fail_and_following.log(This is rerun the fail 
test and the following tests which not run in the first log file)      The main 
reason is 
 the maven test take so much time, and the whole tests are pass in the end. So 
I think it's better to not waste too much time here, then we could move the 
 integration process forward quickly.

2. Pyspark and SparkR test
 =======================
 env: python2.7  python3.6  for PySpark test
      R 3.6.1 for SparkR test
 TEST STEPS:
   - python/run-tests --python-executables=python2.7,python3.6
   - ./R/run-tests.shTEST 
 ANSIBLE CMD: 
   - ansible-playbook -i /home/jenkins/ansible_test_scripts/inventory 
/home/jenkins/ansible_test_scripts/pyspark_sparkr_test.yml
 TEST LOG(including the full log, but success at the last time):
    - /home/jenkins/ansible_test_scripts/test_logs/pyspark_test.log
   - /home/jenkins/ansible_test_scripts/test_logs/sparkr_test.log

In the end, through the real test on the ARM vm, to be honest, we want to show 
you the time cost when test on ARM.
 Test cost summarize:
 The whole test may take very long time.
 * Spark build by maven  – First build take 1h42m, after that, this would take 
1h29m(This may be affected by the VM host performance during the time, the cost 
time may be shorter than we test.)
 * Spark UT test by maven  – This may take 8h-9h to finish the whole test
 * PySpark test  – 20 - 23 mins
 * SparkR test  – 15 - 20 mins

As the above time cost for different test jobs, we can choose multiple ways to 
test them as Periodic test jobs.
 * Split them and test one by one.
   - such as, if we just want to test PySpark, then we just add the periodic 
test  which including Spark Build and Pyspark test. That would just cost 2h per 
test. But if we want to test SparkR, we still need to test Spark Build. That 
means each test type, we must test it after Spark Build testing.
 * Test all of them each time.

we test all of them in one periodic test job, and just run Spark Build testing 
1 time. But it would cost nearly 11h.
 Each way is OK for us, you could choose the way to add the periodic testing 
for ARM.If you want to discuss and know more, please feel free to contact us.  

> Add jenkins arm test for spark
> ------------------------------
>
>                 Key: SPARK-29106
>                 URL: https://issues.apache.org/jira/browse/SPARK-29106
>             Project: Spark
>          Issue Type: Test
>          Components: Tests
>    Affects Versions: 3.0.0
>            Reporter: huangtianhua
>            Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to