[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959124#comment-16959124
 ] 

Shane Knapp commented on SPARK-29106:
-------------------------------------

> First I want to share the details what we have done in Openlab test env.

this is an extremely basic python installation, and doesn't include important 
things that pyspark needs to test against, like pandas and pyarrow.

> 1) If we can not use Anaconda, how about manage the packages via ansible too? 
> Just for ARM now?  Such as for py27, we need to install what packages from 
> pip/somewhere and need to install manually(For manually installed packages, 
> if possible, we can do something like leveldbjni on maven, provider a 
> public/official way to fit the ARM package downloading/installation). For 
> now, I personally think it's very difficult to use Anaconda, as there aren't 
> so much package management platform for ARM, eventhrough we start up Anaconda 
> on ARM. If we do that, we need to fix the all gaps, that's a very huge 
> project.

a few things here:

* i am already using ansible to set up and deploy python via anaconda (and pip) 
on the x86 workers
* we can't use anaconda for ARM, period.  we have to use python virtual envs
* i still haven't had the cycles to dive in to trying to recreate the 3 python 
envs on ARM yet

> 2) For multiple python version, py27 py34 py36 and pypy, the venv is the 
> right choice now. But how about support part of them for the first step? Such 
> as only 1 or 2 python version support now, as we already passed on py27 and 
> py36 testing. Let's see that ARM eco is very limited now. 

yeah, i was planning on doing one at a time.

> 3) As the following integration work is in your sight, we can not know so 
> much details about what problem you hit. So please feel free to tell us how 
> can we help you, we are looking forward to work with you.

that's the plan!  :)

> For more quick to test SparkR, I install manually in the ARM jenkins worker, 
> because the R installation also need so much time, including deb librarise 
> install and R itself. So I found amplab jenkins job also manage the R 
> installation before the real spark test execution? Is that happened in each 
> build?

no, R is set up via ansible and not modified by the build.

> I think the current maven UT test could be run 1 time per day, and 
> pyspark/sparkR runs 1 time per day. Eventhough they are running 
> simultaneously, but we can make the 2 jobs trigger in different time period, 
> such as maven UT test(From 0:00 am to 12:00 am), pyspark/sparkR(From 1:00pm 
> to 10:00pm).

sure, sounds like a plan once we/i get those two parts set up on the worker in 
an atomic and reproducible way.

> Add jenkins arm test for spark
> ------------------------------
>
>                 Key: SPARK-29106
>                 URL: https://issues.apache.org/jira/browse/SPARK-29106
>             Project: Spark
>          Issue Type: Test
>          Components: Tests
>    Affects Versions: 3.0.0
>            Reporter: huangtianhua
>            Priority: Minor
>         Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to