[ https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958651#comment-16958651 ]
zhao bo commented on SPARK-29106: --------------------------------- Hi [~shaneknapp], Thanks very much for sharing so much things to us. > For pyspark conversation: First I want to share the details what we have done in Openlab test env. [https://github.com/theopenlab/spark/pull/32/files#diff-ff133db31a4c2f724e9edfb0f70d243dR33] Anaconda python package management is very good for python and R ecosystem, but I think it is for the most popular ARCH ecosystem(X86 or other famous ARCH), because it just claimed "Cross platform". We hit the same issue, such as, we have to compile, install and test the dependeny libraries on ARM, that's why we want to improve the ARM ecosystem. 1) If we can not use Anaconda, how about manage the packages via ansible too? Just for ARM now? Such as for py27, we need to install what packages from pip/somewhere and need to install manually(For manually installed packages, if possible, we can do something like leveldbjni on maven, provider a public/official way to fit the ARM package downloading/installation). For now, I personally think it's very difficult to use Anaconda, as there aren't so much package management platform for ARM, eventhrough we start up Anaconda on ARM. If we do that, we need to fix the all gaps, that's a very huge project. 2) For multiple python version, py27 py34 py36 and pypy, the venv is the right choice now. But how about support part of them for the first step? Such as only 1 or 2 python version support now, as we already passed on py27 and py36 testing. Let's see that ARM eco is very limited now. ;) 3) As the following integration work is in your sight, we can not know so much details about what problem you hit. So please feel free to tell us how can we help you, we are looking forward to work with you. ;) > For sparkR conversation: I also share the details what we did in Openlab test env. [https://github.com/theopenlab/spark/pull/28/files#diff-ff133db31a4c2f724e9edfb0f70d243dR4] For more quick to test SparkR, I install manually in the ARM jenkins worker, because the R installation also need so much time, including deb librarise install and R itself. So I found amplab jenkins job also manage the R installation before the real spark test execution? Is that happened in each build? > For more jenkins jobs conversation: I think the current maven UT test could be run 1 time per day, and pyspark/sparkR runs 1 time per day. Eventhough they are running simultaneously, but we can make the 2 jobs trigger in different time period, such as maven UT test(From 0:00 am to 12:00 am), pyspark/sparkR(From 1:00pm to 10:00pm). > Add jenkins arm test for spark > ------------------------------ > > Key: SPARK-29106 > URL: https://issues.apache.org/jira/browse/SPARK-29106 > Project: Spark > Issue Type: Test > Components: Tests > Affects Versions: 3.0.0 > Reporter: huangtianhua > Priority: Minor > Attachments: R-ansible.yml, R-libs.txt > > > Add arm test jobs to amplab jenkins for spark. > Till now we made two arm test periodic jobs for spark in OpenLab, one is > based on master with hadoop 2.7(similar with QA test of amplab jenkins), > other one is based on a new branch which we made on date 09-09, see > [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64] > and > [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64] > We only have to care about the first one when integrate arm test with amplab > jenkins. > About the k8s test on arm, we have took test it, see > [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it > later. > And we plan test on other stable branches too, and we can integrate them to > amplab when they are ready. > We have offered an arm instance and sent the infos to shane knapp, thanks > shane to add the first arm job to amplab jenkins :) > The other important thing is about the leveldbjni > [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80] > spark depends on leveldbjni-all-1.8 > [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8], > we can see there is no arm64 supporting. So we build an arm64 supporting > release of leveldbjni see > [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8], > but we can't modified the spark pom.xml directly with something like > 'property'/'profile' to choose correct jar package on arm or x86 platform, > because spark depends on some hadoop packages like hadoop-hdfs, the packages > depend on leveldbjni-all-1.8 too, unless hadoop release with new arm > supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of > openlabtesting and 'mvn install' to use it when arm testing for spark. > PS: The issues found and fixed: > SPARK-28770 > [https://github.com/apache/spark/pull/25673] > > SPARK-28519 > [https://github.com/apache/spark/pull/25279] > > SPARK-28433 > [https://github.com/apache/spark/pull/25186] > > SPARK-28467 > [https://github.com/apache/spark/pull/25864] > > SPARK-29286 > [https://github.com/apache/spark/pull/26021] > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org