Re: Ask for ARM CI for spark

2019-09-18 Thread Dongjoon Hyun
Hi, Tianhua.

Could you summarize the detail on the JIRA once more?
It will be very helpful for the community. Also, I've been waiting on that
JIRA. :)

Bests,
Dongjoon.


On Mon, Sep 16, 2019 at 11:48 PM Tianhua huang 
wrote:

> @shane knapp  thank you very much, I opened an issue
> for this https://issues.apache.org/jira/browse/SPARK-29106, we can tall
> the details in it :)
> And we will prepare an arm instance today and will send the info to your
> email later.
>
> On Tue, Sep 17, 2019 at 4:40 AM Shane Knapp  wrote:
>
>> @Tianhua huang  sure, i think we can get
>> something sorted for the short-term.
>>
>> all we need is ssh access (i can provide an ssh key), and i can then have
>> our jenkins master launch a remote worker on that instance.
>>
>> instance setup, etc, will be up to you.  my support for the time being
>> will be to create the job and 'best effort' for everything else.
>>
>> this should get us up and running asap.
>>
>> is there an open JIRA for jenkins/arm test support?  we can move the
>> technical details about this idea there.
>>
>> On Sun, Sep 15, 2019 at 9:03 PM Tianhua huang 
>> wrote:
>>
>>> @Sean Owen  , so sorry to reply late, we had a
>>> Mid-Autumn holiday:)
>>>
>>> If you hope to integrate ARM CI to amplab jenkins, we can offer the arm
>>> instance, and then the ARM job will run together with other x86 jobs, so
>>> maybe there is a guideline to do this? @shane knapp
>>>   would you help us?
>>>
>>> On Thu, Sep 12, 2019 at 9:36 PM Sean Owen  wrote:
>>>
 I don't know what's involved in actually accepting or operating those
 machines, so can't comment there, but in the meantime it's good that you
 are running these tests and can help report changes needed to keep it
 working with ARM. I would continue with that for now.

 On Wed, Sep 11, 2019 at 10:06 PM Tianhua huang <
 huangtianhua...@gmail.com> wrote:

> Hi all,
>
> For the whole work process of spark ARM CI, we want to make 2 things
> clear.
>
> The first thing is:
> About spark ARM CI, now we have two periodic jobs, one job[1] based on
> commit[2](which already fixed the replay tests failed issue[3], we made a
> new test branch based on date 09-09-2019), the other job[4] based on spark
> master.
>
> The first job we test on the specified branch to prove that our ARM CI
> is good and stable.
> The second job checks spark master every day, then we can find whether
> the latest commits affect the ARM CI. According to the build history and
> result, it shows that some problems are easier to find on ARM like
> SPARK-28770 , and
> it also shows that we would make efforts to trace and figure them out, 
> till
> now we have found and fixed several problems[5][6][7], thanks everyone of
> the community :). And we believe that ARM CI is very necessary, right?
>
> The second thing is:
> We plan to run the jobs for a period of time, and you can see the
> result and logs from 'build history' of the jobs console, if everything
> goes well for one or two weeks could community accept the ARM CI? or how
> long the periodic jobs to run then our community could have enough
> confidence to accept the ARM CI? As you suggested before, it's good to
> integrate ARM CI to amplab jenkins, we agree that and we can donate the 
> ARM
> instances and then maintain the ARM-related test jobs together with
> community, any thoughts?
>
> Thank you all!
>
> [1]
> http://status.openlabtesting.org/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64
> [2]
> https://github.com/apache/spark/commit/0ed9fae45769d4b06b8cf8128f462f09ff3d9a72
> [3] https://issues.apache.org/jira/browse/SPARK-28770
> [4]
> http://status.openlabtesting.org/builds?job_name=spark-master-unit-test-hadoop-2.7-arm64
> [5] https://github.com/apache/spark/pull/25186
> [6] https://github.com/apache/spark/pull/25279
> [7] https://github.com/apache/spark/pull/25673
>
>
>
> On Fri, Aug 16, 2019 at 11:24 PM Sean Owen  wrote:
>
>> Yes, I think it's just local caching. After you run the build you
>> should find lots of stuff cached at ~/.m2/repository and it won't 
>> download
>> every time.
>>
>> On Fri, Aug 16, 2019 at 3:01 AM bo zhaobo <
>> bzhaojyathousa...@gmail.com> wrote:
>>
>>> Hi Sean,
>>> Thanks for reply. And very apologize for making you confused.
>>> I know the dependencies will be downloaded from SBT or Maven. But
>>> the Spark QA job also exec "mvn clean package", why the log didn't print
>>> "downloading some jar from Maven central [1] and build very fast. Is the
>>> reason that Spark Jenkins build the Spark jars in the physical machiines
>>> and won't destrory the test env after job is finished? Then the other 
>>> job
>>> build Spark 

Re: FYI - filed bunch of issues for flaky tests in recent CI builds

2019-09-18 Thread Gabor Somogyi
Had a look at the Kafka test(SPARK-29136
) and commented.

BR,
G


On Wed, Sep 18, 2019 at 7:54 AM Jungtaek Lim  wrote:

> Hi devs,
>
> I've found bunch of test failures (intermittently) in both CI build for
> master branch as well as PR builder (only checked for mine) yesterday. I
> just filed issues which I observed, but I guess there's more as I only
> checked my PR.
>
> https://issues.apache.org/jira/browse/SPARK-29129
> https://issues.apache.org/jira/browse/SPARK-29130
> https://issues.apache.org/jira/browse/SPARK-29131
> https://issues.apache.org/jira/browse/SPARK-29132
> https://issues.apache.org/jira/browse/SPARK-29133
> https://issues.apache.org/jira/browse/SPARK-29134
> https://issues.apache.org/jira/browse/SPARK-29135
> https://issues.apache.org/jira/browse/SPARK-29136
> https://issues.apache.org/jira/browse/SPARK-29137
> https://issues.apache.org/jira/browse/SPARK-29138
> https://issues.apache.org/jira/browse/SPARK-29139
> https://issues.apache.org/jira/browse/SPARK-29140
>
> Other than that, there're another lots of failures with below message:
>
> java.util.concurrent.ExecutionException: java.lang.IllegalStateException:
>> Cannot call methods on a stopped SparkContext.
>
>
> Even some of them above might be affected as well.
>
> I couldn't check whether these issues have been resolved (not for PR
> builder as these failures were yesterday, but for master build) so any
> helps are appreciated.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>