Re: Ask for ARM CI for spark

2019-08-15 Thread Sean Owen
I'm not sure what you mean. The dependencies are downloaded by SBT and
Maven like in any other project, and nothing about it is specific to Spark.
The worker machines cache artifacts that are downloaded from these, but
this is a function of Maven and SBT, not Spark. You may find that the
initial download takes a long time.

On Thu, Aug 15, 2019 at 9:02 PM bo zhaobo 
wrote:

> Hi Sean,
>
> Thanks very much for pointing out the roadmap. ;-). Then I think we will
> continue to focus on our test environment.
>
> For the networking problems, I mean that we can access Maven Central, and
> jobs cloud download the required jar package with a high network speed.
> What we want to know is that, why the Spark QA test jobs[1] log shows the
> job script/maven build seem don't download the jar packages? Could you tell
> us the reason about that? Thank you.  The reason we raise the "networking
> problems" is that we found a phenomenon during we test, if we execute "mvn
> clean package" in a new test environment(As in our test environment, we
> will destory the test VMs after the job is finish), maven will download the
> dependency jar packages from Maven Central, but in this job
> "spark-master-test-maven-hadoop" [2], from the log, we didn't found it
> download any jar packages, what the reason about that?
> Also we build the Spark jar with downloading dependencies from Maven
> Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
> if we run "mvn package" in a VM which already exec "mvn package" before, it
> just cost 14min, looks very closer with [2]. So we suspect that downloading
> the Jar packages cost so much time. For the goad of ARM CI, we expect the
> performance of NEW ARM CI could be closer with existing X86 CI, then users
> could accept it eaiser.
>
> [1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
> [2]
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull
>
> Best regards
>
> ZhaoBo
>
>
>
>
> [image: Mailtrack]
> 
>  Sender
> notified by
> Mailtrack
> 
>  19/08/16
> 上午09:48:43
>
> Sean Owen  于2019年8月15日周四 下午9:58写道:
>
>> I think the right goal is to fix the remaining issues first. If we set up
>> CI/CD it will only tell us there are still some test failures. If it's
>> stable, and not hard to add to the existing CI/CD, yes it could be done
>> automatically later. You can continue to test on ARM independently for now.
>>
>> It sounds indeed like there are some networking problems in the test
>> system if you're not able to download from Maven Central. That rarely takes
>> significant time, and there aren't project-specific mirrors here. You might
>> be able to point at a closer public mirror, depending on where you are.
>>
>> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang 
>> wrote:
>>
>>> Hi all,
>>>
>>> I want to discuss spark ARM CI again, we took some tests on arm instance
>>> based on master and the job includes
>>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>>> https://github.com/theopenlab/spark/pull/17/ , there are several things
>>> I want to talk about:
>>>
>>> First, about the failed tests:
>>> 1.we have fixed some problems like
>>> https://github.com/apache/spark/pull/25186 and
>>> https://github.com/apache/spark/pull/25279, thanks sean owen and others
>>> to help us.
>>> 2.we tried k8s integration test on arm, and met an error: apk fetch
>>> hangs,  the tests passed  after adding '--network host' option for command
>>> `docker build`, see:
>>>
>>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>>> , the solution refers to
>>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't
>>> know whether it happened once in community CI, or maybe we should submit a
>>> pr to pass  '--network host' when `docker build`?
>>> 3.we found there are two tests failed after the commit
>>> https://github.com/apache/spark/pull/23767  :
>>>ReplayListenerSuite:
>>>- ...
>>>- End-to-end replay *** FAILED ***
>>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>- End-to-end replay with compression *** FAILED ***
>>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>
>>> we tried to revert the commit and then the tests passed, the
>>> patch is too big and so sorry we can't find the reason till now, if you are
>>> interesting please try it, and it will be very appreciate  if
>>> someone can help us to figure it out.
>>>
>>> Second, about the test time, we increased the flavor of arm instance to
>>> 16U16G, but seems there was no significant improvement, the k8s integration
>>> test took about one and a half hours, and the QA test(like
>>> 

Re: Ask for ARM CI for spark

2019-08-15 Thread bo zhaobo
Hi Sean,

Thanks very much for pointing out the roadmap. ;-). Then I think we will
continue to focus on our test environment.

For the networking problems, I mean that we can access Maven Central, and
jobs cloud download the required jar package with a high network speed.
What we want to know is that, why the Spark QA test jobs[1] log shows the
job script/maven build seem don't download the jar packages? Could you tell
us the reason about that? Thank you.  The reason we raise the "networking
problems" is that we found a phenomenon during we test, if we execute "mvn
clean package" in a new test environment(As in our test environment, we
will destory the test VMs after the job is finish), maven will download the
dependency jar packages from Maven Central, but in this job
"spark-master-test-maven-hadoop" [2], from the log, we didn't found it
download any jar packages, what the reason about that?
Also we build the Spark jar with downloading dependencies from Maven
Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
if we run "mvn package" in a VM which already exec "mvn package" before, it
just cost 14min, looks very closer with [2]. So we suspect that downloading
the Jar packages cost so much time. For the goad of ARM CI, we expect the
performance of NEW ARM CI could be closer with existing X86 CI, then users
could accept it eaiser.

[1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
[2]
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull

Best regards

ZhaoBo




[image: Mailtrack]

Sender
notified by
Mailtrack

19/08/16
上午09:48:43

Sean Owen  于2019年8月15日周四 下午9:58写道:

> I think the right goal is to fix the remaining issues first. If we set up
> CI/CD it will only tell us there are still some test failures. If it's
> stable, and not hard to add to the existing CI/CD, yes it could be done
> automatically later. You can continue to test on ARM independently for now.
>
> It sounds indeed like there are some networking problems in the test
> system if you're not able to download from Maven Central. That rarely takes
> significant time, and there aren't project-specific mirrors here. You might
> be able to point at a closer public mirror, depending on where you are.
>
> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang 
> wrote:
>
>> Hi all,
>>
>> I want to discuss spark ARM CI again, we took some tests on arm instance
>> based on master and the job includes
>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>> https://github.com/theopenlab/spark/pull/17/ , there are several things
>> I want to talk about:
>>
>> First, about the failed tests:
>> 1.we have fixed some problems like
>> https://github.com/apache/spark/pull/25186 and
>> https://github.com/apache/spark/pull/25279, thanks sean owen and others
>> to help us.
>> 2.we tried k8s integration test on arm, and met an error: apk fetch
>> hangs,  the tests passed  after adding '--network host' option for command
>> `docker build`, see:
>>
>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>> , the solution refers to
>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
>> whether it happened once in community CI, or maybe we should submit a pr to
>> pass  '--network host' when `docker build`?
>> 3.we found there are two tests failed after the commit
>> https://github.com/apache/spark/pull/23767  :
>>ReplayListenerSuite:
>>- ...
>>- End-to-end replay *** FAILED ***
>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>- End-to-end replay with compression *** FAILED ***
>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>
>> we tried to revert the commit and then the tests passed, the
>> patch is too big and so sorry we can't find the reason till now, if you are
>> interesting please try it, and it will be very appreciate  if
>> someone can help us to figure it out.
>>
>> Second, about the test time, we increased the flavor of arm instance to
>> 16U16G, but seems there was no significant improvement, the k8s integration
>> test took about one and a half hours, and the QA test(like
>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>> seventeen hours(it is too long :(), we suspect that the reason is the
>> performance and network,
>> we split the jobs based on projects such as sql, core and so on, the time
>> can be decrease to about seven hours, see
>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
>> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>> it looks all tests seem never download the jar packages from maven centry

Re: Ask for ARM CI for spark

2019-08-15 Thread Tianhua huang
@Sean Owen  , thanks for your reply.
I agree with you basically, two points I have to say :)
First, maybe I didn't express clear enough, now we download from Maven
Central in our test system, seems the community jenkins ci tests never
download the jar packages from maven centry repo, our question is if there
is an internal maven repo in community jenkins?
Second, about the failed tests, of course we will continue to figure them
out, and hope if someone can help/join us:) but I am afraid if we have to
wait it to be "stable"(maybe you mean no failed tests?) And the failed
tests of ReplayListenerSuite mentioned last mail are passed before, we
suspect it introduced by https://github.com/apache/spark/pull/23767, we
revert the code and the tests passed, so hope someone can help us to look
deep into it. Now the tests we took based on master, if some modification
introduce errors, the test will fail, I think this is one reason we need
arm ci.

Thank you all :)

On Thu, Aug 15, 2019 at 9:58 PM Sean Owen  wrote:

> I think the right goal is to fix the remaining issues first. If we set up
> CI/CD it will only tell us there are still some test failures. If it's
> stable, and not hard to add to the existing CI/CD, yes it could be done
> automatically later. You can continue to test on ARM independently for now.
>
> It sounds indeed like there are some networking problems in the test
> system if you're not able to download from Maven Central. That rarely takes
> significant time, and there aren't project-specific mirrors here. You might
> be able to point at a closer public mirror, depending on where you are.
>
> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang 
> wrote:
>
>> Hi all,
>>
>> I want to discuss spark ARM CI again, we took some tests on arm instance
>> based on master and the job includes
>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>> https://github.com/theopenlab/spark/pull/17/ , there are several things
>> I want to talk about:
>>
>> First, about the failed tests:
>> 1.we have fixed some problems like
>> https://github.com/apache/spark/pull/25186 and
>> https://github.com/apache/spark/pull/25279, thanks sean owen and others
>> to help us.
>> 2.we tried k8s integration test on arm, and met an error: apk fetch
>> hangs,  the tests passed  after adding '--network host' option for command
>> `docker build`, see:
>>
>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>> , the solution refers to
>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
>> whether it happened once in community CI, or maybe we should submit a pr to
>> pass  '--network host' when `docker build`?
>> 3.we found there are two tests failed after the commit
>> https://github.com/apache/spark/pull/23767  :
>>ReplayListenerSuite:
>>- ...
>>- End-to-end replay *** FAILED ***
>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>- End-to-end replay with compression *** FAILED ***
>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>
>> we tried to revert the commit and then the tests passed, the
>> patch is too big and so sorry we can't find the reason till now, if you are
>> interesting please try it, and it will be very appreciate  if
>> someone can help us to figure it out.
>>
>> Second, about the test time, we increased the flavor of arm instance to
>> 16U16G, but seems there was no significant improvement, the k8s integration
>> test took about one and a half hours, and the QA test(like
>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>> seventeen hours(it is too long :(), we suspect that the reason is the
>> performance and network,
>> we split the jobs based on projects such as sql, core and so on, the time
>> can be decrease to about seven hours, see
>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
>> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>> it looks all tests seem never download the jar packages from maven centry
>> repo(such as
>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>> So we want to know how the jenkins jobs can do that, is there a internal
>> maven repo launched? maybe we can do the same thing to avoid the network
>> connection cost during downloading the dependent jar packages.
>>
>> Third, the most important thing, it's about ARM CI of spark, we believe
>> that it is necessary, right? And you can see we really made a lot of
>> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
>> jobs to community, we can set them to novoting firstly, and improve/rich
>> the jobs step by step. Generally, there are two ways in our mind to
>> integrate the ARM CI for spark:
>>  1) We introduce openlab ARM CI into spark as a custom CI system. We
>> provide human resources and test ARM VMs, also we will 

Re: [build system] colo maintenance & outage tomorrow, 10am-2pm PDT

2019-08-15 Thread Shane Knapp
a couple of workers needed a bit more time to finish booting up, so no need
for my excursion tomorrow.  :)

builds be building, things look happy.

On Thu, Aug 15, 2019 at 6:46 PM Shane Knapp  wrote:

> it's back up!  some of the workers didn't come back cleanly, so i'll have
> to hit up the colo tomorrow and persuade them in person.
>
> On Thu, Aug 15, 2019 at 6:45 PM Wenchen Fan  wrote:
>
>> Thanks for tracking it Shane!
>>
>> On Fri, Aug 16, 2019 at 7:41 AM Shane Knapp  wrote:
>>
>>> just got an update:
>>>
>>> there was a problem w/the replacement part, and they're trying to fix
>>> it.  if that's successful, the expect to have power restored within the
>>> hour.
>>>
>>> if that doesn't work, a new (new) replacement part is scheduled to
>>> arrive at 8am tomorrow.
>>>
>>> shane
>>>
>>> On Thu, Aug 15, 2019 at 2:07 PM Shane Knapp  wrote:
>>>
 quick update:

 it's been 4 hours, the colo is still down, and i haven't gotten any
 news yet as to when they're planning on getting power restored.

 once i hear something i will let everyone know what's up.

 On Wed, Aug 14, 2019 at 10:22 AM Shane Knapp 
 wrote:

> the berkeley colo had a major power distribution breaker fail, and
> they've scheduled an emergency repair for tomorrow (thursday) @ 10am.
>
> they expect this to take ~4 hours.
>
> i will be shutting down the machines (again) ~9am, and bringing them
> all back up once i get the all-clear.
>
> sorry for the inconvenience...
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


 --
 Shane Knapp
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [build system] colo maintenance & outage tomorrow, 10am-2pm PDT

2019-08-15 Thread Shane Knapp
it's back up!  some of the workers didn't come back cleanly, so i'll have
to hit up the colo tomorrow and persuade them in person.

On Thu, Aug 15, 2019 at 6:45 PM Wenchen Fan  wrote:

> Thanks for tracking it Shane!
>
> On Fri, Aug 16, 2019 at 7:41 AM Shane Knapp  wrote:
>
>> just got an update:
>>
>> there was a problem w/the replacement part, and they're trying to fix
>> it.  if that's successful, the expect to have power restored within the
>> hour.
>>
>> if that doesn't work, a new (new) replacement part is scheduled to arrive
>> at 8am tomorrow.
>>
>> shane
>>
>> On Thu, Aug 15, 2019 at 2:07 PM Shane Knapp  wrote:
>>
>>> quick update:
>>>
>>> it's been 4 hours, the colo is still down, and i haven't gotten any news
>>> yet as to when they're planning on getting power restored.
>>>
>>> once i hear something i will let everyone know what's up.
>>>
>>> On Wed, Aug 14, 2019 at 10:22 AM Shane Knapp 
>>> wrote:
>>>
 the berkeley colo had a major power distribution breaker fail, and
 they've scheduled an emergency repair for tomorrow (thursday) @ 10am.

 they expect this to take ~4 hours.

 i will be shutting down the machines (again) ~9am, and bringing them
 all back up once i get the all-clear.

 sorry for the inconvenience...

 shane
 --
 Shane Knapp
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [build system] colo maintenance & outage tomorrow, 10am-2pm PDT

2019-08-15 Thread Wenchen Fan
Thanks for tracking it Shane!

On Fri, Aug 16, 2019 at 7:41 AM Shane Knapp  wrote:

> just got an update:
>
> there was a problem w/the replacement part, and they're trying to fix it.
> if that's successful, the expect to have power restored within the hour.
>
> if that doesn't work, a new (new) replacement part is scheduled to arrive
> at 8am tomorrow.
>
> shane
>
> On Thu, Aug 15, 2019 at 2:07 PM Shane Knapp  wrote:
>
>> quick update:
>>
>> it's been 4 hours, the colo is still down, and i haven't gotten any news
>> yet as to when they're planning on getting power restored.
>>
>> once i hear something i will let everyone know what's up.
>>
>> On Wed, Aug 14, 2019 at 10:22 AM Shane Knapp  wrote:
>>
>>> the berkeley colo had a major power distribution breaker fail, and
>>> they've scheduled an emergency repair for tomorrow (thursday) @ 10am.
>>>
>>> they expect this to take ~4 hours.
>>>
>>> i will be shutting down the machines (again) ~9am, and bringing them all
>>> back up once i get the all-clear.
>>>
>>> sorry for the inconvenience...
>>>
>>> shane
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: [build system] colo maintenance & outage tomorrow, 10am-2pm PDT

2019-08-15 Thread Shane Knapp
just got an update:

there was a problem w/the replacement part, and they're trying to fix it.
if that's successful, the expect to have power restored within the hour.

if that doesn't work, a new (new) replacement part is scheduled to arrive
at 8am tomorrow.

shane

On Thu, Aug 15, 2019 at 2:07 PM Shane Knapp  wrote:

> quick update:
>
> it's been 4 hours, the colo is still down, and i haven't gotten any news
> yet as to when they're planning on getting power restored.
>
> once i hear something i will let everyone know what's up.
>
> On Wed, Aug 14, 2019 at 10:22 AM Shane Knapp  wrote:
>
>> the berkeley colo had a major power distribution breaker fail, and
>> they've scheduled an emergency repair for tomorrow (thursday) @ 10am.
>>
>> they expect this to take ~4 hours.
>>
>> i will be shutting down the machines (again) ~9am, and bringing them all
>> back up once i get the all-clear.
>>
>> sorry for the inconvenience...
>>
>> shane
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [build system] colo maintenance & outage tomorrow, 10am-2pm PDT

2019-08-15 Thread Shane Knapp
quick update:

it's been 4 hours, the colo is still down, and i haven't gotten any news
yet as to when they're planning on getting power restored.

once i hear something i will let everyone know what's up.

On Wed, Aug 14, 2019 at 10:22 AM Shane Knapp  wrote:

> the berkeley colo had a major power distribution breaker fail, and they've
> scheduled an emergency repair for tomorrow (thursday) @ 10am.
>
> they expect this to take ~4 hours.
>
> i will be shutting down the machines (again) ~9am, and bringing them all
> back up once i get the all-clear.
>
> sorry for the inconvenience...
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Release Apache Spark 2.4.4

2019-08-15 Thread Dongjoon Hyun
+1 for that.

Kazuaki volunteered for 2.3.4 release last month. AFAIK, he has been
preparing that.

-
https://lists.apache.org/thread.html/6fafeefb7715e8764ccfe5d30c90d7444378b5f4f383ec95e2f1d7de@%3Cdev.spark.apache.org%3E

I believe we can handle them after 2.4.4 RC1 (or concurrently.)

Hi, Kazuaki.
Could you start a separate email thread for 2.3.4 release?

Bests,
Dongjoon.


On Thu, Aug 15, 2019 at 8:43 AM Sean Owen  wrote:

> While we're on the topic:
>
> In theory, branch 2.3 is meant to be unsupported as of right about now.
>
> There are 69 fixes in branch 2.3 since 2.3.3 was released in Februrary:
> https://issues.apache.org/jira/projects/SPARK/versions/12344844
>
> Some look moderately important.
>
> Should we also, or first, cut 2.3.4 to end the 2.3.x line?
>
> On Tue, Aug 13, 2019 at 6:16 PM Dongjoon Hyun 
> wrote:
> >
> > Hi, All.
> >
> > Spark 2.4.3 was released three months ago (8th May).
> > As of today (13th August), there are 112 commits (75 JIRAs) in
> `branch-24` since 2.4.3.
> >
> > It would be great if we can have Spark 2.4.4.
> > Shall we start `2.4.4 RC1` next Monday (19th August)?
> >
> > Last time, there was a request for K8s issue and now I'm waiting for
> SPARK-27900.
> > Please let me know if there is another issue.
> >
> > Thanks,
> > Dongjoon.
>


Re: Release Apache Spark 2.4.4

2019-08-15 Thread Sean Owen
While we're on the topic:

In theory, branch 2.3 is meant to be unsupported as of right about now.

There are 69 fixes in branch 2.3 since 2.3.3 was released in Februrary:
https://issues.apache.org/jira/projects/SPARK/versions/12344844

Some look moderately important.

Should we also, or first, cut 2.3.4 to end the 2.3.x line?

On Tue, Aug 13, 2019 at 6:16 PM Dongjoon Hyun  wrote:
>
> Hi, All.
>
> Spark 2.4.3 was released three months ago (8th May).
> As of today (13th August), there are 112 commits (75 JIRAs) in `branch-24` 
> since 2.4.3.
>
> It would be great if we can have Spark 2.4.4.
> Shall we start `2.4.4 RC1` next Monday (19th August)?
>
> Last time, there was a request for K8s issue and now I'm waiting for 
> SPARK-27900.
> Please let me know if there is another issue.
>
> Thanks,
> Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Ask for ARM CI for spark

2019-08-15 Thread Sean Owen
I think the right goal is to fix the remaining issues first. If we set up
CI/CD it will only tell us there are still some test failures. If it's
stable, and not hard to add to the existing CI/CD, yes it could be done
automatically later. You can continue to test on ARM independently for now.

It sounds indeed like there are some networking problems in the test system
if you're not able to download from Maven Central. That rarely takes
significant time, and there aren't project-specific mirrors here. You might
be able to point at a closer public mirror, depending on where you are.

On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang 
wrote:

> Hi all,
>
> I want to discuss spark ARM CI again, we took some tests on arm instance
> based on master and the job includes
> https://github.com/theopenlab/spark/pull/13  and k8s integration
> https://github.com/theopenlab/spark/pull/17/ , there are several things I
> want to talk about:
>
> First, about the failed tests:
> 1.we have fixed some problems like
> https://github.com/apache/spark/pull/25186 and
> https://github.com/apache/spark/pull/25279, thanks sean owen and others
> to help us.
> 2.we tried k8s integration test on arm, and met an error: apk fetch
> hangs,  the tests passed  after adding '--network host' option for command
> `docker build`, see:
>
> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
> , the solution refers to
> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
> whether it happened once in community CI, or maybe we should submit a pr to
> pass  '--network host' when `docker build`?
> 3.we found there are two tests failed after the commit
> https://github.com/apache/spark/pull/23767  :
>ReplayListenerSuite:
>- ...
>- End-to-end replay *** FAILED ***
>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>- End-to-end replay with compression *** FAILED ***
>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>
> we tried to revert the commit and then the tests passed, the patch
> is too big and so sorry we can't find the reason till now, if you are
> interesting please try it, and it will be very appreciate  if
> someone can help us to figure it out.
>
> Second, about the test time, we increased the flavor of arm instance to
> 16U16G, but seems there was no significant improvement, the k8s integration
> test took about one and a half hours, and the QA test(like
> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
> seventeen hours(it is too long :(), we suspect that the reason is the
> performance and network,
> we split the jobs based on projects such as sql, core and so on, the time
> can be decrease to about seven hours, see
> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
> it looks all tests seem never download the jar packages from maven centry
> repo(such as
> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
> So we want to know how the jenkins jobs can do that, is there a internal
> maven repo launched? maybe we can do the same thing to avoid the network
> connection cost during downloading the dependent jar packages.
>
> Third, the most important thing, it's about ARM CI of spark, we believe
> that it is necessary, right? And you can see we really made a lot of
> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
> jobs to community, we can set them to novoting firstly, and improve/rich
> the jobs step by step. Generally, there are two ways in our mind to
> integrate the ARM CI for spark:
>  1) We introduce openlab ARM CI into spark as a custom CI system. We
> provide human resources and test ARM VMs, also we will focus on the ARM
> related issues about Spark. We will push the PR into community.
>  2) We donate ARM VM resources into existing amplab Jenkins. We still
> provide human resources, focus on the ARM related issues about Spark and
> push the PR into community.
> Both options, we will provide human resources to maintain, of course it
> will be great if we can work together. So please tell us which option you
> would like? And let's move forward. Waiting for your reply, thank you very
> much.
>


Re: Ask for ARM CI for spark

2019-08-15 Thread Tianhua huang
Hi all,

I want to discuss spark ARM CI again, we took some tests on arm instance
based on master and the job includes
https://github.com/theopenlab/spark/pull/13  and k8s integration
https://github.com/theopenlab/spark/pull/17/ , there are several things I
want to talk about:

First, about the failed tests:
1.we have fixed some problems like
https://github.com/apache/spark/pull/25186 and
https://github.com/apache/spark/pull/25279, thanks sean owen and others to
help us.
2.we tried k8s integration test on arm, and met an error: apk fetch
hangs,  the tests passed  after adding '--network host' option for command
`docker build`, see:

https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
, the solution refers to
https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
whether it happened once in community CI, or maybe we should submit a pr to
pass  '--network host' when `docker build`?
3.we found there are two tests failed after the commit
https://github.com/apache/spark/pull/23767  :
   ReplayListenerSuite:
   - ...
   - End-to-end replay *** FAILED ***
 "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
   - End-to-end replay with compression *** FAILED ***
 "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)

we tried to revert the commit and then the tests passed, the patch
is too big and so sorry we can't find the reason till now, if you are
interesting please try it, and it will be very appreciate  if
someone can help us to figure it out.

Second, about the test time, we increased the flavor of arm instance to
16U16G, but seems there was no significant improvement, the k8s integration
test took about one and a half hours, and the QA test(like
spark-master-test-maven-hadoop-2.7 community jenkins job) took about
seventeen hours(it is too long :(), we suspect that the reason is the
performance and network,
we split the jobs based on projects such as sql, core and so on, the time
can be decrease to about seven hours, see
https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   , it
looks all tests seem never download the jar packages from maven centry
repo(such as
https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
So we want to know how the jenkins jobs can do that, is there a internal
maven repo launched? maybe we can do the same thing to avoid the network
connection cost during downloading the dependent jar packages.

Third, the most important thing, it's about ARM CI of spark, we believe
that it is necessary, right? And you can see we really made a lot of
efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
jobs to community, we can set them to novoting firstly, and improve/rich
the jobs step by step. Generally, there are two ways in our mind to
integrate the ARM CI for spark:
 1) We introduce openlab ARM CI into spark as a custom CI system. We
provide human resources and test ARM VMs, also we will focus on the ARM
related issues about Spark. We will push the PR into community.
 2) We donate ARM VM resources into existing amplab Jenkins. We still
provide human resources, focus on the ARM related issues about Spark and
push the PR into community.
Both options, we will provide human resources to maintain, of course it
will be great if we can work together. So please tell us which option you
would like? And let's move forward. Waiting for your reply, thank you very
much.

On Wed, Aug 14, 2019 at 10:30 AM Tianhua huang 
wrote:

> OK, thanks.
>
> On Tue, Aug 13, 2019 at 8:37 PM Sean Owen  wrote:
>
>> -dev@ -- it's better not to send to the whole list to discuss specific
>> changes or issues from here. You can reply on the pull request.
>> I don't know what the issue is either at a glance.
>>
>> On Tue, Aug 13, 2019 at 2:54 AM Tianhua huang 
>> wrote:
>>
>>> Hi all,
>>>
>>> About the arm test of spark, recently we found two tests failed after
>>> the commit https://github.com/apache/spark/pull/23767:
>>>ReplayListenerSuite:
>>>- ...
>>>- End-to-end replay *** FAILED ***
>>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>- End-to-end replay with compression *** FAILED ***
>>>  "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>>
>>> We tried to revert the commit and then the tests passed, the patch is
>>> too big and so sorry we can't find the reason till now, if you are
>>> interesting please try it, and it will be very appreciate  if
>>> someone can help us to figure it out.
>>>
>>> On Tue, Aug 6, 2019 at 9:08 AM bo zhaobo 
>>> wrote:
>>>
 Hi shane,
 Thanks for your reply. I will wait for you back. ;-)

 Thanks,
 Best regards
 ZhaoBo



 [image: Mailtrack]
 

Re: [DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-15 Thread Hyukjin Kwon
Yeah, we will probably drop Python 2 entirely after 3.0.0. Python 2 is
already deprecated.

On Thu, 15 Aug 2019, 18:25 Driesprong, Fokko,  wrote:

> Sorry for the late reply, was a bit busy lately, but I still would like to
> share my thoughts on this.
>
> For Apache Airflow we're dropping support for Python 2 in the next major
> release. We're now supporting Python 3.5+. Mostly because:
>
>- Easier to maintain and test, and less if/else constructions for the
>different Python versions. Also, not having to test against Python 2.x
>reduces the build matrix.
>- Python 3 has support for typing. From Python 3.5 you can include
>provisional type hints. An excellent presentation by Guido himself:
>https://www.youtube.com/watch?v=2wDvzy6Hgxg. From Python 3.5 it is
>still provisional, but it is a really good idea. From Airflow we've noticed
>that using mypy is catching bugs early:
>   - This will put less stress on the (boring part of the) reviewing
>   process since a lot of this stuff is checked automatically.
>   - For new developers, it is easier to read the code because of the
>   annotations.
>   - Can be used as an input for generated documentation (or check if
>   it still in sync with the docstrings)
>   - Easier to extend the code since you know what kind of types you
>   can expect, and your IDE will also pick up the hinting.
>- Python 2.x will be EOL end this year
>
> I have a strong preference to migrate everything to Python 3.
>
> Cheers, Fokko
>
>
> Op wo 7 aug. 2019 om 12:14 schreef Weichen Xu :
>
>> All right we could support both Python 2 and Python 3 for spark 3.0.
>>
>> On Wed, Aug 7, 2019 at 6:10 PM Hyukjin Kwon  wrote:
>>
>>> We didn't drop Python 2 yet although it's deprecated. So I think It
>>> should support both Python 2 and Python 3 at the current status.
>>>
>>> 2019년 8월 7일 (수) 오후 6:54, Weichen Xu 님이 작성:
>>>
 Hi all,

 I would like to discuss the compatibility for dev scripts. Because we
 already decided to deprecate python2 in spark 3.0, for development scripts
 under dev/ , we have two choice:
 1) Migration from Python 2 to Python 3
 2) Support both Python 2 and Python 3

 I tend to option (2) which is more friendly to maintenance.

 Regards,
 Weichen

>>>


Re: [DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-15 Thread Hyukjin Kwon
I mean python 2 _will be_ deprecated in Spark 3.

On Thu, 15 Aug 2019, 18:37 Hyukjin Kwon,  wrote:

> Yeah, we will probably drop Python 2 entirely after 3.0.0. Python 2 is
> already deprecated.
>
> On Thu, 15 Aug 2019, 18:25 Driesprong, Fokko, 
> wrote:
>
>> Sorry for the late reply, was a bit busy lately, but I still would like
>> to share my thoughts on this.
>>
>> For Apache Airflow we're dropping support for Python 2 in the next major
>> release. We're now supporting Python 3.5+. Mostly because:
>>
>>- Easier to maintain and test, and less if/else constructions for the
>>different Python versions. Also, not having to test against Python 2.x
>>reduces the build matrix.
>>- Python 3 has support for typing. From Python 3.5 you can include
>>provisional type hints. An excellent presentation by Guido himself:
>>https://www.youtube.com/watch?v=2wDvzy6Hgxg. From Python 3.5 it is
>>still provisional, but it is a really good idea. From Airflow we've 
>> noticed
>>that using mypy is catching bugs early:
>>   - This will put less stress on the (boring part of the) reviewing
>>   process since a lot of this stuff is checked automatically.
>>   - For new developers, it is easier to read the code because of the
>>   annotations.
>>   - Can be used as an input for generated documentation (or check if
>>   it still in sync with the docstrings)
>>   - Easier to extend the code since you know what kind of types you
>>   can expect, and your IDE will also pick up the hinting.
>>- Python 2.x will be EOL end this year
>>
>> I have a strong preference to migrate everything to Python 3.
>>
>> Cheers, Fokko
>>
>>
>> Op wo 7 aug. 2019 om 12:14 schreef Weichen Xu > >:
>>
>>> All right we could support both Python 2 and Python 3 for spark 3.0.
>>>
>>> On Wed, Aug 7, 2019 at 6:10 PM Hyukjin Kwon  wrote:
>>>
 We didn't drop Python 2 yet although it's deprecated. So I think It
 should support both Python 2 and Python 3 at the current status.

 2019년 8월 7일 (수) 오후 6:54, Weichen Xu 님이 작성:

> Hi all,
>
> I would like to discuss the compatibility for dev scripts. Because we
> already decided to deprecate python2 in spark 3.0, for development scripts
> under dev/ , we have two choice:
> 1) Migration from Python 2 to Python 3
> 2) Support both Python 2 and Python 3
>
> I tend to option (2) which is more friendly to maintenance.
>
> Regards,
> Weichen
>



Re: [DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-15 Thread Driesprong, Fokko
Sorry for the late reply, was a bit busy lately, but I still would like to
share my thoughts on this.

For Apache Airflow we're dropping support for Python 2 in the next major
release. We're now supporting Python 3.5+. Mostly because:

   - Easier to maintain and test, and less if/else constructions for the
   different Python versions. Also, not having to test against Python 2.x
   reduces the build matrix.
   - Python 3 has support for typing. From Python 3.5 you can include
   provisional type hints. An excellent presentation by Guido himself:
   https://www.youtube.com/watch?v=2wDvzy6Hgxg. From Python 3.5 it is still
   provisional, but it is a really good idea. From Airflow we've noticed that
   using mypy is catching bugs early:
  - This will put less stress on the (boring part of the) reviewing
  process since a lot of this stuff is checked automatically.
  - For new developers, it is easier to read the code because of the
  annotations.
  - Can be used as an input for generated documentation (or check if it
  still in sync with the docstrings)
  - Easier to extend the code since you know what kind of types you can
  expect, and your IDE will also pick up the hinting.
   - Python 2.x will be EOL end this year

I have a strong preference to migrate everything to Python 3.

Cheers, Fokko


Op wo 7 aug. 2019 om 12:14 schreef Weichen Xu :

> All right we could support both Python 2 and Python 3 for spark 3.0.
>
> On Wed, Aug 7, 2019 at 6:10 PM Hyukjin Kwon  wrote:
>
>> We didn't drop Python 2 yet although it's deprecated. So I think It
>> should support both Python 2 and Python 3 at the current status.
>>
>> 2019년 8월 7일 (수) 오후 6:54, Weichen Xu 님이 작성:
>>
>>> Hi all,
>>>
>>> I would like to discuss the compatibility for dev scripts. Because we
>>> already decided to deprecate python2 in spark 3.0, for development scripts
>>> under dev/ , we have two choice:
>>> 1) Migration from Python 2 to Python 3
>>> 2) Support both Python 2 and Python 3
>>>
>>> I tend to option (2) which is more friendly to maintenance.
>>>
>>> Regards,
>>> Weichen
>>>
>>