Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
I’m very hesitant with this.

I don’t want to vote -1, because I personally think it’s important to do, but 
I’d like to see more discussion points addressed and not voting completely on 
the spirit of it.

First, SPIP doesn’t match the format of SPIP proposed and agreed on. (Maybe 
this is a minor point and perhaps we should also vote to update the SPIP format)

Second, there are multiple pdf/google doc and JIRA. And I think for example the 
design sketch is not covering the same points as the updated SPIP doc? It would 
help to make them align before moving forward.

Third, the proposal touches on some fairly core and sensitive components, like 
the scheduler, and I think more discussions are necessary. We have a few 
comments there and in the JIRA.




From: Marco Gaido 
Sent: Saturday, March 2, 2019 4:18 AM
To: Weichen Xu
Cc: Yinan Li; Tom Graves; dev; Xingbo Jiang
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

+1, a critical feature for AI/DL!

Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu 
mailto:weichen...@databricks.com>> ha scritto:
+1, nice feature!

On Sat, Mar 2, 2019 at 6:11 AM Yinan Li 
mailto:liyinan...@gmail.com>> wrote:
+1

On Fri, Mar 1, 2019 at 12:37 PM Tom Graves  wrote:
+1 for the SPIP.

Tom

On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang 
mailto:jiangxb1...@gmail.com>> wrote:


Hi all,

I want to call for a vote of 
SPARK-24615. It improves 
Spark by making it aware of GPUs exposed by cluster managers, and hence Spark 
can match GPU resources with user task requests properly. The 
proposal
 and production 
doc
 was made available on dev@ to collect input. Your can also find a design 
sketch at SPARK-27005.

The vote will be up for the next 72 hours. Please reply with your vote:

+1: Yeah, let's go forward and implement the SPIP.
+0: Don't really care.
-1: I don't think this is a good idea because of the following technical 
reasons.

Thank you!

Xingbo


Re: [build system] VERY IMPORTANT: please file JIRAs for issues w/jenkins

2019-03-02 Thread Xiao Li
Thank you, Shane!

Xiao

shane knapp  于2019年3月2日周六 下午4:28写道:

> adding new k8s functionality?
>
> something need upgrading in jenkins?
>
> are logs not being archived?
>
> odd build failure (and i mean *odd*)?
>
> PLEASE FILE A JIRA!  :)
>
> adding a @shaneknapp to a PR github is no longer working for me as the
> volume and complexity of requests has been increasing.  i will now consider
> these mentions a 'best effort' problem, and they will generally gravitate
> towards the bottom of my queue.
>
> i will however, pay attention to JIRA.  please file issues accordingly,
> and either assign or @ me so i can investigate and triage.
>
> also, please add the 'jenkins' component to any new issues.
>
> i'm aware that primarily working through github has been the preferred
> method to get my undivided attention for a number of years, and re-training
> ourselves might take a while.  to that end, i will update the spark
> website's dev page w/these instructions and reply w/gentle reminders when i
> get mentioned on JIRA-worthy things on github.
>
> on a related note:  i'm working on getting a bit more in resources on my
> end, particularly on the back-end (ubuntu 18, jenkins 2.x upgrade,
> finishing ubuntu port of builds) but it will take some time for spin-up,
> testing and eventual deployment.  one of my team will start helping me out
> w/that stuff soon, but only part-time...  however, i have faith that he can
> bang away at the myriad parallel projects i've had going on for the past
> couple of years and in a few months we'll be in a much, much better spot.
>
> thanks in advance,
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


[build system] VERY IMPORTANT: please file JIRAs for issues w/jenkins

2019-03-02 Thread shane knapp
adding new k8s functionality?

something need upgrading in jenkins?

are logs not being archived?

odd build failure (and i mean *odd*)?

PLEASE FILE A JIRA!  :)

adding a @shaneknapp to a PR github is no longer working for me as the
volume and complexity of requests has been increasing.  i will now consider
these mentions a 'best effort' problem, and they will generally gravitate
towards the bottom of my queue.

i will however, pay attention to JIRA.  please file issues accordingly, and
either assign or @ me so i can investigate and triage.

also, please add the 'jenkins' component to any new issues.

i'm aware that primarily working through github has been the preferred
method to get my undivided attention for a number of years, and re-training
ourselves might take a while.  to that end, i will update the spark
website's dev page w/these instructions and reply w/gentle reminders when i
get mentioned on JIRA-worthy things on github.

on a related note:  i'm working on getting a bit more in resources on my
end, particularly on the back-end (ubuntu 18, jenkins 2.x upgrade,
finishing ubuntu port of builds) but it will take some time for spin-up,
testing and eventual deployment.  one of my team will start helping me out
w/that stuff soon, but only part-time...  however, i have faith that he can
bang away at the myriad parallel projects i've had going on for the past
couple of years and in a few months we'll be in a much, much better spot.

thanks in advance,

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
+1 on mesos - what Sean says


From: Andrew Melo 
Sent: Friday, March 1, 2019 9:19 AM
To: Xingbo Jiang
Cc: Sean Owen; Xiangrui Meng; dev
Subject: Re: SPIP: Accelerator-aware Scheduling

Hi,

On Fri, Mar 1, 2019 at 9:48 AM Xingbo Jiang  wrote:
>
> Hi Sean,
>
> To support GPU scheduling with YARN cluster, we have to update the hadoop 
> version to 3.1.2+. However, if we decide to not upgrade hadoop to beyond that 
> version for Spark 3.0, then we just have to disable/fallback the GPU 
> scheduling with YARN, users shall still be able to have that feature with 
> Standalone or Kubernetes cluster.
>
> We didn't include the Mesos support in current SPIP because we didn't receive 
> use cases that require GPU scheduling on Mesos cluster, however, we can still 
> add Mesos support in the future if we observe valid use cases.

First time caller, long time listener. We have GPUs in our Mesos-based
Spark cluster, and it would be nice to use them with Spark-based
GPU-enabled frameworks (our use case is deep learning applications).

Cheers
Andrew

>
> Thanks!
>
> Xingbo
>
> Sean Owen  于2019年3月1日周五 下午10:39写道:
>>
>> Two late breaking questions:
>>
>> This basically requires Hadoop 3.1 for YARN support?
>> Mesos support is listed as a non goal but it already has support for 
>> requesting GPUs in Spark. That would be 'harmonized' with this 
>> implementation even if it's not extended?
>>
>> On Fri, Mar 1, 2019, 7:48 AM Xingbo Jiang  wrote:
>>>
>>> I think we are aligned on the commitment, I'll start a vote thread for this 
>>> shortly.
>>>
>>> Xiangrui Meng  于2019年2月27日周三 上午6:47写道:

 In case there are issues visiting Google doc, I attached PDF files to the 
 JIRA.

 On Tue, Feb 26, 2019 at 7:41 AM Xingbo Jiang  wrote:
>
> Hi all,
>
> I want send a revised SPIP on implementing Accelerator(GPU)-aware 
> Scheduling. It improves Spark by making it aware of GPUs exposed by 
> cluster managers, and hence Spark can match GPU resources with user task 
> requests properly. If you have scenarios that need to run 
> workloads(DL/ML/Signal Processing etc.) on Spark cluster with GPU nodes, 
> please help review and check how it fits into your use cases. Your 
> feedback would be greatly appreciated!
>
> # Links to SPIP and Product doc:
>
> * Jira issue for the SPIP: 
> https://issues.apache.org/jira/browse/SPARK-24615
> * Google Doc: 
> https://docs.google.com/document/d/1C4J_BPOcSCJc58HL7JfHtIzHrjU0rLRdQM3y7ejil64/edit?usp=sharing
> * Product Doc: 
> https://docs.google.com/document/d/12JjloksHCdslMXhdVZ3xY5l1Nde3HRhIrqvzGnK_bNE/edit?usp=sharing
>
> Thank you!
>
> Xingbo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Marco Gaido
+1, a critical feature for AI/DL!

Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu <
weichen...@databricks.com> ha scritto:

> +1, nice feature!
>
> On Sat, Mar 2, 2019 at 6:11 AM Yinan Li  wrote:
>
>> +1
>>
>> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves 
>> wrote:
>>
>>> +1 for the SPIP.
>>>
>>> Tom
>>>
>>> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang <
>>> jiangxb1...@gmail.com> wrote:
>>>
>>>
>>> Hi all,
>>>
>>> I want to call for a vote of SPARK-24615
>>> . It improves Spark
>>> by making it aware of GPUs exposed by cluster managers, and hence Spark can
>>> match GPU resources with user task requests properly. The proposal
>>> 
>>>  and production doc
>>> 
>>>  was
>>> made available on dev@ to collect input. Your can also find a design
>>> sketch at SPARK-27005
>>> .
>>>
>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>
>>> +1: Yeah, let's go forward and implement the SPIP.
>>> +0: Don't really care.
>>> -1: I don't think this is a good idea because of the following technical
>>> reasons.
>>>
>>> Thank you!
>>>
>>> Xingbo
>>>
>>