Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Marco de Abreu Wed, 21 Mar 2018 06:33:29 -0700

Hello,

the migration has just been completed and we're now running our UNIX based
slaves on CUDA 9.1 with CuDNN 7. The commit is available at
https://github.com/apache/incubator-mxnet/commit/b0a6760efa141aeca87b03ecf34dae924bd1af46
.


No jobs have been interrupted by this migration. If you encounter any
errors, please reach back to me.

Best regards,
Marco

On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
[email protected]> wrote:

> Hello,
>
> the results of this vote are as follows:
>
> +1:
> Jun
> Anirudh
> Hao
> Marco
>
> 0:
> Chris
>
> -1:
> Naveen (veto recalled as of https://lists.apache.org/thread.html/
> 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> 3Cdev.mxnet.apache.org%3E)
>
> Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> UNIX slaves and work on integration tests for CUDA 8 in the long term, this
> vote counts as PASSED.
>
> The PR for this change is available at https://github.com/apache/
> incubator-mxnet/pull/10108. I have developed and tested the new slaves in
> our test environment and everything looks promising so far. The plan is as
> follows:
>
>    1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved
>    to allow self-merge – CI can’t pass until slaves have been upgraded.
>    2. Replace all existing slaves with new upgraded slaves.
>    3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
>    merge necessary changes into master.
>
> IMPORTANT: The migration will happen tomorrow, so please expect some delay
> in job execution - the CI website will be unaffected. Ideally, no jobs
> should fail - in case they do, please feel free to retrigger them by using
> an empty commit. In case of any errors appearing after the upgrade, don't
> hesitate to contact me!
>
> Best regards,
> Marco
>
>
> On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <[email protected]> wrote:
>
>> Yes, for short-term.
>>
>> On Monday, March 19, 2018, Chris Olivier <[email protected]> wrote:
>>
>> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
>> Windows
>> > CUDA 8 in order to get CUDA version coverage?
>> >
>> > On 2018/03/16 21:09:09, Marco de Abreu <[email protected]>
>> > wrote:
>> > > Thanks for your input. How would you propose to proceed in terms of a
>> > > timeline in case this vote succeedes? I don't really have time to work
>> > on a
>> > > nightly setup right now. Would anybody in the community be able to
>> help
>> > me
>> > > out here or shall we wait with the migration until a nightly setup for
>> > CUDA
>> > > 8 is up?
>> > >
>> > > -Marco
>> > >
>> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
>> [email protected]>
>> > > wrote:
>> > >
>> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
>> > using
>> > > > CUDA9 for most instances in CI.
>> > > >
>> > > > Bhavin Thaker.
>> > > >
>> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <[email protected]>
>> > wrote:
>> > > >
>> > > > > I think its best to add support for CUDA 9.0 while retaining
>> existing
>> > > > > support for CUDA 8, code might regress when you remove and create
>> > more
>> > > > work
>> > > > > to add CUDA 8 support back.
>> > > > >
>> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
>> > > > > [email protected]> wrote:
>> > > > >
>> > > > > > Yeah, sorry Chris, mixed up the names.
>> > > > > >
>> > > > > > @Naveen: Would you be fine with doing the switch now and adding
>> > > > > integration
>> > > > > > tests later or is this a hard constraint for you?
>> > > > > >
>> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Isn't the TItan V the Volta and not the Tesla?
>> > > > > > >
>> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Marco,
>> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
>> adding
>> > > > CUDA
>> > > > > 9.
>> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
>> that
>> > all
>> > > > > > users
>> > > > > > > > might not have switched to CUDA 9.0
>> > > > > > > >
>> > > > > > > > Look at the earlier discussion on the same topic
>> > > > > > > >
>> > > > > > > > https://lists.apache.org/thread.html/
>> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
>> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
>> > > > > > > >
>> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
>> > > > > > > > [email protected]> wrote:
>> > > > > > > >
>> > > > > > > > > Right, the code changes would not be validated against
>> CUDA
>> > 8.0
>> > > > as
>> > > > > > part
>> > > > > > > > of
>> > > > > > > > > the PR process.
>> > > > > > > > >
>> > > > > > > > > I don't have any numbers, but it's pretty unlikely that
>> > anybody
>> > > > is
>> > > > > > > still
>> > > > > > > > > using CUDA 8.0. According to
>> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported, the
>> > devices
>> > > > > which
>> > > > > > > are
>> > > > > > > > > not being supported by CUDA 9 are under the Fermi
>> > architecture
>> > > > > which
>> > > > > > > has
>> > > > > > > > > been released in April 2010. These GPUs are way too old,
>> so I
>> > > > think
>> > > > > > > we're
>> > > > > > > > > safe with not covering them specifically - this does not
>> mean
>> > > > we're
>> > > > > > > > > entirely deprecating them.
>> > > > > > > > >
>> > > > > > > > > One thing to note here is that we're not testing CUDA 9
>> as of
>> > > > now.
>> > > > > > > > > Considering that the Telsa architecture (Titan V, V100)
>> > requires
>> > > > at
>> > > > > > > least
>> > > > > > > > > CUDA 9 and those are probably the most widely used GPUs
>> for
>> > Deep
>> > > > > > > > Learning,
>> > > > > > > > > we'd probably be covering a wider user-base in comparison
>> to
>> > > > CUDA 8
>> > > > > > if
>> > > > > > > we
>> > > > > > > > > make that switch.
>> > > > > > > > >
>> > > > > > > > > -Marco
>> > > > > > > > >
>> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM, Naveen Swamy <
>> > > > [email protected]>
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Does this mean that MXNet Users who use CUDA 8.0 will
>> not
>> > be
>> > > > > > > > > > supported(since you are stopping to test CUDA 8.0) ? I
>> > suggest
>> > > > we
>> > > > > > at
>> > > > > > > > > least
>> > > > > > > > > > have nightly tests for CUDA 8.0.
>> > > > > > > > > >
>> > > > > > > > > > Do you have a sense of how many users are using CUDA
>> > 8.0/9.0 ?
>> > > > > > > > > >
>> > > > > > > > > > -1
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50 AM, Chris Olivier <
>> > > > > > > [email protected]>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > +0
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Mar 14, 2018 at 9:45 AM, Jin, Hao <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > +1
>> > > > > > > > > > > >
>> > > > > > > > > > > > On 3/14/18, 9:04 AM, "Anirudh" <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > >     +1
>> > > > > > > > > > > >
>> > > > > > > > > > > >     On Mar 14, 2018 8:56 AM, "Wu, Jun" <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > >     > +1
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     > On 3/14/18, 8:52 AM, "Marco de Abreu" <
>> > > > > > > > > > > [email protected]>
>> > > > > > > > > > > >     > wrote:
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >     Hello,
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >     this is a vote to upgrade our CI
>> environment
>> > from
>> > > > > the
>> > > > > > > > > current
>> > > > > > > > > > > > CUDA 8.0
>> > > > > > > > > > > >     > with
>> > > > > > > > > > > >     >     CuDNN 5.0 to CUDA 9.1 with CuDNN 7.0.
>> Reason
>> > > > being
>> > > > > > that
>> > > > > > > > > NVCC
>> > > > > > > > > > > > under
>> > > > > > > > > > > >     > CUDA 8
>> > > > > > > > > > > >     >     does not support the Volta GPUs used in
>> AWS
>> > P3
>> > > > > > > instances
>> > > > > > > > > and
>> > > > > > > > > > > thus
>> > > > > > > > > > > >     > limiting
>> > > > > > > > > > > >     >     our test capabilities. More details are
>> > available
>> > > > > at
>> > > > > > > [1].
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >     In order to introduce support for
>> > Quantization
>> > > > [1],
>> > > > > > I'd
>> > > > > > > > > like
>> > > > > > > > > > to
>> > > > > > > > > > > >     > perform a
>> > > > > > > > > > > >     >     system-wide upgrade. This should have no
>> > negative
>> > > > > > > impact
>> > > > > > > > in
>> > > > > > > > > > our
>> > > > > > > > > > > > users
>> > > > > > > > > > > >     > but
>> > > > > > > > > > > >     >     rather makes sure that we're actually
>> testing
>> > > > with
>> > > > > > the
>> > > > > > > > > latest
>> > > > > > > > > > > >     > versions. The
>> > > > > > > > > > > >     >     PR is available at [3].
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >     This means that we would stop verifying
>> CUDA
>> > 8
>> > > > and
>> > > > > > > CuDNN
>> > > > > > > > > 5.0
>> > > > > > > > > > as
>> > > > > > > > > > > > part
>> > > > > > > > > > > >     > of our
>> > > > > > > > > > > >     >     PR process. At a later point in time, this
>> > could
>> > > > be
>> > > > > > > > picked
>> > > > > > > > > up
>> > > > > > > > > > > as
>> > > > > > > > > > > > a
>> > > > > > > > > > > >     >     candidate for an integration test as part
>> of
>> > the
>> > > > > > > nightly
>> > > > > > > > > > suite.
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >     This is a lazy vote, ending on 17th of
>> March,
>> > > > 2018
>> > > > > at
>> > > > > > > > 17:00
>> > > > > > > > > > > (UTC
>> > > > > > > > > > > > +1).
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >     Best regards,
>> > > > > > > > > > > >     >     Marco
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >     [1]:
>> > > > > https://issues.apache.org/jira/browse/MXNET-99
>> > > > > > > > > > > >     >     [2]: https://github.com/apache/
>> > > > > > > incubator-mxnet/pull/9552
>> > > > > > > > > > > >     >     [3]: https://github.com/apache/
>> > > > > > > > incubator-mxnet/pull/10108
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >     >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Reply via email to