Hello, the migration has just been completed and we're now running our UNIX based slaves on CUDA 9.1 with CuDNN 7. The commit is available at https://github.com/apache/incubator-mxnet/commit/b0a6760efa141aeca87b03ecf34dae924bd1af46 .
No jobs have been interrupted by this migration. If you encounter any errors, please reach back to me. Best regards, Marco On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu < [email protected]> wrote: > Hello, > > the results of this vote are as follows: > > +1: > Jun > Anirudh > Hao > Marco > > 0: > Chris > > -1: > Naveen (veto recalled as of https://lists.apache.org/thread.html/ > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@% > 3Cdev.mxnet.apache.org%3E) > > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on > UNIX slaves and work on integration tests for CUDA 8 in the long term, this > vote counts as PASSED. > > The PR for this change is available at https://github.com/apache/ > incubator-mxnet/pull/10108. I have developed and tested the new slaves in > our test environment and everything looks promising so far. The plan is as > follows: > > 1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved > to allow self-merge – CI can’t pass until slaves have been upgraded. > 2. Replace all existing slaves with new upgraded slaves. > 3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to > merge necessary changes into master. > > IMPORTANT: The migration will happen tomorrow, so please expect some delay > in job execution - the CI website will be unaffected. Ideally, no jobs > should fail - in case they do, please feel free to retrigger them by using > an empty commit. In case of any errors appearing after the upgrade, don't > hesitate to contact me! > > Best regards, > Marco > > > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <[email protected]> wrote: > >> Yes, for short-term. >> >> On Monday, March 19, 2018, Chris Olivier <[email protected]> wrote: >> >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and >> Windows >> > CUDA 8 in order to get CUDA version coverage? >> > >> > On 2018/03/16 21:09:09, Marco de Abreu <[email protected]> >> > wrote: >> > > Thanks for your input. How would you propose to proceed in terms of a >> > > timeline in case this vote succeedes? I don't really have time to work >> > on a >> > > nightly setup right now. Would anybody in the community be able to >> help >> > me >> > > out here or shall we wait with the migration until a nightly setup for >> > CUDA >> > > 8 is up? >> > > >> > > -Marco >> > > >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker < >> [email protected]> >> > > wrote: >> > > >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and >> > using >> > > > CUDA9 for most instances in CI. >> > > > >> > > > Bhavin Thaker. >> > > > >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <[email protected]> >> > wrote: >> > > > >> > > > > I think its best to add support for CUDA 9.0 while retaining >> existing >> > > > > support for CUDA 8, code might regress when you remove and create >> > more >> > > > work >> > > > > to add CUDA 8 support back. >> > > > > >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu < >> > > > > [email protected]> wrote: >> > > > > >> > > > > > Yeah, sorry Chris, mixed up the names. >> > > > > > >> > > > > > @Naveen: Would you be fine with doing the switch now and adding >> > > > > integration >> > > > > > tests later or is this a hard constraint for you? >> > > > > > >> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier < >> > [email protected]> >> > > > > > wrote: >> > > > > > >> > > > > > > Isn't the TItan V the Volta and not the Tesla? >> > > > > > > >> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy < >> > [email protected]> >> > > > > > wrote: >> > > > > > > >> > > > > > > > Marco, >> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for >> adding >> > > > CUDA >> > > > > 9. >> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think >> that >> > all >> > > > > > users >> > > > > > > > might not have switched to CUDA 9.0 >> > > > > > > > >> > > > > > > > Look at the earlier discussion on the same topic >> > > > > > > > >> > > > > > > > https://lists.apache.org/thread.html/ >> > > > 27b84e4fc0e0728f2e4ad8b6827d7f >> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E >> > > > > > > > >> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu < >> > > > > > > > [email protected]> wrote: >> > > > > > > > >> > > > > > > > > Right, the code changes would not be validated against >> CUDA >> > 8.0 >> > > > as >> > > > > > part >> > > > > > > > of >> > > > > > > > > the PR process. >> > > > > > > > > >> > > > > > > > > I don't have any numbers, but it's pretty unlikely that >> > anybody >> > > > is >> > > > > > > still >> > > > > > > > > using CUDA 8.0. According to >> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported, the >> > devices >> > > > > which >> > > > > > > are >> > > > > > > > > not being supported by CUDA 9 are under the Fermi >> > architecture >> > > > > which >> > > > > > > has >> > > > > > > > > been released in April 2010. These GPUs are way too old, >> so I >> > > > think >> > > > > > > we're >> > > > > > > > > safe with not covering them specifically - this does not >> mean >> > > > we're >> > > > > > > > > entirely deprecating them. >> > > > > > > > > >> > > > > > > > > One thing to note here is that we're not testing CUDA 9 >> as of >> > > > now. >> > > > > > > > > Considering that the Telsa architecture (Titan V, V100) >> > requires >> > > > at >> > > > > > > least >> > > > > > > > > CUDA 9 and those are probably the most widely used GPUs >> for >> > Deep >> > > > > > > > Learning, >> > > > > > > > > we'd probably be covering a wider user-base in comparison >> to >> > > > CUDA 8 >> > > > > > if >> > > > > > > we >> > > > > > > > > make that switch. >> > > > > > > > > >> > > > > > > > > -Marco >> > > > > > > > > >> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM, Naveen Swamy < >> > > > [email protected]> >> > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Does this mean that MXNet Users who use CUDA 8.0 will >> not >> > be >> > > > > > > > > > supported(since you are stopping to test CUDA 8.0) ? I >> > suggest >> > > > we >> > > > > > at >> > > > > > > > > least >> > > > > > > > > > have nightly tests for CUDA 8.0. >> > > > > > > > > > >> > > > > > > > > > Do you have a sense of how many users are using CUDA >> > 8.0/9.0 ? >> > > > > > > > > > >> > > > > > > > > > -1 >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50 AM, Chris Olivier < >> > > > > > > [email protected]> >> > > > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > +0 >> > > > > > > > > > > >> > > > > > > > > > > On Wed, Mar 14, 2018 at 9:45 AM, Jin, Hao < >> > [email protected]> >> > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > > +1 >> > > > > > > > > > > > >> > > > > > > > > > > > On 3/14/18, 9:04 AM, "Anirudh" < >> [email protected] >> > > >> > > > > wrote: >> > > > > > > > > > > > >> > > > > > > > > > > > +1 >> > > > > > > > > > > > >> > > > > > > > > > > > On Mar 14, 2018 8:56 AM, "Wu, Jun" < >> > [email protected]> >> > > > > > wrote: >> > > > > > > > > > > > >> > > > > > > > > > > > > +1 >> > > > > > > > > > > > > >> > > > > > > > > > > > > On 3/14/18, 8:52 AM, "Marco de Abreu" < >> > > > > > > > > > > [email protected]> >> > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > >> > > > > > > > > > > > > Hello, >> > > > > > > > > > > > > >> > > > > > > > > > > > > this is a vote to upgrade our CI >> environment >> > from >> > > > > the >> > > > > > > > > current >> > > > > > > > > > > > CUDA 8.0 >> > > > > > > > > > > > > with >> > > > > > > > > > > > > CuDNN 5.0 to CUDA 9.1 with CuDNN 7.0. >> Reason >> > > > being >> > > > > > that >> > > > > > > > > NVCC >> > > > > > > > > > > > under >> > > > > > > > > > > > > CUDA 8 >> > > > > > > > > > > > > does not support the Volta GPUs used in >> AWS >> > P3 >> > > > > > > instances >> > > > > > > > > and >> > > > > > > > > > > thus >> > > > > > > > > > > > > limiting >> > > > > > > > > > > > > our test capabilities. More details are >> > available >> > > > > at >> > > > > > > [1]. >> > > > > > > > > > > > > >> > > > > > > > > > > > > In order to introduce support for >> > Quantization >> > > > [1], >> > > > > > I'd >> > > > > > > > > like >> > > > > > > > > > to >> > > > > > > > > > > > > perform a >> > > > > > > > > > > > > system-wide upgrade. This should have no >> > negative >> > > > > > > impact >> > > > > > > > in >> > > > > > > > > > our >> > > > > > > > > > > > users >> > > > > > > > > > > > > but >> > > > > > > > > > > > > rather makes sure that we're actually >> testing >> > > > with >> > > > > > the >> > > > > > > > > latest >> > > > > > > > > > > > > versions. The >> > > > > > > > > > > > > PR is available at [3]. >> > > > > > > > > > > > > >> > > > > > > > > > > > > This means that we would stop verifying >> CUDA >> > 8 >> > > > and >> > > > > > > CuDNN >> > > > > > > > > 5.0 >> > > > > > > > > > as >> > > > > > > > > > > > part >> > > > > > > > > > > > > of our >> > > > > > > > > > > > > PR process. At a later point in time, this >> > could >> > > > be >> > > > > > > > picked >> > > > > > > > > up >> > > > > > > > > > > as >> > > > > > > > > > > > a >> > > > > > > > > > > > > candidate for an integration test as part >> of >> > the >> > > > > > > nightly >> > > > > > > > > > suite. >> > > > > > > > > > > > > >> > > > > > > > > > > > > This is a lazy vote, ending on 17th of >> March, >> > > > 2018 >> > > > > at >> > > > > > > > 17:00 >> > > > > > > > > > > (UTC >> > > > > > > > > > > > +1). >> > > > > > > > > > > > > >> > > > > > > > > > > > > Best regards, >> > > > > > > > > > > > > Marco >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > [1]: >> > > > > https://issues.apache.org/jira/browse/MXNET-99 >> > > > > > > > > > > > > [2]: https://github.com/apache/ >> > > > > > > incubator-mxnet/pull/9552 >> > > > > > > > > > > > > [3]: https://github.com/apache/ >> > > > > > > > incubator-mxnet/pull/10108 >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
