Re: CUDA Support [DISCUSS]

Bhavin Thaker Sat, 06 Jan 2018 09:28:28 -0800

Hi Kellen,

Here is my opinion and stand on this:


I see no need to test on CUDA8 in Apache MXNet CI, especially when CUDA9 is
backward compatible with earlier Nvidia hardware generations. There is time
and resources cost to maintaining the various combinations in the CI and so
I am NOT in favor of running CUDA8 in CI unless there is a technical
reason/requirement for it. This approach helps to encourage users to move
to the latest CUDA version and thus keep the open-source community’s
maintenance cost low for the generic option of CUDA9.

For example: If a user opens a github issue/problem with Apache MXNet and
CUDA8, I would ask the user to test it with CUDA9. If the problem happens
only on CUDA8, then a volunteer in the community may work on it. If the
problem happens on CUDA9 as well, then, in my humble opinion, and this
problem must be fixed by the community. In short, I propose that the MXNet
CI run tests only with latest CUDA9 version and NOT CUDA8.

I am eager to hear alternate viewpoints/corrections from folks other than
Kellen and me.

Bhavin Thaker.

On Sat, Jan 6, 2018 at 8:24 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Thanks for the thoughts Bhavin, supporting the latest release would also be
> an option, and it would be easier from a support point of view.
>
> "2) I think your question probably is what should be tested by the Apache
> MXNet CI and NOT what is supported by Apache MXNet, correct?"
>
> I view these two things as being closely related, if not equivalent.  If we
> don't run at least basic tests of old versions of CUDA I think there will
> be issues that slip through.  That being said we can rely on users to
> report these issues, and chances are we'll be able to provide backwards
> compatible patches.  At a minimum I'd recommend we should run tests on all
> supported CUDA versions before a release.
>
> -Kellen
>
>
> On Sat, Jan 6, 2018 at 5:05 PM, Bhavin Thaker <bhavintha...@gmail.com>
> wrote:
>
> > Hi Kellen,
> >
> > 1) Does Apache MXNet (Incubating) have a support matrix? I think the
> answer
> > is no, because I don’t know of where it is documented. One of the mentors
> > told me earlier that the community uses and modifies the open-source
> > project as per their individual  requirements or those of the community.
> As
> > far as I know, there is no single entity that is responsible for
> supporting
> > something in MXNet — corrections to my understanding are welcome.
> >
> > 2) I think your question probably is what should be tested by the Apache
> > MXNet CI and NOT what is supported by Apache MXNet, correct?
> >
> > If yes, I propose testing only the latest CUDA9 and the respective latest
> > cuDNN version in the MXNet CI since CUDA9 is backward compatible with
> > earlier Nvidia hardware generations.
> >
> > I would like to hear reasons why this would not work.
> >
> > I have commented on the github issue as well:
> > https://github.com/apache/incubator-mxnet/issues/8805
> >
> > Bhavin Thaker.
> >
> > On Sat, Jan 6, 2018 at 3:30 AM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Hello all, I'd like to propose that we nail down exactly which versions
> > of
> > > CUDA we're supporting.  We can then ensure that we've got good test
> > > coverage for those specific versions in CI.  At the moment it's
> ambiguous
> > > what our current policy is.  I.e. when do we drop support for old
> > > versions?  As a result we potentially cut a release promising to
> support
> > a
> > > certain version of CUDA, then retroactively drop support after we find
> an
> > > issue.
> > >
> > > I'd like to propose that we officially support N, and N-1 versions of
> > CUDA,
> > > where N is the most recent major version release.  In addition we can
> do
> > > our best to support libraries that are available for download for those
> > > versions.  Supporting these CUDA versions would also dictate which
> > hardware
> > > we support in terms of compute capability (of course resource
> constraints
> > > would also play some role in our ability to support some hardware).
> > >
> > > As an example this would mean that currently we'd officially support
> CUDA
> > > 9.* and 8.  This would imply we support CUDNN 5.1 through 7, as those
> > > libraries are available for CUDA 8, and 9.  It would also mean we
> support
> > > 3.0-7.x (Kepler, Maxwell, Pascal, Volta) taking the more restrictive
> > > hardware requirements of CUDA 9 into account.
> > >
> > > What do you all think?  Would this be a reasonable support strategy?
> Are
> > > these the versions you'd like to see covered in CI?
> > >
> > > -Kellen
> > >
> > > A relevant issue:
> https://github.com/apache/incubator-mxnet/issues/8805
> > >
> >
>

Re: CUDA Support [DISCUSS]

Reply via email to