> We should support it, but it still seems hypothetical and not urgent. FWIW, cuBLAS only just added 64-bit int support with CUDA 12 (naturally, with a completely separate API).
More generally, it would be interesting to know the breakdown of installed CUDA versions for users. Unlike compilers etc, I suspect that cluster admins (and those running on local machines) are much more likely to be updating their CUDA toolkits to the latest versions as they often contain critical performance improvements. It would help us decide on minimum version to support. We don’t have any real idea of the current minimum version, last time it was estimated to be CUDA 7 IIRC? Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 11, 2023, at 15:38, Jed Brown <j...@jedbrown.org> wrote: > > Rohan Yadav <roh...@alumni.cmu.edu> writes: > >> With modern GPU sizes, for example A100's with 80GB of memory, a vector of >> length 2^31 is not that much memory -- one could conceivably run a CG solve >> with local vectors > 2^31. > > Yeah, each vector would be 8 GB (single precision) or 16 GB (double). You > can't store a matrix of this size, and probably not a "mesh", but it's > possible to create such a problem if everything is matrix-free (possibly with > matrix-free geometric multigrid). This is more likely to show up in a > benchmark than any real science or engineering probelm. We should support it, > but it still seems hypothetical and not urgent. > >> Thanks Junchao, I might look into that. However, I currently am not trying >> to solve such a large problem -- these questions just came from wondering >> why the cuSPARSE kernel PETSc was calling was running faster than mine. > > Hah, bandwidth doesn't like. ;-)