>>>>> Ben Bolker >>>>> on Wed, 9 Jun 2021 21:11:18 -0400 writes:
> Nice! Indeed -- and thanks a lot, Dario (and Martin Morgan !) for getting down to the root problem. so, indeed a bug in Matrix (though "far away" from 'dgTMatrix'). Thank you once more! Martin Maechler > On 6/9/21 9:00 PM, Dario Strbenac via R-devel wrote: >> Good day, >> >> Thanks to handy hints from Martin Morgan, I ran R under gdb and checked for any numeric overflow. We pinpointed the cause: >> >> (gdb) info locals >> i = 0 >> j = 10738 >> m = 200000 >> n = 50000 >> ans = 0x55555b332790 >> aa = 0x55555b3327c0 >> >> There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] += xx[i + j * m]; >> >> i + j * m are all int, and overflow >> (lldb) print 0 + 10738 * 200000 >> (int) $5 = -2147367296 >> >> So, either the code should check that this doesn't occur, or be adjusted to allow for large indexes. >> >> If anyone is interested, this is in the context of single-cell ATAC-seq data, which typically has about 200000 genomic regions (rows) and perhaps 100000 biological cells (columns). >> >> -------------------------------------- >> Dario Strbenac >> University of Sydney >> Camperdown NSW 2050 >> Australia >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel