Nice!

On 6/9/21 9:00 PM, Dario Strbenac via R-devel wrote:
Good day,

Thanks to handy hints from Martin Morgan, I ran R under gdb and checked for any 
numeric overflow. We pinpointed the cause:

(gdb) info locals
i = 0
j = 10738
m = 200000
n = 50000
ans = 0x55555b332790
aa = 0x55555b3327c0

There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] += xx[i 
+ j * m];

i  + j * m are all int, and overflow
(lldb) print 0 + 10738 * 200000
(int) $5 = -2147367296

So, either the code should check that this doesn't occur, or be adjusted to 
allow for large indexes.

If anyone is interested, this is in the context of single-cell ATAC-seq data, 
which typically has about 200000 genomic regions (rows) and perhaps 100000 
biological cells (columns).

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to