Currently, the class for dense symmetric matrices in packed storage, "dspMatrix", inherits its subset (i.e., `[`) methods from the "Matrix" class. As a result, subsetting "dspMatrix" requires coercion to "matrix". If memory cannot be allocated for this "matrix", then an error results.

n <- 30000L
x <- as.double(seq_len(choose(n + 1L, 2L)))
S <- new("dspMatrix", uplo = "L", x = x, Dim = c(n, n))

S[, 1L]
# Error: vector memory exhausted (limit reached?)

traceback()
# 10: .dsy2mat(dsp2dsy(from))
# 9: asMethod(object)
# 8: as(x, "matrix")
# 7: x[, j = j, drop = TRUE]
# 6: x[, j = j, drop = TRUE]
# 5: eval(call, parent.frame())
# 4: eval(call, parent.frame())
# 3: callGeneric(x, , j = j, drop = TRUE)
# 2: S[, 1L]
# 1: S[, 1L]

This seems entirely avoidable, given that there is a relatively simple formula for converting 2-ary indices [i,j] of S to 1-ary indices k of S[lower.tri(S, TRUE)]:

k <- i + round(0.5 * (2L * n - j) * (j - 1L)) # for i >= j

Has the implementation of `[` for class "dspMatrix" been discussed already elsewhere? If not, shall I report it to Bugzilla as a wishlist item?

FWIW, I encountered this problem while trying to coerce a large "dist" object to "dspMatrix", expecting that I would be able to safely subset the result:

setAs(from = "dist", to = "dspMatrix", function(from) {
  p <- length(from)
  if (p > 0L) {
    n <- as.integer(round(0.5 * (1 + sqrt(1 + 8 * p)))) # p = n*(n-1)/2
    x <- double(p + n)
    i <- 1L + c(0L, cumsum(n:2L))
    x[-i] <- from
  } else {
    n <- 1L
    x <- 0
  }
  new("dspMatrix", uplo = "L", x = x, Dim = c(n, n))
})

But that attempt failed for a entirely different reason. For large enough n, I would hit a memory limit at the line:

x[-i] <- from

So I guess my second question is: what is problematic about this benign-seeming subset-assignment?

Mikael

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to