On 1 April 2013 at 17:04, Ramon Diaz-Uriarte wrote: | | | | On Mon, 1 Apr 2013 08:15:48 -0500,Dirk Eddelbuettel <e...@debian.org> wrote: | | > On 1 April 2013 at 14:48, Ramon Diaz-Uriarte wrote: | > | | > | Dear All, | > | | > | I am confused about creating Rcpp Numeric Matrices larger than | > | .Machine$integer.max. The code below illustrates some of the points | > | (probably with too much detail ;-). These are some things that puzzle me: | | > Which R version did you use? | | Ooops, sorry. | | > version | _ | platform x86_64-pc-linux-gnu | arch x86_64 | os linux-gnu | system x86_64, linux-gnu | status Patched | major 2 | minor 15.3
I think you can't really expect this to work. R, up to this version, has the very famous 2^31 - 1 index limit. | year 2013 | month 03 | day 03 | svn rev 62150 | language R | version.string R version 2.15.3 Patched (2013-03-03 r62150) | nickname Security Blanket | | | | > Does what you attempt work _in straight C code | > bypassing Rcpp_ ? | | In straight C++, using std::vector, this works (though not, as I tried it, | in naive straight C, as shown in the comments). It will use ~ 35 GB of | memory: Sure, but "does not matter" as it is outside of R. In R, you can do this _if you go the route of outside memory management_ as eg bigmemory and ff do. | #include <iostream> | #include <vector> | #include <iterator> | | int main() { | | // double v1[500000L * 9000L]; // this segfaults | // double v1[4300000000]; // this segfaults | | std::vector<double> v2(500000L * 9000L); | std::cout << " Max size v2: " << v2.max_size() << std::endl; | std::cout << " Current size v2: " << v2.size() << std::endl; | | double tt = 0; | for(size_t t = 0; t < v2.size(); ++t) | v2[t] = ++tt; | std::cout << "\n Assigned to vector" << std::endl; | std::cout << "\n Last value is " << v2[(500000L * 9000L) - 1] << std::endl; | return 0; | } | | Anyway, I guess the example is not really relevant for this case. Agreed. | > If you used R 2.*, then the attempt makes little sense AFAICT. | | Sorry, I was not clear. I was not (consciously) _attempting_ to do | that. In my "for real" code the dimensions of the object are set almost at | the end of a long simulation and in a few cases those numbers were much | larger than I expected (I did not realize how big until I started looking | into the segfaults and the errors). I understand. But I think you should consider writing some sort of "reducers" to not require to swallow that whole object. | What I found confusing was the segmentation fault, because the behavior | seems inconsistent. Sometimes there was no segfault because the error | ("negative length vectors are not allowed (...)") was triggered. But | sometimes the object seemed to have been created (and thus I assumed sizes | were OK ---yes, before looking at the actual sizes) and then the segfault | took place later. <insert Oscar Wilde quote about conistency being ... just kidding> I think we simply see an error condition for undefined behaviour. Dirk | | | | | R. | | | > If you used R 3.0.0, then you may have noticed that R is ahead of us, and you | > are welcome to help close the gap :) | | > Dirk | | | > | 1. For some values of number of rows and columns, creating the matrix is | > | not allowed, with the message "negative length vectors are not allowed", | > | but with other values the creation of the matrix proceeds without | > | (apparent) troubles, even when the total size is >> 2^31 - 1. | > | | > | 1.a. Is this intended? | > | | > | 1.b. I understand the error message is coming from R (not Rcpp) and thus | > | this is not something that can be made easier to understand? | > | | > | | > | 2. The part I found confusing is that the same problem (number of cells > | > | 2^32 - 1) is sometimes caught at object creation, but sometimes manifests | > | itself much later (either in the C++ code or later in R). | > | | > | I was expecting (maybe the problem are my expectations) an error early on, | > | when creating the matrix; if the creation proceeds without trouble, I was | > | not expecting a segfault (as I think all cells are initialized to cero). | > | | > | Is the recommended procedure to check if the product of dimensions is < | > | 2^31 - 1 before creation? (But then, this will change in R-3.0 in 64 bit | > | systems?). | > | | > | | > | Best, | > | | > | R. | > | | > | | > | | > | // Beginning of file max-size.cpp | > | | > | #include <Rcpp.h> | > | | > | using namespace Rcpp; | > | | > | | > | // [[Rcpp::export]] | > | | > | NumericMatrix f1(IntegerVector nr, IntegerVector nc, | > | IntegerVector sf = 0) { | > | int nrow = as<int>(nr); | > | int ncol = as<int>(nc); | > | int segf = as<int>(sf); | > | | > | NumericMatrix outM(nrow, ncol); | > | std::cout << " After creating outM" << std::endl; | > | outM(nrow - 1, 0) = 1; | > | std::cout << " After asigning to last row, first column" | > | << std::endl; | > | | > | std::cout << " Some other value: 1, 0: " | > | << outM(1, 0) << std::endl; | > | | > | if( (nrow > 1) && (ncol > 3) ) | > | std::cout << " Some other value: nrow - 1, ncol - 3: " | > | << outM(nrow - 1, ncol - 3) << std::endl; | > | | > | outM(nrow - 1, ncol - 1) = 1; | > | std::cout << " After asigning something to last cell" | > | << std::endl; | > | | > | std::cout << " Try to return the last assignment: " | > | << outM(nrow - 1, ncol - 1) << std::endl; | > | | > | if((nrow >= 500000) && segf) { | > | std::cout << "\n Assign a few around/beyond 2^32 - 1. Should segfault\n"; | > | for(int i = 4290; i < 4300; ++i) { | > | std::cout << " i = " << i << std::endl; | > | outM(nrow - 1, i) = 0; | > | } | > | } | > | | > | return wrap(outM); | > | } | > | | > | // End of file max-size.cpp | > | | > | | > | | > | | > | | > | ################################################ | > | library(Rcpp) | > | sourceCpp("max-size.cpp", verbose = TRUE) | > | | > | (tmp <- f1(4, 5)) | > | | > | | > | 4294967 * 500 > .Machine$integer.max | > | tmp <- f1(4294967, 500) | > | object.size(tmp)/(4294967 * 500) ## ~ 8 | > | | > | 4294967 * 501 > .Machine$integer.max | > | tmp <- f1(4294967, 501) ## negative length vectors | > | | > | 500000 * 9000 > .Machine$integer.max | > | tmp <- f1(500000, 9000) ## sometimes segfaults | > | tmp[500000, 9000] | > | object.size(tmp) ## things are missing | > | prod(dim(tmp)) > .Machine$integer.max | > | | > | ## using either of these usually leads to segfault | > | | > | for(i in (4290:4300)) print(tmp[500000, i]) | > | | > | f1(500000, 9000, 1) | > | | > | ##################################################### | > | | > | | > | -- | > | Ramon Diaz-Uriarte | > | Department of Biochemistry, Lab B-25 | > | Facultad de Medicina | > | Universidad Autónoma de Madrid | > | Arzobispo Morcillo, 4 | > | 28029 Madrid | > | Spain | > | | > | Phone: +34-91-497-2412 | > | | > | Email: rdia...@gmail.com | > | ramon.d...@iib.uam.es | > | | > | http://ligarto.org/rdiaz | > | | > | | > | _______________________________________________ | > | Rcpp-devel mailing list | > | Rcpp-devel@lists.r-forge.r-project.org | > | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel | > -- | > Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com | -- | Ramon Diaz-Uriarte | Department of Biochemistry, Lab B-25 | Facultad de Medicina | Universidad Autónoma de Madrid | Arzobispo Morcillo, 4 | 28029 Madrid | Spain | | Phone: +34-91-497-2412 | | Email: rdia...@gmail.com | ramon.d...@iib.uam.es | | http://ligarto.org/rdiaz | | -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel