Re: [Rcpp-devel] clang does not like lists with a DataFrame inside?
On Tue, 28-November-2017, at 00:24:33, Dirk Eddelbuettel <e...@debian.org> wrote: > On 28 November 2017 at 00:03, Ramon Diaz-Uriarte wrote: > | Hi Dirk, > | > | > | On Mon, 27-November-2017, at 14:14:09, Dirk Eddelbuettel <e...@debian.org> > wrote: > | > Ramon, > | > > | > On 27 November 2017 at 10:43, Ramon Diaz-Uriarte wrote: > | > | Dear All, > | > | > | > | I am trying to pass to C++ an R list that contains a data frame. Using > gcc > | > | the following > | > | > | > | # > | > | > | > | #include > | > | using namespace Rcpp; > | > | > | > | // [[Rcpp::export]] > | > | void withDF(Rcpp::List bl) { > | > | Rcpp::DataFrame df1 = bl["the_df"]; > | > > | > I think every existing example uses a template when extracting items from > a > | > list as we cannot know at compile time which run-time SEXP we will be > handed. > | > | Aha, understood. I did not think much about it, tried with gcc, it worked, > | and then was surprised to see it not work with clang. > > These slight differences annoyed me at the beginning, but the advantage of > getting older is that you just get generally more forgetful. "Just assume > nothing works unless you are explicit about it" seems to be my current motto > :) I am going to store this safely to remind myself of the (many) advantages of no longer being so young ;-) Best, R. > > as<>() and wrap() are pretty magic. We get by with fewer very explicit > statement than we used to years ago, but some remain, notably > > -- return statements better not be compound > -- assignment from list or data.frame by name > -- [ your issue here, space for rent upon request ] > > [...] > > | > I don't think so. See eg > | > > | > > https://github.com/eddelbuettel/rcppexamples/blob/master/src/ListExample.cpp > | > | Thanks for that. Bookmarked for future reference :-) > > It's old and could be expanded but better than nothing. > > Dirk -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] clang does not like lists with a DataFrame inside?
Hi Dirk, On Mon, 27-November-2017, at 14:14:09, Dirk Eddelbuettel <e...@debian.org> wrote: > Ramon, > > On 27 November 2017 at 10:43, Ramon Diaz-Uriarte wrote: > | Dear All, > | > | I am trying to pass to C++ an R list that contains a data frame. Using gcc > | the following > | > | # > | > | #include > | using namespace Rcpp; > | > | // [[Rcpp::export]] > | void withDF(Rcpp::List bl) { > | Rcpp::DataFrame df1 = bl["the_df"]; > > I think every existing example uses a template when extracting items from a > list as we cannot know at compile time which run-time SEXP we will be handed. Aha, understood. I did not think much about it, tried with gcc, it worked, and then was surprised to see it not work with clang. > > | Rcpp::Rcout << " rows are " << df1.size() << std::endl; > | } > | > | > | > | works fine. > | > | Using clang (version 3.8 in a system with Debian, but I think same issues > | in Mac with clang 4) I get > | > | > | f5.cpp:12:19: error: conversion from 'NameProxy' (aka > 'generic_name_proxy<19>') to 'Rcpp::DataFrame' (aka > | 'DataFrame_Impl') is ambiguous > | Rcpp::DataFrame df1 = bl["the_df"]; > | ^ > | > /home/ramon/Sources/R-3.5.0-73698/library/Rcpp/include/Rcpp/vector/proxy.h:165:3: > note: candidate function [with T = > | Rcpp::DataFrame_Impl] > | operator T() const { > | ^ > | > /home/ramon/Sources/R-3.5.0-73698/library/Rcpp/include/Rcpp/DataFrame.h:51:9: > note: candidate constructor [with T = > | Rcpp::internal::generic_name_proxy<19>] > | DataFrame_Impl( const T& obj ) ; > | ^ > | 1 error generated. > | /home/ramon/Sources/R-3.5.0-73698/etc/Makeconf:166: recipe for target > 'f5.o' failed > | make: *** [f5.o] Error 1 > | Error in sourceCpp("f5.cpp", verbose = TRUE, rebuild = TRUE) : > | Error 1 occurred building shared library. > | > | > | > | Note that the following, treating the data frame as a list, works fine with > | both gcc and clang > | > | > | #include > | using namespace Rcpp; > | > | // [[Rcpp::export]] > | void withList(Rcpp::List bl) { > | Rcpp::List df1 = bl["the_df"]; > | Rcpp::Rcout << " size is " << df1.size() << std::endl; > | Rcpp::IntegerVector c1 = df1["first"]; > | Rcpp::Rcout << " rows are " << c1.size() << std::endl; > | } > | > | > | Is this bug? > > I don't think so. See eg > > https://github.com/eddelbuettel/rcppexamples/blob/master/src/ListExample.cpp Thanks for that. Bookmarked for future reference :-) Best, R. > > Dirk > > | > | Best, > | > | > | R. > | > | -- > | Ramon Diaz-Uriarte > | Department of Biochemistry, Lab B-25 > | Facultad de Medicina > | Universidad Autónoma de Madrid > | Arzobispo Morcillo, 4 > | 28029 Madrid > | Spain > | > | Phone: +34-91-497-2412 > | > | Email: rdia...@gmail.com > |ramon.d...@iib.uam.es > | > | http://ligarto.org/rdiaz > | ___ > | Rcpp-devel mailing list > | Rcpp-devel@lists.r-forge.r-project.org > | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] clang does not like lists with a DataFrame inside?
Dear All, I am trying to pass to C++ an R list that contains a data frame. Using gcc the following # #include using namespace Rcpp; // [[Rcpp::export]] void withDF(Rcpp::List bl) { Rcpp::DataFrame df1 = bl["the_df"]; Rcpp::Rcout << " rows are " << df1.size() << std::endl; } works fine. Using clang (version 3.8 in a system with Debian, but I think same issues in Mac with clang 4) I get f5.cpp:12:19: error: conversion from 'NameProxy' (aka 'generic_name_proxy<19>') to 'Rcpp::DataFrame' (aka 'DataFrame_Impl') is ambiguous Rcpp::DataFrame df1 = bl["the_df"]; ^ /home/ramon/Sources/R-3.5.0-73698/library/Rcpp/include/Rcpp/vector/proxy.h:165:3: note: candidate function [with T = Rcpp::DataFrame_Impl] operator T() const { ^ /home/ramon/Sources/R-3.5.0-73698/library/Rcpp/include/Rcpp/DataFrame.h:51:9: note: candidate constructor [with T = Rcpp::internal::generic_name_proxy<19>] DataFrame_Impl( const T& obj ) ; ^ 1 error generated. /home/ramon/Sources/R-3.5.0-73698/etc/Makeconf:166: recipe for target 'f5.o' failed make: *** [f5.o] Error 1 Error in sourceCpp("f5.cpp", verbose = TRUE, rebuild = TRUE) : Error 1 occurred building shared library. Note that the following, treating the data frame as a list, works fine with both gcc and clang #include using namespace Rcpp; // [[Rcpp::export]] void withList(Rcpp::List bl) { Rcpp::List df1 = bl["the_df"]; Rcpp::Rcout << " size is " << df1.size() << std::endl; Rcpp::IntegerVector c1 = df1["first"]; Rcpp::Rcout << " rows are " << c1.size() << std::endl; } Is this bug? Best, R. -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] mixing R's and C++'s RNGs and distributions
On Wed, 24-06-2015, at 15:22, Matt D. mat...@gmail.com wrote: On 6/24/2015 15:07, Ramon Diaz-Uriarte wrote: Hi Matt, Thanks a lot for the details and the work. That is great! There is a problem, though: in my particular case, I am uploading my package to BioConcutor, and there the compiler for Win is 4.6.3 so I am restricted to that. Including randutils will lead to an error during building the package in Windows. Yeah, in this case I think replacing the source of entropy with something else may be the compromise choice (it's a range eventually passed to `mix_entropy`, so I'd investigate the effects on that). That all being in the meantime / while waiting for the toolchain to catch up, of course... But I wonder if it is worth the effort. Best, Matt Best, R. On Wed, 24-06-2015, at 14:55, Matt D. mat...@gmail.com wrote: On 6/22/2015 12:31, Ramon Diaz-Uriarte wrote: Actually, I just noticed that things will not work if you need your package to run on Windoze: Rtools uses gcc 4.6.3 there, and this will not work with gcc 4.6 (neither in Linux nor Windows) with flag -std=c++0x or -std=gnu++0x. I guess this should be fixable, but I do not know enough to do it. Hi again! I've just tried with the work-in-progress, _experimental_ version available at the following location: https://rawgit.com/kevinushey/RToolsToolchainUpdate/master/mingwnotes.html // In particular, I've used Windows native compiler for 64 bit Windows output mingw32mingw64_gcc-4.9.2.toolchain.tar.gz. This works better -- the only missing part is thread support :-( However, it is not exactly essential here, in that it's used solely in one place -- to get some extra entropy; that's all. After temporarily removing dependence on `std::this_thread::get_id()`, the example -- available at http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html -- compiles and runs successfully. Incidentally, one can get threads support for MinGW using MSYS2, which gives: Thread model: posix gcc version 4.9.2 (Rev5, Built by MSYS2 project) However, the MinGW that comes with Rtools uses the following: Thread model: win32 gcc version 4.9.2 (GCC) I presume there must be a reason for that. There are certainly trade-offs present: https://wiki.qt.io/MinGW-64-bit#GCC_Threading_model_.28posix_vs_win32.29 What hits us here is the no C++11 thread, mutex, or future (C11 appears to be a typo) part for the win32 choice. // See also: https://stackoverflow.com/questions/17242516/mingw-w64-threads-posix-vs-win32 Note that there's nothing multithreading-specific about the library, so even though I've also tested it with https://github.com/meganz/mingw-std-threads (warning: just something I've ran across while searching for multithreading support for MinGW built w/ win32 threading model, quality and license unknown) and also made it compile work after a small patch (adding the std::hash specialization), it's probably possible to use another source of entropy here. I presume another idea would be to use a different GCC version, but that's rather tedious -- https://stackoverflow.com/questions/25455829/using-a-different-gcc-version-from-that-included-with-rtools-with-rcpp-on-window Other than the above, I'm wondering myself what's the official recommendation for the C++11 threading support w/ Rcpp on Windows. Best, Matt -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] bool from list: unexpected result and differences between compilers/OSs
Hi Dirk, On Mon, 22-06-2015, at 13:21, Dirk Eddelbuettel e...@debian.org wrote: On 22 June 2015 at 10:25, Ramon Diaz-Uriarte wrote: | | Dear All, | | Accessing an element of a list that should be TRUE/FALSE does not recognize | the boolean properly in Windows/gcc-4.6.3 unless I use as. It does the | same surprising thing in Debian Linux with gcc-4.6.4. | | But it does do what I expect in Linux, with both gcc (4.9.2) and | clang++-libc++ (3.5.2-1). In Windows, I am using Rtools33.exe, the latest | Rtools.exe (downloaded last night). | | | | Given the above, is it the proper way to proceed to always use asbool? | Or, just being extra careful, should we aswhatever (double, int, etc)? R has TRUE, FALSE and NA in a logical -- three values. That can breaks Ooops, I had read (and even known) that in the past. automagic conversion to bool. So I think asbool is a safe bet here. In general, in does not hurt to help the compiler -- but you don't always have Aha. I'll start being more explicit. Still, are the differences between compilers expected/known? Best, R. to do it. Also, for bool, you can test explicitly for FALSE (or TRUE, or NA) though. Dirk -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] bool from list: unexpected result and differences between compilers/OSs
Dear All, Accessing an element of a list that should be TRUE/FALSE does not recognize the boolean properly in Windows/gcc-4.6.3 unless I use as. It does the same surprising thing in Debian Linux with gcc-4.6.4. But it does do what I expect in Linux, with both gcc (4.9.2) and clang++-libc++ (3.5.2-1). In Windows, I am using Rtools33.exe, the latest Rtools.exe (downloaded last night). Given the above, is it the proper way to proceed to always use asbool? Or, just being extra careful, should we aswhatever (double, int, etc)? Best, R. / #include Rcpp.h using namespace Rcpp; // [[Rcpp::export]] bool readBool(bool boolR) { bool boolC = boolR; Rcpp::Rcout boolC boolC std::endl; return boolC; } // Does not do what I expect in windows // [[Rcpp::export]] bool readList(Rcpp::List theList) { bool boolC = theList[theBool]; Rcpp::Rcout boolC boolC std::endl; return boolC; } // The following works OK in the three systems // [[Rcpp::export]] bool readList2(Rcpp::List theList) { bool boolC = asbool(theList[theBool]); Rcpp::Rcout boolC boolC std::endl; return boolC; } /***R readBool(TRUE) readBool(FALSE) lTrue - list(a = 99, theBool = TRUE) lFalse - list(a = 99, theBool = FALSE) readList(lTrue) == lTrue[[theBool]] readList(lFalse) == lFalse[[theBool]] ## WRONG! should be 0 readList2(lTrue) == lTrue[[theBool]] readList2(lFalse) == lFalse[[theBool]] ## OK now: should be 0 */ -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] mixing R's and C++'s RNGs and distributions
As a follow up to the thread below, using randutils has turned out to be very simple (and provides some additional nice features, such as pick, etc). Directly using it in an R package makes R CMD check complain about the usage of the Exit function (only its address is used, not the function itself). That can be solved by changing the line auto exit_func = hash(_Exit); by, say auto getenv_func = hash(getenv); and making the corresponding change a little bit further below. Best, R. On Fri, 19-06-2015, at 00:13, Matt D. mat...@gmail.com wrote: On 6/18/2015 14:34, Ramon Diaz-Uriarte wrote: Dear All, Sometimes I use both R's and C++11's RNGs and distributions in the same code base (I am not using OpenMP or similar). Although this might not be very elegant, I find it convenient (use C++'s or R's, depending on which one fits my problem better ---in particular, many distributions are not readily available from C++). In C++ I tend to use std::mt19937, often with a seed generated in R as seed - as.integer(round(runif(1, min = 0, max = 2^16))) and passed to the C++ code. Seeding the Mersenne Twister PRNG can a bit subtle: I've ran across a series of posts a while ago that have brought up a couple of issues I haven't considered before -- perhaps you'll also find them of interest. The discussion threads (involving the author) may also be worth a look. (There seems to be a connectivity issue with www.pcg-random.org at the moment, so also providing the archive.is URLs to be on the safe side.) http://www.pcg-random.org/posts/cpp-seeding-surprises.html // http://archive.is/http://www.pcg-random.org/posts/cpp-seeding-surprises.html Discussions: http://www.reddit.com/r/cpp/comments/32u4m7/the_behavior_of_rng_seeding_in_c_may_surprise/ http://www.reddit.com/r/programming/comments/32uo1p/c_seeding_surprises/ https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html // http://archive.is/http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html http://archive.is/http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html // http://archive.is/http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html In particular, `randutils::auto_seed_128` and `randutils::auto_seed_256` from the last post may be a choice worth consideration: https://gist.github.com/imneme/540829265469e673d045#file-randutils-hpp Discussion: http://www.reddit.com/r/cpp/comments/34yqxa/announcing_randutils_a_single_portable/ One other thing useful to know (in the context of, say, reproducibility) is that (in contrast with the PRNGs themselves) the distributions' algorithms are not mandated, so implementations can vary: http://www.reddit.com/r/cpp/comments/30w7cs/inconsistency_in_c_random/ It is my understanding that similar ideas (seeding C++'s RNG from R and combining C++ with R's RNG) have been used before in much more complex settings (e.g., http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-April/007510.html and http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-April/007512.html), but I wonder if there are problems I cannot think of. A silly example follows below. The links seems to be referring to a parallel computing context. Given that, I'd actually consider using Random123: http://www.thesalmons.org/john/random123/ https://github.com/DEShawResearch/Random123-Boost // a nice, brief overview of the advantages in the parallel computing context (but also potentially applicable elsewhere) Best, Matt ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] mixing R's and C++'s RNGs and distributions
Dear Matt, On Fri, 19-06-2015, at 00:13, Matt D. mat...@gmail.com wrote: On 6/18/2015 14:34, Ramon Diaz-Uriarte wrote: Dear All, Sometimes I use both R's and C++11's RNGs and distributions in the same code base (I am not using OpenMP or similar). Although this might not be very elegant, I find it convenient (use C++'s or R's, depending on which one fits my problem better ---in particular, many distributions are not readily available from C++). In C++ I tend to use std::mt19937, often with a seed generated in R as seed - as.integer(round(runif(1, min = 0, max = 2^16))) and passed to the C++ code. Seeding the Mersenne Twister PRNG can a bit subtle: I've ran across a series of posts a while ago that have brought up a couple of issues I haven't considered before -- perhaps you'll also find them of interest. Thanks a lot! Yes, certainly interesting, a lot. I felt embarrassed by my way of seeding the PRNG from R. I had seen randutils.hpp mentioned before, but had never payed any attention. After your links below (and thanks for the archive URLs ---last night those were the only ones I could access) I think I'll be incorporating it into my work. Not only because of the seeding, but also because of the additional facilities (choose, pick, etc). The discussion threads (involving the author) may also be worth a look. Yes, they were. Thanks! (There seems to be a connectivity issue with www.pcg-random.org at the moment, so also providing the archive.is URLs to be on the safe side.) http://www.pcg-random.org/posts/cpp-seeding-surprises.html // http://archive.is/http://www.pcg-random.org/posts/cpp-seeding-surprises.html Discussions: http://www.reddit.com/r/cpp/comments/32u4m7/the_behavior_of_rng_seeding_in_c_may_surprise/ http://www.reddit.com/r/programming/comments/32uo1p/c_seeding_surprises/ https://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html // http://archive.is/http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html http://archive.is/http://www.pcg-random.org/posts/simple-portable-cpp-seed-entropy.html http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html // http://archive.is/http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html In particular, `randutils::auto_seed_128` and `randutils::auto_seed_256` from the last post may be a choice worth consideration: https://gist.github.com/imneme/540829265469e673d045#file-randutils-hpp Discussion: http://www.reddit.com/r/cpp/comments/34yqxa/announcing_randutils_a_single_portable/ One other thing useful to know (in the context of, say, reproducibility) is that (in contrast with the PRNGs themselves) the distributions' algorithms are not mandated, so implementations can vary: http://www.reddit.com/r/cpp/comments/30w7cs/inconsistency_in_c_random/ I wasn't aware of this. It is my understanding that similar ideas (seeding C++'s RNG from R and combining C++ with R's RNG) have been used before in much more complex settings (e.g., http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-April/007510.html and http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-April/007512.html), but I wonder if there are problems I cannot think of. A silly example follows below. The links seems to be referring to a parallel computing context. Given Yes; I figured that if what I understood were similar approaches (mixing R's and C++s' PRNG and seeding from R) worked in more complex scenarios, they should work on mine. that, I'd actually consider using Random123: http://www.thesalmons.org/john/random123/ https://github.com/DEShawResearch/Random123-Boost // a nice, brief overview of the advantages in the parallel computing context (but also potentially applicable elsewhere) Nice; I'll remember this for if/when I need them in a parallel computing context. Thanks, R. Best, Matt ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] routine registration
Dear All, Does it make sense to use C routine registration (http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines) in a package that has C++ code and uses Rcpp? What are the pros and cons? All of my code is C++. Writing the relevant code for routine registration is not hard (it is actually automagic if one uses https://github.com/kevinushey/Kmisc/blob/master/R/registerFunctions.R). But I wonder if it really makes sense to do this. I've googled around and I haven't been able to find much (maybe a problem of my google-fu). I've looked at packages in CRAN and BioConductor that import Rcpp and almost none, of those I've looked at, use registration (except for a few that use Kmisc's registerFunctions). Thanks, R. -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Largest size of a NumericMatrix, segfaults and error messages
On Mon, 1 Apr 2013 10:13:54 -0500,Dirk Eddelbuettel e...@debian.org wrote: On 1 April 2013 at 17:04, Ramon Diaz-Uriarte wrote: | | | | On Mon, 1 Apr 2013 08:15:48 -0500,Dirk Eddelbuettel e...@debian.org wrote: | | On 1 April 2013 at 14:48, Ramon Diaz-Uriarte wrote: | | | | Dear All, | | | | I am confused about creating Rcpp Numeric Matrices larger than | | .Machine$integer.max. The code below illustrates some of the points | | (probably with too much detail ;-). These are some things that puzzle me: | | Which R version did you use? | | Ooops, sorry. | | version |_ | platform x86_64-pc-linux-gnu | arch x86_64 | os linux-gnu | system x86_64, linux-gnu | status Patched | major 2 | minor 15.3 I think you can't really expect this to work. R, up to this version, has the very famous 2^31 - 1 index limit. | year 2013 | month 03 | day03 | svn rev62150 | language R | version.string R version 2.15.3 Patched (2013-03-03 r62150) | nickname Security Blanket | | | | Does what you attempt work _in straight C code | bypassing Rcpp_ ? | | In straight C++, using std::vector, this works (though not, as I tried it, | in naive straight C, as shown in the comments). It will use ~ 35 GB of | memory: Sure, but does not matter as it is outside of R. In R, you can do this _if you go the route of outside memory management_ as eg bigmemory and ff do. Thanks! However, for the current stuff I definitely want the output to stay well within the 2^32 limit. | #include iostream | #include vector | #include iterator | | int main() { | | // double v1[50L * 9000L]; // this segfaults | // double v1[43]; // this segfaults | | std::vectordouble v2(50L * 9000L); | std::cout Max size v2: v2.max_size() std::endl; | std::cout Current size v2: v2.size() std::endl; | | double tt = 0; | for(size_t t = 0; t v2.size(); ++t) | v2[t] = ++tt; | std::cout \n Assigned to vector std::endl; | std::cout \n Last value is v2[(50L * 9000L) - 1] std::endl; | return 0; | } | | Anyway, I guess the example is not really relevant for this case. Agreed. | If you used R 2.*, then the attempt makes little sense AFAICT. | | Sorry, I was not clear. I was not (consciously) _attempting_ to do | that. In my for real code the dimensions of the object are set almost at | the end of a long simulation and in a few cases those numbers were much | larger than I expected (I did not realize how big until I started looking | into the segfaults and the errors). I understand. But I think you should consider writing some sort of reducers to not require to swallow that whole object. Yes, agreed; that is what I'm trying now. | What I found confusing was the segmentation fault, because the behavior | seems inconsistent. Sometimes there was no segfault because the error | (negative length vectors are not allowed (...)) was triggered. But | sometimes the object seemed to have been created (and thus I assumed sizes | were OK ---yes, before looking at the actual sizes) and then the segfault | took place later. insert Oscar Wilde quote about conistency being ... just kidding C++ is still way to big for me to try the imaginative route; for now, I'll stay inside the box ;-). R. I think we simply see an error condition for undefined behaviour. Dirk | | | | | R. | | | If you used R 3.0.0, then you may have noticed that R is ahead of us, and you | are welcome to help close the gap :) | | Dirk | | | | 1. For some values of number of rows and columns, creating the matrix is | | not allowed, with the message negative length vectors are not allowed, | | but with other values the creation of the matrix proceeds without | | (apparent) troubles, even when the total size is 2^31 - 1. | | | | 1.a. Is this intended? | | | | 1.b. I understand the error message is coming from R (not Rcpp) and thus | | this is not something that can be made easier to understand? | | | | | | 2. The part I found confusing is that the same problem (number of cells | | 2^32 - 1) is sometimes caught at object creation, but sometimes manifests | | itself much later (either in the C++ code or later in R). | | | | I
[Rcpp-devel] Largest size of a NumericMatrix, segfaults and error messages
Dear All, I am confused about creating Rcpp Numeric Matrices larger than .Machine$integer.max. The code below illustrates some of the points (probably with too much detail ;-). These are some things that puzzle me: 1. For some values of number of rows and columns, creating the matrix is not allowed, with the message negative length vectors are not allowed, but with other values the creation of the matrix proceeds without (apparent) troubles, even when the total size is 2^31 - 1. 1.a. Is this intended? 1.b. I understand the error message is coming from R (not Rcpp) and thus this is not something that can be made easier to understand? 2. The part I found confusing is that the same problem (number of cells 2^32 - 1) is sometimes caught at object creation, but sometimes manifests itself much later (either in the C++ code or later in R). I was expecting (maybe the problem are my expectations) an error early on, when creating the matrix; if the creation proceeds without trouble, I was not expecting a segfault (as I think all cells are initialized to cero). Is the recommended procedure to check if the product of dimensions is 2^31 - 1 before creation? (But then, this will change in R-3.0 in 64 bit systems?). Best, R. // Beginning of file max-size.cpp #include Rcpp.h using namespace Rcpp; // [[Rcpp::export]] NumericMatrix f1(IntegerVector nr, IntegerVector nc, IntegerVector sf = 0) { int nrow = asint(nr); int ncol = asint(nc); int segf = asint(sf); NumericMatrix outM(nrow, ncol); std::cout After creating outM std::endl; outM(nrow - 1, 0) = 1; std::cout After asigning to last row, first column std::endl; std::cout Some other value: 1, 0: outM(1, 0) std::endl; if( (nrow 1) (ncol 3) ) std::cout Some other value: nrow - 1, ncol - 3: outM(nrow - 1, ncol - 3) std::endl; outM(nrow - 1, ncol - 1) = 1; std::cout After asigning something to last cell std::endl; std::cout Try to return the last assignment: outM(nrow - 1, ncol - 1) std::endl; if((nrow = 50) segf) { std::cout \n Assign a few around/beyond 2^32 - 1. Should segfault\n; for(int i = 4290; i 4300; ++i) { std::cout i = i std::endl; outM(nrow - 1, i) = 0; } } return wrap(outM); } // End of file max-size.cpp library(Rcpp) sourceCpp(max-size.cpp, verbose = TRUE) (tmp - f1(4, 5)) 4294967 * 500 .Machine$integer.max tmp - f1(4294967, 500) object.size(tmp)/(4294967 * 500) ## ~ 8 4294967 * 501 .Machine$integer.max tmp - f1(4294967, 501) ## negative length vectors 50 * 9000 .Machine$integer.max tmp - f1(50, 9000) ## sometimes segfaults tmp[50, 9000] object.size(tmp) ## things are missing prod(dim(tmp)) .Machine$integer.max ## using either of these usually leads to segfault for(i in (4290:4300)) print(tmp[50, i]) f1(50, 9000, 1) # -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Largest size of a NumericMatrix, segfaults and error messages
On Mon, 1 Apr 2013 08:15:48 -0500,Dirk Eddelbuettel e...@debian.org wrote: On 1 April 2013 at 14:48, Ramon Diaz-Uriarte wrote: | | Dear All, | | I am confused about creating Rcpp Numeric Matrices larger than | .Machine$integer.max. The code below illustrates some of the points | (probably with too much detail ;-). These are some things that puzzle me: Which R version did you use? Ooops, sorry. version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status Patched major 2 minor 15.3 year 2013 month 03 day03 svn rev62150 language R version.string R version 2.15.3 Patched (2013-03-03 r62150) nickname Security Blanket Does what you attempt work _in straight C code bypassing Rcpp_ ? In straight C++, using std::vector, this works (though not, as I tried it, in naive straight C, as shown in the comments). It will use ~ 35 GB of memory: #include iostream #include vector #include iterator int main() { // double v1[50L * 9000L]; // this segfaults // double v1[43]; // this segfaults std::vectordouble v2(50L * 9000L); std::cout Max size v2: v2.max_size() std::endl; std::cout Current size v2: v2.size() std::endl; double tt = 0; for(size_t t = 0; t v2.size(); ++t) v2[t] = ++tt; std::cout \n Assigned to vector std::endl; std::cout \n Last value is v2[(50L * 9000L) - 1] std::endl; return 0; } Anyway, I guess the example is not really relevant for this case. If you used R 2.*, then the attempt makes little sense AFAICT. Sorry, I was not clear. I was not (consciously) _attempting_ to do that. In my for real code the dimensions of the object are set almost at the end of a long simulation and in a few cases those numbers were much larger than I expected (I did not realize how big until I started looking into the segfaults and the errors). What I found confusing was the segmentation fault, because the behavior seems inconsistent. Sometimes there was no segfault because the error (negative length vectors are not allowed (...)) was triggered. But sometimes the object seemed to have been created (and thus I assumed sizes were OK ---yes, before looking at the actual sizes) and then the segfault took place later. R. If you used R 3.0.0, then you may have noticed that R is ahead of us, and you are welcome to help close the gap :) Dirk | 1. For some values of number of rows and columns, creating the matrix is | not allowed, with the message negative length vectors are not allowed, | but with other values the creation of the matrix proceeds without | (apparent) troubles, even when the total size is 2^31 - 1. | | 1.a. Is this intended? | | 1.b. I understand the error message is coming from R (not Rcpp) and thus | this is not something that can be made easier to understand? | | | 2. The part I found confusing is that the same problem (number of cells | 2^32 - 1) is sometimes caught at object creation, but sometimes manifests | itself much later (either in the C++ code or later in R). | | I was expecting (maybe the problem are my expectations) an error early on, | when creating the matrix; if the creation proceeds without trouble, I was | not expecting a segfault (as I think all cells are initialized to cero). | | Is the recommended procedure to check if the product of dimensions is | 2^31 - 1 before creation? (But then, this will change in R-3.0 in 64 bit | systems?). | | | Best, | | R. | | | | // Beginning of file max-size.cpp | | #include Rcpp.h | | using namespace Rcpp; | | | // [[Rcpp::export]] | | NumericMatrix f1(IntegerVector nr, IntegerVector nc, | IntegerVector sf = 0) { | int nrow = asint(nr); | int ncol = asint(nc); | int segf = asint(sf); | | NumericMatrix outM(nrow, ncol); | std::cout After creating outM std::endl; | outM(nrow - 1, 0) = 1; | std::cout After asigning to last row, first column | std::endl; | | std::cout Some other value: 1, 0: | outM(1, 0) std::endl; | | if( (nrow 1) (ncol 3) ) | std::cout Some other value: nrow - 1, ncol - 3: |outM(nrow - 1, ncol - 3) std::endl; | | outM(nrow - 1, ncol - 1) = 1; | std::cout After asigning something to last cell
[Rcpp-devel] Rf_ReplIteration and profiling output from google perftools
Dear All, I am using google perftools to profile C++ called with Rcpp (following the very neat blog http://pj.freefaculty.org/blog/?p=140 by Paul E. Johnson). This is an example of the output I get from the google perftools (after sorting the output), when I focus on function Algorithm5M (the one called from R). 12025 7.0% 83.5%12028 7.0% __copy_m (inline) 9283 5.4% 95.6%13687 8.0% std::vector::size (inline) 78442 45.8% 45.8%82709 48.2% std::vector::operator[] (inline) 0 0.0% 100.0% 152024 88.7% Rf_ReplIteration 0 0.0% 100.0% 171437 100.0% bcEval 0 0.0% 100.0% 171437 100.0% do_begin 0 0.0% 100.0% 171437 100.0% do_dotcall 0 0.0% 100.0% 171437 100.0% do_eval 0 0.0% 100.0% 171437 100.0% do_set 0 0.0% 100.0% 171437 100.0% do_subset 0 0.0% 100.0% 171437 100.0% forcePromise 0 0.0% 100.0% 171437 100.0% FORCE_PROMISE (inline) 0 0.0% 100.0% 171437 100.0% getvar (inline) 0 0.0% 100.0% 171437 100.0% Rf_applyClosure 0 0.0% 100.0% 171437 100.0% Rf_DispatchOrEval 0 0.0% 100.0% 171437 100.0% Rf_eval 52626 30.7% 76.5% 171437 100.0% Algorithm5M I guess that things such as Rf_eval, Rf_applyClosure, etc, should show in 100% of the profiling samples (the 5th column). In other words, in terms of improving my code, I can forget about all those 100% lines. However, I was surprised by Rf_ReplIteration. I was expecting it to either show up in 100% of the samples, or show up in a lot fewer. Am I doing something silly when using Rcpp and calling my function (i.e., that 87% reflects that I could be using it more efficiently), or is this something I should expect (I am happy to provide full examples, etc, but I didn't want to fill up the list with unnecessary stuff, especially since I think my confusion is due to my not understanding something about Rf_ReplIteration). Best, R. -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] RcppArmadillo: resize vs. insert_cols
Dear All, I am using RcppArmadillo, and I am creating matrices in the C++ code (i.e., these are not matrices passed from R). The sizes of these matrices might need to increase dynamically during run time, with elements being added at the end. What is the recommended way of doing this: a) Just insert elements as needed. E.g.: Y.insert_cols(Y.n_cols, the_new_thing); b) Create initial object. Double size of object (using resize) each time I run out of space. Copy the_new_thing to the appropriate place. I thought b) was a reasonable manual approach (e.g., Skiena, The algorithm design manual, 2nd ed, p. 67), but I think I've read that for std::vector, a) (with push_back), is a better choice than b) (using reserve, not resize). For me, a) involves writing less code. Thanks, R. -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] RcppArmadillo: passing matrix columns by reference
Dear All, I am trying to pass columns from an Armadillo matrix to a function, but I'd like to pass just a reference to the column, not a copy of the column and I do not seem to be able to do it elegantly. The code below (function f1) I think shows that passing X.col to a function creates a copy (X.unsafe_col does too). We can pass X as argument, and the index of the column, as in f2. And that will not create a copy. But I think this is not the right way of doing what I want to do (to begin with, I'd rather not pass the column index to the function). What am I getting wrong? Best, // [[Rcpp::depends(RcppArmadillo)]] #include Rcpp.h #include RcppArmadillo.h using namespace Rcpp; double f1(arma::umat Z) { Z(0, 0) = 111; Z(9, 0) = 111; std::cout f1, this is Z std::endl Z std::endl; return 33.3; } double f2(arma::umat Z, const int c1) { Z(1, 0) = 222; return 66.6; } double f3(arma::umat Z) { Z(1, 0) = 223; return 99.9; } // [[Rcpp::export]] List f0(IntegerVector s1_, IntegerVector c1_){ const int s1 = asint(s1_); const int c1 = asint(c1_); arma::umat X(10, s1); for(int j = 0; j s1; ++j) { for(int i = 0; i 10; ++i) { X(i, j) = i * 10 + j; } } // in both cases, a copy seems to be made //double fitness = f1(X.col(c1)); double outf1 = f1(X.unsafe_col(c1)); std::cout f0, this is X after f1 std::endl X std::endl; double outf2 = f2(X, c1); std::cout f0, this is X after f2 std::endl X std::endl; // double outf3 = f3(X.unsafe_col(c1)); //will not work //double outf3 = f3(X.col(c1)); //will not work return List::create(Named(X) = wrap(X), Named(of1) = outf1, Named(of2) = outf2); } -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] RcppArmadillo: passing matrix columns by reference
On Mon, 10 Dec 2012 17:06:06 +0100,Romain Francois rom...@r-enthusiasts.com wrote: the .col method gives you a subview_col : arma_inline subview_coleT col(const uword col_num); arma_inline const subview_coleT col(const uword col_num) const; not an umat. The name of the class implies that it is a view class, so a way to look a data from another class. hence, no data of its own, so cheap copy. double f3(arma::subview_colunsigned int Z) { Z(1, 0) = 223; return 99.9; } Aha! Great, thanks! Also, including RcppArmadillo.h after Rcpp.h is wrong. You should only include RcppArmadillo.h. Oooops. Will not do that again. Best, R. I should do something so that the compiler tells you this. Romain Le 10/12/12 16:49, Ramon Diaz-Uriarte a écrit : Dear All, I am trying to pass columns from an Armadillo matrix to a function, but I'd like to pass just a reference to the column, not a copy of the column and I do not seem to be able to do it elegantly. The code below (function f1) I think shows that passing X.col to a function creates a copy (X.unsafe_col does too). We can pass X as argument, and the index of the column, as in f2. And that will not create a copy. But I think this is not the right way of doing what I want to do (to begin with, I'd rather not pass the column index to the function). What am I getting wrong? Best, // [[Rcpp::depends(RcppArmadillo)]] #include Rcpp.h #include RcppArmadillo.h using namespace Rcpp; double f1(arma::umat Z) { Z(0, 0) = 111; Z(9, 0) = 111; std::cout f1, this is Z std::endl Z std::endl; return 33.3; } double f2(arma::umat Z, const int c1) { Z(1, 0) = 222; return 66.6; } double f3(arma::umat Z) { Z(1, 0) = 223; return 99.9; } // [[Rcpp::export]] List f0(IntegerVector s1_, IntegerVector c1_){ const int s1 = asint(s1_); const int c1 = asint(c1_); arma::umat X(10, s1); for(int j = 0; j s1; ++j) { for(int i = 0; i 10; ++i) { X(i, j) = i * 10 + j; } } // in both cases, a copy seems to be made //double fitness = f1(X.col(c1)); double outf1 = f1(X.unsafe_col(c1)); std::cout f0, this is X after f1 std::endl X std::endl; double outf2 = f2(X, c1); std::cout f0, this is X after f2 std::endl X std::endl; // double outf3 = f3(X.unsafe_col(c1)); //will not work //double outf3 = f3(X.col(c1)); //will not work return List::create(Named(X) = wrap(X), Named(of1) = outf1, Named(of2) = outf2); } -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 R Graph Gallery: http://gallery.r-enthusiasts.com blog:http://romainfrancois.blog.free.fr |- http://bit.ly/RE6sYH : OOP with Rcpp modules `- http://bit.ly/Thw7IK : Rcpp modules more flexible -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] const not allowed for Matrix arguments?
More on const arguments: the first will compile and work, the second will not compile, even if we do not modify anything. As far as I can tell, in this case the culprit is related to having the reference to the Column, because f3 does compile and work. I guess the problem is the *this, returned by Column; is there another way of making sure the caller does not modify the matrix, while enjoying the usage of Column (and Row)? Best, R. // [[Rcpp::export]] int f1(Rcpp::IntegerMatrix m1) { const Rcpp::IntegerMatrix::Column c1 = m1(_, 1); int dd = c1[0] * 2; return dd; } // [[Rcpp::export]] int f2(const Rcpp::IntegerMatrix m1) { const Rcpp::IntegerMatrix::Column c1 = m1(_, 1); int dd = c1[0] * 2; return dd; } // [[Rcpp::export]] int f3(const Rcpp::IntegerMatrix m1) { int dd = m1(0, 0) * 2; return dd; } -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] Rcpp 0.10
Dear Dirk, Romain, and JJ, Version 0.10 is really amazing. In particular, the Rcpp Attributes are awesome, as well as some of the additions to sugar!! If I may, I'd like to make a suggestion and a question about the documentation for Rcpp Attributes. 1. Suggestion: it might be appropriate to say (probably before 5.1) that Makevars, etc, are still needed. And, thus, point the reader to vignette(Rcpp-package). Otherwise it is easy (or it was easy for me) to think that the former mechanism was somehow superseded or included (though, of course, it does not say so anywhere). 2. Question: I find it much harder to follow what is happening in the .h and RCppExports.R files with compileAttributes. I mean, the .h and *.R files generated with Rcpp.package.skeleton seem to have a simple structure that I find easy to replicate and modify as I keep adding stuff to my *.cpp file. (For instance, the placement of RccpExport before functions intended to be exported, the usage of SEXP, etc). However, I do not clearly see how to go around the files generated by compileAttributes and how to mix that with .R and *.h files previously generated with Rcpp.package.skeleton. Any hints that can be provided here? Thanks again for the package and the new additions to it. Best, R. -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Matrix: column-major storage?
On Thu, 8 Nov 2012 06:35:10 -0700,Christian Gunning icos.atr...@gmail.com wrote: [1 text/plain; UTF-8 (quoted-printable)] Interesting thread. I had no idea the speed differences were that significant. I'm curious -- are there obvious use-cases where staying in column-major is *not* possible via careful setup? I am sure staying with column-major is possible. Fortran also has column-major order and a lot of things are (anything is ? ;-) possible with Fortran. But when writing loops and traversing matrices and structs I (so this might just be me) am used to thinking about row-major order. I realize it's a minimal test-case, but the cost of memory allocation in the above example dominates. Hummm... interesting. Thanks for the example. I'll play with it for my stuff. Best, R. Compare with: byRow_inplace - cxxfunction(signature(xm_ = numeric), body = ' //int m = asint(dm); //NumericMatrix xm(m, m); NumericMatrix xm(xm_); int m = xm.nrow(); int i; int j; for(i = 0; i m; i++) { for(j = 0; j m; j++) { xm(i, j) = i * m + j; } } ', plugin = RcppArmadillo, verbose = TRUE) nn - 1 aa - matrix(nn, nrow=nn, ncol=nn) res - benchmark(byRow(nn), byRow_inplace(aa), columns=c(test, replications, elapsed, relative, user.self, sys.self), order=relative, replications=10) best, Christian I thought that, for any of the Matrix types available, if we have x(i, j) then x(i, j + 1) would be contiguous in memory (whereas x(i + 1, j) could be extremely far away, depending on the size of the matrix). However, it seems that with Matrix objects it is the other way around, so that x(i, j) is next to x(i + 1, j), not to x(i, j + 1): -- A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama! [2 text/html; UTF-8 (quoted-printable)] -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
[Rcpp-devel] Matrix: column-major storage?
Dear All, I've recently started using Rcpp, so maybe this is a trivial question, but I have not been able to find an answer in previous posts or the docs. I thought that, for any of the Matrix types available, if we have x(i, j) then x(i, j + 1) would be contiguous in memory (whereas x(i + 1, j) could be extremely far away, depending on the size of the matrix). However, it seems that with Matrix objects it is the other way around, so that x(i, j) is next to x(i + 1, j), not to x(i, j + 1): # showThePointers - cxxfunction(signature(dm = integer), body = ' int m = asint(dm); IntegerMatrix xm(m, m); int *p; int i; int j; std::cout std::endl * Row-major access * std::endl; for(i = 0; i m; i++) { for(j = 0; j m; j++) { xm(i, j) = i * m + j; p = xm(i, j); std::cout i = i ; j = j ; xm(i, j) = xm(i, j) ; p p std::endl; } } std::cout std::endl ** Column-major access * std::endl; for(j = 0; j m; j++) { for(i = 0; i m; i++) { xm(i, j) = i * m + j; p = xm(i, j); std::cout i = i ; j = j ; xm(i, j) = xm(i, j) ; p p std::endl; } } ', plugin = Rcpp, verbose = TRUE) showThePointers(3) # I noticed that because if we time functions that access and modify in row-major order, they are a lot slower than in column-major order (in my laptop, about four times slower in the examples below, about nine times if I then return the object to R). # byRow - cxxfunction(signature(dm = integer), body = ' int m = asint(dm); IntegerMatrix xm(m, m); int i; int j; for(i = 0; i m; i++) { for(j = 0; j m; j++) { xm(i, j) = i * m + j; } } ', plugin = Rcpp, verbose = TRUE) byCol - cxxfunction(signature(dm = integer), body = ' int m = asint(dm); IntegerMatrix xm(m, m); int i; int j; for(j = 0; j m; j++) { for(i = 0; i m; i++) { xm(i, j) = i * m + j; } } ', plugin = Rcpp, verbose = TRUE) nn - 1 benchmark(byRow(nn), byCol(nn), columns=c(test, replications, elapsed, relative, user.self, sys.self), order=relative, replications=10) # I understand it is not possible to store things in row-major order with Rcpp Matrix types? Best, R. -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Matrix: column-major storage?
Dear Dirk, Doug, and Romain, Thanks a lot for your (very fast ---didn't even have time to have dinner ;-) responses. I was aware of the column-major order in R, but for no reason I was expecting row-major in C++ (but then, I was expecting nrow and ncol to work as they did). I'll play around with RcppArmadillo and RcppEigen, and also see if I can easily adapt my algorithms to use column-major order efficiently. Thanks again. Best, R. On Wed, 7 Nov 2012 13:43:52 -0600,Dirk Eddelbuettel e...@debian.org wrote: Hi Ramon, And welcome! On 7 November 2012 at 20:25, Ramon Diaz-Uriarte wrote: | I understand it is not possible to store things in row-major order with | Rcpp Matrix types? In short: No. In more detail, and allow me to quote from Section 5.11.2 Calculating numerical derivatives in Writing R Extensions: Now, we compute the `i''th column of the gradient matrix. Note how it is accessed: R stores matrices by column (like FORTRAN). (which was the first suitable hit I found in the manual). A *lot* of the power and convenience in Rcpp comes from the fact that we do provide so-called 'proxy objects' that are fairly thing wrappers (with added C++ magic) on top of the original R types. So whatever it is in R, it is in Rcpp (holds as a general rule, void where prohibited and you have a money-back guarantee). If you want different matrix representations, the excellent package RcppArmadillo and RcppEigen are willing and able to help. Getting data to and from them is easy too. Hope this helps, Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdia...@gmail.com ramon.d...@iib.uam.es http://ligarto.org/rdiaz ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel