Dear all, I had some problems figuring out how to write some code for iterating through the values of a run-length-encoded factor (Rle). Now I kind of made it work, but I am not sure that the codes does exactly what I expect. My questions are both about Rcpp and about C++ , tell me if this is not the right place to ask them.
The function I am writing should iterate through an object of formal class 'Rle' (from the "IRanges" packages), which it's like this: 1. It has two slots: 'values' and 'lengths'. They have the same length, values is a factor and lengths is a integer vector. 2. values is a factor: an integer vector with an associated character vector (attribute "levels"), and the integer vector points to elements in the character vector. For instance, the factor f= factor(c('a','a','a','a','b','c','c')) when it is run-lenght-encoded rle=Rle(f), it looks like this: rle@values ~ c(1, 2, 3) attributes(rle@values)$levels ~ c("a","b","c") rle@lengths ~ c(3,1,2) To make things a bit more complicated, in my situation this Rle object is contained in a GRanges object 'gr': rle = gr@seqnames I wanted to write the code for a class that encapsulates the iteration through such an object (maybe that's a bit java-style). And that was my first version that compiled: class rleIter { int run; int rlen; int rpos; //should I declare them references if I don't want any unnecessary copying? IntegerVector rlens; IntegerVector values; std::vector<std::string> names; public: rleIter(RObject& rle): rlens(as<IntegerVector>(rle.slot("lengths"))), // is here the vector copied? values(as<IntegerVector>(rle.slot("values"))), names(as<std::vector<std::string> >(values.attr("levels"))), rlen(rlens[0]), // <--- THIS CAUSES SEGFAULT!!!! run(0), rpos(0) {} bool next(){ ++rpos; if (rpos == rlens[run]){ //end of the run, go to the next ++run; rpos = 0; if (run == rlens.length()) return false; } return true; } const std::string& getValue(){ return names[values[run]-1]; } }; void readRle(RObject gr){ //passed in by value (it was a mistake) RObject rle = as<RObject>(gr.slot("seqnames")); //<- is this vector copied here? rleIter iter(rle); bool finished = false; for (; !finished; finished = !iter.next()){ Rcout << iter.getValue() << std::endl; } } // [[Rcpp::export]] void test(RObject gr){ readRle(gr); } in R: library(GenomicRanges) gr <- GRanges(seqnames=c("chr1", "chr1","chr2"), ranges=IRanges(start=c(1,10,7),end=c(10,101,74))) library(my_package_under_development_with_the_rcpp_code_shown_above) test(gr) SEGFAULT Questions: 1. This code gives segfault at the point that I indicated. Why? Maybe I am pointing within the initializer list to areas of memory that are allocated and filled in in the initializer list and maybe this is forbidden? 2. If I change the signature of the function readRle and I pass the gr object by reference, the segfault dissappears, why? If I copy the gr object the copy should be identical, why do they have different behaviours? 3. I don't understand if doing: RObject rle = as<RObject>(gr.slot("seqnames")); causes the vector rle to be copied, and, what is worse, I have no idea about what resources to look up to find it out, or what reasoning/principles to think about, other than posting in this mailing list or attempting to look at the source code for hours... 4. If I replace the line above with: RObject& rle = as<RObject>(gr.slot("seqnames")); so that I am sure that the vector is not copied, the compiler complains saying that as<RObject>(gr.slot("seqnames")) is an rvalue, and if I want to reference it, the reference should be constant. How do I create a non-constant reference to a slot of a s4 object then? If you made it through the end of this very long and boring email and if you could give me some help I would be extremely grateful. Ale -- Alessandro Mammana, PhD Student Max Planck Institute for Molecular Genetics Ihnestraße 63-73 D-14195 Berlin, Germany _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel