Re: [Rcpp-devel] When calling same Rcpp function several times different results are returned
It's because several instances of your object will be created (via the splitting constructor) and then joined together using the join method. If all instances share an output variable then they will overwrite each others' output, giving them their own allows the split/join logic to work correctly. On Wed, Jul 15, 2015 at 8:37 AM, Vaidas Zemlys mpik...@gmail.com wrote: Hi, I’ve finally managed to produce code that works. I hope. The key was to leave output variable out of constructor: #include Rcpp.h using namespace Rcpp; // [[Rcpp::depends(RcppParallel)]] #include RcppParallel.h using namespace RcppParallel; struct SumsInGroups5: public Worker { const RVectordouble v; const RVectorint g; std::vectordouble s; SumsInGroups5(const NumericVector v, const IntegerVector g): v(v), g(g), s(*std::max_element(g.begin(), g.end()) + 1, 0.0){ } SumsInGroups5(const SumsInGroups5 p, Split): v(p.v), g(p.g), s(*std::max_element(g.begin(), g.end()) + 1, 0.0) {} void operator()(std::size_t begin, std::size_t end) { for (std::size_t i = begin; i end; ++i) { s[g[i]]+=v[i]; } } void join(const SumsInGroups5 rhs) { for(std::size_t i = 0; i s.size(); i++) { s[i] += rhs.s[i]; } } }; // [[Rcpp::export]] NumericVector sg5(NumericVector v, IntegerVector g) { SumsInGroups5 p(v, g); parallelReduce(0, v.length(), p); return wrap(p.s); } /*** R a - 1:10 g - c(rep(0,5),rep(1,5)) bb - lapply(1:1,function(x)sg5(a,g)) cc-do.call(rbind,bb) table(cc[,1]) table(cc[,2]) */ Looking at the example http://gallery.rcpp.org/articles/parallel-distance-matrix/, the ouput is initialized in constructor and everything works. So if anybody can explain why in this case leaving out the output from the constructor made the code work I would be all ears. Vaidotas Zemlys Le mar. 14 juil. 2015 à 15:22, Danas Zuokas danas.zuo...@gmail.com a écrit : I have tried this with no luck. 2015-07-14 14:34 GMT+03:00 Romain Francois rom...@r-enthusiasts.com: Or just use a std::vectordouble for sg. That should do it. Romain Envoyé de mon iPhone Le 14 juil. 2015 à 13:21, JJ Allaire jj.alla...@gmail.com a écrit : The examples in the RcppParallel documentation assume that access to vectors and matrixes are *aligned* (i.e. fall into neat buckets whereby reading and writing doesn't overlap between worker instances). Your example appears to access arbitrary elements of sg (depending on what's passed in gi) which probably creates overlapping reads/writes. You should also study the documentation for join carefully. There's nothing incorrect about RcppParallel's behavior here, rather you need to think more carefully about the access patterns of your data and how they might conflict. You may need to introduce locking to overcome the conflicts, which in turn could kill the performance benefit you gain from parallelism. No easy answers here :-\ On Tue, Jul 14, 2015 at 7:15 AM, Danas Zuokas danas.zuo...@gmail.com wrote: Yes it is the same question on SO and I did consider RHertel's comments. But this problem (sums by group id) is not parallelFor it is parallelReduce: I split vector, calculate sums and then aggregate those sums. Please correct me if I am wrong. 2015-07-14 13:54 GMT+03:00 Dirk Eddelbuettel e...@debian.org: On 14 July 2015 at 09:25, Danas Zuokas wrote: | I have written parallel implementation of sums in groups using RcppParallel. Isn't this the same question as http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned You got some excellent comments there by SO user 'RHertel'. Did you consider those? Dirk | // [[Rcpp::depends(RcppParallel)]] | #include Rcpp.h | #include RcppParallel.h | using namespace Rcpp; | using namespace RcppParallel; | | struct SumsG: public Worker | { | const RVectordouble v; | const RVectorint gi; | | RVectordouble sg; | | SumsG(const NumericVector v, const IntegerVector gi, NumericVector sg): v(v), gi(gi), sg(sg) {} | SumsG(const SumsG p, Split): v(p.v), gi(p.gi), sg(p.sg) {} | | void operator()(std::size_t begin, std::size_t end) { | for (std::size_t i = begin; i end; i++) { | sg[gi[i]] += v[i]; | } | } | | void join(const SumsG p) { | for(std::size_t i = 0; i sg.length(); i++) { | sg[i] += p.sg[i]; | } | } | }; | | // [[Rcpp::export]] | List sumsingroups(NumericVector v, IntegerVector gi, int ni) { | NumericVector sg(ni); | SumsG p(v, gi, sg); | parallelReduce(0, v.length(), p); | | return List::create(_[sg] = p.sg); | } | | It compiles using Rcpp::sourceCpp. Now when I call it from R sumsingroups(1:10, | rep(0:1, each = 5), 2) several times I get the right answer (15 40) and
Re: [Rcpp-devel] When calling same Rcpp function several times different results are returned
On 14 July 2015 at 09:25, Danas Zuokas wrote: | I have written parallel implementation of sums in groups using RcppParallel. Isn't this the same question as http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned You got some excellent comments there by SO user 'RHertel'. Did you consider those? Dirk | // [[Rcpp::depends(RcppParallel)]] | #include Rcpp.h | #include RcppParallel.h | using namespace Rcpp; | using namespace RcppParallel; | | struct SumsG: public Worker | { | const RVectordouble v; | const RVectorint gi; | | RVectordouble sg; | | SumsG(const NumericVector v, const IntegerVector gi, NumericVector sg): v(v), gi(gi), sg(sg) {} | SumsG(const SumsG p, Split): v(p.v), gi(p.gi), sg(p.sg) {} | | void operator()(std::size_t begin, std::size_t end) { | for (std::size_t i = begin; i end; i++) { | sg[gi[i]] += v[i]; | } | } | | void join(const SumsG p) { | for(std::size_t i = 0; i sg.length(); i++) { | sg[i] += p.sg[i]; | } | } | }; | | // [[Rcpp::export]] | List sumsingroups(NumericVector v, IntegerVector gi, int ni) { | NumericVector sg(ni); | SumsG p(v, gi, sg); | parallelReduce(0, v.length(), p); | | return List::create(_[sg] = p.sg); | } | | It compiles using Rcpp::sourceCpp. Now when I call it from R sumsingroups(1:10, | rep(0:1, each = 5), 2) several times I get the right answer (15 40) and then | something different (usually some multiplicative of the right answer). Running | | | res - sumsingroups(1:10, rep(0:1, each = 5), 2) | for(i in 1:1000) { | tmp - sumsingroups(1:10, rep(0:1, each = 5), 2) | if(res[[1]][1] != tmp[[1]][1]) break | Sys.sleep(0.1) | } | | breaks at random iteration returning | | $sg | [1] 60 160 | | or | | $sg | [1] 30 80 | | I am new to Rcpp and RcppParallel and do not know what could cause such | behavior. | | Things that did not help: | | 1. Added for (std::size_t i = 0; i sg.length(); i++) sg[i] = 0; to both of | constructors. | 2. Changed names so that they are different in Worker definition and in | function implementation. | | ___ | Rcpp-devel mailing list | Rcpp-devel@lists.r-forge.r-project.org | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] When calling same Rcpp function several times different results are returned
Would that still solve the problem if there are overlapping indexes in gi ? (i.e. if the same index appeared more than one time and was utilized from more than one thread at a time) On Tue, Jul 14, 2015 at 7:21 AM, JJ Allaire jj.alla...@gmail.com wrote: The examples in the RcppParallel documentation assume that access to vectors and matrixes are *aligned* (i.e. fall into neat buckets whereby reading and writing doesn't overlap between worker instances). Your example appears to access arbitrary elements of sg (depending on what's passed in gi) which probably creates overlapping reads/writes. You should also study the documentation for join carefully. There's nothing incorrect about RcppParallel's behavior here, rather you need to think more carefully about the access patterns of your data and how they might conflict. You may need to introduce locking to overcome the conflicts, which in turn could kill the performance benefit you gain from parallelism. No easy answers here :-\ On Tue, Jul 14, 2015 at 7:15 AM, Danas Zuokas danas.zuo...@gmail.com wrote: Yes it is the same question on SO and I did consider RHertel's comments. But this problem (sums by group id) is not parallelFor it is parallelReduce: I split vector, calculate sums and then aggregate those sums. Please correct me if I am wrong. 2015-07-14 13:54 GMT+03:00 Dirk Eddelbuettel e...@debian.org: On 14 July 2015 at 09:25, Danas Zuokas wrote: | I have written parallel implementation of sums in groups using RcppParallel. Isn't this the same question as http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned You got some excellent comments there by SO user 'RHertel'. Did you consider those? Dirk | // [[Rcpp::depends(RcppParallel)]] | #include Rcpp.h | #include RcppParallel.h | using namespace Rcpp; | using namespace RcppParallel; | | struct SumsG: public Worker | { | const RVectordouble v; | const RVectorint gi; | | RVectordouble sg; | | SumsG(const NumericVector v, const IntegerVector gi, NumericVector sg): v(v), gi(gi), sg(sg) {} | SumsG(const SumsG p, Split): v(p.v), gi(p.gi), sg(p.sg) {} | | void operator()(std::size_t begin, std::size_t end) { | for (std::size_t i = begin; i end; i++) { | sg[gi[i]] += v[i]; | } | } | | void join(const SumsG p) { | for(std::size_t i = 0; i sg.length(); i++) { | sg[i] += p.sg[i]; | } | } | }; | | // [[Rcpp::export]] | List sumsingroups(NumericVector v, IntegerVector gi, int ni) { | NumericVector sg(ni); | SumsG p(v, gi, sg); | parallelReduce(0, v.length(), p); | | return List::create(_[sg] = p.sg); | } | | It compiles using Rcpp::sourceCpp. Now when I call it from R sumsingroups(1:10, | rep(0:1, each = 5), 2) several times I get the right answer (15 40) and then | something different (usually some multiplicative of the right answer). Running | | | res - sumsingroups(1:10, rep(0:1, each = 5), 2) | for(i in 1:1000) { | tmp - sumsingroups(1:10, rep(0:1, each = 5), 2) | if(res[[1]][1] != tmp[[1]][1]) break | Sys.sleep(0.1) | } | | breaks at random iteration returning | | $sg | [1] 60 160 | | or | | $sg | [1] 30 80 | | I am new to Rcpp and RcppParallel and do not know what could cause such | behavior. | | Things that did not help: | | 1. Added for (std::size_t i = 0; i sg.length(); i++) sg[i] = 0; to both of | constructors. | 2. Changed names so that they are different in Worker definition and in | function implementation. | | ___ | Rcpp-devel mailing list | Rcpp-devel@lists.r-forge.r-project.org | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] When calling same Rcpp function several times different results are returned
I have tried this with no luck. 2015-07-14 14:34 GMT+03:00 Romain Francois rom...@r-enthusiasts.com: Or just use a std::vectordouble for sg. That should do it. Romain Envoyé de mon iPhone Le 14 juil. 2015 à 13:21, JJ Allaire jj.alla...@gmail.com a écrit : The examples in the RcppParallel documentation assume that access to vectors and matrixes are *aligned* (i.e. fall into neat buckets whereby reading and writing doesn't overlap between worker instances). Your example appears to access arbitrary elements of sg (depending on what's passed in gi) which probably creates overlapping reads/writes. You should also study the documentation for join carefully. There's nothing incorrect about RcppParallel's behavior here, rather you need to think more carefully about the access patterns of your data and how they might conflict. You may need to introduce locking to overcome the conflicts, which in turn could kill the performance benefit you gain from parallelism. No easy answers here :-\ On Tue, Jul 14, 2015 at 7:15 AM, Danas Zuokas danas.zuo...@gmail.com wrote: Yes it is the same question on SO and I did consider RHertel's comments. But this problem (sums by group id) is not parallelFor it is parallelReduce: I split vector, calculate sums and then aggregate those sums. Please correct me if I am wrong. 2015-07-14 13:54 GMT+03:00 Dirk Eddelbuettel e...@debian.org: On 14 July 2015 at 09:25, Danas Zuokas wrote: | I have written parallel implementation of sums in groups using RcppParallel. Isn't this the same question as http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned You got some excellent comments there by SO user 'RHertel'. Did you consider those? Dirk | // [[Rcpp::depends(RcppParallel)]] | #include Rcpp.h | #include RcppParallel.h | using namespace Rcpp; | using namespace RcppParallel; | | struct SumsG: public Worker | { | const RVectordouble v; | const RVectorint gi; | | RVectordouble sg; | | SumsG(const NumericVector v, const IntegerVector gi, NumericVector sg): v(v), gi(gi), sg(sg) {} | SumsG(const SumsG p, Split): v(p.v), gi(p.gi), sg(p.sg) {} | | void operator()(std::size_t begin, std::size_t end) { | for (std::size_t i = begin; i end; i++) { | sg[gi[i]] += v[i]; | } | } | | void join(const SumsG p) { | for(std::size_t i = 0; i sg.length(); i++) { | sg[i] += p.sg[i]; | } | } | }; | | // [[Rcpp::export]] | List sumsingroups(NumericVector v, IntegerVector gi, int ni) { | NumericVector sg(ni); | SumsG p(v, gi, sg); | parallelReduce(0, v.length(), p); | | return List::create(_[sg] = p.sg); | } | | It compiles using Rcpp::sourceCpp. Now when I call it from R sumsingroups(1:10, | rep(0:1, each = 5), 2) several times I get the right answer (15 40) and then | something different (usually some multiplicative of the right answer). Running | | | res - sumsingroups(1:10, rep(0:1, each = 5), 2) | for(i in 1:1000) { | tmp - sumsingroups(1:10, rep(0:1, each = 5), 2) | if(res[[1]][1] != tmp[[1]][1]) break | Sys.sleep(0.1) | } | | breaks at random iteration returning | | $sg | [1] 60 160 | | or | | $sg | [1] 30 80 | | I am new to Rcpp and RcppParallel and do not know what could cause such | behavior. | | Things that did not help: | | 1. Added for (std::size_t i = 0; i sg.length(); i++) sg[i] = 0; to both of | constructors. | 2. Changed names so that they are different in Worker definition and in | function implementation. | | ___ | Rcpp-devel mailing list | Rcpp-devel@lists.r-forge.r-project.org | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] When calling same Rcpp function several times different results are returned
Or just use a std::vectordouble for sg. That should do it. Romain Envoyé de mon iPhone Le 14 juil. 2015 à 13:21, JJ Allaire jj.alla...@gmail.com a écrit : The examples in the RcppParallel documentation assume that access to vectors and matrixes are *aligned* (i.e. fall into neat buckets whereby reading and writing doesn't overlap between worker instances). Your example appears to access arbitrary elements of sg (depending on what's passed in gi) which probably creates overlapping reads/writes. You should also study the documentation for join carefully. There's nothing incorrect about RcppParallel's behavior here, rather you need to think more carefully about the access patterns of your data and how they might conflict. You may need to introduce locking to overcome the conflicts, which in turn could kill the performance benefit you gain from parallelism. No easy answers here :-\ On Tue, Jul 14, 2015 at 7:15 AM, Danas Zuokas danas.zuo...@gmail.com wrote: Yes it is the same question on SO and I did consider RHertel's comments. But this problem (sums by group id) is not parallelFor it is parallelReduce: I split vector, calculate sums and then aggregate those sums. Please correct me if I am wrong. 2015-07-14 13:54 GMT+03:00 Dirk Eddelbuettel e...@debian.org: On 14 July 2015 at 09:25, Danas Zuokas wrote: | I have written parallel implementation of sums in groups using RcppParallel. Isn't this the same question as http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned You got some excellent comments there by SO user 'RHertel'. Did you consider those? Dirk | // [[Rcpp::depends(RcppParallel)]] | #include Rcpp.h | #include RcppParallel.h | using namespace Rcpp; | using namespace RcppParallel; | | struct SumsG: public Worker | { | const RVectordouble v; | const RVectorint gi; | | RVectordouble sg; | | SumsG(const NumericVector v, const IntegerVector gi, NumericVector sg): v(v), gi(gi), sg(sg) {} | SumsG(const SumsG p, Split): v(p.v), gi(p.gi), sg(p.sg) {} | | void operator()(std::size_t begin, std::size_t end) { | for (std::size_t i = begin; i end; i++) { | sg[gi[i]] += v[i]; | } | } | | void join(const SumsG p) { | for(std::size_t i = 0; i sg.length(); i++) { | sg[i] += p.sg[i]; | } | } | }; | | // [[Rcpp::export]] | List sumsingroups(NumericVector v, IntegerVector gi, int ni) { | NumericVector sg(ni); | SumsG p(v, gi, sg); | parallelReduce(0, v.length(), p); | | return List::create(_[sg] = p.sg); | } | | It compiles using Rcpp::sourceCpp. Now when I call it from R sumsingroups(1:10, | rep(0:1, each = 5), 2) several times I get the right answer (15 40) and then | something different (usually some multiplicative of the right answer). Running | | | res - sumsingroups(1:10, rep(0:1, each = 5), 2) | for(i in 1:1000) { | tmp - sumsingroups(1:10, rep(0:1, each = 5), 2) | if(res[[1]][1] != tmp[[1]][1]) break | Sys.sleep(0.1) | } | | breaks at random iteration returning | | $sg | [1] 60 160 | | or | | $sg | [1] 30 80 | | I am new to Rcpp and RcppParallel and do not know what could cause such | behavior. | | Things that did not help: | | 1. Added for (std::size_t i = 0; i sg.length(); i++) sg[i] = 0; to both of | constructors. | 2. Changed names so that they are different in Worker definition and in | function implementation. | | ___ | Rcpp-devel mailing list | Rcpp-devel@lists.r-forge.r-project.org | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] When calling same Rcpp function several times different results are returned
The examples in the RcppParallel documentation assume that access to vectors and matrixes are *aligned* (i.e. fall into neat buckets whereby reading and writing doesn't overlap between worker instances). Your example appears to access arbitrary elements of sg (depending on what's passed in gi) which probably creates overlapping reads/writes. You should also study the documentation for join carefully. There's nothing incorrect about RcppParallel's behavior here, rather you need to think more carefully about the access patterns of your data and how they might conflict. You may need to introduce locking to overcome the conflicts, which in turn could kill the performance benefit you gain from parallelism. No easy answers here :-\ On Tue, Jul 14, 2015 at 7:15 AM, Danas Zuokas danas.zuo...@gmail.com wrote: Yes it is the same question on SO and I did consider RHertel's comments. But this problem (sums by group id) is not parallelFor it is parallelReduce: I split vector, calculate sums and then aggregate those sums. Please correct me if I am wrong. 2015-07-14 13:54 GMT+03:00 Dirk Eddelbuettel e...@debian.org: On 14 July 2015 at 09:25, Danas Zuokas wrote: | I have written parallel implementation of sums in groups using RcppParallel. Isn't this the same question as http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned You got some excellent comments there by SO user 'RHertel'. Did you consider those? Dirk | // [[Rcpp::depends(RcppParallel)]] | #include Rcpp.h | #include RcppParallel.h | using namespace Rcpp; | using namespace RcppParallel; | | struct SumsG: public Worker | { | const RVectordouble v; | const RVectorint gi; | | RVectordouble sg; | | SumsG(const NumericVector v, const IntegerVector gi, NumericVector sg): v(v), gi(gi), sg(sg) {} | SumsG(const SumsG p, Split): v(p.v), gi(p.gi), sg(p.sg) {} | | void operator()(std::size_t begin, std::size_t end) { | for (std::size_t i = begin; i end; i++) { | sg[gi[i]] += v[i]; | } | } | | void join(const SumsG p) { | for(std::size_t i = 0; i sg.length(); i++) { | sg[i] += p.sg[i]; | } | } | }; | | // [[Rcpp::export]] | List sumsingroups(NumericVector v, IntegerVector gi, int ni) { | NumericVector sg(ni); | SumsG p(v, gi, sg); | parallelReduce(0, v.length(), p); | | return List::create(_[sg] = p.sg); | } | | It compiles using Rcpp::sourceCpp. Now when I call it from R sumsingroups(1:10, | rep(0:1, each = 5), 2) several times I get the right answer (15 40) and then | something different (usually some multiplicative of the right answer). Running | | | res - sumsingroups(1:10, rep(0:1, each = 5), 2) | for(i in 1:1000) { | tmp - sumsingroups(1:10, rep(0:1, each = 5), 2) | if(res[[1]][1] != tmp[[1]][1]) break | Sys.sleep(0.1) | } | | breaks at random iteration returning | | $sg | [1] 60 160 | | or | | $sg | [1] 30 80 | | I am new to Rcpp and RcppParallel and do not know what could cause such | behavior. | | Things that did not help: | | 1. Added for (std::size_t i = 0; i sg.length(); i++) sg[i] = 0; to both of | constructors. | 2. Changed names so that they are different in Worker definition and in | function implementation. | | ___ | Rcpp-devel mailing list | Rcpp-devel@lists.r-forge.r-project.org | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel