Hello, I was merely pointing out the problem. People who maintain and contribute to Rcpp will tell you what they expect. I am no longer one of them. So I don’t really care either way, unless it starts adding a bug that will cause issues for software I’m involved with that still has to depend on Rcpp for reasons out of my control.
On a general note, I’d argue that it makes sense to submit the pull request anyway as it creates a special place where you can discuss the proposal, and it triggers continuous testing, so that travis will tell you if you break something. Romain Le 7 juin 2014 à 14:35, Dmitry Nesterov <dmitry.neste...@gmail.com> a écrit : > Hello Romain, > maybe then another function, like force_create() could be available? Or some > checks for equal number of elements in each vector. > One of the main Rcpp advantages to the user is its flexibility and speed, > compared to the plain R code. > I am not sure at this point what solution would be the best, but having fast > methods in Rcpp would be really great. > Should I wait then before submitting the pull request? > Dmitry > > On Jun 7, 2014, at 7:21 AM, Romain François <rom...@r-enthusiasts.com> wrote: > >> >> Le 7 juin 2014 à 03:27, Dmitry Nesterov <dmitry.neste...@gmail.com> a écrit : >> >>> Hello, >>> Here I report the slowness in creation of Rcpp DataFrame objects and >>> proposed change to speed it up. >>> For system information, here is output from sessionInfo(): >>> R version 3.1.0 (2014-04-10) >>> Platform: x86_64-apple-darwin13.1.0 (64-bit) >>> ... >>> other attached packages: >>> [1] microbenchmark_1.3-0 Rcpp_0.11.1 >>> >>> I am using Rcpp package to port my old functions written with R's C >>> interface to a more convenient style of Rcpp. >>> While writing code that creates data.frame’s, I noticed that the Rcpp-based >>> code was running quite a bit slower (using microbenchmark package) than my >>> old implementation. The difference was approximately 40(!) times slower for >>> data frame 50x2 (row x col) >>> >>> I have narrowed the speed difference down to the following call: >>> >>> return Rcpp::DataFrame::create(Rcpp::Named(“xdata”)=x, >>> Rcpp::Named(“ydata”)=y); >>> >>> Where x and y are Rcpp::NumericVector objects. >>> By debugging through the code and Rcpp, I noticed that during the creation >>> Rcpp uses “as.data.frame” conversion on the vector list that contained x, y >>> vectors and their names “xdata” and “ydata”, while this step was not >>> necessary in my previous code using C interface. >> >> Well, how then do you guarantee that the data frame is not corrupt ? >> >> Consider this code: >> >> #include <Rcpp.h> >> using namespace Rcpp ; >> >> // [[Rcpp::export]] >> DataFrame test(){ >> NumericVector x = NumericVector::create( 1, 2, 3, 4 ) ; >> NumericVector y = NumericVector::create( 1, 2 ) ; >> return DataFrame::create(_["x"] = x, _["y"] = y ) ; >> } >> >> The benefit of calling as.data.frame is that it would handle recycling y >> correctly. >> >> Just setting the class attribute to "data.frame" by brute force would make a >> corrupt data frame. Perhaps you can get your suggestion approved on the >> basis of being consistent with other ways to get corrupt data frames in >> Rcpp. >> https://github.com/RcppCore/Rcpp/issues/144 >> >> The basic idea is valid, but this would need more work and understanding of >> the conceptual requirements of a data frame. >> >> Romain >> >> >>> In Rcpp/DataFrame.h:87 >>> static DataFrame_Impl from_list( Parent obj ){ >>> This in turn calls on line 104: >>> return DataFrame_Impl(obj) ; >>> and which ultimately calls on line 78: >>> void set__(SEXP x){ >>> if( ::Rf_inherits( x, "data.frame" )){ >>> Parent::set__( x ) ; >>> } else{ >>> SEXP y = internal::convert_using_rfunction( x, >>> "as.data.frame" ) ; >>> Parent::set__( y ) ; >>> } >>> } >>> Since the DataFrame::create() function has not set a class attribute to >>> “data.frame” by far, the conversion “as.data.frame” takes place and slows >>> down the creation of the final object. >>> I propose to make change on line 103 to set class attribute to >>> “data.frame”, so no further conversion will take place: >>> if( use_default_strings_as_factors ) { >>> Rf_setAttrib(obj, R_ClassSymbol, Rf_mkString("data.frame")); >>> return DataFrame_Impl(obj) ; >>> } >>> >>> I tested it and it brought the speed of execution of the function to about >>> the same as it was before with plain C API. >>> Please let me know if it makes sense or maybe I should use >>> DataFrame::create() function differently. >>> >>> Best, >>> Dmitry >>> >>> _______________________________________________ >>> Rcpp-devel mailing list >>> Rcpp-devel@lists.r-forge.r-project.org >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel >
_______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel