Le 7 juin 2014 à 03:27, Dmitry Nesterov <dmitry.neste...@gmail.com> a écrit :
> Hello, > Here I report the slowness in creation of Rcpp DataFrame objects and proposed > change to speed it up. > For system information, here is output from sessionInfo(): > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin13.1.0 (64-bit) > ... > other attached packages: > [1] microbenchmark_1.3-0 Rcpp_0.11.1 > > I am using Rcpp package to port my old functions written with R's C interface > to a more convenient style of Rcpp. > While writing code that creates data.frame’s, I noticed that the Rcpp-based > code was running quite a bit slower (using microbenchmark package) than my > old implementation. The difference was approximately 40(!) times slower for > data frame 50x2 (row x col) > > I have narrowed the speed difference down to the following call: > > return Rcpp::DataFrame::create(Rcpp::Named(“xdata”)=x, > Rcpp::Named(“ydata”)=y); > > Where x and y are Rcpp::NumericVector objects. > By debugging through the code and Rcpp, I noticed that during the creation > Rcpp uses “as.data.frame” conversion on the vector list that contained x, y > vectors and their names “xdata” and “ydata”, while this step was not > necessary in my previous code using C interface. Well, how then do you guarantee that the data frame is not corrupt ? Consider this code: #include <Rcpp.h> using namespace Rcpp ; // [[Rcpp::export]] DataFrame test(){ NumericVector x = NumericVector::create( 1, 2, 3, 4 ) ; NumericVector y = NumericVector::create( 1, 2 ) ; return DataFrame::create(_["x"] = x, _["y"] = y ) ; } The benefit of calling as.data.frame is that it would handle recycling y correctly. Just setting the class attribute to "data.frame" by brute force would make a corrupt data frame. Perhaps you can get your suggestion approved on the basis of being consistent with other ways to get corrupt data frames in Rcpp. https://github.com/RcppCore/Rcpp/issues/144 The basic idea is valid, but this would need more work and understanding of the conceptual requirements of a data frame. Romain > In Rcpp/DataFrame.h:87 > static DataFrame_Impl from_list( Parent obj ){ > This in turn calls on line 104: > return DataFrame_Impl(obj) ; > and which ultimately calls on line 78: > void set__(SEXP x){ > if( ::Rf_inherits( x, "data.frame" )){ > Parent::set__( x ) ; > } else{ > SEXP y = internal::convert_using_rfunction( x, "as.data.frame" > ) ; > Parent::set__( y ) ; > } > } > Since the DataFrame::create() function has not set a class attribute to > “data.frame” by far, the conversion “as.data.frame” takes place and slows > down the creation of the final object. > I propose to make change on line 103 to set class attribute to “data.frame”, > so no further conversion will take place: > if( use_default_strings_as_factors ) { > Rf_setAttrib(obj, R_ClassSymbol, Rf_mkString("data.frame")); > return DataFrame_Impl(obj) ; > } > > I tested it and it brought the speed of execution of the function to about > the same as it was before with plain C API. > Please let me know if it makes sense or maybe I should use > DataFrame::create() function differently. > > Best, > Dmitry > > _______________________________________________ > Rcpp-devel mailing list > Rcpp-devel@lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel