Re: [Rd] How to create data frame column name in a function
Hello, pvshankar wrote Hello all, I have a data frame with column names s1, s2, s3s11 I have a function that gets two parameters, one is used as a subscript for the column names and another is used as an index into the chosen column. For example: my_func - function(subscr, index) { if (subscr == 1) { df$s1[index] - some value } } The problem is, I do not want to create a bunch of if statements (one for each 1:11 column names)). Instead, I want to create the column name in run time based on subscr value. I tried eval(as.name(paste(df$s,subscr,sep=)))[index] - some value and it complains that object df$s1 is not found. Could someone please help me with this? (Needless to say, I have just started programing in R) Thanks, Shankar Instead of operator '$' use function`[-` with the right indexes. cname - paste(s, subscr, sep=) DF[index, cname] - value See ?[-.data.frame (And df is the name of an R function, use something else, it can be confusing.) Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/How-to-create-data-frame-column-name-in-a-function-tp4605358p4605939.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Proposal: model.data
Greetings: I'm still working on functions to make it easier for students to interpret regression predictions. I am working out a scheme to more easily create newdata objects (for use in predict functions). This has been done before in packages like Zelig, Effects, rms, and others. Nothing is exactly right, though. Stata users keep flashing their predicted probabity tables at me and I'd like something like that to use with R. I'm proposing here a function model.data that receives a regression and creates a dataframe of raw data from which newdata objects can be constructed. This follows a suggestion that Bill Dunlap made to me in response to a question I posted in r-help. While studying termplot code, I saw the carrier function approach to deducing the raw predictors. However, it does not always work. Here is one problem. termplot mistakes 10 in log(10 + x1) for a variable Example: dat - data.frame(x1 = rnorm(100), x2 = rpois(100, lambda=7)) STDE - 10 dat$y - 1.2 * log(10 + dat$x1) + 2.3 * dat$x2 + rnorm(100, sd = STDE) m1 - lm( y ~ log(10 + x1) + x2, data=dat) termplot(m1) ## See the trouble? termplot thinks 10 is the term to plot. Another problem is that predict( type=terms) does not behave sensibly sometimes. RHS of formula that have nonlinear transformations are misunderstood as separate terms. ##Example: dat$y2 - 1.2 * log(10 + dat$x1) + 2.3 * dat$x1^2 + rnorm(100, sd = STDE) m2 - lm( y2 ~ log(10 + x1) + sin(x1), data=dat) summary(m2) predict(m2, type=terms) ## Output: ## log(10 + x1) sin(x1) ## 1 1.50051781 -2.04871711 ## 2-0.14707391 0.31131124 What I wish would happen instead is one correct prediction for each value of x1. This should be the output: predict(m2, newdata = data.frame(x1 = dat$x1)) ## predict(m2, newdata = data.frame(x1 = dat$x1)) ## 12345678 ## 17.78563 18.49806 17.50719 19.70093 17.45071 19.69718 18.84137 18.89971 The fix I'm testing now is the following new function, model.data. which tries to re-create the data object that would be consistent with a fitted model. This follows a suggestion from Bill Dunlap in r-help on 2012-04-22 ##' Creates a raw (UNTRANSFORMED) data frame equivalent ##' to the input data that would be required to fit the given model. ##' ##' Unlike model.frame and model.matrix, this does not return transformed ##' variables. ##' ##' @param model A fitted regression model in which the data argument ##' is specified. This function will fail if the model was not fit ##' with the data option. ##' @return A data frame ##' @export ##' @author Paul E. Johnson pauljohn@@ku.edu ##' @example inst/examples/model.data-ex.R model.data - function(model){ fmla - formula(model) allnames - all.vars(fmla) ## all variable names ## indep variables, includes d in poly(x,d) ivnames - all.vars(formula(delete.response(terms(model ## datOrig: original data frame datOrig - eval(model$call$data, environment(formula(model))) if (is.null(datOrig))stop(model.data: input model has no data frame) ## datOrig: almost right, but includes d in poly(x, d) dat - get_all_vars(fmla, datOrig) ## Get rid of d and other non variable variable names that are not in datOrig: keepnames - intersect(names(dat), names(datOrig)) ## Keep only rows actually used in model fit, and the correct columns dat - dat[ row.names(model$model) , keepnames] ## keep ivnames that exist in datOrig attr(dat, ivnames) - intersect(ivnames, names(datOrig)) invisible(dat) } This works for the test cases like log(10+x) and so forth: ## Examples: head(m1.data - model.data(m1)) head(m2.data - model.data(m2)) ## head(m1.data - model.data(m1)) ## y x1 x2 ## 1 18.53846 0.46176539 8 ## 2 28.24759 0.09720934 7 ## 3 23.88184 0.67602556 9 ## 4 23.50130 -0.74877054 8 ## 5 25.81714 1.02555255 5 ## 6 24.75052 -0.69659539 6 ## head(m2.data - model.data(m2)) ## y x1 ## 1 18.53846 0.46176539 ## 2 28.24759 0.09720934 ## 3 23.88184 0.67602556 ## 4 23.50130 -0.74877054 ## 5 25.81714 1.02555255 ## 6 24.75052 -0.69659539 d - 2 m4 - lm(y ~ poly(x1,d), data=dat) head(m4.data - model.data(m4)) ## y x1 ## 1 18.53846 0.46176539 ## 2 28.24759 0.09720934 ## 3 23.88184 0.67602556 Another strength of this approach is that the return object has an attribute ivnames. If R's termplot used model.dat instead of the carrier functions, this would make for a much tighter set of code. What flaws do you see in this? One flaw is that I did not know how to re-construct data from the parent environment, so I insist the regression model has to have a data argument. Is this necessary, or can one of the R experts help. Another possible flaw: I'm keeping the columns from the data frame that are needed to re-construct the model.frame, and to match the rows, I'm using row.names for the model.frame. Are there other formulae that
Re: [Rd] Proposal: model.data
On Thu, 2012-05-03 at 10:51 -0500, Paul Johnson wrote: If somebody in R Core would like this and think about putting it, or something like it, into the base, then many chores involving predicted values would become much easier. Why does this need to be in base? Implement it in a package. If it works, and is additive, people will use it. Look at 'reshape' or 'xts' or 'Matrix' just to name a few examples of widely used packages. Regards, - Brian __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Proposal: model.data
Greetings: On Thu, May 3, 2012 at 11:36 AM, Brian G. Peterson br...@braverock.com wrote: On Thu, 2012-05-03 at 10:51 -0500, Paul Johnson wrote: If somebody in R Core would like this and think about putting it, or something like it, into the base, then many chores involving predicted values would become much easier. Why does this need to be in base? Implement it in a package. If it works, and is additive, people will use it. Look at 'reshape' or 'xts' or 'Matrix' just to name a few examples of widely used packages. I can't use it to fix termplot unless it is in base. Or are you suggesting I create my own termplot replacement? Regards, - Brian -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Proposal: model.data
On Thu, 2012-05-03 at 12:09 -0500, Paul Johnson wrote: Greetings: On Thu, May 3, 2012 at 11:36 AM, Brian G. Peterson br...@braverock.com wrote: On Thu, 2012-05-03 at 10:51 -0500, Paul Johnson wrote: If somebody in R Core would like this and think about putting it, or something like it, into the base, then many chores involving predicted values would become much easier. Why does this need to be in base? Implement it in a package. If it works, and is additive, people will use it. Look at 'reshape' or 'xts' or 'Matrix' just to name a few examples of widely used packages. I can't use it to fix termplot unless it is in base. Or are you suggesting I create my own termplot replacement? I was suggesting that you create a package that has all the features that you think it needs. If you have a *patch* for termplot that would fix what you perceive to be its problems, and not break existing code, then the usual method would be to propose that. It seems, though, that you are proposing more significant changes to functionality, and it seems as though that would run a risk of breaking backwards compatibility, which is usually a bad idea. Regards, - Brian -- Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] loading multiple CSV files into a single data frame
Sometimes I have hundreds of CSV files scattered in a directory tree, resulting from experiments' executions. For instance, giving an example from my field, I may want to collect the performance of a processor for several design parameters such as cache size (possible values: 2, 4, 8 and 16) and cache associativity (possible values: direct-mapped, 4-way, fully-associative). The results of all these experiments will be stored in a directory tree like: results |-- direct-mapped | |-- 2 -- data.csv | |-- 4 -- data.csv | |-- 8 -- data.csv | |-- 16 -- data.csv |-- 4-way | |-- 2 -- data.csv | |-- 4 -- data.csv ... |-- fully-associative | |-- 2 -- data.csv | |-- 4 -- data.csv ... I am developing a package that would allow me to gather all those CSV into a single data frame. Currently, I just need to execute the following statement: dframe - gather(results/@ASSOC@/@SIZE@/data.csv) and this command returns a data frame containing the columns ASSOC, SIZE and all the remaining columns inside the CSV files (in my case the processor performance), effectively loading all the CSV files into a single data frame. So, I would get something like: ASSOC, SIZE, PERF direct-mapped, 2, 1.4 direct-mapped, 4, 1.6 direct-mapped, 8, 1.7 direct-mapped, 16, 1.7 4-way, 2, 1.4 4-way, 4, 1.5 ... I would like to ask whether there is any similar functionality already implemented in R. If so, there is no need to reinvent the wheel :) If it is not implemented and the R community believes that this feature would be useful, I would be glad to contribute my code. Thank you, Victor P.S: I was not sure whether to submit this question to R-devel or R-help, but since it may lead to some programming discussion I decided to post it to R-devel. Please, let me know if it is better to move it to the other list. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] loading multiple CSV files into a single data frame
On Thu, May 3, 2012 at 2:07 PM, victor jimenez betaband...@gmail.com wrote: Sometimes I have hundreds of CSV files scattered in a directory tree, resulting from experiments' executions. For instance, giving an example from my field, I may want to collect the performance of a processor for several design parameters such as cache size (possible values: 2, 4, 8 and 16) and cache associativity (possible values: direct-mapped, 4-way, fully-associative). The results of all these experiments will be stored in a directory tree like: results |-- direct-mapped | |-- 2 -- data.csv | |-- 4 -- data.csv | |-- 8 -- data.csv | |-- 16 -- data.csv |-- 4-way | |-- 2 -- data.csv | |-- 4 -- data.csv ... |-- fully-associative | |-- 2 -- data.csv | |-- 4 -- data.csv ... I am developing a package that would allow me to gather all those CSV into a single data frame. Currently, I just need to execute the following statement: dframe - gather(results/@ASSOC@/@SIZE@/data.csv) and this command returns a data frame containing the columns ASSOC, SIZE and all the remaining columns inside the CSV files (in my case the processor performance), effectively loading all the CSV files into a single data frame. So, I would get something like: ASSOC, SIZE, PERF direct-mapped, 2, 1.4 direct-mapped, 4, 1.6 direct-mapped, 8, 1.7 direct-mapped, 16, 1.7 4-way, 2, 1.4 4-way, 4, 1.5 ... I would like to ask whether there is any similar functionality already implemented in R. If so, there is no need to reinvent the wheel :) If it is not implemented and the R community believes that this feature would be useful, I would be glad to contribute my code. If your csv files all have the same columns and represent time series then read.zoo in the zoo package can read multiple csv files in at once using a single read.zoo command producing a single zoo object. library(zoo) ?read.zoo vignette(zoo-read) Also see the other zoo vignettes and help files. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] The constant part of the log-likelihood in StructTS
On Thu, May 3, 2012 at 3:36 AM, Mark Leeds marklee...@gmail.com wrote: Hi Ravi: As far as I know ( well , really read ) and Bert et al can say more , the AIC is not dependent on the models being nested as long as the sample sizes used are the same when comparing. In some cases, say comparing MA(2), AR(1), you have to be careful with sample size usage but there is no nesting requirement for AIC atleast, I'm pretty sure. This is only partly true. The expected value of the AIC will behave correctly even if models are non-nested, but there is no general guarantee that the standard deviation is small, so AIC need not even asymptotically lead to optimal model choice for prediction in arbitrary non-nested models. Having said that, 'nearly' nested models like these are probably ok. I believe it's sufficient that all your models are nested in a common model, with a bound on the degree of freedom difference, but my copy of Claeskens Hjort's book on model selection and model averaging is currently with a student so I can't be definitive. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] loading multiple CSV files into a single data frame
First of all, thank you for the answers. I did not know about zoo. However, it seems that none approach can do what I exactly want (please, correct me if I am wrong). Probably, it was not clear in my original question. The CSV files only contain the performance values. The other two columns (ASSOC and SIZE) are obtained from the existing values in the directory tree. So, in my opinion, none of the proposed solutions would work, unless every single data.csv file contained all the three columns (ASSOC, SIZE and PERF). In my case, my experimentation framework basically outputs a CSV with some values read from the processor's performance counters (PMCs). For each cache size and associativity I conduct an experiment, creating a CSV file, and placing that file into its own directory. I could modify the experimentation framework, so that it also outputs the cache size and associativity, but that may not be ideal in some circumstances and I also have a significant amount of old results and I want keep using them without manually fixing the CSV files. Has anyone else faced such a situation? Any good solutions? Thank you, Victor On Thu, May 3, 2012 at 8:54 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: On Thu, May 3, 2012 at 2:07 PM, victor jimenez betaband...@gmail.com wrote: Sometimes I have hundreds of CSV files scattered in a directory tree, resulting from experiments' executions. For instance, giving an example from my field, I may want to collect the performance of a processor for several design parameters such as cache size (possible values: 2, 4, 8 and 16) and cache associativity (possible values: direct-mapped, 4-way, fully-associative). The results of all these experiments will be stored in a directory tree like: results |-- direct-mapped | |-- 2 -- data.csv | |-- 4 -- data.csv | |-- 8 -- data.csv | |-- 16 -- data.csv |-- 4-way | |-- 2 -- data.csv | |-- 4 -- data.csv ... |-- fully-associative | |-- 2 -- data.csv | |-- 4 -- data.csv ... I am developing a package that would allow me to gather all those CSV into a single data frame. Currently, I just need to execute the following statement: dframe - gather(results/@ASSOC@/@SIZE@/data.csv) and this command returns a data frame containing the columns ASSOC, SIZE and all the remaining columns inside the CSV files (in my case the processor performance), effectively loading all the CSV files into a single data frame. So, I would get something like: ASSOC, SIZE, PERF direct-mapped, 2, 1.4 direct-mapped, 4, 1.6 direct-mapped, 8, 1.7 direct-mapped, 16, 1.7 4-way, 2, 1.4 4-way, 4, 1.5 ... I would like to ask whether there is any similar functionality already implemented in R. If so, there is no need to reinvent the wheel :) If it is not implemented and the R community believes that this feature would be useful, I would be glad to contribute my code. If your csv files all have the same columns and represent time series then read.zoo in the zoo package can read multiple csv files in at once using a single read.zoo command producing a single zoo object. library(zoo) ?read.zoo vignette(zoo-read) Also see the other zoo vignettes and help files. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] loading multiple CSV files into a single data frame
Victor, I understand you as follows The first two columns of the desired combined dataframe are the last two levels of the pathname to the csv file. The columns in all the data.csv files are the same, namely, there is only one column, and it is named PERF. If so, the following should work (on unix) do.call(rbind,lapply(Sys.glob('results/*/*/data.csv'),function(path) {within(read.csv(path),{ SIZE-basename(dirname(path)); ASSOC-basename(dirname(dirname(path)))})})) On 5/3/12 4:40 PM, victor jimenez betaband...@gmail.com wrote: First of all, thank you for the answers. I did not know about zoo. However, it seems that none approach can do what I exactly want (please, correct me if I am wrong). Probably, it was not clear in my original question. The CSV files only contain the performance values. The other two columns (ASSOC and SIZE) are obtained from the existing values in the directory tree. So, in my opinion, none of the proposed solutions would work, unless every single data.csv file contained all the three columns (ASSOC, SIZE and PERF). In my case, my experimentation framework basically outputs a CSV with some values read from the processor's performance counters (PMCs). For each cache size and associativity I conduct an experiment, creating a CSV file, and placing that file into its own directory. I could modify the experimentation framework, so that it also outputs the cache size and associativity, but that may not be ideal in some circumstances and I also have a significant amount of old results and I want keep using them without manually fixing the CSV files. Has anyone else faced such a situation? Any good solutions? Thank you, Victor On Thu, May 3, 2012 at 8:54 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: On Thu, May 3, 2012 at 2:07 PM, victor jimenez betaband...@gmail.com wrote: Sometimes I have hundreds of CSV files scattered in a directory tree, resulting from experiments' executions. For instance, giving an example from my field, I may want to collect the performance of a processor for several design parameters such as cache size (possible values: 2, 4, 8 and 16) and cache associativity (possible values: direct-mapped, 4-way, fully-associative). The results of all these experiments will be stored in a directory tree like: results |-- direct-mapped | |-- 2 -- data.csv | |-- 4 -- data.csv | |-- 8 -- data.csv | |-- 16 -- data.csv |-- 4-way | |-- 2 -- data.csv | |-- 4 -- data.csv ... |-- fully-associative | |-- 2 -- data.csv | |-- 4 -- data.csv ... I am developing a package that would allow me to gather all those CSV into a single data frame. Currently, I just need to execute the following statement: dframe - gather(results/@ASSOC@/@SIZE@/data.csv) and this command returns a data frame containing the columns ASSOC, SIZE and all the remaining columns inside the CSV files (in my case the processor performance), effectively loading all the CSV files into a single data frame. So, I would get something like: ASSOC, SIZE, PERF direct-mapped, 2, 1.4 direct-mapped, 4, 1.6 direct-mapped, 8, 1.7 direct-mapped, 16, 1.7 4-way, 2, 1.4 4-way, 4, 1.5 ... I would like to ask whether there is any similar functionality already implemented in R. If so, there is no need to reinvent the wheel :) If it is not implemented and the R community believes that this feature would be useful, I would be glad to contribute my code. If your csv files all have the same columns and represent time series then read.zoo in the zoo package can read multiple csv files in at once using a single read.zoo command producing a single zoo object. library(zoo) ?read.zoo vignette(zoo-read) Also see the other zoo vignettes and help files. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] loading multiple CSV files into a single data frame
On May 3, 2012, at 5:40 PM, victor jimenez wrote: First of all, thank you for the answers. I did not know about zoo. However, it seems that none approach can do what I exactly want (please, correct me if I am wrong). Probably, it was not clear in my original question. The CSV files only contain the performance values. The other two columns (ASSOC and SIZE) are obtained from the existing values in the directory tree. So, in my opinion, none of the proposed solutions would work, unless every single data.csv file contained all the three columns (ASSOC, SIZE and PERF). In my case, my experimentation framework basically outputs a CSV with some values read from the processor's performance counters (PMCs). For each cache size and associativity I conduct an experiment, creating a CSV file, and placing that file into its own directory. I could modify the experimentation framework, so that it also outputs the cache size and associativity, but that may not be ideal in some circumstances and I also have a significant amount of old results and I want keep using them without manually fixing the CSV files. You don't need to touch the CSV files, simply add values at load time - this is all easily doable in one line ;) do.call(rbind,lapply(Sys.glob(*/*/data.csv),function(d) cbind(read.csv(d),as.data.frame(t(strsplit(d,/)[[1]]) A B V1 V2 V3 1 1 2 1 a data.csv 2 3 4 1 a data.csv 3 1 2 1 b data.csv 4 3 4 1 b data.csv 5 1 2 2 a data.csv 6 3 4 2 a data.csv Has anyone else faced such a situation? Any good solutions? Thank you, Victor On Thu, May 3, 2012 at 8:54 PM, Gabor Grothendieck ggrothendi...@gmail.comwrote: On Thu, May 3, 2012 at 2:07 PM, victor jimenez betaband...@gmail.com wrote: Sometimes I have hundreds of CSV files scattered in a directory tree, resulting from experiments' executions. For instance, giving an example from my field, I may want to collect the performance of a processor for several design parameters such as cache size (possible values: 2, 4, 8 and 16) and cache associativity (possible values: direct-mapped, 4-way, fully-associative). The results of all these experiments will be stored in a directory tree like: results |-- direct-mapped | |-- 2 -- data.csv | |-- 4 -- data.csv | |-- 8 -- data.csv | |-- 16 -- data.csv |-- 4-way | |-- 2 -- data.csv | |-- 4 -- data.csv ... |-- fully-associative | |-- 2 -- data.csv | |-- 4 -- data.csv ... I am developing a package that would allow me to gather all those CSV into a single data frame. Currently, I just need to execute the following statement: dframe - gather(results/@ASSOC@/@SIZE@/data.csv) and this command returns a data frame containing the columns ASSOC, SIZE and all the remaining columns inside the CSV files (in my case the processor performance), effectively loading all the CSV files into a single data frame. So, I would get something like: ASSOC, SIZE, PERF direct-mapped, 2, 1.4 direct-mapped, 4, 1.6 direct-mapped, 8, 1.7 direct-mapped, 16, 1.7 4-way, 2, 1.4 4-way, 4, 1.5 ... I would like to ask whether there is any similar functionality already implemented in R. If so, there is no need to reinvent the wheel :) If it is not implemented and the R community believes that this feature would be useful, I would be glad to contribute my code. If your csv files all have the same columns and represent time series then read.zoo in the zoo package can read multiple csv files in at once using a single read.zoo command producing a single zoo object. library(zoo) ?read.zoo vignette(zoo-read) Also see the other zoo vignettes and help files. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Setting up a windows system for rcpp
I am running into a wall getting my system to work with rcpp and inline. Following Dirk's advice on stackoverflow, I hope someone is able to help me. My steps were to install MinGW 32 bit first, then installing Rtools, I disabled MinGW's entry in the PATH. I am trying to get the following code to work: library(Rcpp) library(inline) body - ' NumericVector xx(x); return wrap( std::accumulate( xx.begin(), xx.end(), 0.0));' add - cxxfunction(signature(x = numeric), body, plugin = Rcpp, verbose=T) x - 1 y - 2 res - add(c(x, y)) res I get the following error messages: setting environment variables: PKG_LIBS = C:/Users/Owe/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a LinkingTo : Rcpp CLINK_CPPFLAGS = -IC:/Users/Owe/Documents/R/win-library/2.15/Rcpp/include Program source : 1 : 2 : // includes from the plugin 3 : 4 : #includeRcpp.h 5 : 6 : 7 : #ifndef BEGIN_RCPP 8 : #define BEGIN_RCPP 9 : #endif 10 : 11 : #ifndef END_RCPP 12 : #define END_RCPP 13 : #endif 14 : 15 : using namespace Rcpp; 16 : 17 : 18 : // user includes 19 : 20 : 21 : // declarations 22 : extern C { 23 : SEXP file10bc7da0783e( SEXP x) ; 24 : } 25 : 26 : // definition 27 : 28 : SEXP file10bc7da0783e( SEXP x ){ 29 : BEGIN_RCPP 30 : 31 : NumericVector xx(x); 32 : return wrap( std::accumulate( xx.begin(), xx.end(), 0.0)); 33 : END_RCPP 34 : } 35 : 36 : Compilation argument: C:/R_curr/R_2_15_0/bin/x64/R CMD SHLIB file10bc7da0783e.cpp 2 file10bc7da0783e.cpp.err.txt g++ -m64 -IC:/R_curr/R_2_15_0/include -DNDEBUG -IC:/Users/Owe/Documents/R/win-library/2.15/Rcpp/include -Id:/RCompile/CRANpkg/extralibs64/local/include -O2 -Wall -mtune=core2 -c file10bc7da0783e.cpp -o file10bc7da0783e.o g++ -m64 -shared -s -static-libgcc -o file10bc7da0783e.dll tmp.def file10bc7da0783e.o C:/Users/Owe/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a -Ld:/RCompile/CRANpkg/extralibs64/local/lib/x64 -Ld:/RCompile/CRANpkg/extralibs64/local/lib -LC:/R_curr/R_2_15_0/bin/x64 -lR cygwin warning: MS-DOS style path detected: C:/R_curr/R_2_15_0/etc/x64/Makeconf Preferred POSIX equivalent is: /cygdrive/c/R_curr/R_2_15_0/etc/x64/Makeconf CYGWIN environment variable option nodosfilewarning turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames Cannot export Rcpp::Vector14::update(): symbol not defined Cannot export Rcpp::Vector14::~Vector(): symbol not defined Cannot export Rcpp::Vector14::~Vector(): symbol not defined Cannot export typeinfo for Rcpp::VectorBase14, true, Rcpp::Vector14 : symbol not defined Cannot export typeinfo for Rcpp::Vector14: symbol not defined Cannot export typeinfo for Rcpp::traits::expands_to_logical__impl14: symbol not defined Cannot export typeinfo for Rcpp::RObject: symbol not defined Cannot export typeinfo for Rcpp::internal::eval_methods14: symbol not defined Cannot export typeinfo for std::exception: symbol not defined Cannot export typeinfo name for Rcpp::VectorBase14, true, Rcpp::Vector14 : symbol not defined Cannot export typeinfo name for Rcpp::Vector14: symbol not defined Cannot export typeinfo name for Rcpp::traits::expands_to_logical__impl14: symbol not defined Cannot export typeinfo name for Rcpp::RObject: symbol not defined Cannot export typeinfo name for Rcpp::internal::eval_methods14: symbol not defined Cannot export typeinfo name for std::exception: symbol not defined Cannot export vtable for Rcpp::Vector14: symbol not defined Cannot export _file10bc7da0783e: symbol not defined file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x1a4): undefined reference to `SEXPREC* Rcpp::internal::r_true_cast14(SEXPREC*)' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x1c9): undefined reference to `Rcpp::RObject::setSEXP(SEXPREC*)' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x244): undefined reference to `double* Rcpp::internal::r_vector_start14, double(SEXPREC*)' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x27c): undefined reference to `Rcpp::RObject::~RObject()' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x389): undefined reference to `Rcpp::RObject::~RObject()' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x420): undefined reference to `forward_exception_to_r(std::exception const)' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text$_ZN4Rcpp6VectorILi14EED1Ev[Rcpp::Vector14::~Vector()]+0x13): undefined reference to `Rcpp::RObject::~RObject()' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text$_ZN4Rcpp6VectorILi14EE6updateEv[Rcpp::Vector14::update()]+0xd): undefined reference to `double* Rcpp::internal::r_vector_start14, double(SEXPREC*)' file10bc7da0783e.o:file10bc7da0783e.cpp:(.text$_ZN4Rcpp6VectorILi14EED0Ev[Rcpp::Vector14::~Vector()]+0x13): undefined reference to `Rcpp::RObject::~RObject()' collect2: ld returned 1 exit status ERROR(s) during compilation:
Re: [Rd] Setting up a windows system for rcpp
On 4 May 2012 at 00:07, Owe Jessen wrote: | I am running into a wall getting my system to work with rcpp and inline. | Following Dirk's advice on stackoverflow, I hope someone is able to help | me. There is a dedicated mailing list for Rcpp: rcpp-devel. Please let us try to continue the discussion over there. Subscription is required as on some other R lists, so please subscribe before posting. In general, you need Rtools correctly set up. If and when you compile a basic R package (also containing C or C++ files) from sources, you should be fine. A decent 60+ page tutorial is available at: http://howtomakeanrpackage.pbworks.com/f/How_To_Make_An_R_Package-v1.14-01-11-10.pdf Once you have that sorted out, working with Rcpp and inline should just work as it does on other operating systems. | My steps were to install MinGW 32 bit first, then installing Rtools, I | disabled MinGW's entry in the PATH. What do you mean by MinGW's path entry disabled ? You need mingw. | I am trying to get the following code to work: | | library(Rcpp) | library(inline) | | body - ' | NumericVector xx(x); | return wrap( std::accumulate( xx.begin(), xx.end(), 0.0));' | | add - cxxfunction(signature(x = numeric), body, plugin = Rcpp, | verbose=T) | | x - 1 | y - 2 | res - add(c(x, y)) | res | | | I get the following error messages: | | setting environment variables: | PKG_LIBS = C:/Users/Owe/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a | | LinkingTo : Rcpp | CLINK_CPPFLAGS = -IC:/Users/Owe/Documents/R/win-library/2.15/Rcpp/include | | Program source : | | 1 : | 2 : // includes from the plugin | 3 : | 4 : #includeRcpp.h | 5 : | 6 : | 7 : #ifndef BEGIN_RCPP | 8 : #define BEGIN_RCPP | 9 : #endif |10 : |11 : #ifndef END_RCPP |12 : #define END_RCPP |13 : #endif |14 : |15 : using namespace Rcpp; |16 : |17 : |18 : // user includes |19 : |20 : |21 : // declarations |22 : extern C { |23 : SEXP file10bc7da0783e( SEXP x) ; |24 : } |25 : |26 : // definition |27 : |28 : SEXP file10bc7da0783e( SEXP x ){ |29 : BEGIN_RCPP |30 : |31 : NumericVector xx(x); |32 : return wrap( std::accumulate( xx.begin(), xx.end(), 0.0)); |33 : END_RCPP |34 : } |35 : |36 : | Compilation argument: | C:/R_curr/R_2_15_0/bin/x64/R CMD SHLIB file10bc7da0783e.cpp 2 file10bc7da0783e.cpp.err.txt | g++ -m64 -IC:/R_curr/R_2_15_0/include -DNDEBUG -IC:/Users/Owe/Documents/R/win-library/2.15/Rcpp/include -Id:/RCompile/CRANpkg/extralibs64/local/include -O2 -Wall -mtune=core2 -c file10bc7da0783e.cpp -o file10bc7da0783e.o Looks like compilation worked. | g++ -m64 -shared -s -static-libgcc -o file10bc7da0783e.dll tmp.def file10bc7da0783e.o C:/Users/Owe/Documents/R/win-library/2.15/Rcpp/lib/x64/libRcpp.a -Ld:/RCompile/CRANpkg/extralibs64/local/lib/x64 -Ld:/RCompile/CRANpkg/extralibs64/local/lib -LC:/R_curr/R_2_15_0/bin/x64 -lR | cygwin warning: |MS-DOS style path detected: C:/R_curr/R_2_15_0/etc/x64/Makeconf |Preferred POSIX equivalent is: /cygdrive/c/R_curr/R_2_15_0/etc/x64/Makeconf |CYGWIN environment variable option nodosfilewarning turns off this warning. |Consult the user's guide for more details about POSIX paths: | http://cygwin.com/cygwin-ug-net/using.html#using-pathnames That is just noise and can be ignored. The rest is bad: | Cannot export Rcpp::Vector14::update(): symbol not defined | Cannot export Rcpp::Vector14::~Vector(): symbol not defined | Cannot export Rcpp::Vector14::~Vector(): symbol not defined | Cannot export typeinfo for Rcpp::VectorBase14, true, Rcpp::Vector14 : symbol not defined | Cannot export typeinfo for Rcpp::Vector14: symbol not defined | Cannot export typeinfo for Rcpp::traits::expands_to_logical__impl14: symbol not defined | Cannot export typeinfo for Rcpp::RObject: symbol not defined | Cannot export typeinfo for Rcpp::internal::eval_methods14: symbol not defined | Cannot export typeinfo for std::exception: symbol not defined | Cannot export typeinfo name for Rcpp::VectorBase14, true, Rcpp::Vector14 : symbol not defined | Cannot export typeinfo name for Rcpp::Vector14: symbol not defined | Cannot export typeinfo name for Rcpp::traits::expands_to_logical__impl14: symbol not defined | Cannot export typeinfo name for Rcpp::RObject: symbol not defined | Cannot export typeinfo name for Rcpp::internal::eval_methods14: symbol not defined | Cannot export typeinfo name for std::exception: symbol not defined | Cannot export vtable for Rcpp::Vector14: symbol not defined | Cannot export _file10bc7da0783e: symbol not defined | file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x1a4): undefined reference to `SEXPREC* Rcpp::internal::r_true_cast14(SEXPREC*)' | file10bc7da0783e.o:file10bc7da0783e.cpp:(.text+0x1c9): undefined reference to `Rcpp::RObject::setSEXP(SEXPREC*)' |
Re: [Rd] The constant part of the log-likelihood in StructTS
Thanks, Tom, for the reply as well as to the reference to Claeskens Hjort. Ravi From: Thomas Lumley [tlum...@uw.edu] Sent: Thursday, May 03, 2012 4:41 PM To: Mark Leeds Cc: Ravi Varadhan; r-devel@r-project.org Subject: Re: [Rd] The constant part of the log-likelihood in StructTS On Thu, May 3, 2012 at 3:36 AM, Mark Leeds marklee...@gmail.com wrote: Hi Ravi: As far as I know ( well , really read ) and Bert et al can say more , the AIC is not dependent on the models being nested as long as the sample sizes used are the same when comparing. In some cases, say comparing MA(2), AR(1), you have to be careful with sample size usage but there is no nesting requirement for AIC atleast, I'm pretty sure. This is only partly true. The expected value of the AIC will behave correctly even if models are non-nested, but there is no general guarantee that the standard deviation is small, so AIC need not even asymptotically lead to optimal model choice for prediction in arbitrary non-nested models. Having said that, 'nearly' nested models like these are probably ok. I believe it's sufficient that all your models are nested in a common model, with a bound on the degree of freedom difference, but my copy of Claeskens Hjort's book on model selection and model averaging is currently with a student so I can't be definitive. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] fast version of split.data.frame or conversion from data.frame to list of its rows
A bit late and possibly tangential. The mmap package has something called struct() which is really a row-wise array of heterogenous columns. As Simon and others have pointed out, R has no way to handle this natively, but mmap does provide a very measurable performance gain by orienting rows together in memory (mapped memory to be specific). Since it is all outside of R so to speak, it (mmap) even supports many non-native types, from bit vectors to 64 bit ints with conversion caveats applicable. example(struct) shows some performance gains with this approach. There are even some crude methods to convert as is data.frames to mmap struct object directly (hint: as.mmap) Again, likely not enough to shoehorn into your effort, but worth a look to see if it might be useful, and/or see the C design underlying it. Best, Jeff Jeffrey Ryan|Founder|jeffrey.r...@lemnica.com www.lemnica.com On May 1, 2012, at 1:44 PM, Antonio Piccolboni anto...@piccolboni.info wrote: On Tue, May 1, 2012 at 11:29 AM, Simon Urbanek simon.urba...@r-project.orgwrote: On May 1, 2012, at 1:26 PM, Antonio Piccolboni anto...@piccolboni.info wrote: It seems like people need to hear more context, happy to provide it. I am implementing a serialization format (typedbytes, HADOOP-1722 if people want the gory details) to make R and Hadoop interoperate better (RHadoop project, package rmr). It is a row first format and it's already implemented as a C extension for R for lists and atomic vectors, where each element of a vector is a row. I need to extend it to accept data frames and I was wondering if I can use the existing C code by converting a data frame to a list of its rows. It sounds like the answer is that it is not a good idea, Just think about it -- data frames are lists of *columns* because the type of each column is fixed. Treating them row-wise is extremely inefficient, because you can't use any vector type to represent such thing (other than a generic vector containing vectors of length 1). Thanks, let's say this together with the experiments and other converging opinions lays the question to rest. that's helpful too in a way because it restricts the options. I thought I may be missing a simple primitive, like a t() for data frames (that doesn't coerce to matrix). See above - I think you are misunderstanding data frames - t() makes no sense for data frames. I think you are misunderstanding my use of t(). Thanks Antonio Cheers, Simon On Tue, May 1, 2012 at 5:46 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 01/05/2012 00:28, Antonio Piccolboni wrote: Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste(x, 1:2000, sep =))}) user system elapsed 0.004 0.000 0.004 and then I try to split it system.time(split(fd, 1:nrow(fd))) user system elapsed 0.333 0.031 0.415 You will be quick to notice the roughly two orders of magnitude difference in time between creation and conversion. Granted, it's not written anywhere Unsurprising when you create three orders of magnitude more data frames, is it? That's a list of 2000 data frames. Try system.time(for(i in 1:2000) data.frame(x = i, y = rnorm(1), id = paste0(x, i))) that they should be similar but the latter seems interpreter-slow to me (split is implemented with a lapply in the data frame case) There is also a memory issue when I hit about 2 elements (allocating 3GB when interrupted). So before I resort to Rcpp, despite the electrifying feeling of approaching the bare metal and for the sake of getting things done, I thought I would ask the experts. Thanks You need to re-think your data structures: 1-row data frames are not sensible. Antonio [[alternative HTML version deleted]] __** R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-devel https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/ http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org