Re: [R] How to apply a function to subsets of a data frame *and* obtain a data frame again?
You might want to look at package plyr and use ddply. HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Marius Hofert Sent: woensdag 17 augustus 2011 12:42 To: Help R Subject: [R] How to apply a function to subsets of a data frame *and* obtain a data frame again? Dear all, First, let's create some data to play around: set.seed(1) (df - data.frame(Group=rep(c(Group1,Group2,Group3), each=10), Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),]) ## Now we need the empirical distribution function: edf - function(x) ecdf(x)(x) # empirical distribution function evaluated at x ## The big question is how one can apply the empirical distribution function to ## each subset of df determined by Group, so how to apply it to Group1, then ## to Group2, and finally to Group3. You might suggest (?) to use tapply: (edf. - tapply(df$Value, df$Group, FUN=edf)) ## That's correct. But typically, one would like to obtain not only the values, ## but a data.frame containing the original information and the new (edf-)values. ## What's a simple way to get this? (one would be required to first sort df ## according to Group, then paste the values computed by edf to the sorted df; ## seems a bit tedious). ## A solution I have is the following (but I would like to know if there is a ## simpler one): (edf.. - do.call(rbind, lapply(unique(df$Group), function(strg){ subdata - subset(df, Group==strg) # sub-data subdata - cbind(subdata, edf=edf(subdata$Value)) })) ) Cheers, Marius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmnet
Hi Andra. I wonder how you come about trying to use LASSO without knowing what lambda is. I'd advise you to read up on it. In the help (?glmnet) you can find several paper references, but for a more gentle introduction, you can read http://www-stat.stanford.edu/~tibs/ElemStatLearn/ In a nutshell, though: lambda is the parameter that balances the weight given to the penalty. The bigger this one is, the more 'pressure' there is on the coefficients to be small (or better yet: disappear). The way you use LASSO is: you look at a reasonable set of lambda values (this is e.g. done by glmnet), calculate some measure of success with each lambda value (e.g.: misclassification, AUC,...), generally by using crossvalidation (as is provided by cv.glmnet: read its help). Having this measure of success (say the AUC) for each lambda in your reasonable set allows you to pick the most optimal (lambda.min) or, to avoid happenstance peaks, a more conservative and parsimonious one (lambda.1se), after which you can rerun your lasso with this selected lambda on the full dataset, to find the variables in your model. Finally, to avoid downward bias, you could run a normal glm with only the variables selected in the previous step. Good luck! Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Andra Isan Sent: woensdag 10 augustus 2011 5:59 To: r-help@r-project.org Subject: [R] glmnet Hi All, I have been trying to use glmnet package to do LASSO linear regression. my x data is a matrix n_row by n_col and y is a vector of size n_row corresponding to the vector data. The number of n_col is much more larger than the number of n_row. I do the following: fits = glmnet(x, y, family=multinomial)I have been following this article: http://cran.r-project.org/web/packages/glmnet/glmnet.pdfpage 8, but there are some unclear parts that I dont understand. The lambda variable only returns 100 and I exactly dont know what lambda represents. So, basically I would like to know how to get the coefficients weights and what exactly lambda is? how I can see the difference between predicted values and observed values? If there is a sample code that helps me to understand how to use these, that would be great. Thanks a lot,Andra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting columns that fulfill specific criteria
Hello Paul. You could try something like perc-apply(pwdiff, 1 function(currow){ mean(abs(currow) t, na.rm=TRUE)*100 }) I haven't tested this, as you did not provide a sample pwdiff. You should probably check ?apply for more info. Two suggestions: probably best not to name any variable t, as this is also the function for transposing a matrix, and could end up being confusing at the least. Second: for most practical purposes, it's better to leave out the *100. Good luck, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of pguilha Sent: vrijdag 24 juni 2011 13:15 To: r-help@r-project.org Subject: [R] counting columns that fulfill specific criteria Hi, I have a matrix (pwdiff in the example below) with ~48 rows and 780 columns. For each row, I want to get the percentage of columns that have an absolute value above a certain threshold t. I then want to allocate that percentage to matrix 'perc' in the corresponding row. Below is my attempt at doing this, but it does not work: I get 'replacement has length zero'. Any help would be much appreciated!! perc-matrix(c(1:nrow(pwdiff))) for (x in 1:nrow(pwdiff)) perc[x]-(((ncol(pwdiff[,abs(pwdiff[x,]=t)]))/ncol(pwdiff))*100) I should add that my data has NAs in some rows and not others (but I do not want to just ignore rows that have NAs) Thanks! Paul -- View this message in context: http://r.789695.n4.nabble.com/counting- columns-that-fulfill-specific-criteria-tp3622265p3622265.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem (and solution) to rle on vector with NA values
Hello Cormac. Not having thoroughly checked whether your code actually works, the behavior of rle you describe is the one documented (check the details of ?rle) and makes sense as the missingness could have different reasons. As such, changing this type of behavior would probably break a lot of existing code that is built on top of rle. There are other peculiarities and disputabilities about some base R functions (the order of the arguments for sample trips me every time), but unless the argument is really strong or a downright bug, I doubt people will be willing to change this. Perhaps making the new behavior optional (through a new parameter na.action or similar, with the default the original behavior) is an option? Feel free to run your own version of rle in any case. I suggest you rename it, though, as it may cause problems for some packages. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Cormac Long Sent: donderdag 23 juni 2011 15:44 To: r-help@r-project.org Subject: [R] problem (and solution) to rle on vector with NA values Hello there R-help, I'm not sure if this should be posted here - so apologies if this is the case. I've found a problem while using rle and am proposing a solution to the issue. Description: I ran into a niggle with rle today when working with vectors with NA values (using R 2.31.0 on Windows 7 x64). It transpires that a run of NA values is not encoded in the same way as a run of other values. See the following example as an illustration: Example: The example rv-c(1,1,NA,NA,3,3,3);rle(rv) Returns Run Length Encoding lengths: int [1:4] 2 1 1 3 values : num [1:4] 1 NA NA 3 not Run Length Encoding lengths: int [1:3] 2 2 3 values : num [1:3] 1 NA 3 as I expected. This caused my code to fail later (unsurprising). Analysis: The problem stems from the test y - x[-1L] != x[-n] in line 7 of the rle function body. In this test, NA values return logical NA values, not TRUE/FALSE (again, unsurprising). Resolution: I modified the rle function code as included below. As far as I tested, this modification appears safe. The convoluted construction of naMaskVal should guarantee that the NA masking value is always different from any value in the vector and should be safe regardless of the input vector form (a raw vector is not handled since the NA values do not apply here). rle-function (x) { if (!is.vector(x) !is.list(x)) stop('x' must be an atomic vector) n - length(x) if (n == 0L) return(structure(list(lengths = integer(), values = x), class = rle)) BEGIN NEW SECTION PART 1 naRepFlag-F if(any(is.na(x))){ naRepFlag-T IS_LOGIC-ifelse(typeof(x)==logical,T,F) if(typeof(x)==logical){ x-as.integer(x) naMaskVal-2 }else if(typeof(x)==character){ naMaskVal- paste(sample(c(letters,LETTERS,0:9),32,replace=T),collapse=) }else{ naMaskVal-max(0,abs(x[!is.infinite(x)]),na.rm=T)+1 } x[which(is.na(x))]-naMaskVal } END NEW SECTION PART 1 y - x[-1L] != x[-n] i - c(which(y), n) BEGIN NEW SECTION PART 2 if(naRepFlag) x[which(x==naMaskVal)]-NA if(IS_LOGIC) x-as.logical(x) END NEW SECTION PART 2 structure(list(lengths = diff(c(0L, i)), values = x[i]), class = rle) } Conclusion: I think that the proposed code modification is an improvement on the existing implementation of rle. Is it impertinent to suggest this R-modification to the gurus at R? Best wishes (in flame-war trepidation), Dr. Cormac Long. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rcpp and Object Factories
You might want to send this message to the Rcpp mailing list at: Rcpp-devel mailing list rcpp-de...@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel It will improve your chances of getting a swift (if not helpful) reply. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michael King Sent: donderdag 9 juni 2011 21:03 To: r-help@r-project.org Subject: [R] Rcpp and Object Factories Hello, I'm not exactly sure how to ask this question, but let me give it a shot... Is it possible (easy) to use Rcpp Modules in conjunction with object factories? For example what I am trying to do is something like this: // c++ classes class Foo { public: void do_something() {}; }; class Foo_Factory { public: Foo * create_foo() { return new Foo(); } }; ## R Code library(Rcpp) ff - Module(Foo_Factory) foo - ff$create_foo() foo$do_something() It appears after scouring some message boards that it is doable via boost python, but i'm not literate enough yet about how this works to know if the same logic holds for R. Thanks for the help. -Mike King [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple order() data frame question.
Try (df1[order(-df1[,2]),]) Adding the minus within the [ leaves out the column (in this case column 2). See ?[. HTH. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Kane Sent: donderdag 12 mei 2011 14:33 To: R R-help Subject: [R] Simple order() data frame question. Clearly, I don't understand what order() is doing and as ususl the help for order seems to only confuse me more. For some reason I just don't follow the examples there. I must be missing something about the data frame sort there but what? I originally wanted to reverse-order my data frame df1 (see below) by aa (a factor) but since this was not working I decided to simplify and order by bb to see what was haqppening!! I'm obviously doing something stupid but what? (df1 - data.frame(aa=letters[1:10], bb=rnorm(10))) # Order in acending order by bb (df1[order(df1[,2]),] ) # seems to work fine # Order in decending order by bb. (df1[order(df1[,-2]),]) # does not seem to work === sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=CLC_TIME=English_Canada.1252 attached base packages: [1] grid grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 [8] Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 lattice_0.19-26 svMisc_0.9-61 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping over graphs in igraph
Hi Danielle. You appear to have two problems: 1) getting the data into R Because I don't have the file at hand, I'm going to simulate reading it through a text connection orgdata-textConnection(Graph ID | Vertex1 | Vertex2 | weight\n1 | Alice | Bob | 2\n1 | Alice | Chris | 1\n1 | Alice | Jane | 2\n1 | Bob | Jane | 2\n1 | Chris | Jane | 3\n2 | Alice | Tom | 2\n2 | Alice | Kate | 1\n2 | Kate | Tom | 3\n2 | Tom | Mike | 2) dfr -read.table(orgdata, header=TRUE, sep=|, as.is=TRUE, strip.whit=TRUE) For you, this would probably be more like dfr -read.table(somepath/fileOfInterest.csv, header=TRUE, sep=|, as.is=TRUE, strip.whit=TRUE) 2) performing actions per graph id require(igraph) result-sapply(unique(dfr$Graph.ID), function(curID){ #There may be more elegant ways of creating the graphs per ID, but it works curDfr- dfr[dfr$Graph.ID==curID,] g-graph.edgelist(as.matrix(curDfr[,c(Vertex1, Vertex2)])) g-set.edge.attribute(g, weight, value= curDfr$weight) #return whatever information you're interested about, based on graph object g #for now I'm just returning edge and vertex counts return(c(v=vcount(g), e=ecount(g))) }) colnames(result)-unique(dfr$Graph.ID) print(result) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Danielle Li Sent: donderdag 5 mei 2011 22:25 To: r-help@r-project.org Subject: [R] Looping over graphs in igraph Hi, I'm trying to do some basic social network analysis with igraph in R, but I'm new to R and haven't been able to find documentation on a couple basic things: I want to run igraph's community detection algorithms on a couple thousand small graphs but don't know how to automate igraph looking at multiple graphs described in a single csv file. My data look like something in ncol format, but with an additional column that has an ID for which graph the edge belongs in: Graph ID | Vertex1 | Vertex2 | weight 1 | Alice | Bob | 2 1 | Alice | Chris | 1 1 | Alice | Jane | 2 1 | Bob | Jane | 2 1 | Chris | Jane | 3 2 | Alice | Tom | 2 2 | Alice | Kate | 1 2 | Kate | Tom | 3 2 | Tom | Mike | 2 so on and so forth for about 2000 graph IDs, each with about 20-40 vertices. I've tried using the split command but it doesn't recognize my graph id: (object 'graphid' not found)--this may just be because I don't know how to classify a column of a csv as an object. Ultimately, I want to run community detection on each graph separately--to look only at the edges when the graph identifier is 1, make calculations on that graph, then do it again for 2 and so forth. I suspect that this isn't related to igraph specifically--I just don't know the equivalent command in R for what in pseudo Stata code would read as: forvalues i of 1/N { temp_graph=subrows of the main csv file for which graphid==`i' cs`i' = leading.eigenvector.community.step(temp_graph) convert cs`i'$membership into a column in the original csv } I want the output to look something like: Graph ID | Vertex1 | Vertex2 | weight | Vertex 1 membership | Vertex 2 membership | # of communities in the graph 1 | Alice | Bob | 2 | A | B | 2 1 | Alice | Chris | 1 | A | B | 2 1 | Alice | Jane | 2 | A | B | 2 1 | Bob | Jane | 2 | B | B | 2 1 | Chris | Jane | 3 | B | B | 2 2 | Alice | Tom | 2 | A | B | 3 2 | Alice | Kate | 1 | A | C | 3 2 | Kate | Tom | 3 | C | B | 3 2 | Tom | Mike | 2 | B | C | 3 Here, the graphs are treated completely separately so that community A in graph 1 need not have anything to do with community A in graph 2. I would really appreciate any ideas you guys have. Thank you! Danielle [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lasso with Categorical Variables
For performance reasons, I advise on using the following function instead of model.matrix: factorsToDummyVariables-function(dfr, betweenColAndLevel=) { nc-dim(dfr)[2] firstRow-dfr[1,] coln-colnames(dfr) retval-do.call(cbind, lapply(seq(nc), function(ci){ if(is.factor(firstRow[,ci])) { lvls-levels(firstRow[,ci])[-1] stretchedcols-sapply(lvls, function(lvl){ rv-dfr[,ci]==lvl mode(rv)-integer return(rv) }) if(!is.matrix(stretchedcols)) stretchedcols-matrix(stretchedcols, nrow=1) colnames(stretchedcols)-paste(coln[ci], lvls, sep=betweenColAndLevel) return(stretchedcols) } else { curcol-matrix(dfr[,ci], ncol=1) colnames(curcol)-coln[ci] return(curcol) } })) rownames(retval)-rownames(dfr) return(retval) } Just for comparison: here is my old version of the same function, using model.matrix: factorsToDummyVariables.old-function(dfrPredictors, form=paste(~,paste(colnames(dfrPredictors), collapse=+), sep=)) { #note: this function seems to operate quite slowly! #Because it is used often, it may be worth improving its speed dfrTmp-model.frame(dfrPredictors, na.action=na.pass) frm-as.formula(form) mm-model.matrix(frm, data=dfrTmp) retval-as.matrix(mm)[,-1] return(retval) } In a testcase with a reasonably big dataset, I compared the speeds: #system.time(tmp.fd.convds.full.man-manualFactorsToDummyVariables(ds)) ## user system elapsed ## 9.440.009.48 #system.time(tmp.fd.convds.full-factorsToDummyVariables.old(ds)) ## user system elapsed ## 15.490.00 15.64 #system.time(invisible(factorsToDummyVariables (ds[10,]))) ## user system elapsed ## 0.360.000.36 #system.time(invisible(factorsToDummyVariables.old (ds[10,]))) ## user system elapsed ## 2.180.002.20 #system.time(invisible(factorsToDummyVariables (ds[20:30,]))) ## user system elapsed ## 0.340.000.38 #system.time(invisible(factorsToDummyVariables.old (ds[20:30,]))) ## user system elapsed ## 2.110.002.15 If you have to do this quite often, the difference surely adds up... More improvements may be possible. This function only works if you don't include interactions, though. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: maandag 2 mei 2011 20:48 To: Steve Lianoglou Cc: r-help@r-project.org Subject: Re: [R] Lasso with Categorical Variables On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: Hi, On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander ckale...@ncsu.edu wrote: Hi! This is my first time posting. I've read the general rules and guidelines, but please bear with me if I make some fatal error in posting. Anyway, I have a continuous response and 29 predictors made up of continuous variables and nominal and ordinal categorical variables. I'd like to do lasso on these, but I get an error. The way I am using lars doesn't allow for the factors. Is there a special option or some other method in order to do lasso with cat. variables? Here is and example (considering ordinal variables as just nominal): set.seed(1) Y - rnorm(10,0,1) X1 - factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) X2 - factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) X3 - sample(x=30:55, size=10, replace=TRUE) # think age X4 - rchisq(10, df=4, ncp=0) X - data.frame(X1,X2,X3,X4) str(X) 'data.frame': 10 obs. of 4 variables: $ X1: Factor w/ 4 levels A,B,C,D: 4 1 3 1 2 2 1 2 4 2 $ X2: Factor w/ 5 levels E,F,G,H,..: 3 4 3 2 5 5 5 1 5 3 $ X3: int 51 46 50 44 43 50 30 42 49 48 $ X4: num 2.86 1.55 1.94 2.45 2.75 ... I'd like to do: obj - lars(x=X, y=Y, type = lasso) Instead, what I have been doing is converting all data to continuous but I think this is really bad! Yeah, it is. Check out the Categorical Predictor Variables section here for a way to handle such predictor vars: http://www.psychstat.missouristate.edu/multibook/mlt08m.html Steve's citation is somewhat helpful, but not sufficient to take the next steps. You can find details regarding the mechanics of typical linear regression in R on the ?lm page where you find
Re: [R] Reference variables by string in for loop
Hi Michael. This is a classic :-) ObjectsOfInterest- list(one_df, two_df, three_df) for(namedf in ObjectsOfInterest){...} or probably even better sapply(ObjectsOfInterest, function(namedf){...}) hth. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Bach Sent: vrijdag 29 april 2011 12:03 To: r-help@r-project.org Subject: [R] Reference variables by string in for loop Dear R Users, I am trying to get the following to work better: namevec - c(one, two, three) for (name in namevec) { namedf - eval(parse(text=paste(name, _df, sep=))) ... ... } The rationale behind it being that I created variables with names one_df, two_df and three_df earlier in the same script which I want to reference inside the for loop. Is there a more elegant way to do this? Best Regards, Michael Bach __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] abline outside of plot region
Hi R people. I ran into this problem: I created a plot with errbars, like this: errbar(x=c(1,2,3,4), y=c(2,1,3,3), yminus=c(1.5,0.5,2.5,2.5), yplus=c(2.5,1.5,3.5,3.5)) Next, I wanted to accentuate some x value with an abline, like this: abline(v=2) In one of my R sessions (which admittedly I have had open for quite a while now), the abline draws outside of the plotting region of errbars (till the edge of my plotting window at least). I tested for the cause by opening another session (clean) of the same version of R (2.13), and running the same set of commands. In this session, I do not have this behavior. Conclusion: I must have changed some graphical parameter in my original session, but I don't know which one. Do you? As an addendum: I also want to add a few specific axis ticks besides the standard ones in my graph. I used axis for this, and it works. I set col.ticks to match the color of my abline (in the nonsimplified code), and this works too, but unfortunately, the label below the tick is not in this color, and a parameter for this is not present in axis. Suggestions for either? Note: I'm on windows 7 with R 2.13. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be/ http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assignments inside lapply
No, that does not work. You cannot do assignment within (l)apply. Nor in any other function for that matter. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alaios Sent: woensdag 27 april 2011 11:37 To: R-help@r-project.org Subject: [R] Assignments inside lapply Dear all I would like to ask you if an assignment can be done inside a lapply statement. For example I would like to covert a double nested for loop for (i in c(1:dimx)){ for (j in c(1:dimy)){ Powermap[i,j] - Pr(c(i,j),c(PRX,PRY),f) } } to something like that: ij-expand.grid(i=seq(1:dimx),j=(1:dimy)) unlist(lapply(1:nrow(ij),function(rowId) { return (Powermap[i,j]-Pr(c(ij$i[rowId],ij$j[rowId]),c(PRX,PRY),f)) })) as you can see lapply does not return nothing as the assignment is done inside the function. Would that work correctly? What are the cases such a statement will misfunction? I would like to thank you in advace for your help. Best Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] programming: telling a function where to look for the entered variables
See the warning in ?subset. Passing the column name of lvar is not the same as passing the 'contextual column' (as I coin it in these circumstances). You can solve it by indeed using [] instead. For my own comfort, here is the relevant line from your original function: Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar)) Which should become something like (untested but should be close): Data.tmp - Fulldf[Fulldf[,lvar]==subgroup, c(xvar,yvar)] This should be a lot easier to translate based on column names, as the column names are now used as such. HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of E Hofstadler Sent: vrijdag 1 april 2011 13:09 To: r-help@r-project.org Subject: [R] programming: telling a function where to look for the entered variables Hi there, Could someone help me with the following programming problem..? I have written a function that works for my intended purpose, but it is quite closely tied to a particular dataframe and the names of the variables in this dataframe. However, I'd like to use the same function for different dataframes and variables. My problem is that I'm not quite sure how to tell my function in which dataframe the entered variables are located. Here's some reproducible data and the function: # create reproducible data set.seed(124) xvar - sample(0:3, 1000, replace = T) yvar - sample(0:1, 1000, replace=T) zvar - rnorm(100) lvar - sample(0:1, 1000, replace=T) Fulldf - as.data.frame(cbind(xvar,yvar,zvar,lvar)) Fulldf$xvar - factor(xvar, labels=c(blue,green,red,yellow)) Fulldf$yvar - factor(yvar, labels=c(area1,area2)) Fulldf$lvar - factor(lvar, labels=c(yes,no)) and here's the function in the form that it currently works: from a subset of the dataframe Fulldf, a contingency table is created (in my actual data, several other operations are then performed on that contingency table, but these are not relevant for the problem in question, therefore I've deleted it) . # function as it currently works: tailored to a particular dataframe (Fulldf) myfunct - function(subgroup){ # enter a particular subgroup for which the contingency table should be calculated (i.e. a particular value of the factor lvar) Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar)) #restrict dataframe to given subgroup and two columns of the original dataframe Data.tmp - na.omit(Data.tmp) # exclude missing values indextable - table(Data.tmp$xvar, Data.tmp$yvar) # make contingency table return(indextable) } #Since I need to use the function with different dataframes and variable names, I'd like to be able to tell my function the name of the dataframe and variables it should use for calculating the index. This is how I tried to modify the first part of the #function, but it didn't work: # function as I would like it to work: independent of any particular dataframe or variable names (doesn't work) myfunct.better - function(subgroup, lvarname, yvarname, dataframe){ #enter the subgroup, the variable names to be used and the dataframe in which they are found Data.tmp - subset(dataframe, lvarname==subgroup, select=c(xvar, deparse(substitute(yvarname # trying to subset the given dataframe for the given subgroup of the given variable. The variable xvar happens to have the same name in all dataframes) but the variable yvarname has different names in the different dataframes Data.tmp - na.omit(Data.tmp) indextable - table(Data.tmp$xvar, Data.tmp$yvarname) # create the contingency table on the basis of the entered variables return(indextable) } calling myfunct.better(yes, lvarname=lvar, yvarname=yvar, dataframe=Fulldf) results in the following error: Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected My feeling is that R doesn't know where to look for the entered variables (lvar, yvar), but I'm not sure how to solve this problem. I tried using with() and even attach() within the function, but that didn't work. Any help is greatly appreciated. Best, Esther P.S.: Are there books that elaborate programming in R for beginners -- and I mean things like how to best use vectorization instead of loops and general best practice tips for programming. Most of the books I've been looking at focus on applying R for particular statistical analyses, and only comparably briefly deal with more general programming aspects. I was wondering if there's any books or tutorials out there that cover the latter aspects in a more elaborate and systematic way...? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code
Re: [R] programming: telling a function where to look for the entered variables
This should be a version that does what you want. Because you named the variable lvarname, I assumed you were already passing lvar instead of trying to pass lvar (without the quotes), which is in no way a 'name'. myfunct.better - function(subgroup, lvarname, xvarname, yvarname, dataframe) { #enter the subgroup, the variable names to be used and the dataframe #in which they are found Data.tmp - Fulldf[Fulldf[,lvarname]==subgroup, c(xvarname,yvarname)] Data.tmp -na.omit(Data.tmp) indextable - table(Data.tmp[,xvarname], Data.tmp[,yvarname]) # create the contingency #table on the basis of the entered variables #actually, if I remember well, you could simply use indextable-table(Data.tmp) here #that would allow for some more simplifications (replace xvarname and yvarname by #columnsOfInterest or similar, and pass that instead of c(xvarname, yvarname) ) return(indextable) } myfunct.better(yes, lvarname=lvar, xvarname=xvar, yvarname=yvar, dataframe=Fulldf) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: irene.p...@googlemail.com [mailto:irene.p...@googlemail.com] On Behalf Of E Hofstadler Sent: vrijdag 1 april 2011 14:28 To: Nick Sabbe Cc: r-help@r-project.org Subject: Re: [R] programming: telling a function where to look for the entered variables Thanks Nick and Juan for your replies. Nick, thanks for pointing out the warning in subset(). I'm not sure though I understand the example you provided -- because despite using subset() rather than bracket notation, the original function (myfunct) does what is expected of it. The problem I have is with the second function (myfunct.better), where variable names + dataframe are not fixed within the function but passed to the function when calling it -- and even with bracket notation I don't quite manage to tell R where to look for the columns that related to the entered column names. (but then perhaps I misunderstood you) This is what I tried (using bracket notation): myfunct.better(dataframe, subgroup, lvarname,yvarname){ Data.tmp - dataframe[dataframe[,deparse(substitute(lvarname))]==subgroup, c(xvar,deparse(substitute(yvarname)))] } but this creates an empty contingency table only -- perhaps because my use of deparse() is flawed (I think what is converted into a string is lvarname and yvarname, rather than the column names that these two function-variables represent in the dataframe)? 2011/4/1 Nick Sabbe nick.sa...@ugent.be: See the warning in ?subset. Passing the column name of lvar is not the same as passing the 'contextual column' (as I coin it in these circumstances). You can solve it by indeed using [] instead. For my own comfort, here is the relevant line from your original function: Data.tmp - subset(Fulldf, lvar==subgroup, select=c(xvar,yvar)) Which should become something like (untested but should be close): Data.tmp - Fulldf[Fulldf[,lvar]==subgroup, c(xvar,yvar)] This should be a lot easier to translate based on column names, as the column names are now used as such. HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of E Hofstadler Sent: vrijdag 1 april 2011 13:09 To: r-help@r-project.org Subject: [R] programming: telling a function where to look for the entered variables Hi there, Could someone help me with the following programming problem..? I have written a function that works for my intended purpose, but it is quite closely tied to a particular dataframe and the names of the variables in this dataframe. However, I'd like to use the same function for different dataframes and variables. My problem is that I'm not quite sure how to tell my function in which dataframe the entered variables are located. Here's some reproducible data and the function: # create reproducible data set.seed(124) xvar - sample(0:3, 1000, replace = T) yvar - sample(0:1, 1000, replace=T) zvar - rnorm(100) lvar - sample(0:1, 1000, replace=T) Fulldf - as.data.frame(cbind(xvar,yvar,zvar,lvar)) Fulldf$xvar - factor(xvar, labels=c(blue,green,red,yellow)) Fulldf$yvar - factor(yvar, labels=c(area1,area2)) Fulldf$lvar - factor(lvar, labels=c(yes,no)) and here's the function in the form that it currently works: from a subset of the dataframe Fulldf, a contingency table is created (in my actual data, several other operations are then performed on that contingency table, but these are not relevant for the problem in question, therefore I've deleted it) . # function as it currently works: tailored to a particular dataframe (Fulldf) myfunct - function(subgroup
Re: [R] Graph many points without hiding some
Hi. You could also turn it into a 3D plot with some variation on the function below: plot4d-function(x,y,z, u, main=, xlab=, ylab=, zlab=, ulab=) { require(rgl)#may need to install this package first #standard trick to get some intensity colors uLim-range(u) uLen-uLim[2] - uLim[1] + 1 colorlut-terrain.colors(uLen) col-colorlut[u - uLim[1] + 1] open3d()#Open new device points3d(x=x, y=y, z=z, col=col) aspect3d(x=1, y=1, z=1) #ensure bounding box is in cube-form (scaling variables) #note: if you want to flip an axis, use -1 in the statement above axes3d() #Show axes title3d(main = main, sub=paste(Green is low, ulab, , red is high) xlab = xlab, ylab = ylab, zlab = zlab) } HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Langfelder Sent: donderdag 31 maart 2011 9:26 To: Samuel Dennis Cc: R-help@r-project.org Subject: Re: [R] Graph many points without hiding some On Wed, Mar 30, 2011 at 10:04 PM, Samuel Dennis sjdenn...@gmail.com wrote: I have a very large dataset with three variables that I need to graph using a scatterplot. However I find that the first variable gets masked by the other two, so the graph looks entirely different depending on the order of variables. Does anyone have any suggestions how to manage this? This code is an illustration of what I am dealing with: x - 1 plot(rnorm(x,mean=20),rnorm(x),col=1,xlim=c(16,24)) points(rnorm(x,mean=21),rnorm(x),col=2) points(rnorm(x,mean=19),rnorm(x),col=3) gives an entirely different looking graph to: x - 1 plot(rnorm(x,mean=19),rnorm(x),col=3,xlim=c(16,24)) points(rnorm(x,mean=20),rnorm(x),col=1) points(rnorm(x,mean=21),rnorm(x),col=2) despite being identical in all respects except for the order in which the variables are plotted. I have tried using pch=., however the colours are very difficult to discern. I have experimented with a number of other symbols with no real solution. The only way that appears to work is to iterate the plot with a for loop, and progressively add a few numbers from each variable, as below. However although I can do this simply with random numbers as I have done here, this is an extremely cumbersome method to use with real datasets. plot(1,1,xlim=c(16,24),ylim=c(-4,4),col=white) x - 100 for (i in 1:100) { points(rnorm(x,mean=19),rnorm(x),col=3) points(rnorm(x,mean=20),rnorm(x),col=1) points(rnorm(x,mean=21),rnorm(x),col=2) } Is there some function in R that could solve this through automatically iterating my data as above, using transparent symbols, or something else? Is there some other way of solving this issue that I haven't thought of? Assume you are plotting variables y1, y2, y3 of the same length against a common x, and you would like to assign colors say c(1,2,3). You can automate the randomization of order as follows: n = length(y1); y = c(y1, y2, y3); xx = rep(x, 3); colors = rep(c(1,2,3), c(n, n, n)); order = sample(c(1:(3*n))); plot(xx[order], y[order], col= colors[order]) I basically turn the y's into a single vector y with the corresponding values of x stored in xx and the plotting colors, then randomize the order using the sample function. HTH, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] choosing best 'match' for given factor
Hi Murali. I haven't compared, but this is what I would do: bestMatch-function(searchVector, matchMat) { searchRow-unique(sort(match(searchVector, colnames(matchMat #if you're sure, you could drop unique cat(Original row indices:) print(searchRow) matchMat-matchMat[, -searchRow, drop=FALSE] #avoid duplicates altogether cat(Corrected Matrix:\n) print(matchMat) correctedRows-searchRow - seq_along(searchRow) + 1 #works because of the sort above cat(Corrected row indices:) print(correctedRows) sapply(correctedRows, function(cr){ lookWhere-matchMat[cr, seq(cr-1)] cat(Will now look into:\n) print(lookWhere) cc-which.max(lookWhere) cat(Max at position, cc, \n) colnames(matchMat)[cc] }) } I don't think there's that much difference. Depending on specific sizes, it may be more or less costly to first shrink the search matrix like I do. And similarly depending, I may be better still if you remove the rows that you're not interested in as well (some more but similar index trickery required then. HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of murali.me...@avivainvestors.com Sent: donderdag 31 maart 2011 16:46 To: r-help@r-project.org Subject: [R] choosing best 'match' for given factor Folks, I have a 'matching' matrix between variables A, X, L, O: a - structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58, 0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list( c(A, X, L, O), c(A, X, L, O))) a A X L O A 1.00 0.41 0.58 0.75 X 0.41 1.00 0.60 0.86 L 0.58 0.75 1.00 0.83 O 0.60 0.86 0.83 1.00 And I have a search vector of variables v - c(X, O) I want to write a function bestMatch(searchvector, matchMat) such that for each variable in searchvector, I get the variable that it has the highest match to - but searching only among variables to the left of it in the 'matching' matrix, and not matching with any variable in searchvector itself. So in the above example, although X has the highest match (0.86) with O, I can't choose O as it's to the right of X (and also because O is in the searchvector v already); I'll have to choose A. For O, I will choose L, the variable it's best matched with - as it can't match X already in the search vector. My function bestMatch(v, a) will then return c(A, L) My matrix a is quite large, and I have a long list of search vectors v, so I need an efficient method. I wrote this: bestMatch - function(searchvector, matchMat) { sapply(searchvector, function(cc) { y - matchMat[!(rownames(matchMat) %in% searchvector) (index(rownames(matchMat)) match(cc, rownames(matchMat))), cc, drop = FALSE]; rownames(y)[which.max(y)] }) } Any advice? Thanks, Murali __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a for loop to lapply
Hello Alex. A few issues: * you want seq(dimx) instead of seq(1:dimx) (d'oh) * I think you have problems with your dimensions in the original code as well: you use i, which runs up to dimx as an indexer for your third dimension, of size dimmaps. If dimx dimmaps, you're in for unexpected results. * basic idea of the apply-style functions (nicked *apply below): - first argument = a collection of items to run over. Could be a list or a vector - second argument a function, that could take any of the items in the collection as its first argument - other arguments: either tuning parameters (like simplify) for *apply or passed on as more arguments to the function - each item from the collection is sequentially fed as the first argument, the extra arguments (always the same) are also passed to *apply. - normally, the results of each call are collected into a list, where the names of the list items refers to your original collection. In more elaborate versions (sapply) and under some circumstances, this list is transformed into a simpler structure. * your test case is rather complicated: I don't think there is a way to make lapply or one of its cousins to return a threedimensional array just like that. With sapply (and simplify=TRUE, the default), if the result for each item of your collection has the same length, the result is coerced into a twodimensional array with one column for each item in your collection. * on the other hand, for your example, you probably don't want to use *apply functions nor loops: it can be done with some clever use of seq and rep and dim, for sure. All in all, it seems you may need to get your basics up to speed first, then shift to *apply (and use a simpler example to get started, like: given a matrix with two columns, create a vector holding the differences and the sums of the columns - I know this can be done without *apply as well, but apart from that it is a more attainable exercise). Good luck to you on that! HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Alaios Sent: woensdag 30 maart 2011 8:31 To: R-help@r-project.org Subject: [R] a for loop to lapply Dear all, I am trying to learn lapply. I would like, as a test case, to try the lapply alternative for the Shadowlist-array(data=NA,dim=c(dimx,dimy,dimmaps)) for (i in c(1:dimx)){ Shadowlist[,,i]-i } ---so I wrote the following--- returni -function(i,ShadowMatrix) {ShadowMatrix-i} lapply(seq(1:dimx),Shadowlist[,,seq(1:dimx)],returni) So far I do not get same results with both ways. Could you please help me understand what might be wrong? Regards Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question on glmnet analysis
I haven't read all of your code, but at first read, it seems right. With regard to your questions: 1. Am I doing it correctly or not? Seems OK, as I said. You could use some more standard code to convert your data to a matrix, but essentially the results should be the same. Also, lambda.min may be a tad to optimistic: to correct for the reuse of data in crossvalidation, one normally uses the minus one se trick (I think this is described in the helpfile for glmnet.cv, and that is also present in the glmnet.cv return value (lambda.1se if I'm not mistaken)) 2. Which model, I mean lasso or elastic net, should be selected? and why? Both models chose the same variables but different coefficient values. You may want to read 'the elements of statistical learning' to find some info on the advantages of ridge/lasso/elnet compared. Lasso should work fine in this relatively low-dimensional setting, although it depends on the correlation structure of your covariates. Depending on your goals, you may want to refit a standard logistic regression with only the variables selected by the lasso: this avoids the downward bias that is in (just about) every penalized regression. 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can you calculate 95% confidence interval of odds ratio? Or 95%CI is meaningless in this kind of analysis? At this time, confidence intervals for lasso/elnet in GLM settings is an open problem (the reason being that the L1 penalty is not differentiable). Some 'solutions' exist (bootstrap, for one), but they have all been shown to have (statistical) properties that make them - at the least - doubtful. I know, because I'm working on this. Short answer: there is no way to do this (at this time). HTH (and hang on there in Japan), Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sent: vrijdag 25 maart 2011 14:04 To: r-h...@stat.math.ethz.ch Subject: [R] A question on glmnet analysis Hi, I am trying to do logistic regression for data of 104 patients, which have one outcome (yes or no) and 15 variables (9 categorical factors [yes or no] and 6 continuous variables). Number of yes outcome is 25. Twenty-five events and 15 variables mean events per variable is much less than 10. Therefore, I tried to analyze the data with penalized regression method. I would like please some of the experts here to help me. First of all, I standardized all 6 continuous variables by scale() with center=TRUE and scale=TRUE option. Nine categorical variables and one outcome variable were re-coded as 0 or 1. Then, I used glmnet with standardize=FALSE option because of presence of categorical variables. x15std - matrix(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15), 104, 15) y - outcome library(glmnet) fit.1 - glmnet(x15std, y, family=binomial, standardize=FALSE) fit.1cv - cv.glmnet(x15std, y, family=binomial, standardize=FALSE) default alpha=1, so this should be lasso penalty. Coefficients.fit1 - coef(fit1, s=fit1.cv$lambda.min) Active.Index.fit1 - which(Coefficients.fit1 !=0) Active.Coefficients.fit1 - Coefficients.fit1[Active.Index.fit1] Active.Index.fit1 [1] 1 5 9 10 16 Active.Coefficients.fit1 [1] -1.28774827 0.01420395 0.70444865 -0.27726625 0.18455926 My optimal model chose 5 active covariates including intercept as first one. Second, I did the same things with alpha=0.5 option to do elastic net analysis. fit.2 - glmnet(x15std, y, family=binomial, standardize=FALSE, alpha=0.5) fit.2cv - cv.glmnet(x15std, y, family=binomial, standardize=FALSE, alpha=0.5) Coefficients.fit2 - coef(fit2, s=fit2.cv$lambda.min) Active.Index.fit2 - which(Coefficients.fit2 !=0) Active.Coefficients.fit2 - Coefficients.fit2[Active.Index.fit2] Active.Index.fit2 [1] 1 5 9 10 16 Active.Coefficients.fit2 [1] -1.3286190 0.1410739 0.6315108 -0.2668022 0.2292459 This model chose the same 5 active covariates as first one with lasso penalty. My questions are followings; 1. Am I doing it correctly or not? 2. Which model, I mean lasso or elastic net, should be selected? and why? Both models chose the same variables but different coefficient values. 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can you calculate 95% confidence interval of odds ratio? Or 95%CI is meaningless in this kind of analysis? I would appreciate your help in advance. KH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http
Re: [R] One to One Matching multiple vectors
Hello Vincy. You probably want y[match(z,x)] Or, more instructional: whereAreZInX-match(z, x) y[whereAreZInX] HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Vincy Pyne Sent: woensdag 16 maart 2011 10:42 To: r-help@r-project.org Subject: [R] One to One Matching multiple vectors Dear R helpers Suppose, x = c(0, 1, 2, 3) y = c(A, B, C, D) z = c(1, 3) For given values of z, I need to the values of y. So I should get B and D. I tried doing y[x][z] but it gives y[x][z] [1] A C Kindly guide. Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generic mixup?
Hello list. This is from an R session (admittedly, I'm still using R 2.11.1): print function (x, ...) UseMethod(print) environment: namespace:base showMethods(print) Function print: not a generic function Don't the two results contradict each other? Or do I have a terrible misunderstanding of what comprises a generic function? Thx, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be/ http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up process
Simply avoiding the for loops by using lapply (I may have missed a bracket here or there cause I did this without opening R)... Haven't checked the speed up, though. lapply(seq.yvar, function(k){ plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) lapply(seq_along(mydata_list), function(j){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) return(NULL) }) invisible(NULL) }) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: vrijdag 25 februari 2011 11:20 To: r-help Subject: [R] speed up process Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it wouldn't represent my problem). It might also look too complex for what it is intended to do, but my colleagues who are also supposed to use it don't know much about R. So I wrote it so that they don't have to modify the critical parts to run the script for their needs. #column indexes for function ind.xvar - 2 seq.yvar - 3:4 #position vector for legend(), stupid positioning but it doesn't matter here mypos - c(topleft, topright,bottomleft) #run the function for columns 34 as y (seq.yvar) with column 2 as x (ind.xvar) for all 3 datasets (mydata_list) par(mfrow=c(2,1)) for (i in seq_along(seq.yvar)){ k - seq.yvar[i] plot(mydata1[[k]]~mydata1[[ind.xvar]], type=p, xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k]) for (j in seq_along(mydata_list)){ foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j, pos=mypos[j], name.dat=names(mydata_list)[j]) } } I tried with lapply() or mapply() but couldn't manage to pass the arguments for names() and col= correctly, e.g. for the 2nd loop: lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar, yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])}) mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])}, mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k)) Thanks in advance for any hints. Ivan #create data (it looks horrible with these datasets but it doesn't matter here) mydata1 - structure(list(species = structure(1:8, .Label = c(alsen, gogor, loalb, mafas, pacyn, patro, poabe, thgel), class = factor), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483, 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names = c(species, fruit, Asfc, Tfv), row.names = c(NA, 8L), class = data.frame) mydata2 - mydata1[!(mydata1$species %in% c(thgel,alsen)),] mydata3 - mydata1[!(mydata1$species %in% c(thgel,alsen,poabe)),] mydata_list - list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3) #function for regression library(WRS) foo_reg - function(dat, xvar, yvar, mycol, pos, name.dat){ tsts - tstsreg(dat[[xvar]], dat[[yvar]]) tsts_inter - signif(tsts$coef[1], digits=3) tsts_slope - signif(tsts$coef[2], digits=3) abline(tsts$coef, lty=1, col=mycol) legend(x=pos, legend=c(paste(TSTS ,name.dat,: Y=,tsts_inter,+,tsts_slope,X,sep=)), lty=1, col=mycol) } -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glmnet with binary predictors
Hello Sambit. Step1: Create a matrix out of your predictor data, having columns for every predictor, coding 1 for yes and 0 for no. he matrix should have a row for each observation (called pred.mat below) Besides that, you need a vector with the outcome variable for each observation (best if this is a factor with 2 levels) (called out.v below) Step2 Because you are working with categorical variables, don't forget to always use standardize = FALSE in any call to the glmnet functions (see the docs) Step3 To see how the predictor coefficients move over different values of your penalization parameter, simply do something like myLognet-glmnet(x=pred.mat, y=out.v, standardize = FALSE, family=binomial) and then plot(myLognet, xvar= lambda, label = TRUE) Note: the labels in the plot indicate column numbers in pred.mat Step4 To find the 'best' value of the penalization parameter, use cv.glmnet with the same parameters plus a type (see ?cv.glmnet). Note: if the criterion you want is not provided 'out of the box', it will take you quite a bit of coding, so if you can, take one of the provided ones. Visually, you can select the 'best' value for the penalization parameter from the plot (see ?plot.cv.glmnet), or you can use some numerical argument to find the reasonable extreme value for the criterion. Really boilerplate, I guess. Good luck. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of sambit rath Sent: donderdag 3 februari 2011 10:58 To: r-help@r-project.org Subject: [R] glmnet with binary predictors Hi Everybody! I must start with a declaration that I am a sparse user of R. I am creating a credit scorecard using a dataset which has a variable depicting actual credit history (good/bad) and 41 other variables of yes/no type. The procedure I am asked to follow is to use a penalized logistic procedure for variable selection. I have located the package glmnet which gives the complete elasticnet regularization path for logistic models. I want some help in setting up the process. Can someone point out the basic steps? Thanks Sambit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Preparing dataset for glmnet: factors to dummies
Hello list. For some reason, the makers of glmnet do not accept a dataframe as input. They expect the input to be a matrix, where the dummies are already precoded. Now I have created a sample dataset with . 11 factor columns with two levels . 4 factor columns with three levels . 135 continuous columns (from a standard normal) . 100 observations (rows) Say this dataframe is in dfrPredictors. What I do now, is use the following code: form-paste(~,paste(colnames(dfrPredictors), collapse=+), sep=) dfrTmp-model.frame(dfrPredictors, na.action=na.pass) result- as.matrix(model.matrix(as.formula(form), data=dfrTmp))[,-1] This works (although admittedly, I don't understand everything of it). However, I notice that for this rather limited dataset, this conversion takes around 0.1 seconds user/elapsed time (on a relatively speedy laptop). For my current work, I need to do this a lot of times on very similar dataframes (in fact, they are multiply imputed from the same 'original' dataframe), so I need all the speed I can get. Does anybody know of a way that is quicker than the above? Note: because of other uses of the dataframe, I don't have the option to do this conversion before the imputation, so I really need the conversion itself to work quickly. Thanks, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot not generic
Hello list. I was trying to see some of the code for plot.glmnet in package glmnet (this function name is in the documentation). After loading the library, I tried the obvious typing in the name, but I received a message telling me it could not be found. So I fiddled around a little, and noticed that R does not recognize 'plot' as a generic function, and as such, showMethods does not work. This seems to conflict with the documentation for plot. So 2 questions: . How can I find the code of plot.glmnet . Why is plot not seen as generic? Thx. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be/ http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] get type functions (was: RE: plot not generic)
Thanks for that, Vito. Somehow, I often get lost in the whole slew of similar methods: * get * getMethod * showMethods * getAnywhere * methods Does anybody have a simple list of when to use which one? And maybe I'm even missing some variants? Thx. Nick. -Original Message- From: Vito Muggeo (UniPa) [mailto:vito.mug...@unipa.it] Sent: vrijdag 28 januari 2011 14:42 To: Nick Sabbe Cc: r-help@r-project.org Subject: Re: [R] plot not generic dear Nick, getAnywhere(plot.glmnet) Note the message you get when you type methods(plot) ... Non-visible functions are asterisked Il 28/01/2011 14.26, Nick Sabbe ha scritto: Hello list. I was trying to see some of the code for plot.glmnet in package glmnet (this function name is in the documentation). After loading the library, I tried the obvious typing in the name, but I received a message telling me it could not be found. So I fiddled around a little, and noticed that R does not recognize 'plot' as a generic function, and as such, showMethods does not work. This seems to conflict with the documentation for plot. So 2 questions: . How can I find the code of plot.glmnet . Why is plot not seen as generic? Thx. Nick Sabbe -- ping: nick.sa...@ugent.be link:http://biomath.ugent.be/ http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Università di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 23895240 fax: 091 485726/485612 http://dssm.unipa.it/vmuggeo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Injecting code in a package?
Dear list, I've had this a few times now, and wonder if this is possible: I'm using a package, often for plotting something, but I want to tune the way the plotting goes, in a way that was not foreseen by the maker of the package. Now, most of the time, these kinds of R functions (say pkg::plot.something) call into other R functions (say pkg::plot.something.internal), and it is these that I want to tinker with. So, my question is: can I replace an R function in a package with a version of my own, without having to somehow rebuild the package? I don't just want a non-package bound copy of the function, I want to make sure that when I call pkg::plot.something, this works as before, but when, from within this function, pkg:: plot.something.internal is called, I want it to call _my_ version of it. Any takes? Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be/ http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using a list as multidimensional indexer
Hello list. Another 'puzzle' for which I don't have a clean solution. Say I have a multidimensional object, e.g.: Mm-matrix(1:6, nrow=2, dimnames=list(c(a,b), c(g,h,i))) And on the other hand I have a list Ind-list(b,g) This holds, for each dimension, an indexer for that dimension. Now I would like to get the element pointed at by the list. The obvious solutions don't seem to work, and I can't seem to get do.call to call the indexer ('[') on my multidimensional object. Any suggestions? Thanks in advance, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using a list as multidimensional indexer
Hm. I got somewhat further: Ind2-list(Mm,b,g) do.call([,Ind2) Seems to work. However, now I need it one step beyond: in fact, my actual multidimensional object holds one dimension more than my list holds indexes. i.e.: I want the equivalent of Mm[a,]. I tried some variants of Ind3-list(Mm,b,NULL) do.call([, Ind3) But all of these return integer(0). So the actual new question is: how do I pass a 'missing' argument through a do.call? Thanks for any pointers, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Nick Sabbe Sent: donderdag 20 januari 2011 11:05 To: r-help@r-project.org Subject: [R] Using a list as multidimensional indexer Hello list. Another 'puzzle' for which I don't have a clean solution. Say I have a multidimensional object, e.g.: Mm-matrix(1:6, nrow=2, dimnames=list(c(a,b), c(g,h,i))) And on the other hand I have a list Ind-list(b,g) This holds, for each dimension, an indexer for that dimension. Now I would like to get the element pointed at by the list. The obvious solutions don't seem to work, and I can't seem to get do.call to call the indexer ('[') on my multidimensional object. Any suggestions? Thanks in advance, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] expand.grid
Hello list. I feel like an idiot. There exists a method called expand.grid which, from the documentation, appears to do just what I want, but then it doesn't, and I can't get it to behave. Given a dataframe dfr-data.frame(c1=c(a, b, NA, a, a), c2=c(d, NA, d, e, e), c3=c(g, h, i, j, k)) I would like to have a dataframe with all (unique) combinations of all the factors present. In fact, I would like a simple solution for these two cases: given the three factor columns above, I would like both all _possible_ combinations of the factor levels, and all _present_ combinations of the factor levels (e.g. if I would do this for the first 4 rows of dfr, it would contain no combinations with c3=k). It would also be nice to be able to choose whether or not NA's are included. I'm convinced that some package holds a readymade solution, and I'm trying to switch from always writing my own stuff (get the number of levels per column, then use some apply magic) to using what is there, so thanks for any hints, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be/ http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expand.grid
slaps self in forehead/ I appear to have misinterpreted the help: considering that it explicitly makes note of factors, I wrongly assumed that it would use the levels of a factor automatically. My bad. For completeness' sake, my final solution: getLevels-function(vec, includeNA=FALSE, onlyOccurring=FALSE) { if(onlyOccurring) { rv-levels(factor(vec)) } else { rv-levels(vec) } #cat(levels so far: , rv, \n) if(includeNA any(is.na(vec))) { rv-c(rv,NA) } #cat(levels with na: , rv, \n) return(rv) } expand.combs-function(dfr, includeNA=FALSE, onlyOccurring=FALSE) { expand.grid(lapply(dfr, getLevels, includeNA, onlyOccurring)) } Thx. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: Berwin A Turlach [mailto:ber...@maths.uwa.edu.au] Sent: woensdag 19 januari 2011 11:04 To: Nick Sabbe Cc: r-help@r-project.org Subject: Re: [R] expand.grid G'day Nick, On Wed, 19 Jan 2011 09:43:56 +0100 Nick Sabbe nick.sa...@ugent.be wrote: Given a dataframe dfr-data.frame(c1=c(a, b, NA, a, a), c2=c(d, NA, d, e, e), c3=c(g, h, i, j, k)) I would like to have a dataframe with all (unique) combinations of all the factors present. Easy: R expand.grid(lapply(dfr, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j 17 a d k 18 b d k 19 a e k 20 b e k In fact, I would like a simple solution for these two cases: given the three factor columns above, I would like both all _possible_ combinations of the factor levels, and all _present_ combinations of the factor levels (e.g. if I would do this for the first 4 rows of dfr, it would contain no combinations with c3=k). R dfrpart - lapply(dfr[1:4,], factor) R expand.grid(lapply(dfrpart, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j It would also be nice to be able to choose whether or not NA's are included. R expand.grid(lapply(dfrpart, function(x) c(levels(x), + if(any(is.na(x))) NA else NULL))) c1 c2 c3 1 ad g 2 bd g 3 NAd g 4 ae g 5 be g 6 NAe g 7 a NA g 8 b NA g 9 NA NA g 10ad h 11bd h HTH. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeating value occurence
It is not exactly clear from your message what you want. If you want n random values holding either -1, 0 or 1, use sample(c(-1,0,1), 10, replace=TRUE) or also sample(3, 10, replace=TRUE)-2 If you want n values following the pattern -1, 0, 1, 0 as your example seems to follow, use n-10 pattern- c(-1,0,1,0) rep(pattern, ceiling(n/length(pattern)))[1:n] If you want a sequence of random real numbers between -1 and 1, use runif(10, min=-1, max=1) Here's hoping I haven't just solved your homework... Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rustamali Manesiya Sent: donderdag 13 januari 2011 5:12 To: r-help@r-project.org Subject: [R] Repeating value occurence How can achieve this in R using seq, or rep function c(-1,0,1,0,-1,0,1,0,-1,0) The range value is between-1 and 1, and I want it such that there could be n number of points between -1 and 1 Anyone? Please help Thanks Rusty [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Numbers in a string
Hi Felipe, gsub([^0123456789], , AB15E9SDF654VKBN?dvb.65) results in 15965465. Would that be what you are looking for? Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rainer Schuermann Sent: woensdag 15 december 2010 11:19 To: r-help@r-project.org Subject: Re: [R] Numbers in a string If your OS is Linux, you might want to look at sed or gawk. They are very good and efficient for such tasks. You need it once or as a part of program? Some samples would be helpful... Rgds, Rainer Original-Nachricht Datum: Wed, 15 Dec 2010 16:55:26 +0800 Von: Luis Felipe Parra felipe.pa...@quantil.com.co An: r-help r-help@r-project.org Betreff: [R] Numbers in a string Hello, I have stings which have all sort of characters (numbers, letters, punctuation marks, etc) I would like to stay only with the numbers in them, does somebody know how to do this? Thank you Felipe Parra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- --- Windows: Just say No. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem accessing complex list data frames
Hello Germán. You probably want something like: sapply(vmat, function(curMat){ curMat[,999] != 0 }) Or if you want the indices, just surround this with a which. HTH. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Germán Sanchis Sent: woensdag 8 december 2010 11:50 To: r-help@r-project.org Subject: [R] problem accessing complex list data frames Hi all. I am currently attempting to build a list of sparse matrixes. That I have already achieved, by vmat - list() for (i in 1:n) { vmat - c(vmat, sparseMatrix(i,j,x=data) } How I am trying to select those elements from the list where the column e.g. 999 is not null. I can do this for one of the sparse matrices with which(vmat[[1]][,999] != 0) which returns the rows where such column is non-zero. However, my purpose is to obtain the list indices of the sparse matrices with such non-zero elements. I tried things like which(vmat[[]][,999] != 0) which(vmat[,,999] != 0) sapply(vmat, which, [,999] != 0) but none worked... any help will be appreciated!! Cheers, German [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dataframe from list of similar lists: not _a_ way, but _the best_ way
Hi All. I often find myself in this situation: . Based on some vector (or list) of values, I need to calculate a few new values for each of them, where some of the new values are numbers, but some are more of descriptive nature (so: character strings) . So I use e.g. sapply, passing a custom function that returns a list with all the calculated values . The result of this is: a list (=the return value of sapply) of lists, that all have the same kind of named values A silly example: list.of.lists-sapply(1:10, function(nr){list(org=nr, chr=as.character(nr))}) It seems rather obvious that the result would be better structured as a dataframe. Now I know a few ways to do this (using do.call), but I fear most of these are rather bad in performance: I suspect all the data is being repetitively copied which may be slow. So, my question to the specialists: . Is the above way of working reasonable for this kind of problem? Or would you suggest otherwise? . What would be the best (as in: quickest) way of transforming this list of lists to a dataframe? The answer to this is probably based upon knowledge of the inner workings of R? Or is there any way in which this depends on the specifics of my function (for nontrivial functions and list sizes)? Thanks! Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be/ http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] small problem in coding
Hello Mike. I'm not clear on why you would want to state your g parameter (In fact, I don't even know for sure what you mean by that) with lamda-c(g=0.2) If you want a variable (vector) g containing 0.2, why don't you simply do: g-0.2 If you need that lamda thing for some reason later on, you can always do: lamda-c(g=g) Afterwards to get the same effect. If you have some reason not to do this: With your statement, you create a vector lamda, with one item in it, and that first and only item is named g. So from your statement, you can access g by: lamda[g] as in: Q-exp(lamda[g]) It looks like you've got a misunderstanding of how R variables work, but maybe I just misunderstood your question... HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Gibson Sent: vrijdag 26 november 2010 9:13 To: r-help@r-project.org Subject: [R] small problem in coding I must be missing something. I first state my g parameter with: lamda-c(g=0.2) However, when I do the next step R is telling me object g not found Here is my next step: Q-exp(g) ??? Any help would be greatly appreciated. Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Function
Hi Nikos. There is quite a bit going on here, both in the code and in your terminology. You should really consider reading An introduction to R that comes with your R installation. A few pointers though: * in R speak, you have nowhere declared 2 global matrices: it is not completely clear why you use code like y-c(NA) to try to achieve such a thing, but if I'm not mistaken, this creates a logical vector of length 1. Surely not a matrix. * operator - only looks for variables in the environment in which they are evaluated, as does = (note: I would advise you to use - in R as an assignment operator instead of =). If you want to change variables in other environments, particularly the global environment, you need to use - (?- does not seem to work to get you to its help page, but open R help, then find the search page and search for -, for more information). * apart from that: you may want to avoid the for loop here altogether: y[i:10]-(i:10)+1 f[i:10]-y[(i-1):9]/2 gives you the same result, but more in the R fashion (in general, you want to avoid explicit for loops in R) HTH, Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of rnick Sent: woensdag 10 november 2010 7:51 To: r-help@r-project.org Subject: [R] Simple Function Hi guys, Very new to R and your help would be highly appreciated for the following problem. I am trying to create a simple function which registers values within an array through a for loop. These are the steps I have followed: 1) Declared 2 global matrices 2) Create function mat() with i as an input 3) constructed the for loop 4) called mat(2) The problem is that when i try to get y[4] and f[5] the output is: [1] NA my concern is that i am not addressing any of the following topics: 1) definition of global variable 2) the argument does not go through the for loop 3) the matrices definition is not correct 4) other Please check my code below: y=c(NA) f=c(NA) mat-function(i) { for (k in i:10) { y[k]=k+1 f[k]=y[k-1]/2 } } mat(2) Any thoughts or recommendations would be highly appreciated. Thanks in advance, N -- View this message in context: http://r.789695.n4.nabble.com/Simple-Function-tp3035572p3035572.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Iterator
I guess what you want is: - change the line : xnew-f(xold,data) into xnew-f(xold,data, itel) - change your mat function to take itel as an extra parameter: mat-function (x, data=NULL, itel) {return (1+x^itel)} That should do the trick (though I haven't checked whether the rest of your code is OK) Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of zhiji19 Sent: dinsdag 9 november 2010 9:02 To: r-help@r-project.org Subject: [R] Help with Iterator Dear Experts, The following is my Iterator. When I try to write a new function with itel, I got error. This is what I have: supDist-function(x,y) return(max(abs(x-y))) myIterator - function(xinit,f,data=NULL,eps=1e-6,itmax=5,verbose=FALSE) { + xold-xinit + itel-0 + repeat { + xnew-f(xold,data) + if (verbose) { + cat( + Iteration: ,formatC(itel,width=3, format=d), + xold: ,formatC(xold,digits=8,width=12,format=f), + xnew: ,formatC(xnew,digits=8,width=12,format=f), + \n + ) + } + if ((supDist(xold,xnew) eps) || (itel == itmax)) { + return(xnew) + } + xold-xnew; itel-itel+1 + } + } mat-function (x, data=NULL) {return (1+x^itel)} myIterator(3, f=mat, verbose=TRUE) Error in f(xold, data) : object 'itel' not found Can anyone please help me to fix the error? -- View this message in context: http://r.789695.n4.nabble.com/Help-with-Iterator-tp3033254p3033254.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to eliminate this for loop ?
I doubt this to be true. Try this in R: dmy-rep(1,5) dmy[2:5]-dmy[1:4]+1 This is equivalent to what you propose (even simpler), but it does not, as OP seems to have wanted, fill dmy with 1,2,3,4,5, but, as I had expected, with 1,2,2,2,2. I would be interested in knowing what exactly the difference beween my example above, and the one you suggest, is. As others have suggested: another way is to use actual recursive calls, but I seriously doubt these to be more efficient. You should probably only use it if you really hate to type the word 'for' (-: Though I would also like to see an example where they prove to be the better way to go (by any criteria, but preferably speed or perhaps other resource usage) Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: maandag 8 november 2010 15:04 To: Nick Sabbe Cc: 'PLucas'; r-help@r-project.org Subject: Re: [R] How to eliminate this for loop ? On Nov 8, 2010, at 4:30 AM, Nick Sabbe wrote: Whenever you use a recursion (that cannot be expressed otherwise), you always need a (for) loop. Not necessarily true ... assuming a is of length n: a[2:n] - a[1:(n-1))]*b + cc[1:(n-1)] # might work if b and n were numeric vectors of length 1 and cc had length = n. (Never use c as a vector name.) # it won't work if there are no values for the nth element at the beginning and you are building up a element by element. And you always need to use operations that appropriate to the object type. So if a really is a list, this will always fail since arithmetic does not work on list elements. If on the other hand, the OP were incorrect in calling this a list and a were a numeric vector, there might be a chance of success if the rules of indexing were adhered to. The devil is in the details and the OP has not supplied enough code to tell what might happen. -- David. Apply and the like do not allow to use the intermediary results (i.e. a[i-1] to calculate a[i]). So: no, it cannot be avoided in your case, I guess. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of PLucas Sent: maandag 8 november 2010 10:26 To: r-help@r-project.org Subject: [R] How to eliminate this for loop ? Hi, I would like to create a list recursively and eliminate my for loop : a-c() a[1] - 1; # initial value for(i in 2:N) { a[i]-a[i-1]*b - c[i-1] # b is a value, c is another vector } Is it possible ? Thanks -- View this message in context: http://r.789695.n4.nabble.com/How-to-eliminate-this-for-loop-tp3031667p30316 67.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to eliminate this for loop ?
Whenever you use a recursion (that cannot be expressed otherwise), you always need a (for) loop. Apply and the like do not allow to use the intermediary results (i.e. a[i-1] to calculate a[i]). So: no, it cannot be avoided in your case, I guess. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of PLucas Sent: maandag 8 november 2010 10:26 To: r-help@r-project.org Subject: [R] How to eliminate this for loop ? Hi, I would like to create a list recursively and eliminate my for loop : a-c() a[1] - 1; # initial value for(i in 2:N) { a[i]-a[i-1]*b - c[i-1] # b is a value, c is another vector } Is it possible ? Thanks -- View this message in context: http://r.789695.n4.nabble.com/How-to-eliminate-this-for-loop-tp3031667p30316 67.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Integer Number in Uniform Distribution
Check ?sample. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gundala Viswanath Sent: maandag 25 oktober 2010 8:38 To: r-h...@stat.math.ethz.ch Subject: [R] Random Integer Number in Uniform Distribution Is there a way to do it? At best what I can achieve is non integer: runif(10, min=1, max=100) [1] 51.959151 56.654146 63.630251 3.172794 4.073018 11.977437 86.601869 [8] 75.788618 11.734361 6.770962 -G.V. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if statement and truncated distribution
What I guess you want is something like (this is for zero-truncation): rZeroTruncNormal1d-function(mu, sig, invalidSign) #sig holds standard deviation! { val-rnorm(1, mu, sig) while(val * invalidSign 0) { val-rnorm(1, mu, sig) } return(val) } Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sally Luo Sent: maandag 25 oktober 2010 2:01 To: r-help@r-project.org Subject: [R] if statement and truncated distribution Hi R helpers, I am trying to use the if statement to generate a truncated random variable as follows: if (y[i]==0) { v[i] ~ rnorm(1,0,1) | (-inf ,0) } if (y[i]==1) { v[i] ~ rnorm(1,0,1) | (0, inf) } I guess I cannot use | ( , ) to restrict the range of a variable in R. Could you let me know how to write the code correctly in R? Many thanks for your help. Maomao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find index of a string inside a string?
For simple searches, use grep with fixed=TRUE. Check ?grep. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of yoav baranan Sent: maandag 25 oktober 2010 13:27 To: r-help@r-project.org Subject: [R] Find index of a string inside a string? Hi, I am searching for the equivalent of the function Index from SAS. In SAS: index(abcd, bcd) will return 2 because bcd is located in the 2nd cell of the abcd string. The equivalent in R should do this: myIndex - foo(abcd, bcd) #return 2. What is the function that I am looking for? I want to use the return value in substr, like I do in SAS. thanks, y. baranan. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] printing a variable during a loop
At least in the Windows version, there is an option in the menu that might resolve your issue: In Rgui, Under Misc, there is the option Buffered Output which is checked by default. Unchecking it seems to make sure that messages, print statements and cat output is rendered immediately. A likely consequence will be that your code will run somewhat slower. For using some output as 'progress control' you definitely want to turn the option off. Nick Sabbe -- ping: nick.sa...@ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of j.delashe...@ed.ac.uk Sent: vrijdag 22 oktober 2010 13:00 To: Joshua Wiley Cc: R-help Subject: Re: [R] printing a variable during a loop Thank you for this! I had also wanted in the past to do this, and ended up writing dummy files with informative names to a folder I set to collect these messages, so I'd check the folder to see the new files being generated... It did the job, and at the same time I could see how long it took for my program to reach certain points (filer creation time) but not the most elegant! I didn't know about flush.console() However I used that approach to generate some diagnostic files, so that if a complex process broke (sometimes it involved several system calls to external programs) I had good information about what the program was doing and at what stage it failed. I created a vector to store teh names of all teh files being generated and they could be removed automatically afterwards. Not what the OP wanted, but this strategy may be useful for certain tasks. Jose Quoting Joshua Wiley jwiley.ps...@gmail.com: On Thu, Oct 21, 2010 at 12:03 PM, David Winsemius dwinsem...@comcast.net wrote: On Oct 21, 2010, at 8:58 PM, Antonio Olinto wrote: Thanks Adrienne, but I still in doubt. The behavior of print and message looks the same. Nothing is displayed on the screen after minutes of routine processing . All values of i are displayed only when I press the stop button (I'm under Windows) or when i reaches the maximum value. In the past people have needed to use flush.console() to get output to the screen. Unable to test since A) I'm not running your OS, and B) no reproducible example offered. I am running your OS (though it would also be nice if you reported the results of sessionInfo() ). In any case, this worked for me on R 2.12.0 (i386-pc-mingw32): for(i in 1:6) {Sys.sleep(3); print(i); flush.console()} For your problem, I imagine something like (though untested because no data): for (i in 1:23194) { dat.stat[i,c(2:8)]-quantile(dat.bat[BL==block[i],2],prob=c(0,0.025,0.25,0.5,0.75,0.975,1)) print(i) flush.console() } Thanks again, Antônio Olitno Citando Adrienne Wootten amwoo...@ncsu.edu: instead of print use this message(i) the message command is used for things like this and it will print the value of i as you are looping through, but you can also do this: message(Counter value is: ,i) which returns for i = 20 for example Counter value is 20 for more check out the message help section in the html ? message Adrienne Wootten NCSU On Thu, Oct 21, 2010 at 2:05 PM, Antonio Olinto aolint...@bignet.com.brwrote: Hello, About looping, consider the example: for (i in 1:23194) { dat.stat[i,c(2:8)]-quantile(dat.bat[BL==block[i],2],prob=c(0,0.025,0.25,0.5,0.75,0.975,1)) print(i) } I'd like to have the value of i printed for each loop (step). As I could see the values of i are shown on screen only after all the work is done. Thanks in advance for any suggestion. Best regards, Antonio -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Jose I. de las Heras Email: j.delashe...@ed.ac.uk The Wellcome Trust Centre for Cell BiologyPhone: +44 (0)131 6507095 Institute for Cell Molecular BiologyFax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@r-project.org