Re: [R] Ignoring missing elements in data.frame()
Hi, One possible way to get around it is using following idea : X1 - rnorm(10) X2 - rnorm(10) Names - c(X1,X2,X3) Names - Names[Names %in% ls()] n - length(Names) p - 10 #length of each object output - matrix(NA,ncol=n,nrow=p) for(i in 1:n){ output[,i] - get(Names[i]) } output - as.data.frame(output) names(output) - Names You can also use an eval-parse construct like this : ## Alternative Names - c(X1,X2,X3) Names - Names[Names %in% ls()] Names - paste(Names,collapse=,) expr = paste(output - data.frame(,Names,),sep=) eval(parse(text=expr)) Both are not really the most optimal solution, but do work. It would be better if you made a list or matrix beforehand and then save the results of the calculations in that list or matrix whenever the calculation turns out to give a result. Cheers Joris On Sat, Jun 5, 2010 at 1:23 AM, Scott Chamberlain scham...@rice.edu wrote: Hello, I am trying to make a data frame from many elements after running a function which creates many elements, some of which may not end up being real elements due to errors or missing data. For example, I have the following three elements p1s, p2s, and p3s. p9s did not generate the same data as there was an error in the function for some reason. I currently have to delete p9s from the data.frame() command to get the data.frame to work. How can I make a data frame by somehow ignoring elements (e.g., p9s) that do not exist, without having to delete each missing element from data.frame()? The below is an example of the code. p1s statistic parameter p.value [1,] 3.606518 153 0.0004195377 p2s statistic parameter p.value [1,] -3.412436 8 0.009190015 p3s statistic parameter p.value [1,] 1.543685 599 0.1231928 t(data.frame(t(p1s),t(p2s),t(p3s),t(p9s))) Error in t(p9s) : object 'p9s' not found Thanks, Scott Chamberlain Rice University Houston, TX __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon test output as a table
# not tested out - rbind(as.numeric(Wnew),as.numeric(P)) rownames(out) - c(Wnew,P) Cheers On Sat, Jun 5, 2010 at 11:18 PM, Iurie Malai iurie.ma...@gmail.com wrote: Hi! I searched some time ago a way to get the Wilcoxon test results as a table more or less formatted. Nobody told me any solution and I found nothing on the Internet. Recently I came across this link ( http://myowelt.blogspot.com/2008/04/beautiful-correlation-tables-in-r.html), which helped me to find a solution. Here's the solution (I'm using R Commander): W - as.matrix(lapply(Dataset[2:11], function(x) wilcox.test(x ~ GrFac, alternative=two.sided, data=Dataset)$statistic)) P - as.matrix(lapply(Dataset[2:11], function(x) wilcox.test(x ~ GrFac, alternative=two.sided, data=Dataset)$p.value)) W - format(W, digits = 5, nsmall = 2) P - format(P, digits = 1, nsmall = 3) Wnew - matrix(paste(W), ncol=ncol(Dataset[2:11])) colnames(Wnew) - paste(colnames(Dataset[2:11])) Wnew P This is the output (excerpt): Wnew X1 X2 X3 X4 X5 X6 X7 X8 IA IV [1,] 4582.50 4335.50 4610.50 4008.50 6409.50 6064.50 5126.50 6861.50 4305.50 5769.00 P [1] 0.301 0.100 0.336 0.013 5e-04 0.008 0.756 4e-06 0.089 0.059 Can anyone share their views? Propose an improvement? For example, how to make it appear as a table, not as separate rows? How to remove quotes? How to show rows names W and P? Regards, Iurie Malai, Senior Lecturer Department of Psychology Faculty of Psychology and Special Education Ion Creanga Moldova Pedagogical State University - www.upsm.md http://en.wikipedia.org/wiki/Ion_Creang%C4%83_Pedagogical_State_University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
OK, as you're new: 1) this is a list about R, not about statistics. 2) it looks awkwardly much like a homework assignment. People tend to be not really keen on solving those ones. 3) READ THE POSTING GUIDELINES. Seriously, read them. http://www.R-project.org/posting-guide.html As a tip : go through the archives and search on zero inflated negative binomial or ZINB. You'll find tons of discussions about the code, including very recent ones. Cheers Joris On Sat, Jun 5, 2010 at 3:19 PM, cahyo kristiono cahyo_kristi...@yahoo.com wrote: Dear Sirs First herewith I'll introduce myself. My name is Kristiono, I want ask you to help me how to get ZINB (Zero Inflated Negative Binomial) regression modeling step by step. Anyway, I get some trouble to get step by step about 1. How to get the log likelihood function of ZINB (step by step) 2. How to get first derivative, second derifative to get MLE by Newton Raphson (step by step) 3. Syntax program I want ask you to help me please to solve my trouble above, because I'm very realy need it soon. I will thank you a lot for your help. Sincerely. Kristiono [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R2HTML problem
Tinn-R is using the R2HTML package itself for communication with R. You could ask JC Faria who wrote Tinn-R what exactly is going on there. You might get more help here : http://sourceforge.net/projects/tinn-r/support Personally, I'd just use a different editor in this case. I love Tinn-R, but it's no use programming in an editor that interacts with your code. Cheers Joris On Sat, Jun 5, 2010 at 4:57 PM, RGtk2User iagoco...@gmail.com wrote: Im developing an application with R and Gtk+. It's just a simple GUI which helps new users to interactuate with R. Thing is, when you do a statistical analysis, I also want to provide a HTML report, but HTMLStart doesnt work propperly when executing from TinnR. It does create the file but not empty, I've tried some examples from different websites, and it's always the same.. it works if I execute it from the R prompt, but doesnt when it comes to execute it from TinnR. So, any ideas? Im trying to divide the internal code of the HTMLStart function to find out where it crashes, but I couldnt find it yet. Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcoxon test output as a table
Can't reproduce those with your code and your dataset. I also noticed some other unwanted behaviour by using as.numeric : it changes the formatting again. You won't get rid of the as that indicates it's a character, and you won't be able to format the numbers as the columns in a dataframe or in a matrix have all the same formatting. If you want to generate output for a function or so, you can play around with cat() (see ?cat ). If it's for a report, think about using latex or HTML and the xtable package. There are other options, but that requires a bit more info. And your code is not very optimal. setwd(c:/Temp) Dataset - read.table(Dataset.txt,header=T,sep=,) W - apply(Dataset[2:11],2, function(x) wilcox.test(x ~ GrFac, alternative=two.sided, data=Dataset)$statistic) P - apply(Dataset[2:11],2, function(x) wilcox.test(x ~ GrFac, alternative=two.sided, data=Dataset)$p.value) W - format(W, digits = 5, nsmall = 2) P - format(P, digits = 1, nsmall = 3) out - rbind(W,P) rownames(out) - c(W,P) colnames(out) - colnames(Dataset[2:11]) If you know latex, you can use following package to get library(xtable) xtable(out) # latex output #html output outtable - xtable(out) print(outtable,type=html) On Sat, Jun 5, 2010 at 11:35 PM, Iurie Malai iurie.ma...@gmail.com wrote: Thank you, Joris! I received two identical warnings: [14] WARNING: Warning in if (nchar(cmd) = width) return(cmd) : the condition has length 1 and only the first element will be used [15] WARNING: Warning in if (nchar(cmd) = width) return(cmd) : the condition has length 1 and only the first element will be used 2010/6/6 Joris Meys jorism...@gmail.com # not tested out - rbind(as.numeric(Wnew),as.numeric(P)) rownames(out) - c(Wnew,P) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a maxtrix from conditional prints
Use rbind? Not the most optimal solution, but it should get the job done. # not tested Code example: out - c() for (x in 1:10) { for (y in 1:10) { qui - ifelse((mac[,1] == x) (mac[,5] == y) | (mac[,1] == y) (mac[,5] == x), 1, NA) quo - cbind(mac,qui) qua - subset(quo, qui ==1) if(nrow(qua) == 2) print(qua) out - rbind(out,qua) }} On Fri, Jun 4, 2010 at 9:08 PM, EM evilmas...@gmail.com wrote: Hi guys :) I'm dealing with this problem, perhaps conceptually not that complex, but still - I'm stuck. Two columns, values 1x10, only integers. I want to check when the first column's index is identical to the second's (and vice versa). If that's true, I want to add a further column with value 1 (if true) or NA (if false). Thus, I obtain 100 matrices (for each columns I will have 1-1, 1-2, 1-3 etc). Now, I want R to consider only those matrices whose new column has value = 1 whose total number of rows is equal to 2. I can get R to print this result inside the for cycle, yet I can't manage to build a single matrix, to store all the results altoghether - which is what I really want. Code example: for (x in 1:10) { for (y in 1:10) { qui - ifelse((mac[,1] == x) (mac[,5] == y) | (mac[,1] == y) (mac[,5] == x), 1, NA) quo - cbind(mac,qui) qua - subset(quo, qui ==1) if(nrow(qua) == 2) print(qua) }} result (wrong, now): ricevente genere_r abo_r classieta_r donatore genere_d abo_d classieta_d eta_d mismatch pra comp mum qui [1,] 8 0 1 3 9 1 1 4 56.17437 2 1 1 -6.645437 1 [2,] 9 1 1 2 8 0 1 3 48.77579 2 1 1 -5.905579 1 ricevente genere_r abo_r classieta_r donatore genere_d abo_d classieta_d eta_d mismatch pra comp mum qui [1,] 8 0 1 3 10 0 0 3 48.77579 2 1 1 -5.905579 1 [2,] 10 0 2 5 8 0 1 3 48.77579 1 1 1 -5.391579 1 ricevente genere_r abo_r classieta_r donatore genere_d abo_d classieta_d eta_d mismatch pra comp mum qui [1,] 8 0 1 3 9 1 1 4 56.17437 2 1 1 -6.645437 1 [2,] 9 1 1 2 8 0 1 3 48.77579 2 1 1 -5.905579 1 ricevente genere_r abo_r classieta_r donatore genere_d abo_d classieta_d eta_d mismatch pra comp mum qui [1,] 9 1 1 2 10 0 0 3 48.77579 0 1 1 -4.877579 1 [2,] 10 0 2 5 9 1 1 4 56.17437 0 1 1 -5.617437 1 what I'd like to get: ricevente genere_r abo_r classieta_r donatore genere_d abo_d classieta_d eta_d mismatch pra comp mum qui [1,] 8 0 1 3 9 1 1 4 56.17437 2 1 1 -6.645437 1 [2,] 9 1 1 2 8 0 1 3 48.77579 2 1 1 -5.905579 1 [3,] 8 0 1 3 10 0 0 3 48.77579 2 1 1 -5.905579 1 [4,] 10 0 2 5 8 0 1 3 48.77579 1 1 1 -5.391579 1 [5,] 8 0 1 3 9 1 1 4 56.17437 2 1 1 -6.645437 1 [6,] 9 1 1 2 8 0 1 3 48.77579 2 1 1 -5.905579 1 [7,] 9 1 1 2 10 0 0 3 48.77579 0 1 1 -4.877579 1 [8,] 10 0 2 5 9 1 1 4 56.17437 0 1 1 -5.617437 1 (don't mind the values names, this is just a small part of a longer algorithm) Thanks for your help, in advance :) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Bar Issues
you can't refer to an argument within a function call. Try uiw - Saline[,3] plotCI(x=Saline [,1],y=Saline [,2], uiw=uiw, liw=uiw, err=y, pch=21, pt.bg=par(bg), cex=1.5, lty=1, type=o, gap=0, sfrac=0.005, xlim=c(-21,340),xaxp=c(-20,320,11), xlab=Time (min), ylim=c(0,12), yaxp=c(0,12,11), ylab=Arterial Plasma Acetaminophen (µg/mL), las=1, font.lab=2, add=TRUE) Cheers Joris On Sat, Jun 5, 2010 at 6:09 PM, beloitstudent schu...@beloit.edu wrote: Hello all, I am an undergraduate student who is having syntax issues trying to get error bars on my graph. This is the data, which I assigned the name Saline to. Time Average SEM 1 -20 0.00 0.000 2 3 30 0.00 0.000 4 45 3.227902 0.7462524 5 60 5.04 1.1623944 6 80 6.107491 1.5027762 7 110 6.968231 1.3799637 8 140 7.325713 1.2282053 9 200 7.875194 1.1185175 10 260 6.513927 0.5386359 11 320 4.204342 0.6855906 This is the command that I typed in to get my error bars. plotCI(x=Saline [,1],y=Saline [,2], uiw=Saline [,3], liw=uiw, err=y, pch=21, pt.bg=par(bg), cex=1.5, lty=1, type=o, gap=0, sfrac=0.005, xlim=c(-21,340),xaxp=c(-20,320,11), xlab=Time (min), ylim=c(0,12), yaxp=c(0,12,11), ylab=Arterial Plasma Acetaminophen (µg/mL), las=1, font.lab=2, add=TRUE) And this is the error message I keep getting Error in plotCI(x = Saline[, 1], y = Saline[, 2], uiw = Saline[, 3], liw = uiw, : object 'uiw' not found In addition: Warning message: In if (err == y) z - y else z - x : the condition has length 1 and only the first element will be used Now, to me, the command seems correct. I want the error bars to show up where the points on my graph are...so the x coordinates should be my time (aka Saline [1]) and the y coordinates should be my Averages (aka Saline [2]) and my upper and lower limits to my confidence interval should be the SEM from Saline [3], but something is wrong with this and I cannot figure out what it is. If anyone has suggestions I would be very grateful. Thanks for your help! beloitstudent -- View this message in context: http://r.789695.n4.nabble.com/Error-Bar-Issues-tp2244335p2244335.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordinal variables
Hi, If you look around a bit, there is some great material on the web about the powers and quirks of R. I've taught myself most of what I know from R through reading a lot and trying it out on the console. The help list is also a darn fine source of efficient code for a set of general problems. It won't help any more this year, but I'm working on a guide for R to bundle valuable information I got from the help list and the internet. It should be ready in a couple of months, and it will be available for all to use. In any case, Owen's guide is of great value for an introduction to the command line and basic statistics: http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf Also the introduction to R is a must-read for all our students : http://cran.r-project.org/doc/manuals/R-intro.pdf Next to that, a couple of websites are great additional sources of code : Quick-R, a guide for those who come over from SAS/SPSS/Stata. It contains tons of examples for statistical analyses in about every field. If you didn't know it yet, you'll love it for sure : http://www.statmethods.net/ The R graph gallery, to show what exactly can be done with the graphical power of R : http://addictedtor.free.fr/graphiques/ The R Graphics gallery, doing the same : http://research.stowers-institute.org/efg/R/ There's many more to be found, a whole community of users is contributing to the information in various ways. We give the sources mentioned here to our students, with the message that they should never underestimate the power of Google. Last but not least, there is a specific mailing list regarding teaching statistics using R: https://stat.ethz.ch/mailman/listinfo/r-sig-teaching You might want to take a look at their archives as well. Cheers Joris On Fri, Jun 4, 2010 at 6:39 AM, Iasonas Lamprianou lampria...@yahoo.com wrote: Thanks, I'll have a go and will let you know. I guess that the success has to do with how efficiently I help them to demonstrate the efficiency of code over menues. So part of the issue is how I teach them as well... Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk --- On Thu, 3/6/10, S Ellison s.elli...@lgc.co.uk wrote: From: S Ellison s.elli...@lgc.co.uk Subject: Re: [R] ordinal variables To: Joris Meys jorism...@gmail.com, Iasonas Lamprianou lampria...@yahoo.com Cc: r-help@r-project.org Date: Thursday, 3 June, 2010, 15:44 If you set them a problem that has them doing the same sort of thing five times and compare the time it takes with code pasted from an editor (eg Tinn-R) and the time it takes via menius, you may have more luck convincing them. A command line sequence is harder than menus the first two times but easier for any n iterations thereafter. Steve ellison Iasonas Lamprianou lampria...@yahoo.com 03/06/2010 14:51 Thank you Joris, I'll have a look into the commands you sent me. They look convincing. I hope my students will also see them in a positive way (although I can force them to pretend that they have a positive attitude)! Dr. Iasonas Lamprianou Assistant Professor (Educational Research and Evaluation) Department of Education Sciences European University-Cyprus P.O. Box 22006 1516 Nicosia Cyprus Tel.: +357-22-713178 Fax: +357-22-590539 Honorary Research Fellow Department of Education The University of Manchester Oxford Road, Manchester M13 9PL, UK Tel. 0044 161 275 3485 iasonas.lampria...@manchester.ac.uk --- On Thu, 3/6/10, Joris Meys jorism...@gmail.com wrote: From: Joris Meys jorism...@gmail.com Subject: Re: [R] ordinal variables To: Iasonas Lamprianou lampria...@yahoo.com Cc: r-help@r-project.org Date: Thursday, 3 June, 2010, 14:35 see ?factor and ?as.factor. On ordered factors you can technically do a spearman without problem, apart from the fact that a spearman test by definition cannot give exact p-values with ties present. x - sample(c(a,b,c,d,e),100,replace=T) y - sample(c(a,b,c,d,e),100,replace=T) x.ordered - factor(x,levels=c(e,b,a,d,c),ordered=T) x.ordered y.ordered - factor(y,levels=c(e,b,a,d,c),ordered=T) y.ordered cor.test(x.ordered,y.ordered,method=spearman) require(pspearman) spearman.test(x.ordered,y.ordered) R commander has some menu options to deal with factors. R commander also provides a scripting window. Please do your students a favor, and show them how to use those commands. Cheers Joris On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou lampria...@yahoo.com wrote: Dear colleagues, I teach statistics using SPSS. I want to use R instead. I hit on one problem and I
Re: [R] Handling of par() with variables
I think you misunderstand the working of par(). If you set new parameters, R allows you to store the old parameters simultaneously. Take a look at : par(no.readonly=T) oldpar - par(mar=c(1,1,1,1),tck=0.02) par(no.readonly=T) par(oldpar) par(no.readonly=T) So your line : newpar - par(mar=c(3.1,3.1,0.1,0.1), # margin for figure area oma=c(0,0,0,0), # margin for outer figure area cex.axis=0.9, # font size axis mgp=c(2,0.6,0), # distance of axis tck=0.02# major ticks inside ) actually stores the OLD parameters in newpar, and not the new ones. If you want to set them using a variable, you'll need something like : newmar - c(3.1,3.1,1.0,1.0) # store the mar values in a variable oldpar - par(mar=newmar) # set the mar and store the old values ... par(oldpar) # back to the old parameters Cheers Joris On Fri, Jun 4, 2010 at 11:40 AM, Steffen Uhlig steffen.uh...@htw-saarland.de wrote: Hello! In order to plot multiple graphs with the same setup I use the following code-structure: ### # storing old parameter set oldpar - par(no.readonly=T)t #copying old parameter set newpar - par(no.readonly=T) #adjusting parameters newpar - par(mar=c(3.1,3.1,0.1,0.1), # margin for figure area oma=c(0,0,0,0), # margin for outer figure area cex.axis=0.9, # font size axis mgp=c(2,0.6,0), # distance of axis tck=0.02 # major ticks inside ) ... ... postscript(...) par(newpar) ... dev.off() ### Calling the variable newpar delivers the old paramter set only (from code-line newpar - par(no.readonly=T)). If the code-segment newpar - par(mar=... runs a second time, the correct paramter set is stored, however, just the 5 parameters adjusted and not the full list. My question is, why must the code segment newpar-par(mar...) run twice? Is there a better way to handle the graphics output? I would be grateful for a pointer on a FAQ-section or to an older discussion thread in this group! Thank you very much in advance! Regards, /steffen -- Steffen Uhlig, PhD Mechatronik und Sensortechnik HTW des Saarlandes Goebenstraße 40 66117 Saarbrücken Tel.: +49 (0) 681 58 67 274 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] save in for loop
On a side note: On Thu, May 20, 2010 at 9:43 AM, Ivan Calandra ivan.calan...@uni-hamburg.de wrote: Thanks to all of you for your answers! ... Tao, I don't understand why you have backslashes before file and after .rda. I guess it's something about regular expression, but I'm still very new to it. eval(parse(text=paste(save(file, i, , file=\file, i, .rda\), sep=))) Very simple: You need to give a command as a string. In the save command, you have to put quotation marks around the filename. Now within the paste function, a simple quotation mark would make R believe the string to paste ends there, and you don't want that. So you escape the by typing \, then R knows you want to add the symbol to the string instead of end it. : paste(save(file, i, , file=\file, i, .rda\),sep=) [1] save(file2, file=\file2.rda\) parse(text=paste(save(file, i, , file=\file, i, .rda\),sep=)) expression(save(file2, file=file2.rda)) attr(,srcfile) text paste(save(file, i, , file=file, i, .rda),sep=) Error: unexpected symbol in paste(save(file, i, , file=file Hope it's a bit more clear now. Cheers Joris -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ps-output and LaTeX/DVIPS/PS2PDF - Greek letters disappear
That's a problem of LateX and Ubuntu, not R : https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/319495 You'll have more luck on an Ubuntu list or forum. Cheers Joris On Fri, Jun 4, 2010 at 11:47 AM, Steffen Uhlig steffen.uh...@htw-saarland.de wrote: Hello! My graphs are produced using the postscript-option in R (R version 2.10.1 (2009-12-14)). When Greek letters are used on the axis, everything looks fine in the *.ps-file. If included in a LaTeX-file and (on Ubuntu 10.04, fresh install), the Greek letters appear in the DVI- and PS-output, however, if converted with ps2pdf they suddenly disappear. Could anyone suggest a solution? Best regards, /steffen -- Steffen Uhlig, PhD Mechatronik und Sensortechnik HTW des Saarlandes Goebenstraße 40 66117 Saarbrücken Tel.: +49 (0) 681 58 67 274 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R with Emacs
Emacs ESS : http://ess.r-project.org/ Cheers Joris On Fri, Jun 4, 2010 at 12:55 PM, dhanush dhana...@gmail.com wrote: I want to know how Emacs works with R. can anyone provide me a link or manual to read? Thank you -- View this message in context: http://r.789695.n4.nabble.com/R-with-Emacs-tp2243022p2243022.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tinn-R keyboard problem
Tinn-R works with SDI. Make sure you have both the settings in R and the Rprofile.site correct. If the bug persists with the latest version of Tinn-R, look for help on : http://sourceforge.net/projects/tinn-r/support Cheers Joris On Fri, Jun 4, 2010 at 11:56 AM, dhidh23061972 carsten.giess...@gmx.net wrote: I have the same problem. I also installed the older stable version (1.17.2.4, compatible version with MDI), but with no success. The keyboard worked fine before. I use Windows XP. Is there any solution? Many thanks, Carsten -- View this message in context: http://r.789695.n4.nabble.com/Tinn-R-keyboard-problem-tp839036p2242964.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package mgcv inconsistency in help files? cyclic P-spline cs not cyclic?
Dear all, I'm a bit stunned by the behaviour of a gam model using cyclic P-spline smoothers. I cannot provide the data, as I have about 61.000 observations from a time series. I use the following model : testgam - gam(NO~s(x)+s(y,bs=cs)+s(DD,bs=cs)+s(TT),data=Final) The problem lies with the cyclic smoother I use for seasonal trends. The variable Final$y is a numerical variable, going from 1 to 366, representing the day of the year. I have hourly data from 2003 until 2009, so each day is represented 168 times in the dataset (apart from 366, that one only 48). DD is the wind direction, going from 1 to 3600, and is also modeled with the same cyclic smoother. Yet, if I check the predictions, the smoother for y is far from cyclic. I checked the help files ?smooth.terms, and found about 10 lines apart : bs=cs specifies a shrinkage version of cr. bs=cs gives a cyclic version of a P-spline. When I use the (bs=cc) option, I get the results as I want them, so I keep with the cyclic cubic splines for now. Yet, I find the behaviour of bs=cs puzzling, and I'm wondering whether I missed something, or if this really is an inconsistency in the package. I currently run mgcv 1.6-1 on R 2.10.1 A small example showing what I experience. Mind you that here x is in fact NOT cyclic, whereas in my data I'm sure it has to be : y - rep(1:20,200) x - 1:4000 DD - sample(1:360,4000,replace=T) TT - sample(-10:10,4000,replace=T) NO - TT^2 + (10-y+2)^2 + 10*sin(DD*2*pi/360) - 0.002*sqrt(x) +rnorm(4000,0,100) model - gam(NO~s(x)+s(y,bs=cs)+s(DD,bs=cs)+s(TT)) plot(model) model - gam(NO~s(x)+s(y,bs=cc)+s(DD,bs=cc)+s(TT)) plot(model) Cheers Joris -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function
This is what you asked for. Prod2007 - 1:10 Prod2006 - Prod2007/1+c(0,diff(Prod2007)) Prod2005 - Prod2006+(1+c(0,diff(Prod2006))) Prod2004 - Prod2005+(1+c(0,diff(Prod2005))) Prod2006 [1] 1 3 4 5 6 7 8 9 10 11 Prod2005 [1] 2 6 6 7 8 9 10 11 12 13 Prod2004 [1] 3 11 7 9 10 11 12 13 14 15 Sure that's what you want? On Thu, Jun 3, 2010 at 12:30 PM, n.via...@libero.it n.via...@libero.itwrote: Dear list, I would like to ask you a question. I'm trying to build the time series' production with the Divisia index. The final step would require to do the following calculations: a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)] b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)] c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)] my question is how can I tell R to take the value generated in the previous step (for example is the case of the produciton of 2005 that need the value of the production of 2006) in order to generate the time series production?? (PS:my data.frame is not set as a time series) Thanks for your attention!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam error
Data? On Thu, Jun 3, 2010 at 1:24 PM, natalieh fbs...@leeds.ac.uk wrote: Hi all, I'm trying to use a gam (mgcv package) to analyse some data with a roughly U shaped curve. My model is very simple with just one explanatory variable: m1-gam(CoT~s(incline)) However I just keep getting the error message Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom Just wondering if anyone had come across this before/ could offer any advice on where the problem might lie Many thanks, Natalie -- View this message in context: http://r.789695.n4.nabble.com/gam-error-tp2241518p2241518.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Continous variables with implausible transformation?
x - rnorm(100,10,1) sqrtx - sqrt(x) y - rbinom(100,1,0.5) lrm(y~x+sqrtx) works. What's the problem? But you wrote linear+ square. Don't you mean: lrm(Y~x+x^2) Cheers On Thu, Jun 3, 2010 at 6:34 AM, zhu yao mailzhu...@gmail.com wrote: Dear r users I have a question in coding continuous variables in logistic regression. When rcs is used in transforming variables, sometime it gives implausible associations with the outcome although the model x2 is high. So what's your tips and tricks in coding continuous variables. P.S. How to code variables as linear+square in the formula such as lrm. lrm(y~x+sqrt(x)) can't work. Many thanks. Yao Zhu. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compare results of glms
Mailing this twice ain't going to help you. Reading a course on statistics might. The test you want to do is answering following hypothesis : The mean predicted value of a specific model differs when different datasets are used to fit it. Seems likely to me if the datasets are not almost identical. Why testing? About that Z-test : that should be used in your field of research to test 2 proportions that are not too close to 0 or 1 and that originate from a binomial distribution with large enough n. Suggesting to use it for comparing a number of series of around 20 logit-transformed predicted probabilities is plain shocking. In case you are interested in the difference of the intercept for these specific trials, add trial as a fixed effect to your model and do the appropriate testing. You want to know whether the relation between state and days differs in slope, you add an interaction term and again use the appropriate testing. To know what is the appropriate testing, see line 1. Cheers Joris On Thu, Jun 3, 2010 at 10:31 AM, Sacha Viquerat sacha.v...@googlemail.comwrote: dear list! i have run several glm analysises to estimate a mean rate of dung decay for independent trials. i would like to compare these results statistically but can't find any solution. the glm calls are: dung.glm1-glm(STATE~DAYS, data=o_cov, family=binomial(link=logit)) dung.glm2-glm(STATE~DAYS, data=o_cov_T12, family=binomial(link=logit)) as all the trials have different sample sizes (around 20 each), anova(dung.glm1, dung.glm2) is not applicable. has anyone an idea? thanks in advance! ps: my advisor urges me to use the z-test (the common test statistic in my field of research), but i reject that due to the small sample size. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordinal variables
see ?factor and ?as.factor. On ordered factors you can technically do a spearman without problem, apart from the fact that a spearman test by definition cannot give exact p-values with ties present. x - sample(c(a,b,c,d,e),100,replace=T) y - sample(c(a,b,c,d,e),100,replace=T) x.ordered - factor(x,levels=c(e,b,a,d,c),ordered=T) x.ordered y.ordered - factor(y,levels=c(e,b,a,d,c),ordered=T) y.ordered cor.test(x.ordered,y.ordered,method=spearman) require(pspearman) spearman.test(x.ordered,y.ordered) R commander has some menu options to deal with factors. R commander also provides a scripting window. Please do your students a favor, and show them how to use those commands. Cheers Joris On Thu, Jun 3, 2010 at 2:25 PM, Iasonas Lamprianou lampria...@yahoo.comwrote: Dear colleagues, I teach statistics using SPSS. I want to use R instead. I hit on one problem and I need some quick advice. When I want to work with ordinal variables, in SPSS I can compute the median or create a barchart or compute a spearman correlation with no problems. In R, if I read the ordinal variable as numeric, then I cannot do a barplot because I miss the category names. If I read the variables as characters, then I cannot run a spearman. How can I read a variable as numeric, still have the chance to assign value labels, and be able to get table of frequencies etc? I want to be able to do all these things in R commander. My students will probable be scared away if I try anything else other than R commander (just writing commands will not make them happy). I hope I am not asking for too much. Hopefully there is a way __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing
= NBII) GAMLSS-RS iteration 1: Global Deviance = 284.5993 GAMLSS-RS iteration 2: Global Deviance = 281.9548 ##..## GAMLSS-RS iteration 5: Global Deviance = 280.7311 GAMLSS-RS iteration 15: Global Deviance = 280.6343 model_ZINBI - gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= ZINBI) GAMLSS-RS iteration 1: Global Deviance = 1672.234 GAMLSS-RS iteration 2: Global Deviance = 544.742 GAMLSS-RS iteration 3: Global Deviance = 598.9939 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate Thus, in this case, only the Poisson (PO) and Negative Binomial type I (NBI)converge whereas all other models fail My first approach was to omit the smoothing factors for each model, or further reduce the number of variables but this does not solve the problem and most models fail, often yielding a Error in RS() : The global deviance is increasing message. I would think that, given the fact that the dependent variable is zero-inflated and overdispersed, that the Zero-Inflated Negative Binomial (ZINBI) distribution would be the best fit, but the ZINBI even fails in the following very simple examples. model_ZINBI - gamlss(duck ~ cs(LFAP200,df=3),data=data,family= ZINBI) GAMLSS-RS iteration 1: Global Deviance = 3508.533 GAMLSS-RS iteration 2: Global Deviance = 1117.121 GAMLSS-RS iteration 3: Global Deviance = 652.5771 GAMLSS-RS iteration 4: Global Deviance = 632.8885 GAMLSS-RS iteration 5: Global Deviance = 645.1169 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate model_ZINBI - gamlss(duck ~ LFAP200,data=data,family= ZINBI) GAMLSS-RS iteration 1: Global Deviance = 3831.864 GAMLSS-RS iteration 2: Global Deviance = 1174.605 GAMLSS-RS iteration 3: Global Deviance = 562.5428 GAMLSS-RS iteration 4: Global Deviance = 344.0637 GAMLSS-RS iteration 5: Global Deviance = 1779.018 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate Any suggestions on how to proceed with this? Many thanks in advance, Diederik Diederik Strubbe Evolutionary Ecology Group Department of Biology University of Antwerp Groenenborgerlaan 171 2020 Antwerpen, Belgium tel: +32 3 265 3464 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/http://www.ucl.ac.uk/%7Eucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function
That's a bit more clear. Prod2007=2 Delta=c(4,3,5) Delta - 1+Delta/100 Series - Prod2007+cumsum(Delta) Series [1] 3.04 4.07 5.12 On Thu, Jun 3, 2010 at 1:21 PM, n.via...@libero.it n.via...@libero.itwrote: What I would like to do is for example: Suppose that I have the following value a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)] b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)] c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)] where prod(2007)=2 DELTA_PROD(2007)=4 DELTA_PROD(2006)=3 DELTA_PROD(2005)=5 so prod(2007) is like the starting value of production from wich starts the construction of its the time series. So: prod(2006)=2+[1+4/100] which is equal to 3.04 so i will have: prod(2005)=3.04+ [1+3/100] and so on Messaggio originale Da: jorism...@gmail.com Data: 03/06/2010 13.05 A: n.via...@libero.itn.via...@libero.it Cc: r-help@r-project.org Ogg: Re: [R] function This is what you asked for. Prod2007 - 1:10 Prod2006 - Prod2007/1+c(0,diff(Prod2007)) Prod2005 - Prod2006+(1+c(0,diff(Prod2006))) Prod2004 - Prod2005+(1+c(0,diff(Prod2005))) Prod2006 [1] 1 3 4 5 6 7 8 9 10 11 Prod2005 [1] 2 6 6 7 8 9 10 11 12 13 Prod2004 [1] 3 11 7 9 10 11 12 13 14 15 Sure that's what you want? On Thu, Jun 3, 2010 at 12:30 PM, n.via...@libero.it n.via...@libero.itwrote: Dear list, I would like to ask you a question. I'm trying to build the time series' production with the Divisia index. The final step would require to do the following calculations: a)PROD(2006)=PROD(2007)/1+[DELTA_PROD(2007)] b)PROD(2005)=PROD(2006)+[1+DELTA_PROD(2006)] c)PROD(2004)=PROD(2005)+[1+DELTA_PROD(2005)] my question is how can I tell R to take the value generated in the previous step (for example is the case of the produciton of 2005 that need the value of the production of 2006) in order to generate the time series production?? (PS:my data.frame is not set as a time series) Thanks for your attention!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam error
This doesn't tell us much either. What does the variable incline represent, and what does the variable ToC represent? I could guess your data looks something like : ToC Incline x1-90 x2-60 x3-30 x4 0 x5 30 x6 60 x7 90 x8 -90 ... ... Or incline could be the number of the sample (going from 1 to 7). No way to know what you did. Please, read the posting guide and take the hints given there into consideration. This said, you very likely just have not enough data to use a thin plate regression spline without limiting k. see ?choose.k and ?null.space.dimension Cheers Joris On Thu, Jun 3, 2010 at 2:49 PM, natalieh fbs...@leeds.ac.uk wrote: Data? The data are measures of energy use (continuous variable) for running on 7 inclines between -90 and +90 degrees (n=7-21). -- View this message in context: http://r.789695.n4.nabble.com/gam-error-tp2241518p2241608.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Continous variables with implausible transformation?
You're right, it is the same. using I() won't work for the same reason sqrt don't, so : x2 - x^2 lrm(y~x+x2) Thx for the correction. Cheers Joris On Thu, Jun 3, 2010 at 6:14 PM, Bert Gunter gunter.ber...@gene.com wrote: Below. -- Bert Bert Gunter Genentech Nonclinical Biostatistics -- But you wrote linear+ square. Don't you mean: lrm(Y~x+x^2) --- I believe this is the same as lrm(Y ~ x). You must protect the x^2 via lrm(Y ~ x + I(x^2)) -- -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing
See below. On Thu, Jun 3, 2010 at 5:35 PM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote: On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote: On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote: vegan is probably not too useful here as the response is univariate; counts of ducks. If we assume that only one species is counted and of interest for the whole research. I (probably wrongly) assumed that data for multiple species was available. Without knowledge about the whole research setup it is difficult to say which method is the best, or even which methods are appropriate. VGAM is indeed a powerful tool, but : proportion_non_zero - (sum(ifelse(data$duck == 0,0,1))/182) means 182 observations in the dataset model_NBI - gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data, family= NBI) is 5 splines with 3df, an intercept, that's a lot of df for only 182 observations. using VGAM ain't going to help here. How do you know? I don't. I thought it would be like that because you use essentially the same splines, and I overlooked the fact that the OP tried to reduce to a single smooth. I stand corrected. Cheers Joris I'd reckon that the model itself should be reconsidered, rather than the distribution used to fit the error terms. I was going to mention that too, but the OP did reduce this down to a single smooth and the problem of increasing deviance remained. Hence trying to fit a /similar/ model in other software might give an indication whether the problems are restricted to a single software or a more general issue of the data/problem? At this stage the OP is stuck not knowing what is wrong; (s)he has nothing to do model checking on etc. Trying zeroinfl() and fitting a parametric model, for example, might be a useful starting point, then move on to models with smoothers if required. He (quite positive on that one :-) ) can indeed try to use VGAM on the model with one smooth and see if that turns out to give something. That should give some clarity on the question whether it is the optimization of pscl that goes wrong, or whether the problem is inherent to the data. I'd like to suggest next to that to take a closer look at the iteration parameters of the gamlss function itself. Honestly, I've never tried these ones out before, but you never know whether it would work. See ?gamlss.control Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing
Correction : That should give some clarity on the question whether it is the optimization of GAMLSS that goes wrong, or whether the problem is inherent to the data. On Thu, Jun 3, 2010 at 7:00 PM, Joris Meys jorism...@gmail.com wrote: See below. On Thu, Jun 3, 2010 at 5:35 PM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote: On Thu, 2010-06-03 at 17:00 +0200, Joris Meys wrote: On Thu, Jun 3, 2010 at 9:27 AM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote: vegan is probably not too useful here as the response is univariate; counts of ducks. If we assume that only one species is counted and of interest for the whole research. I (probably wrongly) assumed that data for multiple species was available. Without knowledge about the whole research setup it is difficult to say which method is the best, or even which methods are appropriate. VGAM is indeed a powerful tool, but : proportion_non_zero - (sum(ifelse(data$duck == 0,0,1))/182) means 182 observations in the dataset model_NBI - gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data, family= NBI) is 5 splines with 3df, an intercept, that's a lot of df for only 182 observations. using VGAM ain't going to help here. How do you know? I don't. I thought it would be like that because you use essentially the same splines, and I overlooked the fact that the OP tried to reduce to a single smooth. I stand corrected. Cheers Joris I'd reckon that the model itself should be reconsidered, rather than the distribution used to fit the error terms. I was going to mention that too, but the OP did reduce this down to a single smooth and the problem of increasing deviance remained. Hence trying to fit a /similar/ model in other software might give an indication whether the problems are restricted to a single software or a more general issue of the data/problem? At this stage the OP is stuck not knowing what is wrong; (s)he has nothing to do model checking on etc. Trying zeroinfl() and fitting a parametric model, for example, might be a useful starting point, then move on to models with smoothers if required. He (quite positive on that one :-) ) can indeed try to use VGAM on the model with one smooth and see if that turns out to give something. That should give some clarity on the question whether it is the optimization of pscl that goes wrong, or whether the problem is inherent to the data. I'd like to suggest next to that to take a closer look at the iteration parameters of the gamlss function itself. Honestly, I've never tried these ones out before, but you never know whether it would work. See ?gamlss.control Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumsum function with data frame
See ?split and ?unsplit. Data - read.table(textConnection(variableYear value EC01 2005 5 EC01 2006 10 AAO12005 2 AAO1 2006 4),header=T) Datalist -split(Data,Data$variable) resultlist - lapply(Datalist,function(x){ x$cumul - cumsum(x$value) return(x) }) result - unsplit(resultlist,Data$variable) result variable Year value cumul 1 EC01 2005 5 5 2 EC01 20061015 3 AAO1 2005 2 2 4 AAO1 2006 4 6 On a side note: I've used this construction now for a number of problems. Some could be better solved using more specific functions (e.g. ave() for adding a column with means for example). I'm not sure however this is the most optimal approach to applying a function to subsets of a dataframe and adding the result of that function as an extra variable. Anybody care to elaborate on how the R masters had it in mind? Cheers Joris On Thu, Jun 3, 2010 at 5:58 PM, n.via...@libero.it n.via...@libero.itwrote: Dear list, I have a problem with the cumsum function. I have a data frame like the following one variableYear value EC01 2005 5 EC01 2006 10 AAO12005 2 AAO1 2006 4 what I would like to obtain is variableYear value cumsum EC01 2005 5 5 EC01 2006 10 15 AAO12005 22 AAO1 2006 46 if I use the by function or the aggregate function the result is a list or something else, what I want is a data frame as I showed above... anyone knows how to get it??? THANKS A LOT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
Could you copy the data? Data - data.frame(C.Mean,Mean.richness,Zoop,Diversity,Phyto) dput(Data) I have the feeling something's wrong there. I believe you have 48 observations (47df + 1 for the intercept), 2 levels of Diversity, 4 of Phyto and 48/(3*4)=4 levels of Zoop. But you don't have 3df for Zoop. Either I'm way off, or what goes in the lm is not what you think it is. I tried a small sample with the datastructure I believe you have, but I couldn't reproduce your error. ## Run Phyto - as.factor(rep(rep(c(A,B,C,D),each=6),2)) Diversity - as.factor(rep(c(High,Low),each=24)) Zoop - rep(c(1,2,3,4),times=12) C.Mean - rnorm(48) Mean.richness -rnorm(48) test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) Anova(test,type=III) Zoop - as.factor(Zoop) Anova(test,type=III) ## End Run Cheers Joris On Thu, Jun 3, 2010 at 10:26 PM, Anita Narwani anitanarw...@gmail.comwrote: I would just like to add that when I remove the co-variate of Mean.richness from the model (i.e. eliminating the non-orthogonality), the aliasing warning is replaced by the following error message: Error in t(Z) %*% ip : non-conformable arguments That is when I enter this model: carbonmean-lm(C.Mean~ Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) On Wed, Jun 2, 2010 at 6:05 PM, Joris Meys jorism...@gmail.com wrote: that's diversity/phyto, zoop or phyto twice in the formula. On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote: That's what one would expect with type III sum of squares. You have Phyto twice in your model, but only as a nested factor. To compare the full model with a model without diversity of zoop, you have either the combination diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing. Depending on how you stand on type III sum of squares, you could call that a bug. Personally, I'd just not use them. https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html Cheers Joris On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote: Hello, I have been trying to get an ANOVA table for a linear model containing a single nested factor, two fixed factors and a covariate: carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) where, *Mean.richness* is a covariate*, Zoop* is a categorical variable (the species), *Diversity* is a categorical variable (Low or High), and *Phyto*(community composition) is also categorical but is nested within the level of *Diversity*. Quinn Keough's statistics text recommends using Type III SS for a nested ANOVA with a covariate. I get the following output using the Type I SS ANOVA: Analysis of Variance Table Response: C.Mean DfSum Sq Mean Sq F valuePr(F) Mean.richness1 5638532656385326 23.5855 3.239e-05 *** Diversity 1 14476593 14476593 6.0554 0.019634 * Zoop1 13002135 13002135 5.4387 0.026365 * Diversity:Phyto 6 126089387 21014898 8.7904 1.257e-05 *** Diversity:Zoop 1 263036 263036 0.1100 0.742347 Diversity:Zoop:Phyto 6 6171014510285024 4.3021 0.002879 ** Residuals3174110911 2390675 I have tried using both the drop1() command and the Anova() command in the car package. When I use the Anova command I get the following error message: Anova(carbonmean,type=III) Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One or more terms aliased in model. I am not sure why this is aliased. There are no missing cells, and the cells are balanced (aside from for the covariate). Each Phyto by Zoop cross is replicated 3 times, and there are four Phyto levels within each level of Diversity. When I remove the nested factor (Phyto), I am able to get the Type III SS output. Then when I use drop1(carbonmean,.~.,Test=F) I get the following output: drop1(carbonmean,.~.,Test=F) Single term deletions Model: C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop * Diversity/Phyto DfSum of Sq RSS AIC none74110911 718 Mean.richness1 49790403 123901314 741 Diversity 0 0 74110911718 Zoop0 0 74110911718 Diversity:Phyto 6 118553466 192664376 752 Diversity:Zoop 0
Re: [R] Nested ANOVA with covariate using Type III sums of squares
I see where my confusion comes from. I counted 4 levels of Phyto, but you have 8, being 4 in every level of Diversity. There's your aliasing. table(Diversity,Phyto) Phyto Diversity M1 M2 M3 M4 P1 P2 P3 P4 H 0 0 0 0 6 6 6 6 L 6 6 6 6 0 0 0 0 There's no need to code them differently for every level of Diversity. If you don't, all is fine : Phyto - gsub(M,P,as.character(Phyto)) Phyto - as.factor(Phyto) test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F value Pr(F) (Intercept) 23935609 1 10.0121 0.0034729 ** Mean.richness 49790385 1 20.8269 7.471e-05 *** Diversity 35807205 1 14.9779 0.0005234 *** Zoop 10794614 1 4.5153 0.0416688 * Diversity:Phyto 118553464 6 8.2650 2.184e-05 *** Diversity:Zoop 261789 1 0.1095 0.7429356 Diversity:Zoop:Phyto 61710162 6 4.3021 0.0028790 ** Residuals 74110938 31 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 You can check with summary(test) that the model is fitted correctly. On Fri, Jun 4, 2010 at 12:48 AM, Anita Narwani anitanarw...@gmail.com wrote: You have everything right except that there are only 2 zooplankton species (C D, which stand for Ceriodaphnia and Daphnia). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
Hi Anita, I have to correct myself too, I've been rambling a bit. Off course you don't delete the variable out of the interaction term when you test the main effect. What I said earlier didn't really make any sense. That testing a main effect without removing the interaction term is has a tricky interpretation. By removing a main effect you test full model A + B + A:B against the model A + A:B. If you remove the main effect Zoop for example, you basically nest Zoop within Diversity and test whether that's not worse than the full model. This explains it very well: https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html I'd go for type II, but you're free to test any hypothesis you want. Cheers Joris On Thu, Jun 3, 2010 at 9:59 PM, Anita Narwani anitanarw...@gmail.comwrote: Thanks for your response Joris. I was aware of the potential for aliasing, although I thought that this was only a problem when you have missing cell means. It was interesting to read the vehement argument regarding the Type III sums of squares, and although I knew that there were different positions on the topic, I had no idea how divisive it was. Nevertheless, Type III SS are generally recommended by statistical texts in ecology for my type of experimental design. Interestingly, despite the aliasing, SPSS has no problems calculating Type III SS for this data set. This is simply because I am entering a co-variate, which causes non-orthogonality. I would be happier using R and the default Type I SS, which are the same as the Type III SS anyway when I omit the co-variate of Mean.richness, except that these results are very sensitive to the order in which I add the variables into the model when I do enter the co-variate. I understand that the order is very important relates back to the scientific hypothesis, but I am equally interested in the main effects of Zoop, Diversity, and the nested effect of Phyto, so entering either of these variables before the other does not make sense from an ecological perspective, and because the results do change, the order cannot be ignored from a statistical perspective. Finally, I have tried using the Type II SS and received similar warnings. Do you have a recommendations? Anita. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumsum function with data frame
But then you don't apply cumsum within each factor level. Hence the ddply. Cheers Joris On Thu, Jun 3, 2010 at 9:35 PM, Jorge Ivan Velez jorgeivanve...@gmail.com wrote: Or better yet, you can use transform only (in base): transform(Data, CUMSUM = cumsum(value)) HTH, Jorge On Thu, Jun 3, 2010 at 3:30 PM, Felipe Carrillo wrote: Better yet, is shorter using tranform instead of summarise: Data - read.table(textConnection(variable Year value EC01 2005 5 EC01 2006 10 AAO1 2005 2 AAO1 2006 4),header=T) ddply(Data,.(variable),transform,CUMSUM=cumsum(value)) - Original Message From: Felipe Carrillo mazatlanmex...@yahoo.com To: Joris Meys jorism...@gmail.com; n.via...@libero.it n.via...@libero.it Cc: r-help@r-project.org Sent: Thu, June 3, 2010 11:28:58 AM Subject: Re: [R] cumsum function with data frame You can also use ddply from the plyr package: library(plyr) Data - read.table(textConnection(variable Year value EC01 2005 5 EC01 2006 10 AAO1 2005 2 AAO1 2006 4),header=T) Data ddply(Data,.(variable),summarise,Year=Year,value=value,CUMSUM=cumsum(value)) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: Joris Meys ymailto=mailto:jorism...@gmail.com; href=mailto:jorism...@gmail.com;jorism...@gmail.com To: ymailto=mailto:n.via...@libero.it; href=mailto:n.via...@libero.it;n.via...@libero.it ymailto=mailto:n.via...@libero.it; href=mailto:n.via...@libero.it;n.via...@libero.it Cc: ymailto=mailto:r-help@r-project.org; href=mailto:r-help@r-project.org;r-help@r-project.org Sent: Thu, June 3, 2010 9:26:17 AM Subject: Re: [R] cumsum function with data frame See ?split and ?unsplit. Data - read.table(textConnection(variable Year value EC01 2005 5 EC01 2006 10 AAO1 2005 2 AAO1 2006 4),header=T) Datalist -split(Data,Data$variable) resultlist - lapply(Datalist,function(x){ x$cumul - cumsum(x$value) return(x) }) result - unsplit(resultlist,Data$variable) result variable Year value cumul 1 EC01 2005 5 5 2 EC01 2006 10 15 3 AAO1 2005 2 2 4 AAO1 2006 4 6 On a side note: I've used this construction now for a number of problems. Some could be better solved using more specific functions (e.g. ave() for adding a column with means for example). I'm not sure however this is the most optimal approach to applying a function to subsets of a dataframe and adding the result of that function as an extra variable. Anybody care to elaborate on how the R masters had it in mind? Cheers Joris On Thu, Jun 3, 2010 at 5:58 PM, ymailto=mailto: href=mailto:n.via...@libero.it;n.via...@libero.it href=mailto: href=mailto:n.via...@libero.it;n.via...@libero.it ymailto=mailto:n.via...@libero.it; href=mailto:n.via...@libero.it;n.via...@libero.it ymailto=mailto: href=mailto:n.via...@libero.it;n.via...@libero.it href=mailto: href=mailto:n.via...@libero.it;n.via...@libero.it ymailto=mailto:n.via...@libero.it; href=mailto:n.via...@libero.it;n.via...@libero.itwrote: Dear list, I have a problem with the cumsum function. I have a data frame like the following one variable Year value EC01 2005 5 EC01 2006 10 AAO1 2005 2 AAO1 2006 4 what I would like to obtain is variable Year value cumsum EC01 2005 5 5 EC01 2006 10 15 AAO1 2005 2 2 AAO1 2006 4 6 if I use the by function or the aggregate function the result is a list or something else, what I want is a data frame as I showed above... anyone knows how to get it??? THANKS A LOT [[alternative HTML version deleted]] __ ymailto=mailto: ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org href=mailto: href=mailto:R-help@r-project.org;R-help@r-project.org ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 href
Re: [R] Subsetting for unwanted values
PcToAdd_-Pc[!(Pc %in% Pc.X)] PcToAdd_ [1] Res Os Gov Rur PcToAdd_-subset(Pc,!(Pc %in% Pc.X)) PcToAdd_ [1] Res Os Gov Rur On Fri, Jun 4, 2010 at 1:52 AM, LCOG1 jr...@lcog.org wrote: Hi all, I have toyed with this for too long today and in the past i used multiple lines of code to get at what i want. Consider the following: All i need to do is subset Pc to the values that do not equal Pc.X. The first attempt doesnt work because i have unequal lengths. The second attempt doesnt give me an the right answer. Pc-c(Res,Com,Ind,Os,Mix,Gov,Rur) Pc.X-c(Com,Ind,Mix) PcToAdd_-Pc[Pc!=Pc.X] #Doesnt Work AND PcToAdd_-subset(Pc.X,Pc.X %in% Pc) #Works but doesnt get me what i want I am looking a return of PcToAdd_ - Res Os Gov Rur This has got to be a simple answer. Thanks -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-for-unwanted-values-tp2242506p2242506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
SPSS uses a different calculation. As far as I understood, they test main effects without the covariate. Regarding the difference between my and your results, did you use sum contrasts? options(contrasts=c(contr.sum,contr.poly)) On Fri, Jun 4, 2010 at 2:19 AM, Anita Narwani anitanarw...@gmail.comwrote: Hi Joris, That seems to have worked and the contrasts look correct. I have tried comparing the results to what SPSS produces for the same model. The two programs produce very different results, although the model F statistics, R squared and adjusted R squared values are identical. The results are so different that I don't know what to trust. For the same model you coded I got: test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F valuePr(F) (Intercept) 28223311 1 11.8056 0.001701 ** Mean.richness49790403 1 20.8269 7.471e-05 *** Diversity31055477 1 12.9903 0.001082 ** Zoop 2736238 1 1.1445 0.292953 Diversity:Phyto 27943313 6 1.9481 0.104103 Diversity:Zoop 168184 1 0.0703 0.792584 Diversity:Zoop:Phyto 61710145 6 4.3021 0.002879 ** Residuals74110911 31 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Also sightly different from your result) and summary(test) Call: lm(formula = C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + +Zoop * Diversity/Phyto) Residuals: Min 1Q Median 3Q Max -3555.26 -479.5349.94 423.49 4073.20 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -8562.9 2492.2 -3.436 0.00170 ** Mean.richness 4605.7 1009.2 4.564 7.47e-05 *** DiversityL 6576.9 1824.8 3.604 0.00108 ** ZoopD -1414.4 1322.1 -1.070 0.29295 DiversityH:PhytoP2-4307.5 1824.8 -2.361 0.02472 * DiversityL:PhytoP2 -268.4 1262.5 -0.213 0.83300 DiversityH:PhytoP3-2233.4 1393.0 -1.603 0.11900 DiversityL:PhytoP3-1571.4 1262.5 -1.245 0.22257 DiversityH:PhytoP4-7914.8 2647.2 -2.990 0.00543 ** DiversityL:PhytoP4-1612.8 1262.5 -1.277 0.21092 DiversityL:ZoopD484.9 1828.0 0.265 0.79258 DiversityH:ZoopD:PhytoP2683.9 1855.3 0.369 0.71493 DiversityL:ZoopD:PhytoP2 6346.4 1785.4 3.555 0.00124 ** DiversityH:ZoopD:PhytoP3 4922.8 1786.3 2.756 0.00971 ** DiversityL:ZoopD:PhytoP3 1085.4 1785.4 0.608 0.54766 DiversityH:ZoopD:PhytoP4 3261.8 1985.6 1.643 0.11055 DiversityL:ZoopD:PhytoP4681.9 1785.4 0.382 0.70513 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1546 on 31 degrees of freedom Multiple R-squared: 0.7858, Adjusted R-squared: 0.6753 F-statistic: 7.109 on 16 and 31 DF, p-value: 1.810e-06 From SPSS I got Tests of Between-Subjects Effects Dependent Variable:C Mean Source Type III Sum of Squares df Mean Square F Sig. Corrected Model 2.719E+08 16 1.700E+07 7.109 .000 Intercept 2.394E+07 1 2.394E+07 10.012 .003 Meanrichness 4.979E+07 1 4.979E+07 20.827 .000 Diversity 3.581E+07 1 3.581E+07 14.978 .001 Zoop 1.079E+07 1 1.079E+07 4.515 .042 Diversity * Zoop 261789.172 1 261789.172 .110 .743 Phyto(Diversity) 1.186E+08 6 1.976E+07 8.265 .000 Phyto * Zoop(Diversity) 6.171E+07 6 1.029E+07 4.302 .003 Error 7.411E+07 31 2.391E+06 Total 7.959E+08 48 Corrected Total 3.460E+08 47 Which, gives some similar results, but a completely different F statistic and P-value for the main effect of Zoop and the nested effect of Phyto. Obviously SPSS is not necessarily the perfect reference, but when using the Type I SS, the results did agree. Any thoughts on why this might be? Could the two programs be calculating the Type III SS differently? Might it be wise to stick to Type I SS? Thanks very much for your time and effort. It has been very helpful. Anita. On Thu, Jun 3, 2010 at 4:25 PM, Joris Meys jorism...@gmail.com wrote: I see where my confusion comes from. I counted 4 levels of Phyto, but you have 8, being 4 in every level of Diversity. There's your aliasing. table(Diversity,Phyto) Phyto Diversity M1 M2 M3 M4 P1 P2 P3 P4 H 0 0 0 0 6 6 6 6 L 6 6 6 6 0 0 0 0 There's no need to code them differently for every level of Diversity. If you don't, all is fine : Phyto - gsub(M,P,as.character(Phyto)) Phyto - as.factor(Phyto) test - lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + + Zoop*Diversity/Phyto) Anova(test,type=III) Anova Table (Type III tests) Response: C.Mean Sum Sq Df F valuePr(F) (Intercept
Re: [R] storing output data from a loop that has varying row numbers
Hi Ross, the trick is especially split() and unsplit(). Split() splits up the dataframe based on the combined factors, unsplit() transforms it to a dataframe again. This way you can do the calculation for a set of mini-dataframes that contain only the information for 1 combination of the factors. lapply is the apply-function specifically for lists. As split() gives you a list of dataframes, lapply loops through those dataframes the appropriate way. You can see that for yourself by doing str(seal_list). Hope this clears things out a bit. Cheers Joris On Wed, Jun 2, 2010 at 9:55 AM, RCulloch ross.cull...@dur.ac.uk wrote: Hi Jorvis, Many thanks for sorting that! I haven't seen it done that way before, so I'll have to look in to the properties of lapply a bit more to get a full appreciation of other approaches to looping data in R. Thanks again for your help, it is much appreciated, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2239711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Seeking help on Vectorize()
Your arguments are not coming through. fn - function(x = 1:3, y = 3:6) { x - matrix(x, nrow=1) y - matrix(y, ncol=1) dat - apply(x, 2, function(xx) { apply(y, 1, function(yy) { return(xx + yy) } ) }) Vectorize(dat, SIMPLIFY = TRUE) return(dat)} fn(1:3,3:7) [,1] [,2] [,3] [1,]456 [2,]567 [3,]678 [4,]789 [5,]89 10 Cheers Joris On Wed, Jun 2, 2010 at 11:25 AM, Megh Dal megh700...@yahoo.com wrote: Dear falks, here I have written following function : fn - Vectorize(function(x = 1:3, y = 3:6) { x - matrix(x, nrow=1) y - matrix(y, ncol=1) dat - apply(x, 2, function(xx) { apply(y, 1, function(yy) { return(xx + yy) } ) }) return(dat)}, SIMPLIFY = TRUE) If I run this function, I got some warning message, even format of the returned object is not correct, for example : fn(x = 1:3, y = 3:7) [1] 4 6 8 7 9 Warning message: In mapply(FUN = function (x = 1:3, y = 3:6) : longer argument not a multiple of length of shorter However if I run individual line of codes like : x - 1:3; y = 3:7 x - matrix(x, nrow=1) y - matrix(y, ncol=1) dat - apply(x, 2, function(xx) { + apply(y, 1, function(yy) { + return(xx + yy) } ) }) dat [,1] [,2] [,3] [1,]456 [2,]567 [3,]678 [4,]789 [5,]89 10 I get exactly what I want. Where I am making fault? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to label the som notes by the majority vote
Hi Changbin, I looked at your code again, and it appears as if you're using the mapping plot for something that it isn't meant for. The mapping shows you how many points you have in every circle, and these points are represented by the labels. Your first plot gives the majority vote. This said, you can hack the function using: classif - predict(nir.xyf) tmp - table(classif$unit.classif,classif$prediction) label - colnames(tmp) label - apply(tmp!=0,1,function(x){label[x]})[classif$unit.classif] label[-match(1:16,classif$unit.classif)] - cl - colors() bgcols - rev(heat.colors(4)) plot(nir.xyf, type=mapping,bgcol=bgcols[as.numeric(as.factor(temp.predict))], main=Mapping plot,labels=label) It does not calculate the majority vote itself, it just assigns a label to the category based on the predicted labels. Which is equivalent in this case. Cheers Joris On Wed, Jun 2, 2010 at 5:08 AM, Changbin Du changb...@gmail.com wrote: library(kohonen) data(nir) attach(nir) #SOM, the supervised learning, train the map using temperature as the class variable. set.seed(13) nir.xyf- xyf(data=spectra, Y=classvec2classmat(temperature), xweight = 0.9, grid=somgrid(4, 4, hexagonal)) temp.xyf - predict(nir.xyf)$unit.prediction #get prediction temp.predict- as.numeric(classmat2classvec(temp.xyf)) #change matrix to vectors. par(mfrow=c(1,2)) plot(nir.xyf, type=property, property=temp.predict, palette.name=rainbow, main=Prediction ) cl - colors() bgcols - cl[2:14] plot(nir.xyf, type=mapping, labels=nir$temperature, bgcol=bgcols[as.integer(temp.predict)], main=Mapping plot) par(mfrow=c(1,1)) HI, Joris, Thanks so much for your suggestion! I have modified the above codes, and what I want is to label the notes by the temperature. if a note has 3 objects mapped to it (the temperature are 30, 40, 30), then I want the 30 be labeled on the note. the right plot is the mapping plot, I want it to be labeled by only one temperature. Thanks so much! On Tue, Jun 1, 2010 at 5:36 PM, Joris Meys jorism...@gmail.com wrote: Dear Changbin, Please provide a self-contained, minimal example, meaning the whole code should run and create the plot as it is now, without having to load your dataset (which we don't have). Otherwise it's impossible to see what's going on and help you. Cheers Joris On Wed, Jun 2, 2010 at 2:21 AM, Changbin Du changb...@gmail.com wrote: HI, Dear R community, I am using the following codes to do the som. I tried to label the notes by the majority vote. either through mapping or prediction. I attached my output, the left one dont have any labels in the note, the right one has more than one label in each note. I need to have only one label for each note either by majority vote or prediction. Can anyone give some suggestions or advice? Thanks so much! alex-read.table(/home/cdu/operon/alex2.txt, , sep=\t, skip=0, header=T, fill=T) alex1-alex[,c(1:257)] levels(alex1$Label) alex1$outcome-as.numeric(alex1$Label) alex1$outcome[1:20] #self-organizing maps(unsupervised learning) library(kohonen) #SOM, the supervised learning, train the map using outcome as the class variable. set.seed(13) final.xyf- xyf(data=as.matrix(alex1[,c(1:256)]), Y=classvec2classmat(alex1$outcome), xweight = 0.99, grid=somgrid(20, 30, hexagonal)) outcome.xyf - predict(final.xyf)$unit.prediction#get prediction outcome.predict- as.numeric(classmat2classvec(outcome.xyf)) #change matrix to vectors. outcome.label-LETTERS[outcome.predict] #conver the numeric value to letters. plot(final.xyf, type=property, property=outcome.predict, labels=outcome.label, palette.name =rainbow, main=Prediction ) cl - colors() bgcols - cl[2:14] plot(final.xyf, type=mapping, labels=outcome.label, col=black, bgcol=bgcols[as.integer(outcome.predict)], main=Mapping plot) -- Sincerely, Changbin -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Sincerely, Changbin -- -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted
Re: [R] Problems using gamlss to model zero-inflated and overdispersed count data: the global deviance is increasing
GAMLSS-RS iteration 3: Global Deviance = 652.5771 GAMLSS-RS iteration 4: Global Deviance = 632.8885 GAMLSS-RS iteration 5: Global Deviance = 645.1169 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate model_ZINBI - gamlss(duck ~ LFAP200,data=data,family= ZINBI) GAMLSS-RS iteration 1: Global Deviance = 3831.864 GAMLSS-RS iteration 2: Global Deviance = 1174.605 GAMLSS-RS iteration 3: Global Deviance = 562.5428 GAMLSS-RS iteration 4: Global Deviance = 344.0637 GAMLSS-RS iteration 5: Global Deviance = 1779.018 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate Any suggestions on how to proceed with this? Many thanks in advance, Diederik Diederik Strubbe Evolutionary Ecology Group Department of Biology University of Antwerp Groenenborgerlaan 171 2020 Antwerpen, Belgium tel: +32 3 265 3464 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr mystery can not remove trailing spaces
Could you provide us with dput(becva$V1[1])? Cheers Joris On Wed, Jun 2, 2010 at 2:07 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Dear all I encountered strange problem with regexpr replacement I made this character object str - 02.06.10 12:40 str(str) chr 02.06.10 12:40 I read in an object which seems to be quite similar str(as.character(becva$V1)[1]) chr 02.06.10 12:40 However I can not remove trailing spaces from it sub(' +$', '', as.character(becva$V1[1])) [1] 02.06.10 12:40 sub(' +$', '', str) [1] 02.06.10 12:40 Do somebody have an idea what to do? $version.string [1] R version 2.12.0 Under development (unstable) (2010-04-25 r51820) on Windows Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compute the associate vector of distances between leaves in a binary non-rooted tree
Hi, with a little hack you can use the function cophenetic.phylo from ape. You just set all branch lengths to 1 : require(ape) tree - rtree(5,rooted=F) n - length(tree$edge.length) tree$edge.length - rep(1,n) cophenetic.phylo(tree) t3 t1 t2 t4 t5 t3 0 3 3 3 3 t1 3 0 2 4 4 t2 3 2 0 4 4 t4 3 4 4 0 2 t5 3 4 4 2 0 Cheers On Wed, Jun 2, 2010 at 2:47 PM, Arnau Mir Torres arnau@uib.es wrote: Hello. I'd like to compute the associate vector of distances between leaves in a binary non-rooted tree. The definition of a distance between two leaves in a binary non-rooted tree is the number of edges in the path joining the two leaves. I've tried the ape package but I'm unable to find this vector. For example, using rtree(5,rooted=F) I've obtained the following tree: $edge [,1] [,2] [1,]67 [2,]71 [3,]78 [4,]82 [5,]83 [6,]64 [7,]65 $tip.label [1] t4 t3 t2 t1 t5 $edge.length [1] 0.9126727 0.2765674 0.4996832 0.7904400 0.8508797 0.8174133 0.9027958 $Nnode [1] 3 My question is: how to compute the vector of distances between the 5 leaves. This vector is in this case: v=(d(t1,t2),d(t1,t3),d(t1,t4),d(t1,t5),d(t2,t3),d(t2,t4),d(t2,t5),d(t3,t4),d(t3,t5),d(t4,t5))=(4,4,3,2,2,3,4,3,4,3). Thanks in advance, Arnau. Arnau Mir Torres Edifici A. Turmeda Campus UIB Ctra. Valldemossa, km. 7,5 07122 Palma de Mca. tel: (+34) 971172987 fax: (+34) 971173003 email: arnau@uib.es URL: http://dmi.uib.es/~arnau http://dmi.uib.es/%7Earnau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr mystery can not remove trailing spaces
sub(\\s+$, '', bbb,perl=T) does it for me. On Wed, Jun 2, 2010 at 3:22 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi dput(bbb) c(02.06.10 12:40 , 02.06.10 12:00 , 02.06.10 11:00 , 02.06.10 10:00 , 02.06.10 09:00 , 02.06.10 08:00 , 02.06.10 07:00 , 02.06.10 06:00 , 02.06.10 05:00 , 02.06.10 04:00 , 02.06.10 03:00 , 02.06.10 02:00 , 02.06.10 01:00 , 02.06.10 00:00 , 01.06.10 23:00 , 01.06.10 22:00 , 01.06.10 21:00 , 01.06.10 20:00 , 01.06.10 19:00 , 01.06.10 18:00 , 01.06.10 17:00 , 01.06.10 16:00 , 01.06.10 15:00 , 01.06.10 14:00 , 01.06.10 13:00 , 01.06.10 05:00 , 31.05.10 05:00 , 30.05.10 05:00 , 29.05.10 05:00 , 28.05.10 05:00 , 27.05.10 05:00 ) For simplicity I change the name and put it to single variable. I also reinstalled R to recent R-devel sub('\\w+$', '', bbb[1]) [1] 02.06.10 12:40 sub('[:space:]', '', bbb[1]) [1] 02.06.10 1240 I also tried Matt's suggestion but it did not help. Regards Petr Joris Meys jorism...@gmail.com napsal dne 02.06.2010 14:35:19: Could you provide us with dput(becva$V1[1])? Cheers Joris On Wed, Jun 2, 2010 at 2:07 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Dear all I encountered strange problem with regexpr replacement I made this character object str - 02.06.10 12:40 str(str) chr 02.06.10 12:40 I read in an object which seems to be quite similar str(as.character(becva$V1)[1]) chr 02.06.10 12:40 However I can not remove trailing spaces from it sub(' +$', '', as.character(becva$V1[1])) [1] 02.06.10 12:40 sub(' +$', '', str) [1] 02.06.10 12:40 Do somebody have an idea what to do? $version.string [1] R version 2.12.0 Under development (unstable) (2010-04-25 r51820) on Windows Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr mystery can not remove trailing spaces
Hi Petr, Matt may very well have been right. As I copied the dput from the mail, any white space is converted to spaces apparently. Still, it might be possible the white spaces in your original data are tabs or even newline characters. You can check that easily with grep(\t, as.character(becva$V1[1])) grep(\n, as.character(becva$V1[1])) Cheers Joris On Wed, Jun 2, 2010 at 3:54 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi thanks. I am puzzled what was wrong. Now even sub(' +$', '', bbb[1]) works. I am checking water throughput in nearby river and copying data from internet. So I wonder if there was some change recently as during floods they update it in about 10 minutes interval. Regards Petr jim holtman jholt...@gmail.com napsal dne 02.06.2010 15:44:42: You had the wrong case on 'w' and the wrong expression with [:space:]'; see below bbb - c(02.06.10 12:40 , 02.06.10 12:00 , 02.06.10 11:00 , + 02.06.10 10:00 , 02.06.10 09:00 , 02.06.10 08:00 , + 02.06.10 07:00 , 02.06.10 06:00 , 02.06.10 05:00 , + 02.06.10 04:00 , 02.06.10 03:00 , 02.06.10 02:00 , + 02.06.10 01:00 , 02.06.10 00:00 , 01.06.10 23:00 , + 01.06.10 22:00 , 01.06.10 21:00 , 01.06.10 20:00 , + 01.06.10 19:00 , 01.06.10 18:00 , 01.06.10 17:00 , + 01.06.10 16:00 , 01.06.10 15:00 , 01.06.10 14:00 , + 01.06.10 13:00 , 01.06.10 05:00 , 31.05.10 05:00 , + 30.05.10 05:00 , 29.05.10 05:00 , 28.05.10 05:00 , + 27.05.10 05:00 ) sub('\\W+$', '', bbb[1]) [1] 02.06.10 12:40 sub('[[:space:]]+$', '', bbb[1]) [1] 02.06.10 12:40 On Wed, Jun 2, 2010 at 9:22 AM, Petr PIKAL petr.pi...@precheza.cz wrote: Hi dput(bbb) c(02.06.10 12:40 , 02.06.10 12:00 , 02.06.10 11:00 , 02.06.10 10:00 , 02.06.10 09:00 , 02.06.10 08:00 , 02.06.10 07:00 , 02.06.10 06:00 , 02.06.10 05:00 , 02.06.10 04:00 , 02.06.10 03:00 , 02.06.10 02:00 , 02.06.10 01:00 , 02.06.10 00:00 , 01.06.10 23:00 , 01.06.10 22:00 , 01.06.10 21:00 , 01.06.10 20:00 , 01.06.10 19:00 , 01.06.10 18:00 , 01.06.10 17:00 , 01.06.10 16:00 , 01.06.10 15:00 , 01.06.10 14:00 , 01.06.10 13:00 , 01.06.10 05:00 , 31.05.10 05:00 , 30.05.10 05:00 , 29.05.10 05:00 , 28.05.10 05:00 , 27.05.10 05:00 ) For simplicity I change the name and put it to single variable. I also reinstalled R to recent R-devel sub('\\w+$', '', bbb[1]) [1] 02.06.10 12:40 sub('[:space:]', '', bbb[1]) [1] 02.06.10 1240 I also tried Matt's suggestion but it did not help. Regards Petr Joris Meys jorism...@gmail.com napsal dne 02.06.2010 14:35:19: Could you provide us with dput(becva$V1[1])? Cheers Joris On Wed, Jun 2, 2010 at 2:07 PM, Petr PIKAL petr.pi...@precheza.cz wrote: Dear all I encountered strange problem with regexpr replacement I made this character object str - 02.06.10 12:40 str(str) chr 02.06.10 12:40 I read in an object which seems to be quite similar str(as.character(becva$V1)[1]) chr 02.06.10 12:40 However I can not remove trailing spaces from it sub(' +$', '', as.character(becva$V1[1])) [1] 02.06.10 12:40 sub(' +$', '', str) [1] 02.06.10 12:40 Do somebody have an idea what to do? $version.string [1] R version 2.12.0 Under development (unstable) (2010-04-25 r51820) on Windows Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted
Re: [R] glmnet strange error message
Could you give us the traceback? (In case you don't know, just type traceback() right after you got the error message.) I can't reproduce the error, so it gets a bit difficult to solve without having the real data. Cheers Joris On Wed, Jun 2, 2010 at 6:51 PM, Dave_F friedenbe...@battelle.org wrote: Hello fellow R users, I have been getting a strange error message when using the cv.glmnet function in the glmnet package. I am attempting to fit a multinomial regression using the lasso. covars is a matrix with 80 rows and roughly 4000 columns, all the covariates are binary. resp is an eight level factor. I can fit the model with no errors but when I try to cross-validate after about 30 seconds I get the following: glmnet.fit = glmnet(covars,resp,family=multinomial) glmnet.cv = cv.glmnet(covars,resp,family=multinomial,type=class) Error in if (outlist$msg != Unknown error) return(outlist) : argument is of length zero It seems like it makes it through the first couple folds but trips up somewhere in the middle. The example in the documentation works perfectly on my machine. Any ideas on what the problem may be? Thanks! Dave -- View this message in context: http://r.789695.n4.nabble.com/glmnet-strange-error-message-tp2240458p2240458.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why the dim gave me different results
, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nnet: cannot coerce class c(terms, formula) into a data.frame
Without checking R or the rest of the code, the error seems quite clear to me: R finds a formula where it expects a data frame. cvc_lda is not a dataframe. Do str(cvc_lda) to check for yourself. You really need to learn this btw. Whenever you get an error, first thing to do is to check whether everything you put in the function is what you think it is, and is what R needs it to be. Before you overload the help list with questions, please take some time to read the introduction to R thoroughly. You really need to get to understand the differences between vectors or arrays, matrices, data frames, lists, ... You struggle with it quite obviously, and that's a problem we cannot solve for you. http://cran.r-project.org/doc/manuals/R-intro.pdf If there is something that is not clear to you, feel free to ask here. Cheers Joris On Wed, Jun 2, 2010 at 8:15 PM, cobbler_squad la.f...@gmail.com wrote: Dearest all, Objective: I am now learning neural networks. I want to see how well can train an artificial neural network model to discriminate between the two files I am attaching with this message. http://r.789695.n4.nabble.com/file/n2240582/3dMaskDump.txt 3dMaskDump.txt http://r.789695.n4.nabble.com/file/n2240582/test_vowels.txttest_vowels.txt Question: when I am attempting to run cvc_nnet - nnet(G ~ ., data=cvc_lda, size=1,iter=10,MaxNWts=100) I get an error saying: Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class c(terms, formula) into a data.frame I have not encountered this error when I was running this script with previous lda results, and, I am not quite sure what the error means. Below is short (and, I hope, reproducible) code. library(nnet) cvc_nnet - nnet(G ~ ., data=cvc_lda, size=1,iter=10,MaxNWts=100) predict(cvc_nnet,cvc_lda,type = class) table(predict(cvc_nnet,cvc_lda,type = class),cvc_lda$G) cvc_nnet.out-NULL all = c(1:52) for(n in all){ cvc_nnet - nnet(G ~ ., data=cvc_lda[all != n,], CV =TRUE,size=1,iter=10,MaxNWts=100) cvc_nnet.out - c(cvc_nnet.out,predict(cvc_nnet,cvc_lda[all == n,],type = class)) } table(cvc_nnet.out,cvc_lda$G) === to get cvc_lda: library(MASS) vowel_features - data.frame(as.matrix(read.table(file = test_vowels.txt))) mask_features - data.frame(as.matrix(read.table(file = 3dmaskdump.txt))) G -vowel_features[,41] cvc_lda - lda(G ~ ., data=mask_features, na.action=na.omit, CV=TRUE) Your insight is very much appreciated it! -- View this message in context: http://r.789695.n4.nabble.com/nnet-cannot-coerce-class-c-terms-formula-into-a-data-frame-tp2240582p2240582.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use apply only on non-missing values
Not really a direct answer on your question, but: system.time(replicate(1,apply(as.matrix(theta), 1, rasch, b_vector))) user system elapsed 4.510.034.55 system.time(replicate(1,theta%*%t(b_vector))) user system elapsed 0.250.000.25 It does make a difference on large datasets... Cheers Joris On Wed, Jun 2, 2010 at 4:44 PM, Doran, Harold hdo...@air.org wrote: I have a function that I am currently using very inefficiently. The following are needed to illustrate the problem: set.seed(12345) dat - matrix(sample(c(0,1), 110, replace = TRUE), nrow = 11, ncol=10) mis - sample(1:110, 5) dat[mis] - NA theta - rnorm(11) b_vector - runif(10, -4,4) empty - which(is.na(t(dat))) So, I have a matrix (dat) with some values within the matrix missing. In my real world problem, the matrix is huge, and most values are missing. The function in question is called derivs() and is below. But, let me step through the inefficient portions. First, I create a matrix of some predicted probabilities as: rasch - function(theta,b) 1/ (1 + exp(b-theta)) mat - apply(as.matrix(theta), 1, rasch, b_vector) However, I only need those predicted probabilities in places where the data are not missing. So, the next step in the function is mat[empty] - NA which manually places NAs in places where the data are missing (notice the matrix 'mat' is the transpose of the data matrix and so I get the empty positions from the transpose of dat). Afterwards, the function computes the gradient and hessians needed to complete the MLE estimation. All of this works in the sense that it yields the correct answers for my problem. But, the glaring problem is that I create predicted probabilities for every cell in 'mat' when in many cases they are not needed. I end up replacing those values with NAs. In my real world problem, this is horribly inefficient and slow. My question is then is there a way to use apply such that is computes the necessary predicted probabilities only when the data are not missing to yield the matrix 'mat'. My desired end result is the matrix 'mat' created after the manually placing the NAs in the appropriate cells. Thanks Harold derivs - function(dat, b_vector, theta){ mat - apply(as.matrix(theta), 1, rasch, b_vector) mat[empty] - NA gradient - -(colSums(dat, na.rm = TRUE) - rowSums(mat, na.rm = TRUE)) hessian - -(rowSums(mat * (1-mat), na.rm = TRUE)) list('gradient' = gradient, 'hessian' = hessian) } sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
That's what one would expect with type III sum of squares. You have Phyto twice in your model, but only as a nested factor. To compare the full model with a model without diversity of zoop, you have either the combination diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing. Depending on how you stand on type III sum of squares, you could call that a bug. Personally, I'd just not use them. https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html Cheers Joris On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote: Hello, I have been trying to get an ANOVA table for a linear model containing a single nested factor, two fixed factors and a covariate: carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) where, *Mean.richness* is a covariate*, Zoop* is a categorical variable (the species), *Diversity* is a categorical variable (Low or High), and *Phyto*(community composition) is also categorical but is nested within the level of *Diversity*. Quinn Keough's statistics text recommends using Type III SS for a nested ANOVA with a covariate. I get the following output using the Type I SS ANOVA: Analysis of Variance Table Response: C.Mean DfSum Sq Mean Sq F valuePr(F) Mean.richness1 5638532656385326 23.5855 3.239e-05 *** Diversity 1 14476593 14476593 6.0554 0.019634 * Zoop1 13002135 13002135 5.4387 0.026365 * Diversity:Phyto 6 126089387 21014898 8.7904 1.257e-05 *** Diversity:Zoop 1 263036 263036 0.1100 0.742347 Diversity:Zoop:Phyto 6 6171014510285024 4.3021 0.002879 ** Residuals3174110911 2390675 I have tried using both the drop1() command and the Anova() command in the car package. When I use the Anova command I get the following error message: Anova(carbonmean,type=III) Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One or more terms aliased in model. I am not sure why this is aliased. There are no missing cells, and the cells are balanced (aside from for the covariate). Each Phyto by Zoop cross is replicated 3 times, and there are four Phyto levels within each level of Diversity. When I remove the nested factor (Phyto), I am able to get the Type III SS output. Then when I use drop1(carbonmean,.~.,Test=F) I get the following output: drop1(carbonmean,.~.,Test=F) Single term deletions Model: C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop * Diversity/Phyto DfSum of Sq RSS AIC none74110911 718 Mean.richness1 49790403123901314 741 Diversity 0 0 74110911718 Zoop0 0 74110911718 Diversity:Phyto 6 118553466 192664376 752 Diversity:Zoop 0 -1.49e-0874110911 718 Diversity:Zoop:Phyto 6 61710145135821055 735 There are zero degrees of freedom for Diversity, Zoop and their interaction, and zero sums of sq for Diversity and Zoop. This cannot be correct, however when I do the model simplification by dropping terms from the models manually and comparing them using anova(), I get virtually the same results. I would appreciate any suggestions for things to try or pointers as to what I may be doing incorrectly. Thank you. Anita Narwani. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested ANOVA with covariate using Type III sums of squares
that's diversity/phyto, zoop or phyto twice in the formula. On Thu, Jun 3, 2010 at 3:00 AM, Joris Meys jorism...@gmail.com wrote: That's what one would expect with type III sum of squares. You have Phyto twice in your model, but only as a nested factor. To compare the full model with a model without diversity of zoop, you have either the combination diversity/phyto, zoop/phyto or phyto twice in the formula. That's aliasing. Depending on how you stand on type III sum of squares, you could call that a bug. Personally, I'd just not use them. https://stat.ethz.ch/pipermail/r-help/2001-October/015984.html Cheers Joris On Thu, Jun 3, 2010 at 2:13 AM, Anita Narwani anitanarw...@gmail.comwrote: Hello, I have been trying to get an ANOVA table for a linear model containing a single nested factor, two fixed factors and a covariate: carbonmean-lm(C.Mean~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop*Diversity/Phyto) where, *Mean.richness* is a covariate*, Zoop* is a categorical variable (the species), *Diversity* is a categorical variable (Low or High), and *Phyto*(community composition) is also categorical but is nested within the level of *Diversity*. Quinn Keough's statistics text recommends using Type III SS for a nested ANOVA with a covariate. I get the following output using the Type I SS ANOVA: Analysis of Variance Table Response: C.Mean DfSum Sq Mean Sq F valuePr(F) Mean.richness1 5638532656385326 23.5855 3.239e-05 *** Diversity 1 14476593 14476593 6.0554 0.019634 * Zoop1 13002135 13002135 5.4387 0.026365 * Diversity:Phyto 6 126089387 21014898 8.7904 1.257e-05 *** Diversity:Zoop 1 263036 263036 0.1100 0.742347 Diversity:Zoop:Phyto 6 6171014510285024 4.3021 0.002879 ** Residuals3174110911 2390675 I have tried using both the drop1() command and the Anova() command in the car package. When I use the Anova command I get the following error message: Anova(carbonmean,type=III) Error in linear.hypothesis.lm(mod, hyp.matrix, summary.model = sumry,: One or more terms aliased in model. I am not sure why this is aliased. There are no missing cells, and the cells are balanced (aside from for the covariate). Each Phyto by Zoop cross is replicated 3 times, and there are four Phyto levels within each level of Diversity. When I remove the nested factor (Phyto), I am able to get the Type III SS output. Then when I use drop1(carbonmean,.~.,Test=F) I get the following output: drop1(carbonmean,.~.,Test=F) Single term deletions Model: C.Mean ~ Mean.richness + Diversity + Zoop + Diversity/Phyto + Zoop * Diversity/Phyto DfSum of Sq RSS AIC none74110911 718 Mean.richness1 49790403123901314 741 Diversity 0 0 74110911718 Zoop0 0 74110911718 Diversity:Phyto 6 118553466 192664376 752 Diversity:Zoop 0 -1.49e-0874110911 718 Diversity:Zoop:Phyto 6 61710145135821055 735 There are zero degrees of freedom for Diversity, Zoop and their interaction, and zero sums of sq for Diversity and Zoop. This cannot be correct, however when I do the model simplification by dropping terms from the models manually and comparing them using anova(), I get virtually the same results. I would appreciate any suggestions for things to try or pointers as to what I may be doing incorrectly. Thank you. Anita Narwani. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control
Re: [R] storing output data from a loop that has varying row numbers
There's something very unlogic in your code. You have the whole time the same datafra On Tue, Jun 1, 2010 at 1:51 PM, RCulloch ross.cull...@dur.ac.uk wrote: Hi All, I am trying to run a loop that will have varying numbers of rows with each output. Previously I have had the same number of rows so I would use (and I appreciate that this will no doubt achieve some gasps as being thoroughly inefficient!): xdfrow-(0) xdfrow1-(1:32) xdfrow2-(33:64) xdfrow3-(65:96) xdfrow4-(97:128) xdfrow5-(129:160) xdfrow6-(161:192) xdfrow7-(193:224) and so on xdf - matrix(999, nrow=1024, ncol=7) xdf - as.data.frame(xdf) NAM - c(NAME,ID2,DAY,BEH, B_FALSE, B_TRUE,TOTAL) colnames(xdf)-NAM I then use this matrix and then run the loop and assign the data to each of the xdfrows just doing +1 on each loop. (If that makes sense? Not really important, just trying to show that I do try and solve some of my own problems, albeit perhaps not in the best manner!) _ However, the data I'm working with now has a very varied number of rows (0:2500) over a large data set and I can't work out how is best to do this. So my loop would be: for (i in 1:33){ SEL_DAY-seal_dist[seal_dist[,10]==i,] print(paste(DAY, i, of 33)) for (s in 1:11){ SEL_HR-SEL_DAY[SEL_DAY[,5]==s,] print(paste(HR, s, of 11)) indx - subset(SEL_HR, SEL_HR$DIST == 0) SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)]} } where i is day and s is the hr within the day, the loop works fine because it prints as i expect it too. I have not given any info on the data because I assume this is more of a method question and will be very straight forward to most people on here!? But I am happy to post data if it is needed. I assume I need to set up a matrix before the loop, e.g. DIST_LOOP-matrix(NA,1000,ncol=11) and then I should be able to put something before the first } that allows me to add to the matrix, but everything I have tried doesn't work e.g. DIST_LOOP[[i]]-SEL_HR Any help would be much appreciated, Best wishes, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238396.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
could you just give us the output of dput() for the data you copied in the mail? eg dput(seal_dist[,1:100]) and an example of how you want your output. I guess I get what you want to do, but it's not what your code is doing. And it will be difficult to put that in a matrix, as you have different labels and different numbers of TO-levels for different days and HR values. Cheers On Tue, Jun 1, 2010 at 3:32 PM, RCulloch ross.cull...@dur.ac.uk wrote: Hi Ivan, Thanks for your help, your initial suggestion did not work, but that is no doubt down to my lack of making sense! Here is a short example of my dataset. Basically the loop is set up to match the ID with the TO column based on DIST = 0. So A1 = 2, A1.1 =1, A2 = 4, A2.1 = 3. That is fine for HR 9, but for HR 10 the numbers no longer match those IDs so I need to loop the data and store each loop - if that makes sense. FROM TO DIST ID HR DD MM YY ANIMAL DAY 1 1 1 2.63981'A1' 9 30 9 7 1 1 2 1 2 0.0'A1' 9 30 9 7 1 1 3 1 3 6.95836'A1' 9 30 9 7 1 1 4 1 4 8.63809'A1' 9 30 9 7 1 1 5 1 1 0.0 'A1.1' 9 30 9 7 7 1 6 1 2 2.63981 'A1.1' 9 30 9 7 7 1 7 1 3 8.03071 'A1.1' 9 30 9 7 7 1 8 1 4 8.90896 'A1.1' 9 30 9 7 7 1 9 1 1 8.90896'A2' 9 30 9 7 1 1 101 2 8.63809'A2' 9 30 9 7 1 1 111 3 2.85602'A2' 9 30 9 7 1 1 121 4 0.0'A2' 9 30 9 7 1 1 131 1 8.03071 'A2.1' 9 30 9 7 7 1 141 2 6.95836 'A2.1' 9 30 9 7 7 1 151 3 0.0 'A2.1' 9 30 9 7 7 1 161 4 2.85602 A2.1' 9 30 9 7 7 1 171 1 3.53695'A1' 10 30 9 7 1 1 181 2 4.32457'A1' 10 30 9 7 1 1 191 3 0.0'A1' 10 30 9 7 1 1 201 4 8.85851'A1' 10 30 9 7 1 1 211 5 12.09194'A1' 10 30 9 7 1 1 221 1 7.44743 'A1.1' 10 30 9 7 7 1 231 2 0.0 'A1.1' 10 30 9 7 7 1 241 3 4.32457 'A1.1' 10 30 9 7 7 1 251 4 13.16728 'A1.1' 10 30 9 7 7 1 261 5 16.34761 'A1.1' 10 30 9 7 7 1 271 1 6.13176'A2' 10 30 9 7 1 1 281 2 13.16728'A2' 10 30 9 7 1 1 291 3 8.85851'A2' 10 30 9 7 1 1 301 4 0.0'A2' 10 30 9 7 1 1 311 5 3.40726'A2' 10 30 9 7 1 1 321 1 9.03345 'A2.1' 10 30 9 7 7 1 331 2 16.34761 'A2.1' 10 30 9 7 7 1 341 3 12.09194 'A2.1' 10 30 9 7 7 1 351 4 3.40726 'A2.1' 10 30 9 7 7 1 361 5 0.0 'A2.1' 10 30 9 7 7 1 371 1 0.0 'MALE1' 10 30 9 7 12 1 381 2 7.44743 'MALE1' 10 30 9 7 12 1 391 3 3.53695 'MALE1' 10 30 9 7 12 1 401 4 6.13176 'MALE1' 10 30 9 7 12 1 411 5 9.03345 'MALE1' 10 30 9 7 12 1 So the loop is: DIST_LOOP-matrix(NA,NA,ncol=11) for (i in 1:33){ SEL_DAY-seal_dist[seal_dist[,10]==i,] SEL_DAY[i]=dist[i] print(paste(DAY, i, of 33)) for (s in 1:11){ SEL_HR-SEL_DAY[SEL_DAY[,5]==s,] print(paste(HR, s, of 11)) indx - subset(SEL_HR, SEL_HR$DIST == 0) SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)] DIST_LOOP[i,]-SEL_HR } } But storing the data in the DIST_LOOP matrix doesn't work, I am just told in another post that a list might be better than a matrix? I hope this makes more sense!? Many thanks, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238483.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
Is this what you're looking for? seal_list - split(seal_dist,sel) out - lapply(seal_list,function(x){ indx - subset(x, x$DIST == 0) x$TO_ID - indx$ID[match(x$TO, indx$TO)] return(x) }) output - unsplit(out,sel) Cheers Joris On Tue, Jun 1, 2010 at 3:32 PM, RCulloch ross.cull...@dur.ac.uk wrote: Hi Ivan, Thanks for your help, your initial suggestion did not work, but that is no doubt down to my lack of making sense! Here is a short example of my dataset. Basically the loop is set up to match the ID with the TO column based on DIST = 0. So A1 = 2, A1.1 =1, A2 = 4, A2.1 = 3. That is fine for HR 9, but for HR 10 the numbers no longer match those IDs so I need to loop the data and store each loop - if that makes sense. FROM TO DIST ID HR DD MM YY ANIMAL DAY 1 1 1 2.63981'A1' 9 30 9 7 1 1 2 1 2 0.0'A1' 9 30 9 7 1 1 3 1 3 6.95836'A1' 9 30 9 7 1 1 4 1 4 8.63809'A1' 9 30 9 7 1 1 5 1 1 0.0 'A1.1' 9 30 9 7 7 1 6 1 2 2.63981 'A1.1' 9 30 9 7 7 1 7 1 3 8.03071 'A1.1' 9 30 9 7 7 1 8 1 4 8.90896 'A1.1' 9 30 9 7 7 1 9 1 1 8.90896'A2' 9 30 9 7 1 1 101 2 8.63809'A2' 9 30 9 7 1 1 111 3 2.85602'A2' 9 30 9 7 1 1 121 4 0.0'A2' 9 30 9 7 1 1 131 1 8.03071 'A2.1' 9 30 9 7 7 1 141 2 6.95836 'A2.1' 9 30 9 7 7 1 151 3 0.0 'A2.1' 9 30 9 7 7 1 161 4 2.85602 A2.1' 9 30 9 7 7 1 171 1 3.53695'A1' 10 30 9 7 1 1 181 2 4.32457'A1' 10 30 9 7 1 1 191 3 0.0'A1' 10 30 9 7 1 1 201 4 8.85851'A1' 10 30 9 7 1 1 211 5 12.09194'A1' 10 30 9 7 1 1 221 1 7.44743 'A1.1' 10 30 9 7 7 1 231 2 0.0 'A1.1' 10 30 9 7 7 1 241 3 4.32457 'A1.1' 10 30 9 7 7 1 251 4 13.16728 'A1.1' 10 30 9 7 7 1 261 5 16.34761 'A1.1' 10 30 9 7 7 1 271 1 6.13176'A2' 10 30 9 7 1 1 281 2 13.16728'A2' 10 30 9 7 1 1 291 3 8.85851'A2' 10 30 9 7 1 1 301 4 0.0'A2' 10 30 9 7 1 1 311 5 3.40726'A2' 10 30 9 7 1 1 321 1 9.03345 'A2.1' 10 30 9 7 7 1 331 2 16.34761 'A2.1' 10 30 9 7 7 1 341 3 12.09194 'A2.1' 10 30 9 7 7 1 351 4 3.40726 'A2.1' 10 30 9 7 7 1 361 5 0.0 'A2.1' 10 30 9 7 7 1 371 1 0.0 'MALE1' 10 30 9 7 12 1 381 2 7.44743 'MALE1' 10 30 9 7 12 1 391 3 3.53695 'MALE1' 10 30 9 7 12 1 401 4 6.13176 'MALE1' 10 30 9 7 12 1 411 5 9.03345 'MALE1' 10 30 9 7 12 1 So the loop is: DIST_LOOP-matrix(NA,NA,ncol=11) for (i in 1:33){ SEL_DAY-seal_dist[seal_dist[,10]==i,] SEL_DAY[i]=dist[i] print(paste(DAY, i, of 33)) for (s in 1:11){ SEL_HR-SEL_DAY[SEL_DAY[,5]==s,] print(paste(HR, s, of 11)) indx - subset(SEL_HR, SEL_HR$DIST == 0) SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)] DIST_LOOP[i,]-SEL_HR } } But storing the data in the DIST_LOOP matrix doesn't work, I am just told in another post that a list might be better than a matrix? I hope this makes more sense!? Many thanks, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238483.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
Sorry, forgot to add the sel. This is the first line, then just run the rest. sel - as.factor(paste(seal_dist[,10],-,seal_dist[,5],sep=)) cheers Joris On Tue, Jun 1, 2010 at 4:56 PM, Joris Meys jorism...@gmail.com wrote: Is this what you're looking for? seal_list - split(seal_dist,sel) out - lapply(seal_list,function(x){ indx - subset(x, x$DIST == 0) x$TO_ID - indx$ID[match(x$TO, indx$TO)] return(x) }) output - unsplit(out,sel) Cheers Joris On Tue, Jun 1, 2010 at 3:32 PM, RCulloch ross.cull...@dur.ac.uk wrote: Hi Ivan, Thanks for your help, your initial suggestion did not work, but that is no doubt down to my lack of making sense! Here is a short example of my dataset. Basically the loop is set up to match the ID with the TO column based on DIST = 0. So A1 = 2, A1.1 =1, A2 = 4, A2.1 = 3. That is fine for HR 9, but for HR 10 the numbers no longer match those IDs so I need to loop the data and store each loop - if that makes sense. FROM TO DIST ID HR DD MM YY ANIMAL DAY 1 1 1 2.63981'A1' 9 30 9 7 1 1 2 1 2 0.0'A1' 9 30 9 7 1 1 3 1 3 6.95836'A1' 9 30 9 7 1 1 4 1 4 8.63809'A1' 9 30 9 7 1 1 5 1 1 0.0 'A1.1' 9 30 9 7 7 1 6 1 2 2.63981 'A1.1' 9 30 9 7 7 1 7 1 3 8.03071 'A1.1' 9 30 9 7 7 1 8 1 4 8.90896 'A1.1' 9 30 9 7 7 1 9 1 1 8.90896'A2' 9 30 9 7 1 1 101 2 8.63809'A2' 9 30 9 7 1 1 111 3 2.85602'A2' 9 30 9 7 1 1 121 4 0.0'A2' 9 30 9 7 1 1 131 1 8.03071 'A2.1' 9 30 9 7 7 1 141 2 6.95836 'A2.1' 9 30 9 7 7 1 151 3 0.0 'A2.1' 9 30 9 7 7 1 161 4 2.85602 A2.1' 9 30 9 7 7 1 171 1 3.53695'A1' 10 30 9 7 1 1 181 2 4.32457'A1' 10 30 9 7 1 1 191 3 0.0'A1' 10 30 9 7 1 1 201 4 8.85851'A1' 10 30 9 7 1 1 211 5 12.09194'A1' 10 30 9 7 1 1 221 1 7.44743 'A1.1' 10 30 9 7 7 1 231 2 0.0 'A1.1' 10 30 9 7 7 1 241 3 4.32457 'A1.1' 10 30 9 7 7 1 251 4 13.16728 'A1.1' 10 30 9 7 7 1 261 5 16.34761 'A1.1' 10 30 9 7 7 1 271 1 6.13176'A2' 10 30 9 7 1 1 281 2 13.16728'A2' 10 30 9 7 1 1 291 3 8.85851'A2' 10 30 9 7 1 1 301 4 0.0'A2' 10 30 9 7 1 1 311 5 3.40726'A2' 10 30 9 7 1 1 321 1 9.03345 'A2.1' 10 30 9 7 7 1 331 2 16.34761 'A2.1' 10 30 9 7 7 1 341 3 12.09194 'A2.1' 10 30 9 7 7 1 351 4 3.40726 'A2.1' 10 30 9 7 7 1 361 5 0.0 'A2.1' 10 30 9 7 7 1 371 1 0.0 'MALE1' 10 30 9 7 12 1 381 2 7.44743 'MALE1' 10 30 9 7 12 1 391 3 3.53695 'MALE1' 10 30 9 7 12 1 401 4 6.13176 'MALE1' 10 30 9 7 12 1 411 5 9.03345 'MALE1' 10 30 9 7 12 1 So the loop is: DIST_LOOP-matrix(NA,NA,ncol=11) for (i in 1:33){ SEL_DAY-seal_dist[seal_dist[,10]==i,] SEL_DAY[i]=dist[i] print(paste(DAY, i, of 33)) for (s in 1:11){ SEL_HR-SEL_DAY[SEL_DAY[,5]==s,] print(paste(HR, s, of 11)) indx - subset(SEL_HR, SEL_HR$DIST == 0) SEL_HR$TO_ID - indx$ID[match(SEL_HR$TO, indx$TO)] DIST_LOOP[i,]-SEL_HR } } But storing the data in the DIST_LOOP matrix doesn't work, I am just told in another post that a list might be better than a matrix? I hope this makes more sense!? Many thanks, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238483.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
Re: [R] Help on aggregate method
Take a look at ?split (and unsplit) eg: Dur - rnorm(100) Attr1=rep(c(A,B),each=50) Attr2=rep(c(A,B),times=50) ap.dat -data.frame(Attr1,Attr2,Dur) split.fact - paste(ap.dat$Attr1,ap.dat$Attr2) ap.list -split(ap.dat,split.fact) ap.mean -lapply(ap.list,function(x){ x$meanDur=rep(mean(x$Dur),dim(x)[1]) return(x) }) ap.dat.fast - unsplit(ap.mean,split.fact) system.time on 1000 replicates gives : system.time(replicate(1000,{ + split.fact - paste(ap.dat$Attr1,ap.dat$Attr2) + ap.list -split(ap.dat,split.fact) + ap.mean -lapply(ap.list,functi [TRUNCATED] user system elapsed 4.880.004.88 source(.trPaths[5], echo=TRUE, max.deparse.length=150) system.time(replicate(1000,{ + avgDur - aggregate(ap.dat[[Dur]], by = list(ap.dat[[Attr1]], + ap.dat[[Attr2]]), FUN=mean) + meanDur - sapp [TRUNCATED] user system elapsed 58.000.11 58.13 It should be a tenfold faster. Cheers Joris On Tue, Jun 1, 2010 at 4:48 PM, Stella Pachidi stella.pach...@gmail.comwrote: Dear R experts, I would really appreciate if you had an idea on how to use more efficiently the aggregate method: More specifically, I would like to calculate the mean of certain values on a data frame, grouped by various attributes, and then create a new column in the data frame that will have the corresponding mean for every row. I attach part of my code: matchMean - function(ind,dataTable,aggrTable) { index - which((aggrTable[,1]==dataTable[[Attr1]][ind]) (aggrTable[,2]==dataTable[[Attr2]][ind])) as.numeric(aggrTable[index,3]) } avgDur - aggregate(ap.dat[[Dur]], by = list(ap.dat[[Attr1]], ap.dat[[Attr2]]), FUN=mean) meanDur - sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur) ap.dat - cbind (ap.dat, meanDur) As I deal with very large dataset, it takes long time to run my matching function, so if you had an idea on how to automate more this matching process I would be really grateful. Thank you very much in advance! Kind regards, Stella -- Stella Pachidi Master in Business Informatics student Utrecht University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
OK, then I was right. It's exactly what my code does. Enjoy. Cheers On Tue, Jun 1, 2010 at 4:25 PM, RCulloch ross.cull...@dur.ac.uk wrote: Hi Joris, Thanks for your help! The data as requested: structure(list(FROM = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), TO = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), DIST = c(2.63981, 0, 6.95836, 8.63809, 0, 2.63981, 8.03071, 8.90896, 8.90896, 8.63809, 2.85602, 0, 8.03071, 6.95836, 0, 2.85602, 3.53695, 4.32457, 0, 8.85851, 12.09194, 7.44743, 0, 4.32457, 13.16728, 16.34761, 6.13176, 13.16728, 8.85851, 0, 3.40726, 9.03345, 16.34761, 12.09194, 3.40726, 0, 0, 7.44743, 3.53695, 6.13176, 9.03345), ID = structure(c(12L, 12L, 12L, 12L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 143L, 12L, 12L, 12L, 12L, 12L, 11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 13L, 13L, 94L, 94L, 94L, 94L, 94L), .Label = c('11.1', '15.1', '15.5', '18.1', '24.2', '26.1', '26.2', '28.3', '4.2', '7.1', 'A1.1', 'A1', 'A2.1', 'A2', 'B1', 'C1', 'D1.1', 'D1', 'D2.1', 'D2', 'D3.1', 'D3', 'D4.1', 'D4', 'D5.1', 'D5', 'D6.1', 'D6', 'E1.1', 'E1', 'E2.1', 'E2', 'E4', 'E5', 'F1.1', 'F1', 'F10.1', 'F10', 'F11', 'F2', 'F3', 'F4.1', 'F4', 'F5.1', 'F5', 'F7', 'F8.1', 'F8', 'G2.1', 'G2', 'G3.1', 'G3', 'G4.1', 'G4', 'G5.1', 'G5', 'H1.1', 'H1', 'H2', 'H3.1', 'H3', 'H8', 'I1.1', 'I1', 'I2', 'I4.1', 'I4', 'J1.1', 'J1', 'J2.1', 'J2', 'J3', 'J6', 'J7', 'JUV', 'K1.1', 'K1', 'K2', 'K3', 'K4.1', 'K4', 'L1.1', 'L1', 'L2.1', 'L2', 'L4', 'M1', 'M2.1', 'M2', 'M3.1', 'M3', 'M4.1', 'M4', 'MALE1', 'N1.1', 'N1', 'N2', 'N3', 'N4.1', 'N4', 'O1', 'O2', 'O3.1', 'O3', 'O4.1', 'O4', 'O5', 'P1.1', 'P1', 'Q1', 'Q2', 'Q3', 'R1.1', 'R1', 'R2', 'R3.1', 'R3', 'R4.1', 'R4', 'R5.1', 'R5', 'S1.1', 'S1', 'S2.1', 'S2', 'S3.1', 'S3', 'S4.1', 'S4', 'T1', 'U1.1', 'U1', 'U2', 'U3', 'UKFEM', 'UKMAL', 'UKPUP', 'V1.1', 'V1', 'W1.1', 'W1', 'WR', A2.1'), class = factor), HR = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), DD = c(30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), MM = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), YY = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), ANIMAL = c(1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 12L, 12L, 12L, 12L, 12L), DAY = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c(FROM, TO, DIST, ID, HR, DD, MM, YY, ANIMAL, DAY), row.names = c(NA, 41L ), class = data.frame) The output should be as the original file is, but it should have an additional column for 'TO_ID' I hope that makes sense? Cheers, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238576.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help spam detection; please help the moderators
Hi all, I also couldn't help but notice that some of my messages are bounced for following reason: The message headers matched a filter rule I included the header of one of the messages below, but neither of these messages is sent trough Nabble, nor does any mail address has digits in it. I also never had that before. Did you change some of the rules somehow? Cheers Joris --- MIME-Version: 1.0 Received: by 10.140.173.9 with HTTP; Fri, 28 May 2010 05:32:32 -0700 (PDT) In-Reply-To: aanlktim9etuy2efynloh2lyn7m133ytjencdjpkgp...@mail.gmail.com References: aanlktikgc7v2zbsyrwcwbueezm8d24qj0vqeb2z1n...@mail.gmail.com aanlktim9etuy2efynloh2lyn7m133ytjencdjpkgp...@mail.gmail.com Date: Fri, 28 May 2010 14:32:32 +0200 Delivered-To: jorism...@gmail.com Message-ID: aanlktimg4idyivhe1ek9mk6_rybjcnuu4msvwrvts...@mail.gmail.com Subject: Re: [R] How to get values out of a string using regular expressions? From: Joris Meys jorism...@gmail.com To: Gabor Grothendieck ggrothendi...@gmail.com Cc: R mailing list r-help@r-project.org Content-Type: multipart/alternative; boundary=000e0cd2295481515c0487a6b3be --000e0cd2295481515c0487a6b3be Content-Type: text/plain; charset=ISO-8859-1 On Tue, Jun 1, 2010 at 3:25 PM, Martin Maechler maech...@stat.math.ethz.chwrote: Dear readers of R-help as most of you will *not* be aware, R-help has continued to work the way it does, only thanks to a dozen of volunteers, see https://stat.ethz.ch/mailman/listinfo/r-help . The volunteers manually moderate e-mails that look like spam (and sometimes are and sometimes are not). While much more than 90% of the spam is filtered out long before a human sees it, with the increasing sophistication of spammers, manual intervention has deemed to be necessary and served the community very well. OTOH, in recent weeks, the amount of work for the volunteers has increased, mainly because an increasingly number of non-spam postings are erronously tagged as possibly spam. We have discussed about this and done some analysis and found that most of these message that produce a considerable amount of extra work share two properties : 1) they are posted via Nabble {which *always* attaches a small pro-Nabble spam at the end of the message} 2) the e-mail address of the sender is from a freemail provider, quite often 'at gmail dot com', and often the part *before* the '@' (at-sign) ends with digits. We hereby ask those among you who use a freemail account to please no longer post via nabble. Thank you for your support of R-help, *the* community mailing list of the R project since even before that project existed formally, namely since 1997-04-01, today 13 years and two months. Martin Maechler, ETH Zurich (and R-help creator and principal manager) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.date
Change this line to : pose$CREATED.DATE=as.Date(pose$CREATED.DATE,%d/%m/%Y) # mind the capital Y pose DESCRIPTION CREATED.DATE QUANITY CLOSING.PRICE 1 COTTON NO.2 Jul/10 2010-05-13 1 81.2000 2 COTTON NO.2 Jul/10 2010-05-13 1 81.2000 3 PALLADIUM Jun/10 2010-05-14 -1 503.6000 4 PALLADIUM Jun/10 2010-05-14 -1 503.6000 5 SUGAR NO.11 Jul/10 2010-05-10 1 13.8900 6 SUGAR NO.11 Jul/10 2010-05-10 1 13.8900 Cheers Joris On Tue, Jun 1, 2010 at 5:57 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Dear group, Here is my df (obtained with a read.csv2()): df - structure(list(DESCRIPTION = c(COTTON NO.2 Jul/10, COTTON NO.2 Jul/10, PALLADIUM Jun/10, PALLADIUM Jun/10, SUGAR NO.11 Jul/10, SUGAR NO.11 Jul/10), CREATED.DATE = c(13/05/2010, 13/05/2010, 14/05/2010, 14/05/2010, 10/05/2010, 10/05/2010), QUANITY = c(1, 1, -1, -1, 1, 1), CLOSING.PRICE = c(81.2000, 81.2000, 503.6000, 503.6000, 13.8900, 13.8900)), .Names = c(DESCRIPTION, CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = c(NA, 6L), class = data.frame) str(df) 'data.frame': 6 obs. of 4 variables: $ DESCRIPTION : chr COTTON NO.2 Jul/10 COTTON NO.2 Jul/10 PALLADIUM Jun/10 PALLADIUM Jun/10 ... $ CREATED.DATE : chr 13/05/2010 13/05/2010 14/05/2010 14/05/2010 ... $ QUANITY : num 1 1 -1 -1 1 1 $ CLOSING.PRICE: chr 81.2000 81.2000 503.6000 503.6000 ... I want to change the class of df$CREATED.DATE from Chr to Date: pose$CREATED.DATE=as.Date(pose$CREATED.DATE,%d/%m/%y) Here is what I get : df - structure(list(DESCRIPTION = c(COTTON NO.2 Jul/10, COTTON NO.2 Jul/10, PALLADIUM Jun/10, PALLADIUM Jun/10, SUGAR NO.11 Jul/10, SUGAR NO.11 Jul/10), CREATED.DATE = structure(c(18395, 18395, 18396, 18396, 18392, 18392), class = Date), QUANITY = c(1, 1, -1, -1, 1, 1), CLOSING.PRICE = c(81.2000, 81.2000, 503.6000, 503.6000, 13.8900, 13.8900)), .Names = c(DESCRIPTION, CREATED.DATE, QUANITY, CLOSING.PRICE), row.names = c(NA, 6L), class = data.frame) Where does the problem comes from?? Maybe from my sytem date ?? TY for any help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with assigning text to matrix
Hi Jessica, this tells me that your text is saved as a factor. Try : names - read.csv(file=Names.csv,stringsAsFactors=F) Cheers Joris On Tue, Jun 1, 2010 at 11:04 AM, Jessica Queree j.j.que...@googlemail.comwrote: My issue relates to adding text to a matrix and finding that the text is converted to a number. This is the section of code I'm having trouble with: # First, I load in a list of names from a .csv file to 'names' names - read.csv(file(Names.csv)) # Then I define a matrix which will be populated with various test statistics, with several rows for each entry in names testOutput -matrix(nrow = 200, ncol = 5) for (i in 1:nrow(names)){ testOutput[i,1] - names[i,1] testOutput[i,2] - names[i,2] # test statistics code here } If I look at names[,1], I get the following: names[,1] [1] EQ_Level_UK EQ_Level_EUR EQ_Level_US EQ_Level_Far East [5] IR_PC 1_UKIR_PC 2_UKIR_PC 3_UKSwap_PC 1_UK [9] Swap_PC 2_UK Swap_PC 3_UK FX_Level_EUR FX_Level_US [13] FX_Level_Far East Infl_PC 1_UK Infl_PC 2_UK Infl_PC 3_UK [17] Prop_Level_UK CreditAAA_PC 1_UK CreditAAA_PC 2_UK CreditAAA_PC 3_UK [21] CreditAA_PC 1_UK CreditAA_PC 2_UK CreditAA_PC 3_UK CreditA_PC 1_UK [25] CreditA_PC 2_UK CreditA_PC 3_UK CreditBBB_PC 1_UK CreditBBB_PC 2_UK [29] CreditBBB_PC 3_UK 29 Levels: CreditA_PC 1_UK CreditA_PC 2_UK CreditA_PC 3_UK ... Swap_PC 3_UK But if I look at testOutput[,1], I get: testOutput[,1] [1] 15 13 16 14 23 24 25 27 28 29 17 19 18 20 21 [16] 22 26 7 8 9 4 5 6 1 2 3 10 11 12 17 [31] NA NA 19 18 NA NA NA 20 NA NA 21 NA NA 22 NA [46] NA 26 NA NA 7 NA NA 8 NA NA 9 NA NA 4 NA [61] NA 5 NA NA 6 NA NA 1 NA NA 2 NA NA 3 NA [76] NA 10 NA NA 11 NA NA 12 NA NA NA NA NA NA NA [91] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [106] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [121] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [136] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [151] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [166] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [181] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [196] NA NA NA NA NA That is, the names are now converted to numbers. I think this might have something to do with the way I've defined the testOutput matrix, but haven't been able to find any information about how to fix it. Can anyone help? Many thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any doc to understand arima state space model?
Type in Google Arima R Read the first hit, the third, the fifth, and any other that says tutorial Cheers Joris On Tue, Jun 1, 2010 at 4:14 PM, shakira M m.shak...@gmail.com wrote: I am trying to understand R arima function. Any pointers would be appreciated. Thank you, Shakira. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BreastCancer Dataset for Classification in kknn
Hi Nitin, It can be solved by splitting your data a bit different. You need more training data than you have evaluation data, eg : i1 = 1:400 i2=401:d Then it works on my computer. No clue as to where the error originates from though. Cheers Joris On Tue, Jun 1, 2010 at 4:27 PM, Nitin niti...@gmail.com wrote: Dear All, I'm getting a error while trying to apply the BreastCancer dataset (package=mlbench) to kknn (package=kknn) that I don't understand as I'm new to R. The codes are as follow: rm = (list = ls()) library(mlbench) data(BreastCancer) library(kknn) BCancer = na.omit(BreastCancer) d = dim(BCancer)[1] i1 = seq(1, d, 2) i2 = seq(2, d, 2) t1 = BCancer[i1, ] t2 = BCancer[i2, ] y2 = BCancer[i2, 11] x = 10 k = array(1:x, dim = c(x,1)) ker = array(c( rectangular, triangular, epanechnikov, biweight, triweight, cos, inv, gaussian), dim = c(8,1)) f = function(x, ker){ BreastCancer.kknn - kknn(Class~., train = t1, test = t2, k = x, kernel = ker, distance = 1) fit = fitted(BreastCancer.kknn) z - (fit==y2) z.e - (100 - (length(y2)-length(z[!z]))/length(y2)*100 ) } err.k = function(ker){ error.BreastCancer = apply(k,1,function(y) f(y, ker)) } err.ker = apply(ker, 1, err.k) colnames(err.ker) = c(rectangular, triangular, epanechnikov, biweight, triweight, cos, inv, gaussian) print(err.ker) It throws a error: Error in as.matrix(learn[, ind == i]) : (subscript) logical subscript too long In addition: Warning messages: 1: In model.matrix.default(mt, mf) : variable 'Id' converted to a factor 2: In model.matrix.default(mt, test) : variable 'Id' converted to a factor I tried the codes with other datasets in mlbench package and most of them working. That is the mistake here for this particular dataset and how can I solve it? Thanks Nitin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regexpr help (match.length=0)
Dear all, It sounds as if regexp works according to the same rules as Perl, very nicely explained in: http://blob.perl.org/books/beginning-perl/3145_Chap05.pdf Yet, I couldn't help but wonder if there are also differences in behaviour. I couldn't find any yet, but there must be some. Anybody care to elaborate on this? Cheers Joris On Wed, Jun 2, 2010 at 1:05 AM, Matt Shotwell shotw...@musc.edu wrote: On Tue, 2010-06-01 at 16:43 -0400, Erik Iverson wrote: McGehee, Robert wrote: R-help, Sorry if this is more of a regex question than an R question. However, help would be appreciated on my use of the regexpr function. In the first example below, I ask for all characters (a-z) in 'abc123'; regexpr returns a 3-character match beginning at the first character. regexpr([[:alpha:]]*, abc123) [1] 1 attr(,match.length) [1] 3 However, when the text is flipped regexpr, and I ask for a match of all characters in '123abc', regexpr returns a zero-character match beginning at the first character. Can someone explain what a zero length match means (i.e. why not return -1), and why the result isn't 4, match.length=3? It means it matches 0 characters, which is fine since you use *, which means match 0 or more occurrences of the regex. It sounds like you want + instead of *. Also see gregexpr. Also, regular expressions try to match as early as possible. That's why the match is at position one of length zero, and not at position four of length three. Matt Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina regexpr([[:alpha:]]*, 123abc) [1] 1 attr(,match.length) [1] 0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to label the som notes by the majority vote
Dear Changbin, Please provide a self-contained, minimal example, meaning the whole code should run and create the plot as it is now, without having to load your dataset (which we don't have). Otherwise it's impossible to see what's going on and help you. Cheers Joris On Wed, Jun 2, 2010 at 2:21 AM, Changbin Du changb...@gmail.com wrote: HI, Dear R community, I am using the following codes to do the som. I tried to label the notes by the majority vote. either through mapping or prediction. I attached my output, the left one dont have any labels in the note, the right one has more than one label in each note. I need to have only one label for each note either by majority vote or prediction. Can anyone give some suggestions or advice? Thanks so much! alex-read.table(/home/cdu/operon/alex2.txt, , sep=\t, skip=0, header=T, fill=T) alex1-alex[,c(1:257)] levels(alex1$Label) alex1$outcome-as.numeric(alex1$Label) alex1$outcome[1:20] #self-organizing maps(unsupervised learning) library(kohonen) #SOM, the supervised learning, train the map using outcome as the class variable. set.seed(13) final.xyf- xyf(data=as.matrix(alex1[,c(1:256)]), Y=classvec2classmat(alex1$outcome), xweight = 0.99, grid=somgrid(20, 30, hexagonal)) outcome.xyf - predict(final.xyf)$unit.prediction#get prediction outcome.predict- as.numeric(classmat2classvec(outcome.xyf)) #change matrix to vectors. outcome.label-LETTERS[outcome.predict] #conver the numeric value to letters. plot(final.xyf, type=property, property=outcome.predict, labels=outcome.label, palette.name =rainbow, main=Prediction ) cl - colors() bgcols - cl[2:14] plot(final.xyf, type=mapping, labels=outcome.label, col=black, bgcol=bgcols[as.integer(outcome.predict)], main=Mapping plot) -- Sincerely, Changbin -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
, 0.052999774, 0.513440813, 0.402895033, 0.201576687, 0.076826481), V7 = c(0.642136394, 0.099776129, 0.148801865, 0.603051825, 0.440594157, 0.215038249, 0.531623479, 0.534920743, 0.45784502, 0.080887221), V8 = c(0.016004048, 0.519115043, 0.149317949, 0.088362708, 0.705002368, 0.185590863, 0.434963787, 0.847410734, 0.78777694, 0.443995646, 0.53903599), V9 = c(0.400620271, 0.918472003, 0.446820588, 0.310981412, 0.734013866, 0.172112916 ), V10 = c(0.532136091, 0.350028839, 0.40424688, 0.607395545, 0.392450857, 0.306530929, 0.756277707, 0.63606622, 0.718866192, 0.258778101)), .Names = c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10), class = data.frame, row.names = c(NA, -671L)) Thank you once more for your help. I really can not say it enough. ps. original files i work with are attached. Cobbler. http://r.789695.n4.nabble.com/file/n2236083/3dMaskDump.txt 3dMaskDump.txt http://r.789695.n4.nabble.com/file/n2236083/vowel_features.txt vowel_features.txt -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2236083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] about heatmap
Hi, Take a look at the heatmap.2 function in the library gplots, and the brewer.pal in the library RColorBrewer. With this combination you have a far bigger flexibility on the colors and the output, plus you get a colorcoded legend. There used to be a bug in that function distorting the legend when breaks with unequal intervals were used, but I've adapted the function myself to work also in that case. If you need it, feel free to contact me. Cheers Joris On Mon, May 31, 2010 at 9:54 AM, å欣 lm_meng...@163.com wrote: Hi all: As to the heatmap function, the default style is red and yellow,and red refers to low level and yellow refers to high level. How can I change the style to the contrary: red refers to high level and yellow refers to low level? Thanks a lot! My best [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What does LOESS stand for?
This is the paper on which the loess algorithm is based in general: http://www.econ.pdx.edu/faculty/KPL/readings/cleveland88.pdf The explanation about the origin of the term LOESS is given on page 597. Cheers Joris On Mon, May 31, 2010 at 11:33 AM, Peter Neuhaus pneuh...@pneuhaus.dewrote: Dear R-community, maybe someone can help me with this: I've been using the loess() smoother for quite a while now, and for the matter of documentation I'd like to resolve the acronym LOESS. Unfortunately there's no explanation in the help file, and I didn't get anything convincing from google either. I know that the predecessor LOWESS stands for Locally Weighted Scatterplot Smoothing. But what does LOESS stand for, specifically? Locally Weighted Exponential Scatterplot Smoothing? As far as I understand LOESS is still a local polynomial regression, so that would probably make no sense. Any help appreciated! Thanks in advance, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] missing values in autocorelation
Could you specify the problem and give a minimal example that represents your datastructure and reproduces the error? See also the posting guides : http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html Cheers Joris On Mon, May 31, 2010 at 1:12 PM, nuncio m nunci...@gmail.com wrote: Hi all, I am trying to find the autocorrelation of some time series. I have say 100 files, some files have only missing values(-99.99, say). I dont want to exclude these files as they represent some points in a grid. But when the acf command is issued i get an error. Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf Is this because of all the values in the time series is the same, if so How can I specify a bad value when the acf command is issued. Also is it possible to return a flag(like, -999) of length the maximum lag for acf of bad grid points so that I can keep the number of files same for input and output Thanks nuncio -- Nuncio.M Research Scientist National Center for Antarctic and Ocean research Head land Sada Vasco da Gamma Goa-403804 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to delete the previously saved workspace restored
If you start R, type : unlink(.RData) This deletes the workspace file. Cheers Joris On Mon, May 31, 2010 at 11:10 AM, Yanwei Tan t...@nbio.uni-heidelberg.dewrote: Dear all, I am a new user of R, here I have a question about remove the previous restored workspace. I saved the workspace last time, but R always automatically load the workspace when I open it. I try to remove the object and then close R without saving. But next time when I open R, it always load the previous workspace. I want to delete the .RData in the directory, but I have no clue where is the .RData directory. The message is Workspace restored from /Users/wei/.RData How could I avoid from this directory? because there is a dot before, I do not know where I can find this file. Also I already try this command : rm(list=ls()) But R still load the previous workspace. With many thanks for any advice!! Best, Wei __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with apply
Ivan is -partly- right. However, in the details it says as well that : If X is not an array but has a dimension attribute, apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., data frames) or via as.array. The main problem is the fact that what goes into the PromP function is not a dataframe, not even a matrix, but a vector. You can easily see where it goes wrong if you place print(str(HistRio)) as a first line in your function. You'll also see that (hopefully) it's a named vector, meaning you could try to rewrite your function like : if(length(which(AnaQuim$SecSte==HistRio[SecSte]))0){ xx[1]-1 } etc... I didn't test it out though, but it should work. Cheers Joris On Mon, May 31, 2010 at 5:16 PM, Luis Felipe Parra felipe.pa...@quantil.com.co wrote: Hello I am tryin to use the apply functions with two data frames I've got and I am getting the following error message Error en HistRio$SecSte : $ operator is invalid for atomic vectors I don't understand why. when I use the apply I am doing: PromP - function(HistRio,AnaQuim){ xx - c(0,0,0) if(length(which(AnaQuim$SecSte==HistRio$SecSte))0){ xx[1]-1 } if(length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte)))0){ xx[2] - 1} if( length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra)))0){ xx[3]-1 } if( length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra)))0 length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte)))0 ){ xx[4] - 2} return(xx) } zz- apply(HistRio,1,PromP,AnaQuim) and if I do exactly the same with a for xx - matrix(0,nrow(HistRio),4) for(i in 1:nrow(HistRio)){ if(length(which(AnaQuim$SecSte==HistRio$SecSte[i]))0){ xx[1]-1 } if(length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte[i])))0){ xx[2] - 1} if( length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra[i])))0){ xx[3]-1 } if( length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FechaSiembra[i])))0 length(which(as.Date(AnaQuim$AÑO1)=as.Date(HistRio$FinCorte[i])))0 ){ xx[4] - 2} } I get no error message. Attached is the data I am using. Any idea of why this is happening? Thank you Felipe Parra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
It's not your questions, Cobbler, but could you PLEASE just do what we asked for? Copy-paste the following in R and copy-paste ALL output you get in your next mail. test.vowel - vowel_features[,1:10] test.mask - mask_features[,1:10] dput(test.vowel) dput(test.mask) I don't know whether your vowel_features is a list or a data-frame (which is technically also a list). But I know for sure that vowel_features[15] is NOT giving you a column. Probably it has to be vowel_features[,15]. So start with that one, and I'll take a look at the rest to get your lda running. Cheers Joris On Sat, May 29, 2010 at 6:53 PM, cobbler_squad la.f...@gmail.com wrote: Thanks for being patient with me. I guess my problem is with understand how grouping in this particular case is used: one of the sample codes I found online (http://www.statmethods.net/advstats/discriminant.html) library(MASS) fit - lda(G ~ x1 + x2 + x3, data=mydata, na.action=na.omit, CV=TRUE) the mydata file in my case is the 3dmaskdump file with 52 columns and 671 rows (all values range between 0 and 1 after they're scaled) the other file, what I assumed was the grouping file (or the vowel_feature) is the file that defines features for the vowels (i.e. column 1 of the file is vowel name (a, i, u) and every other column in a distinct combination of 0's and 1's defining the vowel (so this file has 26 columns and 254 rows). Therefore, every column that follows represents a particular feature of that vowel.. (hope this makes sense!!) So, the reason I wanted to return G - vowel_feature[15] in my previous post is because I need to extract a column that represents backness of the vowel (while other columns represent roundedness, nasalization features, etc). So what (in my mind) G - vowel_feature[15] would return is 1 column which is 254 rows long with 0's and 1's in it. i.e. 1 0 2 1 3 1 4 0 ... .. . 2541 I am a novice with R (so I know my questions are pretty dumb!), but I really hope I clarified my confusion a bit better. I very much appreciate your help. Looking forward to your replies. Thank you again, Cobbler -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2235777.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] difference in sort order linux/Windows (R.2.11.0)
Pretty obvious: You use different locales (collate). What happens if you use the same on both machines? Cheers Joris On Fri, May 28, 2010 at 10:17 AM, carslaw david.cars...@kcl.ac.uk wrote: Dear R users, I'm a bit perplexed with the effect sort has here, as it is different on ... the linux order is perhaps more intuitive. However, the problem is the order is inconsistent between the two systems. Any suggestions? sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=en_GB.utf8 [9] LC_ADDRESS=en_GB.utf8LC_TELEPHONE=en_GB.utf8 [11] LC_MEASUREMENT=en_GB.utf8LC_IDENTIFICATION=en_GB.utf8 ... sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 ... Dr David Carslaw King's College London Environmental Research Group Franklin Wilkins Building 150 Stamford Street London SE1 9NH -- View this message in context: http://r.789695.n4.nabble.com/difference-in-sort-order-linux-Windows-R-2-11-0-tp2234251p2234251.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] clustering in R
As Tal said. Next to that, I read that column1 (and column2?) are supposed to be seen as factors, not as numerical variables. Did you take that into account somehow? It's easy to reproduce the error code : n - NULL if(n2)print(This is OK) Error in if (n 2) print(This is OK) : argument is of length zero In the hclust code, you find following line : n - as.integer(attr(d, Size)) where d is the distance object entered in the hclust function. Looking at the error you get, this means that the size attribute of your distance is NULL. Which tells me that distA is not a dist-object. A - matrix(1:4,ncol=2) A [,1] [,2] [1,]13 [2,]24 hclust(A,method=single) Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero Did you actually put in a distance object? see also ?dist or ?as.dist. Cheers Joris On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan ayesha.diamond...@gmail.comwrote: i have a matrix with the following dimensions 136 3 and it looks something like [,1] [,2] [,3] [1,] 402 675 1.802758 [2,] 402 696 1.938902 [3,] 402 699 1.994253 [4,] 402 945 1.898619 [5,] 424 470 1.812857 [6,] 424 905 1.816345 [7,] 470 905 1.871252 [8,] 504 780 1.958191 [9,] 504 848 1.997111... so you get the idea. I want to group similar items in one group/cluster following the friends of friends approach. I tried doing distclust - hclust(distA,method=single) However, I got the following error. Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero which probably means there's something wrong with my input here. Is there another way of doing this kind of clustering without getting into all the looping and ifelse etc. Basically, if 402 is close to 675,696,and699 and thus fall in cluster A then all items close to 675,696,and 699 should also fall into the same cluster A following a friends of friedns strategy. Any help would be highly appreciated. -- Ayesha Khan MS Bioengineering Dept. of Bioengineering Rice University, TX [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handing significance digits
Hi Christofer, I don't know what .Net is doing, but for R these globals are dependent on your machine and platform. ?.Machine ?.Platform Don't know if you can actually hack R into believing otherwise. Did you consider the possibility that the underlying algorithms differ between .Net and R? Cheers Joris On Fri, May 28, 2010 at 12:51 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hi folks, recently I was trying evaluation of some complex function having exactly same starting values as well as same algorithm in both R and .Net environment. However at the end point I notice that there are some differences in the reported figures from those two applications (as much as 0.10%). I feel this is basically due to consideration of different significance digits in handling floating point numbers between R and .Net. Therefore I want to fix the number of digits that should be there after . in each and every calculations in R. For example suppose I am multiplying two numbers : 18.456 and 20.345. Ideally it should come as 375.48732. However I want R to consider only 2 significant digits i.e. 18.46 20.35 and reports 375.66 and should consider this trimmed value for subsequent calculations.It would be good if there is any possibility to define such behavior once at the beginning of my R-session. Is there any way to do that? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get values out of a string using regular expressions?
Dear all, I have a vector of filenames which begins like this : X - c(OrthoP1_DNA_str.aln, OrthoP10_DNA_str.aln, OrthoP100_DNA_str.aln, OrthoP101_DNA_str.aln, OrthoP102_DNA_str.aln, OrthoP103_DNA_str.aln, OrthoP104_DNA_str.aln, OrthoP105_DNA_str.aln, OrthoP106_DNA_str.aln, OrthoP107_DNA_str.aln) using grep((\\d+),X,perl=T,value=T) I get the complete values back. Yet, I want a vector : c(1,10,100,101,102,103,104,105,106,107) In Perl, using the brackets allows for extracting only the numbers (using a construct with $1 for those who know Perl). I want to do the same in R, but can't find a way of doing that without extensive string manipulations. Problem is that the length of the numbers differ, so I can't use substr. I tried strsplit(X,\\d+) [[1]] [1] OrthoP _DNA_str.aln which gives me exactly what I want to throw away. So : strsplit(X,\\D+) [[1]] [1] 1 [[2]] [1]10 gives something I can use, but it still requires a lot of list manipulation afterwards to get the right vector. Is there an option or a function I'm missing somewhere? Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova post hoc tests
See : http://www.statmethods.net/stats/anova.html ?TukeyHSD Cheers Joris On Fri, May 28, 2010 at 2:11 PM, Iasonas Lamprianou lampria...@yahoo.comwrote: Hi everybody does anyone know how I can run ANOVA post-hoc tests using R commander or R in general? Thank you Dr. Iasonas Lamprianou __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
Could you provide us with data to test the code? use dput (and limit the size!) eg: dput(vowel_features) dput(mask_features) Without this information, it's impossible to say what's going wrong. It looks like you're doing something wrong in the selection. What should vowel_features[15] return? Did you check it's actually what you want? Did you use str(G) to check the type? Cheers Joris On Thu, May 27, 2010 at 5:28 PM, cobbler_squad la.f...@gmail.com wrote: Joris, You are a life saver. Based on two sample files above, I think lda should go something like this: vowel_features - read.table(file = mappings_for_vowels.txt) mask_features - data.frame(as.matrix(read.table(file = 3dmaskdump_ICA_37_Combined.txt))) G - vowel_features[15] cvc_lda - lda(G~ vowel_features[15], data=mask_features, na.action=na.omit, CV=TRUE) ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data = mask_features, : invalid type (list) for variable 'G' I am clearly doing something wrong declaring G (how should I declare grouping in R when I need to use one column from vowel_feature file)? Sorry for stupid questions and thank you for being so helpful! - again, sample files that I am working with: mappings_for_vowels.txt: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 1E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 2o 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 3I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 4^ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 5@ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 and the mask_features file is: V42 V43 V44 V45 V46 V47 V48 V49 [1,] 2.890891625 2.881188521 2.88778 -2.882606612 -2.77341 2.879834384 2.886483229 2.883815864 [2,] 2.763404707 2.756198683 2.761863881 -2.756827983 -2.762268531 2.754305072 2.760017050 2.758399799 [3,] 0.556614506 0.556377530 0.556247414 -0.556300910 -0.556098321 0.557495060 0.557383073 0.556867424 [4,] 0.367065248 0.366962036 0.366870087 -0.366794442 -0.366644148 0.366613343 0.366537320 0.366953464 [5,] 0.423692393 0.421835623 0.421741829 -0.421897460 -0.421659824 0.421567705 0.421465738 0.422407838 -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get values out of a string using regular expressions?
Bingo! Thx Gabor. Thank you too Tal, I looked briefly at the package and it looks like a nice interface. I keep it in mind for later. Cheers Joris On Fri, May 28, 2010 at 2:25 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Try this: as.numeric(gsub(\\D, , X)) On Fri, May 28, 2010 at 8:21 AM, Joris Meys jorism...@gmail.com wrote: Dear all, I have a vector of filenames which begins like this : X - c(OrthoP1_DNA_str.aln, OrthoP10_DNA_str.aln, OrthoP100_DNA_str.aln, OrthoP101_DNA_str.aln, OrthoP102_DNA_str.aln, OrthoP103_DNA_str.aln, OrthoP104_DNA_str.aln, OrthoP105_DNA_str.aln, OrthoP106_DNA_str.aln, OrthoP107_DNA_str.aln) using grep((\\d+),X,perl=T,value=T) I get the complete values back. Yet, I want a vector : c(1,10,100,101,102,103,104,105,106,107) In Perl, using the brackets allows for extracting only the numbers (using a construct with $1 for those who know Perl). I want to do the same in R, but can't find a way of doing that without extensive string manipulations. Problem is that the length of the numbers differ, so I can't use substr. I tried strsplit(X,\\d+) [[1]] [1] OrthoP _DNA_str.aln which gives me exactly what I want to throw away. So : strsplit(X,\\D+) [[1]] [1] 1 [[2]] [1]10 gives something I can use, but it still requires a lot of list manipulation afterwards to get the right vector. Is there an option or a function I'm missing somewhere? Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix interesting question!
Provide a minimal example to start with. This sounds more like voodoo than anything else. Cheers Joris On Fri, May 28, 2010 at 6:30 PM, UM usman.muni...@imperial.ac.uk wrote: hi, I have been trying to do this in R (have implemented it in Excel) but I have been using a very inefficent way (loops etc.). I have matrix A (columns are years and ages are rows) and matrix B (columns are birth yrs and rows are ages) I would like to first turn matrix A into matrix B And then I would like to convert matrix B back again to the original matrix A. (I have left out details of steps) but this is the gist of what I want to do. Can anyone please give any insights? Thanks http://r.789695.n4.nabble.com/file/n2234852/untitled.bmp -- View this message in context: http://r.789695.n4.nabble.com/Matrix-interesting-question-tp2234852p2234852.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] leave-one-out cross validation
see ?cv.glm under the heading Value. The help files tell you what comes out. On Fri, May 28, 2010 at 10:19 PM, azam jaafari azamjaaf...@yahoo.comwrote: Hi Finally, I did leave-one-out cross validation in R for prediction error of logistic regression by cv.glm. But I don't know what are the produced data(almost 700)? does delta show me error estimation? cost-function(a,b)mean(abs(a-b)) #SALIC=binary response salic.lr-glm(profilesample$SALIC~profilesample$wetnessindex , profilesample, family=binomial('logit')) loadpackage(boot) cv.err-cv.glm(profilesample, salic.lr, cost, K=100) cv.err $call cv.glm(data = profilesample, glmfit = salic.lr, cost = cost, K = 100) $K [1] 100 $delta 1 1 0.4278 0.4278 $seed [1] 403 133 1654269195 -1877109783 -961256264 1403523942 [7] 124639233 261424787 1836448066 1034917620 -13630729 468718317 [13] 1694379396 1559298986 1935866133 -1450855505 2105396150 1802260960 [19] 1077391651 539731521 122505520 230898510 -1940184647 1223031755 [25] -1597886342 -1854140036 -1783225921 1484611221 1365746860 -346485118 [31] 1206044253 1201793367 956757054 350214264 -1324711077 . . . please help me Thanks alot --- On Wed, 5/26/10, Joris Meys jorism...@gmail.com wrote: From: Joris Meys jorism...@gmail.com Subject: Re: [R] validation logistic regression To: azam jaafari azamjaaf...@yahoo.com Cc: r-help@r-project.org Date: Wednesday, May 26, 2010, 5:00 AM Hi, first of all, you shouldn't backtransform your prediction, use the option type=response instead : salichpred-predict(salic.lr, newdata=profilevalidation,type=response) limit - 0.5 salichpredcat - ifelse(salichpredlimit,0,1) # prediction of categories. Read in on sensitivity, specificity and ROC-curves. With changing the limit, you can calculate sensitivity and specificity, and you can construct a ROC curve that will tell you how well your predictions are. It all depends on how much error you allow on the predictions. Cheers Joris On Wed, May 26, 2010 at 10:04 AM, azam jaafari azamjaaf...@yahoo.com wrote: Hi I did validation for prediction by logistic regression according to following: validationsize - 23 set.seed(1) random-runif(123) order(random) nrprofilesinsample-sort(order(random)[1:100]) profilesample - data[nrprofilesinsample,] profilevalidation - data[-nrprofilesinsample,] salich-profilesample$SALIC.H.1 salic.lr-glm(salich~wetnessindex, profilesample, family=binomial('logit')) summary(salic.lr) salichpred-predict(salic.lr, newdata=profilevalidation) expsalichpred-exp(salichpred) salichprediction-(expsalichpred/(1+expsalichpred)) So, table(salichprediction, profilevalidation$SALIC.H.1) in result: salichprediction0 1 0.0408806327422231 1 0 0.094509645033899 1 0 0.118665480273383 1 0 0.129685441514168 1 0 0.135452955695111 0 0.137580612201769 1 0 0.197265822234215 1 0 0.199278585548248 0 1 0.202436276322278 1 0 0.211278767985746 1 0 0.261036846823867 1 0 0.283792703256058 1 0 0.362229486187581 0 1 0.362795636267779 1 0 0.409067386115694 1 0 0.410860613509484 0 1 0.423960962956254 1 0 0.428164288793652 1 0 0.448509687866763 0 1 0.538401659478058 0 1 0.557282539294224 1 0 0.603881788227797 0 1 0.63633478460736 0 1 So, I have salichprediction between 0 to 1 and binary variable(observed values) 0 or 1. I want to compare these data together and I want to know is ok this model(logistic regression) for prediction or no? please help me? Thanks alot Azam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted
Re: [R] clustering in R
errr, forget about the output of dput(q), but keep it in mind for next time. f = dist(t(q)) hclust(f,method=single) it's as simple as that. Cheers Joris On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan ayesha.diamond...@gmail.comwrote: v - dput(x,sampledata.txt) dim(v) q - v[1:10,1:10] f =as.matrix(dist(t(q))) distB=NULL for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) { if(f[k,m] 2) distB=rbind(distB,c(k,m,f[k,m])) } #now distB looks like this distB [,1] [,2] [,3] [1,]12 1.6275568 [2,]13 0.5252058 [3,]14 0.7323116 [4,]15 1 .9966001 [5,]16 1.6664110 [6,]17 1.0800540 [7,]18 1.8698925 [8,]1 10 0.5161808 [9,]23 1.7325811 [10,]25 0.8267843 [11,]26 0.5963280 [12,]27 0.8787230 #now from this output i want to cluster all 1's, friedns of 1 and friends of friends of 1 in one cluster. The same goes for 2,3 and so on But when i do that using hclust, i get the following error. I think what I need to do is convert my cureent matrix somehow into a format that would be accepted by the hclust function but I dont know how to achieve that. distclust - hclust(distB,method=single) Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero P.S: Please let me know if this makes things more clear? cuz i dont know how looking at the original data set would help becuase the matrix under consdieration right now is the distance matrix and how it can be altered. I have tried as.dist, doesnt work because my matrix as i mentioned eralier is not a square matrix. On Fri, May 28, 2010 at 2:37 PM, Tal Galili tal.gal...@gmail.com wrote: Hi Ayesha, I wish to help you, but without a simple self contained example that shows your issue, I will not be able to help. Try using the ?dput command to create some simple data, and let us see what you are doing. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan ayesha.diamond...@gmail.com wrote: Thanks Tal Joris! I created my distance matrix distA by using the dist() function in R manipulating my output in order to get a matrix. distA =as.matrix(dist(t(x2))) # x2 being my original dataset as according to the documentaion on dist() For the default method, a dist object, or a matrix (of distances) or an object which can be coerced to such a matrix using as.matrix() On Fri, May 28, 2010 at 6:34 AM, Joris Meys jorism...@gmail.comwrote: As Tal said. Next to that, I read that column1 (and column2?) are supposed to be seen as factors, not as numerical variables. Did you take that into account somehow? It's easy to reproduce the error code : n - NULL if(n2)print(This is OK) Error in if (n 2) print(This is OK) : argument is of length zero In the hclust code, you find following line : n - as.integer(attr(d, Size)) where d is the distance object entered in the hclust function. Looking at the error you get, this means that the size attribute of your distance is NULL. Which tells me that distA is not a dist-object. A - matrix(1:4,ncol=2) A [,1] [,2] [1,]13 [2,]24 hclust(A,method=single) Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero Did you actually put in a distance object? see also ?dist or ?as.dist. Cheers Joris On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan ayesha.diamond...@gmail.com wrote: i have a matrix with the following dimensions 136 3 and it looks something like [,1] [,2] [,3] [1,] 402 675 1.802758 [2,] 402 696 1.938902 [3,] 402 699 1.994253 [4,] 402 945 1.898619 [5,] 424 470 1.812857 [6,] 424 905 1.816345 [7,] 470 905 1.871252 [8,] 504 780 1.958191 [9,] 504 848 1.997111... so you get the idea. I want to group similar items in one group/cluster following the friends of friends approach. I tried doing distclust - hclust(distA,method=single) However, I got the following error. Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero which probably means there's something wrong with my input here. Is there another way of doing this kind of clustering without getting into all the looping and ifelse etc. Basically, if 402 is close to 675,696,and699 and thus fall in cluster A then all items close to 675,696,and 699 should also fall into the same cluster A following a friends of friedns strategy. Any help would be highly
Re: [R] clustering in R
I can't run your code. Please, just give me whatever comes on your screen when you run: dput(q) On Fri, May 28, 2010 at 10:57 PM, Ayesha Khan ayesha.diamond...@gmail.comwrote: I assume my matrix should look something like this?.. round(distance, 4) P00A P00B M02A M02B P04A P04B M06A M06B P08A P08B M10A P00B 0.9678 M02A 1.0054 1.0349 M02B 1.0258 1.0052 1.2106 P04A 1.0247 0.9928 1.0145 0.9260 P04B 0.9898 0.9769 0.9875 0.9855 0.6075 M06A 1.0159 0.9893 1.0175 0.9521 0.9266 0.9660 M06B 0.9837 0.9912 1.0124 1.0402 1.0272 1.0367 1.5693 P08A 1.0279 1.0303 0.9865 0.9748 1.0184 1.0452 0.9799 1.0400 P08B 1.0248 1.0299 0.9717 0.9673 1.0048 1.0329 1.0280 0.9907 0.2158 M10A 0.9850 0.9603 1.0246 0.9708 1.0231 0.9771 0.9916 1.0168 0.9722 0.9525 M10B 1.0150 1.0397 0.9754 1.0292 0.9769 1.0229 1.0084 0.9832 1.0278 1.0475 2. On Fri, May 28, 2010 at 3:39 PM, Ayesha Khan ayesha.diamond...@gmail.comwrote: v - dput(x,sampledata.txt) dim(v) q - v[1:10,1:10] f =as.matrix(dist(t(q))) distB=NULL for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) { if(f[k,m] 2) distB=rbind(distB,c(k,m,f[k,m])) } #now distB looks like this distB [,1] [,2] [,3] [1,]12 1.6275568 [2,]13 0.5252058 [3,]14 0.7323116 [4,]15 1 .9966001 [5,]16 1.6664110 [6,]17 1.0800540 [7,]18 1.8698925 [8,]1 10 0.5161808 [9,]23 1.7325811 [10,]25 0.8267843 [11,]26 0.5963280 [12,]27 0.8787230 #now from this output i want to cluster all 1's, friedns of 1 and friends of friends of 1 in one cluster. The same goes for 2,3 and so on But when i do that using hclust, i get the following error. I think what I need to do is convert my cureent matrix somehow into a format that would be accepted by the hclust function but I dont know how to achieve that. distclust - hclust(distB,method=single) Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero P.S: Please let me know if this makes things more clear? cuz i dont know how looking at the original data set would help becuase the matrix under consdieration right now is the distance matrix and how it can be altered. I have tried as.dist, doesnt work because my matrix as i mentioned eralier is not a square matrix. On Fri, May 28, 2010 at 2:37 PM, Tal Galili tal.gal...@gmail.comwrote: Hi Ayesha, I wish to help you, but without a simple self contained example that shows your issue, I will not be able to help. Try using the ?dput command to create some simple data, and let us see what you are doing. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan ayesha.diamond...@gmail.com wrote: Thanks Tal Joris! I created my distance matrix distA by using the dist() function in R manipulating my output in order to get a matrix. distA =as.matrix(dist(t(x2))) # x2 being my original dataset as according to the documentaion on dist() For the default method, a dist object, or a matrix (of distances) or an object which can be coerced to such a matrix using as.matrix() On Fri, May 28, 2010 at 6:34 AM, Joris Meys jorism...@gmail.comwrote: As Tal said. Next to that, I read that column1 (and column2?) are supposed to be seen as factors, not as numerical variables. Did you take that into account somehow? It's easy to reproduce the error code : n - NULL if(n2)print(This is OK) Error in if (n 2) print(This is OK) : argument is of length zero In the hclust code, you find following line : n - as.integer(attr(d, Size)) where d is the distance object entered in the hclust function. Looking at the error you get, this means that the size attribute of your distance is NULL. Which tells me that distA is not a dist-object. A - matrix(1:4,ncol=2) A [,1] [,2] [1,]13 [2,]24 hclust(A,method=single) Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero Did you actually put in a distance object? see also ?dist or ?as.dist. Cheers Joris On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan ayesha.diamond...@gmail.com wrote: i have a matrix with the following dimensions 136 3 and it looks something like [,1] [,2] [,3] [1,] 402 675 1.802758 [2,] 402 696 1.938902 [3,] 402 699 1.994253 [4,] 402 945 1.898619 [5,] 424 470 1.812857 [6,] 424 905 1.816345 [7,] 470 905 1.871252 [8,] 504 780 1.958191 [9,] 504 848 1.997111
Re: [R] clustering in R
Ah OK, I didn't get your question then. a dist-object is actually a vector of numbers with a couple of attributes. You can't just cut out values like that. The hclust function needs a perfect distance matrix to use the calculations. shortcut is easy : just do f - f/2*max(f), and all values are below 2. Otherwise this function could do that for you : to.dist - function(x){ x.names - sort(unique(c(x[[1]],x[[2]]))) n - length(x.names) x.dist - matrix(0,n,n) dimnames(x.dist) - list(x.names,x.names) x.ind - rbind(cbind(match(x[[1]], x.names), match(x[[2]], x.names)), cbind(match(x[[2]], x.names), match(x[[1]], x.names))) x.dist[x.ind] - rep(x[[3]], 2) x.dist - as.dist(x.dist) return(x.dist) } d - to.dist(distB) hclust(d) Cheers Joris On Sat, May 29, 2010 at 12:04 AM, Ayesha Khan ayesha.diamond...@gmail.comwrote: Yes Joris. I did try that and it does produce the results. I am now wondering why I wanted a matrix like structure in the first place. However, I do want 'f' to contain values less than 2 only. but when i try to get rid of values greater than 2 by doing N - (f[f2], f strcuture disrupts and hclust doesnt want to recognize it anyore again. Because obviously the data frame changes again with that. Any ideas on how to do that? On Fri, May 28, 2010 at 4:13 PM, Joris Meys jorism...@gmail.com wrote: errr, forget about the output of dput(q), but keep it in mind for next time. f = dist(t(q)) hclust(f,method=single) it's as simple as that. Cheers Joris On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan ayesha.diamond...@gmail.com wrote: v - dput(x,sampledata.txt) dim(v) q - v[1:10,1:10] f =as.matrix(dist(t(q))) distB=NULL for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) { if(f[k,m] 2) distB=rbind(distB,c(k,m,f[k,m])) } #now distB looks like this distB [,1] [,2] [,3] [1,]12 1.6275568 [2,]13 0.5252058 [3,]14 0.7323116 [4,]15 1 .9966001 [5,]16 1.6664110 [6,]17 1.0800540 [7,]18 1.8698925 [8,]1 10 0.5161808 [9,]23 1.7325811 [10,]25 0.8267843 [11,]26 0.5963280 [12,]27 0.8787230 #now from this output i want to cluster all 1's, friedns of 1 and friends of friends of 1 in one cluster. The same goes for 2,3 and so on But when i do that using hclust, i get the following error. I think what I need to do is convert my cureent matrix somehow into a format that would be accepted by the hclust function but I dont know how to achieve that. distclust - hclust(distB,method=single) Error in if (n 2) stop(must have n = 2 objects to cluster) : argument is of length zero P.S: Please let me know if this makes things more clear? cuz i dont know how looking at the original data set would help becuase the matrix under consdieration right now is the distance matrix and how it can be altered. I have tried as.dist, doesnt work because my matrix as i mentioned eralier is not a square matrix. On Fri, May 28, 2010 at 2:37 PM, Tal Galili tal.gal...@gmail.comwrote: Hi Ayesha, I wish to help you, but without a simple self contained example that shows your issue, I will not be able to help. Try using the ?dput command to create some simple data, and let us see what you are doing. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan ayesha.diamond...@gmail.com wrote: Thanks Tal Joris! I created my distance matrix distA by using the dist() function in R manipulating my output in order to get a matrix. distA =as.matrix(dist(t(x2))) # x2 being my original dataset as according to the documentaion on dist() For the default method, a dist object, or a matrix (of distances) or an object which can be coerced to such a matrix using as.matrix() On Fri, May 28, 2010 at 6:34 AM, Joris Meys jorism...@gmail.comwrote: As Tal said. Next to that, I read that column1 (and column2?) are supposed to be seen as factors, not as numerical variables. Did you take that into account somehow? It's easy to reproduce the error code : n - NULL if(n2)print(This is OK) Error in if (n 2) print(This is OK) : argument is of length zero In the hclust code, you find following line : n - as.integer(attr(d, Size)) where d is the distance object entered in the hclust function. Looking at the error you get, this means that the size attribute of your distance is NULL. Which tells me that distA is not a dist-object. A - matrix(1:4,ncol=2) A [,1] [,2] [1,]13 [2,]24 hclust(A,method=single) Error in if (n 2) stop(must have n = 2 objects to cluster
Re: [R] data frame manipulation change elements meeting criteria
The loop is due to the switch statement, not the condition. Without condition it would become: for (i in 1:length(Y)){ new.vect[i]-switch( EXPR = X[i], Sell=Buy, Buy=Sell, X[i]) } You can make an sapply construct too off course : new.vect - sapply(X[which(Y==DEL)],switch,Sell=Buy,Buy=Sell) This will speed up things a little bit, but the effect is marginal. Cheers Joris On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Thank you for the answer. Is there any way to combine if() and switch() in one line? In my case, something like : if(trade$Trade.Status==DEL)switch(.) I would like to avoid the loop . From: Joris Meys [mailto:jorism...@gmail.com] Sent: Wednesday, May 26, 2010 9:15 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation change elements meeting criteria see ?switch X- rep(c(Buy,Sell,something else),each=5) Y- rep(c(DEL,INS,DEL),5) new.vect - X for (i in which(Y==DEL)){ new.vect[i]-switch( EXPR = X[i], Sell=Buy, Buy=Sell, X[i]) } cbind(new.vect,X,Y) On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Dear group, Here is my df : trade - structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name= c(SUGAR NO.11, CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10, Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L, 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. = c(4.01, -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name, Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price, Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame) Here is what I want : If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell , change it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell. If trade$Trade.Status==INS, do nothing I tried to work around with ifelse, but don't know how to deal with so many conditions. Any help is appreciated. TY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame manipulation change elements meeting criteria
Off course. You put in a matrix to sapply, but sapply is for vectors. You want to apply the switch command on every entry of the vector trades$Buy.Sell..Cleared for which trades$Trade.Status equals DEL. Why do you try to put in a matrix with all variables for the observations where status is DEL? You should have done : tradesnew-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status==DEL)], switch,Sell=Buy,Buy=Sell) Check the help files, and keep track of what goes in and out a function. Cheers Joris On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Joris, If i pass this line : tradesnew-sapply(trades[which(trades$Trade.Status==DEL),],switch,Sel l=Buy,Buy=Sell) Here is what I get : tradesnew $Trade.Status NULL $Instrument.Long.Name NULL $Delivery.Prompt.Date NULL $Buy.Sell..Cleared. [1] Buy $Volume [1] Buy $Price NULL $Net.Charges..sum. NULL That's certainly not what I want. From: Joris Meys [mailto:jorism...@gmail.com] Sent: Thursday, May 27, 2010 8:43 AM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation change elements meeting criteria The loop is due to the switch statement, not the condition. Without condition it would become: for (i in 1:length(Y)){ new.vect[i]-switch( EXPR = X[i], Sell=Buy, Buy=Sell, X[i]) } You can make an sapply construct too off course : new.vect - sapply(X[which(Y==DEL)],switch,Sell=Buy,Buy=Sell) This will speed up things a little bit, but the effect is marginal. Cheers Joris On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Thank you for the answer. Is there any way to combine if() and switch() in one line? In my case, something like : if(trade$Trade.Status==DEL)switch(.) I would like to avoid the loop . From: Joris Meys [mailto:jorism...@gmail.com] Sent: Wednesday, May 26, 2010 9:15 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation change elements meeting criteria see ?switch X- rep(c(Buy,Sell,something else),each=5) Y- rep(c(DEL,INS,DEL),5) new.vect - X for (i in which(Y==DEL)){ new.vect[i]-switch( EXPR = X[i], Sell=Buy, Buy=Sell, X[i]) } cbind(new.vect,X,Y) On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Dear group, Here is my df : trade - structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name= c(SUGAR NO.11, CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10, Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L, 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. = c(4.01, -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name, Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price, Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame) Here is what I want : If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell , change it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell. If trade$Trade.Status==INS, do nothing I tried to work around with ifelse, but don't know how to deal with so many conditions. Any help is appreciated. TY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame manipulation change elements meeting criteria
Ah, OK. sapply -evidently- only gives an output for every case that goes in. Which is only one, as there is only one DEL case. You can use that output to change the corresponding value in the dataframe, like : tradenews - trades tradenews$Buy.Sell..Cleared.[which(trades$Trade.Status==DEL)] - sapply(trades$Buy.Sell..Cleared.[which(trades$Trade.Status==DEL)], switch,Sell=Buy,Buy=Sell) Also take a look at these help files and the examples mentioned in there. ?switch ?sapply ?which And please, give your variables some decent names. All those points make your code very error-prone. Cheers Joris On Thu, May 27, 2010 at 10:47 AM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Sorry Joris, but I am totally lost on this issue!! tradenews-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status==DEL )],switch,Sell=Buy,Buy=Sell) tradenews Sell Buy Not really what I want !! From: Joris Meys [mailto:jorism...@gmail.com] Sent: Thursday, May 27, 2010 10:38 AM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation change elements meeting criteria Off course. You put in a matrix to sapply, but sapply is for vectors. You want to apply the switch command on every entry of the vector trades$Buy.Sell..Cleared for which trades$Trade.Status equals DEL. Why do you try to put in a matrix with all variables for the observations where status is DEL? You should have done : tradesnew-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status==DEL) ], switch,Sell=Buy,Buy=Sell) Check the help files, and keep track of what goes in and out a function. Cheers Joris On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Joris, If i pass this line : tradesnew-sapply(trades[which(trades$Trade.Status==DEL),],switch,Sel l=Buy,Buy=Sell) Here is what I get : tradesnew $Trade.Status NULL $Instrument.Long.Name NULL $Delivery.Prompt.Date NULL $Buy.Sell..Cleared. [1] Buy $Volume [1] Buy $Price NULL $Net.Charges..sum. NULL That's certainly not what I want. From: Joris Meys [mailto:jorism...@gmail.com] Sent: Thursday, May 27, 2010 8:43 AM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation change elements meeting criteria The loop is due to the switch statement, not the condition. Without condition it would become: for (i in 1:length(Y)){ new.vect[i]-switch( EXPR = X[i], Sell=Buy, Buy=Sell, X[i]) } You can make an sapply construct too off course : new.vect - sapply(X[which(Y==DEL)],switch,Sell=Buy,Buy=Sell) This will speed up things a little bit, but the effect is marginal. Cheers Joris On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Thank you for the answer. Is there any way to combine if() and switch() in one line? In my case, something like : if(trade$Trade.Status==DEL)switch(.) I would like to avoid the loop . From: Joris Meys [mailto:jorism...@gmail.com] Sent: Wednesday, May 26, 2010 9:15 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] data frame manipulation change elements meeting criteria see ?switch X- rep(c(Buy,Sell,something else),each=5) Y- rep(c(DEL,INS,DEL),5) new.vect - X for (i in which(Y==DEL)){ new.vect[i]-switch( EXPR = X[i], Sell=Buy, Buy=Sell, X[i]) } cbind(new.vect,X,Y) On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Dear group, Here is my df : trade - structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name= c(SUGAR NO.11, CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10, Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L, 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. = c(4.01, -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name, Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price, Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame) Here is what I want : If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell , change it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell. If trade$Trade.Status==INS, do nothing I tried to work around with ifelse, but don't know how to deal with so many conditions. Any help is appreciated. TY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e
Re: [R] cluster analysis and supervised classification: an alternative to knn1?
Hi Abanero, first, I have to correct myself. Knn1 is a supervised learning algorithm, so my comment wasn't completely correct. In any case, if you want to do a clustering prior to a supervised classification, the function daisy() can handle any kind of variable. The resulting distance matrix can be used with a number of different methods. And you're right, randomForest doesn't handle categorical variables either. So I haven't been of great help here... Cheers Joris On Thu, May 27, 2010 at 1:25 PM, abanero gdevi...@xtel.it wrote: Hi, thank you Joris and Ulrich for you answers. Joris Meys wrote: see the library randomForest for example I'm trying to find some example in randomForest with categorical variables but I haven't found anything. Do you know any example with both categorical and numerical variables? Anyway I don't have any class labels yet. How could I find clusters with randomForest? Ulrich wrote: Probably the simplest way is Affinity Propagation[...] All you need is a way of measuring the similarity of samples which is straightforward both for numerical and categorical variables. I had a look at the documentation of the package apcluster. That's interesting but do you have any example using it with both categorical and numerical variables? I'd like to test it with a large dataset.. Thanks a lot! Cheers Giuseppe -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232950.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster analysis and supervised classification: an alternative to knn1?
I'm confusing myself :-) randomForest cannot handle character vectors as predictors. (Which is why I, to my surprise, found out that a categorical variable could not be used in the function). It can handle categorical variables as predictors IF they are put in as a factor. Obviously they handle categorical variables as a response variable. I hope I'm not going to add up more mistakes, it's been enough for the day... Cheers Joris On Thu, May 27, 2010 at 2:08 PM, steve_fried...@nps.gov wrote: Joris, I've been following this thread for a few days as I am beginning to use randomForest in my work. I am confused by your last email. What do you mean that randomForest does not handle categorical variables ? It can be used in either regression or classification analysis. Do you mean that categorical predictors are not suitable? Certainly they are as the response. Would you be so kind, and clarify what you were suggesting. Thanks, Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 Joris Meys jorism...@gmail. com To Sent by: abanero gdevi...@xtel.it r-help-boun...@r- cc project.org r-help@r-project.org Subject Re: [R] cluster analysis and 05/27/2010 07:56 supervised classification: an AMalternative to knn1? Hi Abanero, first, I have to correct myself. Knn1 is a supervised learning algorithm, so my comment wasn't completely correct. In any case, if you want to do a clustering prior to a supervised classification, the function daisy() can handle any kind of variable. The resulting distance matrix can be used with a number of different methods. And you're right, randomForest doesn't handle categorical variables either. So I haven't been of great help here... Cheers Joris On Thu, May 27, 2010 at 1:25 PM, abanero gdevi...@xtel.it wrote: Hi, thank you Joris and Ulrich for you answers. Joris Meys wrote: see the library randomForest for example I'm trying to find some example in randomForest with categorical variables but I haven't found anything. Do you know any example with both categorical and numerical variables? Anyway I don't have any class labels yet. How could I find clusters with randomForest? Ulrich wrote: Probably the simplest way is Affinity Propagation[...] All you need is a way of measuring the similarity of samples which is straightforward both for numerical and categorical variables. I had a look at the documentation of the package apcluster. That's interesting but do you have any example using it with both categorical and numerical variables? I'd like to test it with a large dataset.. Thanks a lot! Cheers Giuseppe -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232950.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented
Re: [R] summary of arima model in R
I reckon you misunderstand the function arima. If you're interested in the significance of any regressor, you should use the proper fitting tools. Check all the code examples from the book I recommended before on : http://www.stat.pitt.edu/stoffer/tsa2/index.html There's a nice tutorial that explains quite well how to proceed. In the code for chapter 1-5 they give you examples for the functions you need to do formal testing of your regressors. Cheers Joris On Wed, May 26, 2010 at 1:28 AM, Jianyun Wu jianyun.fred...@gmail.comwrote: Thanks for ur reply. But wot i want is to see the significancy of intervention regressors,rather than see the goodness of fit of time series itself. Thanks On 5/26/10, Joris Meys jorism...@gmail.com wrote: Check http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf for some ideas on testing time series in R. I'd go with the acf() and pacf() on the residuals of the arima model. If arima works, both plots will indicate absence of autocorrelation. also check ?tsdiag And if you're really going to use those more often, I really can recommend this book : http://www.amazon.com/Time-Analysis-Its-Applications-Statistics/dp/0387293175 Cheers Joris On Tue, May 25, 2010 at 9:34 AM, Fred jianyun.fred...@gmail.com wrote: Hi, I want to give a summary or anova for arima model in R, as summary, and anova for lm. As including various intervention factors in arima(xreg = ) part, I want to assess the significancy of thse factors. I can do it using interrupted analysis of time series by linear regression, but want to see whether arima model works for the data first. summary, anova do not work for arima, any alternatives ??? Thank you very much. Fred __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Sent from my mobile device -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] validation logistic regression
Hi, first of all, you shouldn't backtransform your prediction, use the option type=response instead : salichpred-predict(salic.lr, newdata=profilevalidation,type=response) limit - 0.5 salichpredcat - ifelse(salichpredlimit,0,1) # prediction of categories. Read in on sensitivity, specificity and ROC-curves. With changing the limit, you can calculate sensitivity and specificity, and you can construct a ROC curve that will tell you how well your predictions are. It all depends on how much error you allow on the predictions. Cheers Joris On Wed, May 26, 2010 at 10:04 AM, azam jaafari azamjaaf...@yahoo.comwrote: Hi I did validation for prediction by logistic regression according to following: validationsize - 23 set.seed(1) random-runif(123) order(random) nrprofilesinsample-sort(order(random)[1:100]) profilesample - data[nrprofilesinsample,] profilevalidation - data[-nrprofilesinsample,] salich-profilesample$SALIC.H.1 salic.lr-glm(salich~wetnessindex, profilesample, family=binomial('logit')) summary(salic.lr) salichpred-predict(salic.lr, newdata=profilevalidation) expsalichpred-exp(salichpred) salichprediction-(expsalichpred/(1+expsalichpred)) So, table(salichprediction, profilevalidation$SALIC.H.1) in result: salichprediction0 1 0.0408806327422231 1 0 0.094509645033899 1 0 0.118665480273383 1 0 0.129685441514168 1 0 0.135452955695111 0 0.137580612201769 1 0 0.197265822234215 1 0 0.199278585548248 0 1 0.202436276322278 1 0 0.211278767985746 1 0 0.261036846823867 1 0 0.283792703256058 1 0 0.362229486187581 0 1 0.362795636267779 1 0 0.409067386115694 1 0 0.410860613509484 0 1 0.423960962956254 1 0 0.428164288793652 1 0 0.448509687866763 0 1 0.538401659478058 0 1 0.557282539294224 1 0 0.603881788227797 0 1 0.63633478460736 0 1 So, I have salichprediction between 0 to 1 and binary variable(observed values) 0 or 1. I want to compare these data together and I want to know is ok this model(logistic regression) for prediction or no? please help me? Thanks alot Azam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation time of isoMDS and the optimal number of dimensions
Hi Michael, thanks for your answer. Indeed, with a 100x100 matrix it runs even pretty fast with k=30. But as with a lot of things in R, there is a disproportionate rise in the calculation time once you exceed a certain size limit on your matrices. In the end, it ran about 8 hours for my complete matrix. Thanks for the suggestion, that saves me quite a bit of time. For the full story: I'm applying a new distance measure for comparing phylogenetic trees. Now the space this is calculated in, has to be mapped back on an euclidean space, which I'm trying out right now. I noticed that the 2D-solution seems a bad representation of the real distances, so I increase the dimensionality. I use the dimensions to get a medoid and a centroid in the euclidean space, but those results obviously depend on the number of dimensions used in the MDS. So I'm trying to figure out when these location measures get more or less stable. graphical representation is off course bound to 2 dimensions. 3D plots tend to be confusing with over 800 points. Tried color coding, but that doesn't really help... Cheers Joris On Wed, May 26, 2010 at 5:32 AM, Michael Denslow michael.dens...@gmail.comwrote: Hi Joris, On Tue, May 25, 2010 at 1:00 PM, Joris Meys jorism...@gmail.com wrote: Dear all, I'm running a set of nonparametric MDS analyses, using a wrapper for isoMDS, on a 800x800 distance matrix. I noticed that setting the parameter k to larger numbers seriously increases the calculation time. Actually, with k=10 it calculates already longer than for k=2 and k=5 together. It's now calculating for 6 hours, and counting... Seems like a long time, I have a 100x100 matrix that takes about 40 secs to run with k=10. What is the wrapper function doing? There is quite a difference between the results using k=2 or k=5 when looking at the first 2 dimensions (logically...). I suspect the same when k=10. Yet, I start asking myself whether this makes sense if I'm only using the first 2 dimensions. And I can't think of a formal method to check in a nMDS framework how much dimensions are enough. Anybody an idea? You might want to look at the nmds.min() function in the ecodist package, which seeks to minimize stress. Out of curiosity, do you often use 10 dimensional solutions in your field of study? Hope this helps, Michael I use metaMDS from the vegan package, although it's not really meant to be used on these data. Cheers Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Denslow I.W. Carpenter Jr. Herbarium [BOON] Department of Biology Appalachian State University Boone, North Carolina U.S.A. -- AND -- Communications Manager Southeast Regional Network of Expertise and Collections sernec.org 36.214177, -81.681480 +/- 3103 meters -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation time of isoMDS and the optimal number of dimensions
Hi Gavin, thank you for the answer. I am aware of the fact that with nMDS it's about the configuration, and that's exactly my problem: the configuration changes pretty much when I increase the number of dimensions. As I am trying to go from a CAT(0) space of trees (see Billera et al on geodesic distance) to an euclidean space, the required amount of dimensions is not easily determined. I have to restrict my euclidean space for practical reasons, but I want to stay as close as possible to the original configuration of the trees. Hence my playing with the dimensions in the nMDS. I merely commented on the metaMDS as not really meant for this kind of data because of the object that's returned. As you miss the species component in the data, you get warning messages when using procrustes() or other functions in the vegan package. But you're right. It might be written for community data, but it is perfectly valid for any kind of distance matrix. thanks again for your insights. Cheers Joris On Wed, May 26, 2010 at 9:34 AM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote: On Tue, 2010-05-25 at 19:00 +0200, Joris Meys wrote: Dear all, I'm running a set of nonparametric MDS analyses, using a wrapper for isoMDS, on a 800x800 distance matrix. I noticed that setting the parameter k to larger numbers seriously increases the calculation time. Actually, with k=10 it calculates already longer than for k=2 and k=5 together. It's now calculating for 6 hours, and counting... metaMDS will try 'trymax' random starts of isoMDS in an attempt to see if convergent solutions are reached. The 10d computation is clearly much more complex than fitting rank distances in 2 or even 5 d. There is quite a difference between the results using k=2 or k=5 when looking at the first 2 dimensions (logically...). I suspect the same when k=10. Yet, I start asking myself whether this makes sense if I'm only using the first 2 dimensions. And I can't think of a formal method to check in a nMDS framework how much dimensions are enough. Anybody an idea? In nMDS the configuration counts, not the axes (as they are themselves arbitrary directions --- having one or the other of a x or y geographical coordinate isn't much use without the other coordinate if you want to find your way to that location - you need both). It makes no sense what so ever to compute a 10d nMDS solution if you only want a 2d solution for later computations; there is no guarantee that the first two axes of a 10d nMDS solution will be as good as those from the 2d solution. If you only want a 2d solution, concentrate on finding the best 2d solution you can using metaMDS. I use metaMDS from the vegan package, although it's not really meant to be used on these data. Why do you say that? As long as you turn off a couple of the ecological helper bits in metaMDS, all it is doing is handling random starts of the isoMDS algorithm. Cheers Joris HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/http://www.ucl.ac.uk/%7Eucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
What exactly are you trying to do? If you want to know which position is wrong, try : if (sum(u$POSITION==0)0) cat(WARNING:POSITION IS WRONG FOR ,which(u$POSITION==0),\n) or even : wrong - which(u$POSITION==0) if(length(wrong)0) cat(WARNING: POSITION IS WRONG FOR,u$DESCRIPTION[wrong],\n) Gives you the exact location of wrong positions. If you do that, make sure u$DESCRIPTION is a character vector and not a factor. Cheers Joris On Wed, May 26, 2010 at 2:31 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Dear group, Here is my data frame: dput(u) structure(list(DESCRIPTION = structure(c(2L, 5L, 6L, 7L, 9L, 11L, 12L, 15L, 14L, 16L, 1L, 10L, 3L, 4L, 13L, 8L, 17L), .Label = c(COFFEE C Jul/10, COPPER May/10, CORN Jul/10, CORN May/10, COTTON NO.2 Jul/10, CRUDE OIL miNY May/10, GOLD Jun/10, HENRY HUB NATURAL GAS May/10, ROBUSTA COFFEE (10) Jul/10, SILVER May/10, SOYBEANS Jul/10, SPCL HIGH GRADE ZINC USD, STANDARD LEAD USD, SUGAR NO.11 Jul/10, SUGAR NO.11 May/10, WHEAT Jul/10, WHEAT May/10), class = factor), PL = c(3500, -1874.999, -2612.503, -2169.998, -680, 425, 1025, 1008.000, -3057.599, 3212.5, -1781.251, -2265.0, 75, -387.5, 2950, 490.0013, 0), POSITION = c(-2, 3, 2, 2, 18, 3, -1, -1, 5, 5, 0, 0, 0, 0, 0, 0, 0)), .Names = c(DESCRIPTION, PL, POSITION ), class = data.frame, row.names = c(NA, -17L)) I want to give a warning message if one of the element of the POSITION column is different from zero. I tried using mapply with some line like this : mapply(if,u$POSITION,==0,print(WARNING:POSITIONS ARE WRONG,quote=F)) But it seems it is not the correct way to pass the various arguments. Any help is appreciated *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] condition apply to elements of a data frame column
Arnaud, check the vector : u$POSITION0 [1] TRUE TRUE ... what I do is putting u$POSITION==0 [1] FALSE FALSE ... when you apply the sum() function on that vector, FALSE becomes 0 and TRUE becomes 1. So this actually gives you a way of counting the amount of positions that are not zero. if you have one element -2 and another 2, then c(-2,0,2)==0 [1] FALSE TRUE FALSE sum(c(-2,0,2)==0) [1] 1 Which gives you exactly the amount of elements that is 0. Cheers Joris On Wed, May 26, 2010 at 3:14 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Joris, I want to add a line in a function with a print warning if one element of the column is 0. I could use if(sum(u$POSITION)0) as a condition, but I can imagine having one element equal to -2, and another one to 2. So in this case, sum=0, but the condition is false in fact (minimum of one element different from zero). From: Joris Meys [mailto:jorism...@gmail.com] Sent: Wednesday, May 26, 2010 2:48 PM To: arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] (no subject) What exactly are you trying to do? If you want to know which position is wrong, try : if (sum(u$POSITION==0)0) cat(WARNING:POSITION IS WRONG FOR ,which(u$POSITION==0),\n) or even : wrong - which(u$POSITION==0) if(length(wrong)0) cat(WARNING: POSITION IS WRONG FOR,u$DESCRIPTION[wrong],\n) Gives you the exact location of wrong positions. If you do that, make sure u$DESCRIPTION is a character vector and not a factor. Cheers Joris On Wed, May 26, 2010 at 2:31 PM, arnaud Gaboury arnaud.gabo...@gmail.com wrote: Dear group, Here is my data frame: dput(u) structure(list(DESCRIPTION = structure(c(2L, 5L, 6L, 7L, 9L, 11L, 12L, 15L, 14L, 16L, 1L, 10L, 3L, 4L, 13L, 8L, 17L), .Label = c(COFFEE C Jul/10, COPPER May/10, CORN Jul/10, CORN May/10, COTTON NO.2 Jul/10, CRUDE OIL miNY May/10, GOLD Jun/10, HENRY HUB NATURAL GAS May/10, ROBUSTA COFFEE (10) Jul/10, SILVER May/10, SOYBEANS Jul/10, SPCL HIGH GRADE ZINC USD, STANDARD LEAD USD, SUGAR NO.11 Jul/10, SUGAR NO.11 May/10, WHEAT Jul/10, WHEAT May/10), class = factor), PL = c(3500, -1874.999, -2612.503, -2169.998, -680, 425, 1025, 1008.000, -3057.599, 3212.5, -1781.251, -2265.0, 75, -387.5, 2950, 490.0013, 0), POSITION = c(-2, 3, 2, 2, 18, 3, -1, -1, 5, 5, 0, 0, 0, 0, 0, 0, 0)), .Names = c(DESCRIPTION, PL, POSITION ), class = data.frame, row.names = c(NA, -17L)) I want to give a warning message if one of the element of the POSITION column is different from zero. I tried using mapply with some line like this : mapply(if,u$POSITION,==0,print(WARNING:POSITIONS ARE WRONG,quote=F)) But it seems it is not the correct way to pass the various arguments. Any help is appreciated *** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stress function in isoMDS
Dear all, as far as my understanding goes, isoMDS uses the Kruskal definition of stress, i.e. : the square root of the ratio of the sum of squared differences between the input distances and those of the configuration to the sum of configuration distances squared. (as stated in the help files). Now the definition of Kruskal also includes weights. I checked the isoMDS code, but they call to C routines that I can't really read. Anybody an idea about whether or not isoMDS applies those weights? Next to that, The input distances are allowed a monotonic transformation. How do I have to see that transformation within isoMDS? Kind regards Joris -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster analysis and supervised classification: an alternative to knn1?
Not a direct answer, but from your description it looks like you are better of with supervised classification algorithms instead of unsupervised clustering. see the library randomForest for example. Alternatively, you can try a logistic regression or a multinomial regression approach, but these are parametric methods and put requirements on the data. randomForest is completely non-parametric. Cheers Joris On Wed, May 26, 2010 at 3:45 PM, abanero gdevi...@xtel.it wrote: Hi, I have a 1.000 observations with 10 attributes (of different types: numeric, dicotomic, categorical ecc..) and a measure M. I need to cluster these observations in order to assign a new observation (with the same 10 attributes but not the measure) to a cluster. I want to calculate for the new observation a measure as the average of the meausures M of the observations in the cluster assigned. I would use cluster analysis ( Clara algorithm?) and then knn1 (in package class) to assign the new observation to a cluster. The problem is: Im not able to use knn1 because some of attributes are categorical. Do you know something like knn1 that works with categorical variables too? Do you have any suggestion? -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2231656.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to avoid a subset of a matrix to become a column vector
What exactly are you trying to do? An example (which you should have provided) A - matrix(1:100,nrow=10,ncol=10) B - A[10,1:3] B [1] 10 20 30 is.matrix(B) [1] FALSE matrix(B) [,1] [1,] 10 [2,] 20 [3,] 30 This is logic, as you convert a vector to a matrix, and he will assume you have one column. If you transform it, you should do : matrix(B,ncol=3) [,1] [,2] [,3] [1,] 10 20 30 Or use drop=F : C - A[10,1:3,drop=F] C [,1] [,2] [,3] [1,] 10 20 30 is.matrix(C) [1] TRUE On Wed, May 26, 2010 at 5:58 PM, mau...@alice.it wrote: I am assigning subset of a matrix A [n,3] where n1 to a temporary matrix TMP I do not know how many rows of A will be assigned to TMP because this is established by a run-time test. I expect TMP to be a matrix [m,3], m =1 But when 1 row only is transferred from A to TMP then TMP becomes [3,1] rather than [1,3] How can I avoid this unwanted transpose operation ? THank you in advance, Maura tutti i telefonini TIM! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame manipulation change elements meeting criteria
see ?switch X- rep(c(Buy,Sell,something else),each=5) Y- rep(c(DEL,INS,DEL),5) new.vect - X for (i in which(Y==DEL)){ new.vect[i]-switch( EXPR = X[i], Sell=Buy, Buy=Sell, X[i]) } cbind(new.vect,X,Y) On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury arnaud.gabo...@gmail.comwrote: Dear group, Here is my df : trade - structure(list(Trade.Status = c(DEL, INS, INS), Instrument.Long.Name= c(SUGAR NO.11, CORN, CORN), Delivery.Prompt.Date = c(Jul/10, Jul/10, Jul/10), Buy.Sell..Cleared. = c(Sell, Buy, Buy), Volume = c(1L, 2L, 1L), Price = c(15.2500, 368., 368.5000), Net.Charges..sum. = c(4.01, -8.64, -4.32)), .Names = c(Trade.Status, Instrument.Long.Name, Delivery.Prompt.Date, Buy.Sell..Cleared., Volume, Price, Net.Charges..sum.), row.names = c(NA, 3L), class = data.frame) Here is what I want : If trade$Trade.Status==DEL: then if trade$buy.Sell..Cleared==Sell , change it to Buy, if trade$buy.Sell..Cleared==Buy, change it to Sell. If trade$Trade.Status==INS, do nothing I tried to work around with ifelse, but don't know how to deal with so many conditions. Any help is appreciated. TY __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error variable names are limited to 256 bytes when sourcing code
col=red, lty=3) ## Add HIGH ERROR BAR lines(hh.wealth.plot.ss$Year, # x var hh.wealth.plot.ss$EB.High, # y var (EB HIGH) type=l, # line graph col=red, lty=3) } ## Add QUANTILES if Argument 'n.quantiles' is = 3 if (n.quantiles = 3) { # Cycle through column numbers and draw quantile lines for (qcol in 1:length(cols.q)) { lines(hh.wealth.quantiles$Year, # Plot quantile lines hh.wealth.quantiles[,cols.q[qcol]], type=l, col=orange, lty=3) } # Add MEDIAN line if (plot.med == TRUE) { lines(hh.wealth.quantiles$Year, # Plot median hh.wealth.quantiles[,col.med], type=l, col=orange, lty=3, lwd=2) } } ## Add COUNT points(hh.wealth.plot.ss$Year, # x var hh.wealth.plot.ss$Count,# y var (COUNT) type=p, # line graph pch=16, col=blue, cex=0.5) ## Add LEGEND legend(x=topright, leg.txt.ss, lty=leg.lty.ss, lwd=leg.lwd.ss, pch=leg.pch.ss, col=leg.col.ss, cex=1) dev.off() ##* ## Plot wealth for individual households ##* png(filename=output.png.wlth, width=10, height=7, units=in, res=300) ## Create an empty plot if (log.plot == FALSE) { plot(hh.wealth.plot$Year, hh.wealth.plot$Wealth, type=n, xlim=c(0,max.yr), main=ttl.hh, xlab=, ylab=Wealth) } else { plot(hh.wealth.plot$Year, hh.wealth.plot$Wealth, log=y, type=n, xlim=c(0,max.yr), main=ttl.hh, xlab=, ylab=Wealth (log scale)) } #legend(x=leg.x.coord, y=leg.y.coord, # Sets the location for the legend legend(x=topright, leg.txt.hh, # text in the legent col=c(red, red),# sets the line colors in the legend lty=c(1,3), # draws lines lwd=c(1,1), # sets line thickness # bty=n,# no border on the legend ncol=2, # makes it a 2-column legend cex=0.8)# sets the legend text size ## Loop through IDs and add a line for each for (id in 1:length(uniq.hh.ids)) { ## Get the current HH ID this.id - uniq.hh.ids[id] ## Extract the records for the current ID this.sub - hh.wealth.plot[hh.wealth.plot$HHID00 == this.id,] if (dim(this.sub)[1] 0) { ## Set line type if (mean(this.sub$Status) == 0) { ltype - 1 } else { ltype - 3 } ## Add the line for this ID lines(this.sub$Year, this.sub$Wealth, type=l, col= colors.id[id], lwd=1, lty=ltype) } } dev.off() } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted
Re: [R] R editor
I'm not Erik, but what the heck. What platform, linux or Windows? On Windows, I use Tinn-R, which is great for using with R as you get full control over the console. You need to take into account that you should install R with the SDI option, and that you have to configure Tinn-R the first time from the menu (RConfigurePermanent). SciTe can be used with R as well. On how to set SciTe for R : http://tolstoy.newcastle.edu.au/R/e6/help/09/03/6695.html Cheers Joris On Wed, May 26, 2010 at 9:51 PM, b...@email.unc.edu wrote: Erik, What R editor do you use? I've tried SciTE but it won't color the code. Brian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
Why exactly do you need lda and not another method? For lda to be applicable, you should check : 1) whether the regressors are normally distributed within the classes 2) whether the variance-covariance matrices are equal for all classes Essentially, this means that the boundary between both classes is a hyperplane (or in 2 dimensions, a straight line). Otherwise you can try qda, or go to other supervised learning methods. How to use lda is explained rather well in the help files. if it doesn't work, provide us with self-contained code (i.e. code that can be run without need of extra information like data frames) that reproduces the error. Cheers Joris PS : There's an error in your code. scaled_features - scale(mask_features, center = FALSE, scale = apply(abs(mask_features, 2, median))) should be scaled_features - scale(mask_features, center = FALSE, scale = apply(abs(mask_features), 2, median)) On Wed, May 26, 2010 at 5:55 PM, cobbler_squad la.f...@gmail.com wrote: Dear R gurus, Thank you all for continuous support and guidance -- learning without you would not be efficient. I have a question regarding LD analysis and how to best code it up in R. I have a file of (V52 and 671 time points across all columns) and another file of phonetic features (each vowel is aligned with a distinct binary sequence, i.e. E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 and so on). I need to run lda (at first for one of the features, meaning one column only extracted from the binary file mentioned above). In code so far I have very little, but here the short examples of both files: V57 file: V27 V28 V29 V30 V31 V32 V33 V34 1 -2.515000e-03 -0.203858 6.531000e-03 0.248686 6.76e-04 0.084677 -1.262000e-03 2 -2.406000e-03 -0.194943 6.248000e-03 0.237851 6.47e-04 0.081001 -1.207000e-03 3 -4.86e-04 -0.039288 1.263000e-03 0.047980 1.30e-04 0.016292 -2.43e-04 and binary file V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 1E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 2o 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 3I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 thus in code I have the following: library(MASS) vowel_features - read.table(file = mappings_for_vowels.txt) mask_features - read.table(file = 3dmaskdump_ICA_37_Combined.txt) #scale the mask_features file scaled_features - scale(mask_features, center = FALSE, scale = apply(abs(mask_features, 2, median))) #input vowel feature, lda lda(ROI_values ~ mappings_for_vowels[15]...) not sure what is the correct approach to use for lda any pointers would be greatly appreciated thanks again all! Cobbler -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2231922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] More efficient way to use ifelse()? - A follow up
Remove the around the options. You also have to put it in a sapply, as switch only works on single values. But I wouldn't call this optimal... elevation.DM -sapply(Population,switch, CO= 2169, CN = 1121, Ga =500, KO = 2500, Mw = 625, Ng = 300 ) Cheers Joris On Wed, May 26, 2010 at 9:04 PM, Ian Dworkin idwor...@msu.edu wrote: # Dennis Murphy suggested switch.. I have not gotten it working yet.. elevation.DM - switch(Population, CO= 2169, CN = 1121, Ga = 500, KO = 2500, Mw = 625, Ng = 300 ) -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fill a matrix using logical arguments?
Hi Alistair, ?match will help, but you need to extract the site names first. Quick and dirty : seed - substr(rownames(bank.plot),1,5) site - rownames(site.veg) for(i in 1:length(seed)){ bank.plot[i,]-site.veg[match(seed[i],site),] } Cheers Joris On Wed, May 26, 2010 at 2:50 PM, Alistair Auffret alistair.auff...@natgeo.su.se wrote: Hello all, I am going slightly mad trying to create a table for running co-correspondence analysis. What I have is seed bank and vegetation data, and my aim is to see if the vegetation found in a site (containing several seed bank samples) can predict the composition of a seed bank sample within that site. So for this I need two tables with matching rows. I have created an empty matrix, where the rows correspond to the seed bank samples bank.plot-matrix(,5,3,dimnames=list(c(AB 01 01, AB 01 02, AB 02 01,AB 03 01,AB 03 02),c(1:3))) bank.plot And I have a matrix where I have presence/absence of species in the vegetation at each site. site.veg-matrix((c(1,0,1,1,0,1,0,1,1)),3,3,dimnames=list(c(AB 01, AB 02, AB 03))) site.veg Is there a way to fill the bank.plot matrix with the results from the vegetation survey, duplicating them appropriately to match sites to plots, even when the number or sites per plot are unequal? i.e. in my example, the row AB 01 in site.veg would be duplicatied for the first two rows, AB 02 only once, and AB 03 twice. Hope you can help! Many thanks. -- Alistair Auffret PhD Student Department of Physical Geography and Quaternary Geology Stockholm University 106 91 Stockholm Sweden +46(0)8 674 7568 +46(0)76 7158975 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need Help! Poor performance about randomForest for large data
Hi Jia, without seeing the actual data, it's difficult to give solid options. But it's quite normal this runs for hours : it has to make a whole lot of decisions, and it can grow tremendous large trees with that amount of data. Also the error is quite logic : you just can't store all those huge trees. Try to set the following options in RandomForest : mtry : number of variables selected at each split. Smaller number speeds up things, but this effect will be not too big. nodesize : this is the minimum node size. In default, it is 1 for classification, meaning that you build a tree until every observation is in a seperate leaf. In your case, this should be set waay higher. maxnodes : this is the maximum number of nodes. Again, with the amount of data you have, this number goes skyrocket and thus produces huge trees (you can have more than 200.000 nodes... ). No need to do that, so you should set it to a reasonable low amount. Try this for example : res - randomForest(x=sdata1,y=sdata2,ntrees=500, mtry=5, nodesize=100,maxnodes=60) These trees assume that the minimum size of a group with similar observations is 100. Sounds reasonable, it still gives you over 2800 groups for a full tree. The maximum number of nodes I chose to allow that every variable occurs once in the tree, although it doesn't have to be this way. If you still get errors, play a bit more with those numbers. Actually, you should do that anyway, regardless of memory and computation time. RandomForest is known to have the danger of overfitting. Restricting the tree size avoids this and gives you a more general fit. Cheers Joris On Tue, May 25, 2010 at 11:51 AM, Jia ZJ Zou jia...@cn.ibm.com wrote: Hi, dears, I am processing some data with 60 columns, and 286,730 rows. Most columns are numerical value, and some columns are categorical value. It turns out that: when ntree sets to the default value (500), it says can not allocate a vector of 1.1 GB size; And when I set ntree to be a very small number like 10, it will run for hours. I use the (x,y) rather than the (formula,data). My code: sdata-read.csv(D://zSignal Dump////.csv) sdata1-subset(sdata,select=-38) sdata2-subset(sdata,select=38) res-randomForest(x=sdata1,y=sdata2,ntrees=10) Am I doing anything wrong? Or do you have other suggestions? Are there any other packages to do the same thing? I will appreciate if anyone can help me out, thanks! Thanks and Best regards, Jia, Zou (×Þ¼Î), Ph.D. IBM Research -- China Diamond Building, #19 Zhongguancun Software Park, 8 Dongbeiwang West Road, Haidian District, Beijing 100193, P.R. China Tel: +86 (10) 58748518 E-mail: jia...@cn.ibm.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R eat my data
without any clue about your data-file this is definitely unsolvable. But some things to consider : Where is the dataset coming from? Did you check for special characters? Is there an apostrophe somewhere in a string? (That messed up things for me once). Is the delimiter placed correctly everywhere? Did you check how the dataframe looks like? If you see what's the last observation read in, you can jump to that line number in the txt file and check yourself what goes wrong. On Tue, May 25, 2010 at 6:15 PM, Changbin Du changb...@gmail.com wrote: c...@nuuk:~/operon$ grep '^#' id_name_gh5.txt c...@nuuk:~/operon$ no lines starts with # On Tue, May 25, 2010 at 9:11 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Tue, May 25, 2010 at 4:42 PM, Changbin Du changb...@gmail.com wrote: HI, Dear R community, My original file has 1932 lines, but when I read into R, it changed to 1068 lines, how comes? c...@nuuk:~/operon$ wc -l id_name_gh5.txt 1932 id_name_gh5.txt gene_name-read.table(/home/cdu/operon/id_name_gh5.txt, sep=\t, skip=0, header=F, fill=T) dim(gene_name) [1] 10683 Do any of your lines start with a #? read.table(test.txt,sep=\t) V1 1 line 1 2 line 2 3 line 3 4 line 4 read.table(test.txt,comment.char=,sep=\t) V1 1 line 1 2 #commented 3 line 2 4 line 3 5 #nother comment 6 line 4 just a guess. hard to tell without the file... Barry -- Sincerely, Changbin -- Changbin Du DOE Joint Genome Institute Bldg 400 Rm 457 2800 Mitchell Dr Walnut Creet, CA 94598 Phone: 925-927-2856 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.