Re: [R] plot.hclust point to older version
Thanks! That worked Of course: As in about 99.99% of all cases where Bill Dunlap helps. You probably have a local copy of an old version of plot.hclust or plot.dendrogram in your global environmenet or another package that masks the one in package:stats. E.g., I fired up R-2.14.2 and copied those 2 plot methods to .GlobalEnv and then saved by workspace when quitting R. I then fired up R-3.1.1, which loads the workspace saved by the older version of R. I get: objects() [1] plot.dendrogram plot.hclust plot(hclust(dist(c(2,3,5,7,11,13,17,19 Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) : there is no .Internal function 'dend.window' traceback() 2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 Note how calling traceback() after an error gives more information about the source of the error. To fix this, get rid of the .RData file that is being loaded when R starts. In the spirit of the old -- now politically incorrect -- sayings `` Real men don't . ''' I'd like to emphasize my own view that Real useRs don't use .RData in other words, experienced R users do not let their workspace be saved automatically (to '.RData') and hence do not load any .RData automatically at startup. Consequently, use R with the '--no-save' command line argument (maybe also with '--no-restore'). ESS (Emacs Speaks Statistics) users can put (custom-set-variables '(inferior-R-args --no-restore-history --no-save ) ) into their ~/.emacs {and I'd like to see a way to do this easily with RStudio...} Martin Maechler, ETH Zurich and R Core Team Bill Dunlap TIBCO Software wdunlap tibco.comhttp://tibco.com On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote: On 26/11/14 08:53, Michael Mason wrote: Here you are. I expect most folks won't get the error. N = 100; M = 1000 mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M) h = hclust(as.dist(1-cor(mat))) plot(h) Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' Thanks again On 11/25/14 11:29 AM, Rolf Turner r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote: Reproducible example??? (I know from noddink about hclust, but I tried the example from the help page and it plotted without any problem.) cheers, Rolf Turner On 26/11/14 06:13, Michael Mason wrote: Hello fellow R users, I have recently updated to R 3.1.2. When trying to plot an hclust object to generate the dendrogram I get the following error: Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' I am indeed using R3.1.2 but my understanding is that the .Internal API to the C code is no longer used. I have tried detaching the stats package and restarting R to no avail. I would love any help from any wiser guRus. Please keep communications on-list; there are others on the list far more likely to be able to help you than I. I am cc-ing this reply to the list. For what it's worth, I can run your example without error. As to how to track down what is going wrong on your system, I'm afraid I have no idea. Someone on the list may have some thoughts. cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --CONFIDENTIALITY NOTICE--: The information contained in this email is intended for the exclusive use of the addressee and may contain confidential information. If you are not the intended recipient, you are hereby notified that any form of dissemination of this communication is strictly prohibited. www.benaroyaresearch.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.hclust point to older version
into their ~/.emacs {and I'd like to see a way to do this easily with RStudio...} In RStudio: Tools - Global Options - General - uncheck Restore .RData into workspace at startup and choose Never for Save workspace to .RData on exit -- Pascal Oettli Project Scientist JAMSTEC Yokohama, Japan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.hclust point to older version
Hi You say in other words, experienced R users do not let their workspace be saved automatically (to '.RData') and hence do not load any .RData automatically at startup. I save/load .RData for years without any issues (except of not installed packages when working on different PCs).I usually keep each project in separated .RData (and separated folder, together with all stuff belonging to that project), which prevent to mess things together. There is no such warning as do not use .RData in books I have available. I wonder how experienced useR keep track of several projects without using startup loading .RData? What would you recommend for keeping track of commands and created objects instead of .RData? Petr -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Maechler Sent: Wednesday, November 26, 2014 10:03 AM To: Michael Mason Cc: R help Subject: Re: [R] plot.hclust point to older version Thanks! That worked Of course: As in about 99.99% of all cases where Bill Dunlap helps. You probably have a local copy of an old version of plot.hclust or plot.dendrogram in your global environmenet or another package that masks the one in package:stats. E.g., I fired up R-2.14.2 and copied those 2 plot methods to .GlobalEnv and then saved by workspace when quitting R. I then fired up R-3.1.1, which loads the workspace saved by the older version of R. I get: objects() [1] plot.dendrogram plot.hclust plot(hclust(dist(c(2,3,5,7,11,13,17,19 Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) : there is no .Internal function 'dend.window' traceback() 2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 Note how calling traceback() after an error gives more information about the source of the error. To fix this, get rid of the .RData file that is being loaded when R starts. In the spirit of the old -- now politically incorrect -- sayings `` Real men don't . ''' I'd like to emphasize my own view that Real useRs don't use .RData in other words, experienced R users do not let their workspace be saved automatically (to '.RData') and hence do not load any .RData automatically at startup. Consequently, use R with the '--no-save' command line argument (maybe also with '--no-restore'). ESS (Emacs Speaks Statistics) users can put (custom-set-variables '(inferior-R-args --no-restore-history --no-save ) ) into their ~/.emacs {and I'd like to see a way to do this easily with RStudio...} Martin Maechler, ETH Zurich and R Core Team Bill Dunlap TIBCO Software wdunlap tibco.comhttp://tibco.com On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote: On 26/11/14 08:53, Michael Mason wrote: Here you are. I expect most folks won't get the error. N = 100; M = 1000 mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M) h = hclust(as.dist(1-cor(mat))) plot(h) Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' Thanks again On 11/25/14 11:29 AM, Rolf Turner r.tur...@auckland.ac.nzmailto:r.tur...@auckland.ac.nz wrote: Reproducible example??? (I know from noddink about hclust, but I tried the example from the help page and it plotted without any problem.) cheers, Rolf Turner On 26/11/14 06:13, Michael Mason wrote: Hello fellow R users, I have recently updated to R 3.1.2. When trying to plot an hclust object to generate the dendrogram I get the following error: Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' I am indeed using R3.1.2 but my understanding is that the .Internal API to the C code is no longer used. I have tried detaching the stats package and restarting R to no avail. I would love any help from any wiser guRus. Please keep communications on-list; there are others on the list far more likely to be able to help you than I. I am cc-ing this reply to the list. For what it's worth, I can run your example without error. As to how to track down what is going wrong on your system, I'm afraid I have no idea. Someone on the list may have some thoughts. cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --CONFIDENTIALITY NOTICE--: The information contained in this email is intended for the exclusive use of the addressee and may contain confidential information. If you are not the
Re: [R] Presentation tables in R (knitr)
I found also knitr + html + the ReporteRs package a good combination, and less intimidating than Latex. Have a look at their FlexTable tool. HTH, Gabriele -Original Message- From: Tom Wright [mailto:t...@maladmin.com] Sent: Tuesday, November 25, 2014 9:12 PM To: r-help@r-project.org Subject: [R] Presentation tables in R (knitr) Hi, This problem has me stumped so I thought I'd ask the experts. I'm trying to create a pretty summary table of some data (which patients have had what tests at what times). Ideally I'd like to knitr this into a pretty PDF for presentation. If anyone has pointers I'll be grateful. require(tables) require(reshape2) data-data.frame('ID'=paste0('pat',c(rep(1,8),rep(2,8))), 'Time'=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), 'Eye'=rep(c('OS','OS','OD','OD'),4), 'Measure'=rep(c('Height','Weight'),8)) tabular(Measure~factor(ID)*factor(Time)*factor(Eye),data) #All levels of Time are repeated for all IDs, I'd prefer to just show the relevant times. tabular(Measure~factor(ID)*Time*factor(Eye),data) #Time is getting collapsed by ID data$value=1 dcast(data,Measure~ID+Time+Eye) #close but not very pretty __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Missing values where true/false needed
Comments in-line below On 26/11/2014 06:27, Frederic Ntirenganya wrote: Hi PIKAL, Actually I am Michael, Petr is one of the other respondents. The error seems to be starnge to me because i access the indices of NAs. Indices can't be non-applicable. But you are not testing the indexes, see below This is the output of indecs having the NA in my dataset. my dataset is very big that's why I did not provide it. indicNAs - which(data$Rain %in% NA) indicNAs [1] 426 792 1158 1890 2256 2622 3354 3720 4086 4818 5184 5550 6282 6648 7014 7746 8112 [18] 8478 9210 9576 9942 10674 11040 11406 12138 12504 12870 13602 13968 14334 15066 15432 15798 16530 [35] 16896 17262 17994 18360 18726 19458 19824 20190 Regards, Frederic. Frederic Ntirenganya Maseno University, African Maths Initiative, Kenya. Mobile:(+254)718492836 Email: fr...@aims.ac.za mailto:fr...@aims.ac.za https://sites.google.com/a/aims.ac.za/fredo/ On Tue, Nov 25, 2014 at 3:51 PM, Michael Dewey i...@aghmed.fsnet.co.uk mailto:i...@aghmed.fsnet.co.uk wrote: You do not tell us what you are trying to do but I think there is something wrong in the logic of your thinking as on the one hand you are selecting just precisely those elements of data$Rain which are NA and then testing whether any of them equals 60. My comments on your code are preceded ## to make them clear On 25/11/2014 12:19, Frederic Ntirenganya wrote: Dear All, I am getting this error and don't know why it comes. can you please help ? Error in if (data$Rain[i_NA] == 60) { : missing value where TRUE/FALSE needed The loop is : indicNAs - which(data$Rain %in% NA) ## so at this point indicNAs is the indexes of all the NA ## values in dat$Rain ind_nonleap = c() # NAs due to non leap years ind_nonrecord = c() # NAs due to non recording values for (i_NA in indicNAs ){ ## step through those indexes if(data$Rain[i_NA] == 60){ ## since i_NA is the index of a value of data$Rain which ## you know to be NA this evaluates to NA and if() complains ## I expect you really meant some other variable in data ## incidentally it is better not to call your data data ind_nonleap - append(ind_nonleap,i_NA) } else { ind_nonrecord-append(ind___nonrecord,i_NA) } #cat(ind_nonrecord) #cat( ind_nonleap) } ind_nonleap Regards, Frederic. Frederic Ntirenganya Maseno University, African Maths Initiative, Kenya. Mobile:(+254)718492836 Email: fr...@aims.ac.za mailto:fr...@aims.ac.za https://sites.google.com/a/__aims.ac.za/fredo/ https://sites.google.com/a/aims.ac.za/fredo/ [[alternative HTML version deleted]] R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/__listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/__posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - No virus found in this message. Checked by AVG - www.avg.com http://www.avg.com Version: 2015.0.5577 / Virus Database: 4223/8627 - Release Date: 11/25/14 -- Michael http://www.dewey.myzen.co.uk No virus found in this message. Checked by AVG - www.avg.com http://www.avg.com Version: 2015.0.5577 / Virus Database: 4223/8632 - Release Date: 11/25/14 -- Michael http://www.dewey.myzen.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot facet and subsetting
Dear all I encountered strange behaviour of ggplot with combination of facet and subsetting. I use for creating plots sometimes a for cycle, something like this for (i in n:m) { p-ggplot(data, aes(x=x, y=data[,i], colour=f))), ...} However I found strange result with this combination This is OK but only in BW p-ggplot(vec.c, aes(x=fi, y=nad1mi)) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) this is OK with colour p-ggplot(vec.c, aes(x=fi, y=nad1mi, colour=as.factor(cas))) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) Here results in facets are mismatched p-ggplot(vec.c, aes(x=fi, y=vec.c[,2], colour=as.factor(cas))) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) and this is mismatched too p-ggplot(vec.c, aes(x=fi, y=vec.c[,2])) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) Doeas anybody know what I am doing wrong? dput(vec.c) structure(list(cas = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), nad1mi = c(3, 2.7, 0.3, 0.5, 1.9, 5.3, 0.4, 3, 5.4, 0.7, 20.6, 16.7, 16.6, 20.7, 16.1, 15.2, 20.5, 16.4, 14.8, 24.6, 19.3, 15.2, 26.9, 21.3, 20.6, 22.6, 16.3, 15.7, 19.3, 16.5, 15.5, 3.6, 3.4, 5.9, 4.6, 5.4, 4.2, 5.3, 5.6, 5.1, 5), stroj = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(mastersizer, odstredivka, zetasizer), class = factor), fi = c(341L, 341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L, 341L, 341L, 341L, 285L, 285L, 285L, 401L, 401L, 401L, 231L, 231L, 231L, 190L, 190L, 190L, 167L, 167L, 167L, 161L, 161L, 161L, 341L, 341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L )), .Names = c(cas, nad1mi, stroj, fi), class = data.frame, row.names = c(1L, 2L, 6L, 7L, 11L, 12L, 16L, 17L, 21L, 22L, 26L, 27L, 28L, 32L, 33L, 34L, 38L, 39L, 40L, 44L, 45L, 46L, 50L, 51L, 52L, 56L, 57L, 58L, 62L, 63L, 64L, 68L, 69L, 73L, 74L, 78L, 79L, 83L, 84L, 88L, 89L)) Regards Petr sessionInfo(package = NULL) R Under development (unstable) (2014-07-16 r66175) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Czech_Czech Republic.1250 LC_CTYPE=Czech_Czech Republic.1250 [3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C [5] LC_TIME=Czech_Czech Republic.1250 attached base packages: [1] stats datasets utils grDevices graphics methods base other attached packages: [1] ggplot2_1.0.0 lattice_0.20-29 fun_1.0 loaded via a namespace (and not attached): [1] colorspace_1.2-4 digest_0.6.4 grid_3.2.0 gtable_0.1.2 [5] labeling_0.2 MASS_7.3-33 munsell_0.4.2plyr_1.8.1 [9] proto_0.3-10 Rcpp_0.11.2 reshape2_1.4 scales_0.2.4 [13] stringr_0.6.2tools_3.2.0 Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract
Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function
Dear Charlie, I admit that I haven't read your email closely, but here is a way to test for non-proportional odds using the ordinal package (warning: self-promotion) using the wine data set also from the ordinal package. There is more information in the package vignettes Hope this is something you can use. Cheers, Rune library(ordinal) ## Fit model: fm - clm(rating ~ temp + contact, data=wine) summary(fm) formula: rating ~ temp + contact data:wine link threshold nobs logLik AICniter max.grad cond.H logit flexible 72 -86.49 184.98 6(0) 4.64e-15 2.7e+01 Coefficients: Estimate Std. Error z value Pr(|z|) tempwarm 2.5031 0.5287 4.735 2.19e-06 *** contactyes 1.5278 0.4766 3.205 0.00135 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Threshold coefficients: Estimate Std. Error z value 1|2 -1.3444 0.5171 -2.600 2|3 1.2508 0.4379 2.857 3|4 3.4669 0.5978 5.800 4|5 5.0064 0.7309 6.850 ## Model with non-proportional odds for contact: fm2 - clm(rating ~ temp, nominal=~contact, data=wine) ## Likelihood ratio test of non-proportional odds: anova(fm, fm2) Likelihood ratio tests of cumulative link models: formula:nominal: link: threshold: fm rating ~ temp + contact ~1 logit flexible fm2 rating ~ temp ~contact logit flexible no.parAIC logLik LR.stat df Pr(Chisq) fm 6 184.98 -86.492 fm2 9 190.42 -86.209 0.5667 3 0.904 ## Automatic tests of non-proportional odds for all varibles: nominal_test(fm) Tests of nominal effects formula: rating ~ temp + contact Df logLikAICLRT Pr(Chi) none -86.492 184.98 temp 3 -84.904 187.81 3.1750 0.3654 contact 3 -86.209 190.42 0.5667 0.9040 On 25 November 2014 at 17:21, Charlotte Whitham charlotte.whit...@gmail.com wrote: Dear list, I have used the ‘polr’ function in the MASS package to run an ordinal logistic regression for an ordinal categorical response variable with 15 continuous explanatory variables. I have used the code (shown below) to check that my model meets the proportional odds assumption following advice provided at (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely helpful, thank you to the authors! However, I’m a little worried about the output implying that not only are the coefficients across various cutpoints similar, but they are exactly the same (see graphic below). Here is the code I used (and see attached for the output graphic) FGV1b-data.frame(FG1_val_cat=factor(FGV1b[,FG1_val_cat]),scale(FGV1[,c(X,Y,Slope,Ele,Aspect,Prox_to_for_FG,Prox_to_for_mL,Prox_to_nat_border,Prox_to_village,Prox_to_roads,Prox_to_rivers,Prox_to_waterFG,Prox_to_watermL,Prox_to_core,Prox_to_NR,PCA1,PCA2,PCA3)])) b-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE) #Checking the assumption. So the following code will estimate the values to be graphed. First it shows us #the logit transformations of the probabilities of being greater than or equal to each value of the target #variable FGV1b$FG1_val_cat-as.numeric(FGV1b$FG1_val_cat) sf - function(y) { c('VC=1' = qlogis(mean(FGV1b$FG1_val_cat = 1)), 'VC=2' = qlogis(mean(FGV1b$FG1_val_cat = 2)), 'VC=3' = qlogis(mean(FGV1b$FG1_val_cat = 3)), 'VC=4' = qlogis(mean(FGV1b$FG1_val_cat = 4)), 'VC=5' = qlogis(mean(FGV1b$FG1_val_cat = 5)), 'VC=6' = qlogis(mean(FGV1b$FG1_val_cat = 6)), 'VC=7' = qlogis(mean(FGV1b$FG1_val_cat = 7)), 'VC=8' = qlogis(mean(FGV1b$FG1_val_cat = 8))) } (t - with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf))) #The table displays the (linear) predicted values we would get if we regressed our #dependent variable on our predictor variables one at a time, without the parallel slopes #assumption. So now, we can run a series of binary logistic regressions with varying cutpoints #on the dependent variable to check the equality of coefficients across cutpoints par(mfrow=c(1,1)) plot(t, which=1:8, pch=1:8, xlab='logit', main=' ', xlim=range(s[,7:8])) Apologies that I am no statistics expert and perhaps I am missing something obvious here. However, I have spent a long time trying to figure out if there is a problem in how I tested the model assumption and also trying to figure out other ways to run the same
Re: [R] ggplot facet and subsetting
I am not quite sure what you want to achieve here, but you only have one factor column so shouldn't you be using facet_wrap(~stroj), perhaps with nrow or ncol parameters? --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On November 26, 2014 5:40:07 AM PST, PIKAL Petr petr.pi...@precheza.cz wrote: Dear all I encountered strange behaviour of ggplot with combination of facet and subsetting. I use for creating plots sometimes a for cycle, something like this for (i in n:m) { p-ggplot(data, aes(x=x, y=data[,i], colour=f))), ...} However I found strange result with this combination This is OK but only in BW p-ggplot(vec.c, aes(x=fi, y=nad1mi)) p+geom_point(size=5)+geom_line()+facet_grid(.~ p-ggplot(vec.c, aes(x=fi, y=nad1mi)) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) ) this is OK with colour p-ggplot(vec.c, aes(x=fi, y=nad1mi, colour=as.factor(cas))) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) Here results in facets are mismatched p-ggplot(vec.c, aes(x=fi, y=vec.c[,2], colour=as.factor(cas))) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) and this is mismatched too p-ggplot(vec.c, aes(x=fi, y=vec.c[,2])) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) Doeas anybody know what I am doing wrong? dput(vec.c) structure(list(cas = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 0L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), nad1mi = c(3, 2.7, 0.3, 0.5, 1.9, 5.3, 0.4, 3, 5.4, 0.7, 20.6, 16.7, 16.6, 20.7, 16.1, 15.2, 20.5, 16.4, 14.8, 24.6, 19.3, 15.2, 26.9, 21.3, 20.6, 22.6, 16.3, 15.7, 19.3, 16.5, 15.5, 3.6, 3.4, 5.9, 4.6, 5.4, 4.2, 5.3, 5.6, 5.1, 5), stroj = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(mastersizer, odstredivka, zetasizer), class = factor), fi = c(341L, 341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L, 341L, 341L, 341L, 285L, 285L, 285L, 401L, 401L, 401L, 231L, 231L, 231L, 190L, 190L, 190L, 167L, 167L, 167L, 161L, 161L, 161L, 341L, 341L, 285L, 285L, 401L, 401L, 231L, 231L, 190L, 190L )), .Names = c(cas, nad1mi, stroj, fi), class = data.frame, row.names = c(1L, 2L, 6L, 7L, 11L, 12L, 16L, 17L, 21L, 22L, 26L, 27L, 28L, 32L, 33L, 34L, 38L, 39L, 40L, 44L, 45L, 46L, 50L, 51L, 52L, 56L, 57L, 58L, 62L, 63L, 64L, 68L, 69L, 73L, 74L, 78L, 79L, 83L, 84L, 88L, 89L)) Regards Petr sessionInfo(package = NULL) R Under development (unstable) (2014-07-16 r66175) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Czech_Czech Republic.1250 LC_CTYPE=Czech_Czech Republic.1250 [3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C [5] LC_TIME=Czech_Czech Republic.1250 attached base packages: [1] stats datasets utils grDevices graphics methods base other attached packages: [1] ggplot2_1.0.0 lattice_0.20-29 fun_1.0 loaded via a namespace (and not attached): [1] colorspace_1.2-4 digest_0.6.4 grid_3.2.0 gtable_0.1.2 [5] labeling_0.2 MASS_7.3-33 munsell_0.4.2plyr_1.8.1 [9] proto_0.3-10 Rcpp_0.11.2 reshape2_1.4 scales_0.2.4 [13] stringr_0.6.2tools_3.2.0 Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření
Re: [R] list.files() not compatible with all Unicode characters; file.exists() is compatible.
On 25/11/2014 06:53, Prof Brian Ripley wrote: On 25/11/2014 01:25, MacQueen, Don wrote: Sorry, your email was undecipherable because you sent HTML formatted email. Please send plain text Also, the 'at a minimum' information requested by the posting guide is essential here (which OS and locale, in particular). In general file names not in the locale's encoding are unsupported. An off-list reply indicated this was Windows XP. Although the message body was unreadable, the gist is in the subject line. From ?list.files under Windows path must specify paths which can be represented in the current codepage. whereas ?file.exists says Most of these functions accept UTF-8 filepaths not valid in the current locale. So this is documented behaviour. [For anyone curious as to why list.files is different: note that it does regexp pattern matching. Adding support for Unicode file paths would not be impossible but it would require hundreds of lines of Windows-only code.] -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use ggplot2
Dear All!! I'll try to plot a barplot using aggplot2 head(alt) as.factor.data...7..Col ColMat Fastq miseq 1 189158158158104 2 190 54272 54272 54272 32122 3 191 301574 301574 301574 152625 4 192 161620 161620 161620 100469 5 193 61263 61263 61263 38109 6 194 83800 83800 83800 40095 p- ggplot(data = alt, aes(y = alt[,2])) + geom_bar() Error : Mapping a variable to y and also using stat=bin. With stat=bin, it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat=bin and don't map a variable to y. If you want y to represent values in the data, use stat=identity. See ?geom_bar for examples. (Defunct; last used in version 0.9.2) How can resolve this problem? My data are in column: each columns are conditions and each row rappresnt a sample thanks for your help! M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot facet and subsetting
Below John Kane Kingston ON Canada This is OK but only in BW p-ggplot(vec.c, aes(x=fi, y=nad1mi)) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) Perhaps: p - ggplot(vec.c, aes(x=fi, y=nad1mi, colour = stroj)) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) and this is mismatched too p-ggplot(vec.c, aes(x=fi, y=vec.c[,2])) p+geom_point(size=5)+geom_line()+facet_grid(.~stroj) I don'[ understand what you want here so cannot suggest anything Can't remember your password? Do you need a strong and secure password? Use Password manager! It stores your passwords protects your account. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use ggplot2
It is useful to have a reproducable example https://github.com/hadley/devtools/wiki/Reproducibility http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example However is this somethingl like what you want? Note I changed variable names and removed caps to make life easier and renamed the dataset to dat1 (just handier for me). I think col as a reserved word should not be used. It seemed to be causing a problem. library(ggplot2) dat1 - structure(list(aa = 189:194, bb = c(158L, 54272L, 301574L, 161620L, 61263L, 83800L), colmat = c(158L, 54272L, 301574L, 161620L, 61263L, 83800L), fastq = c(158L, 54272L, 301574L, 161620L, 61263L, 83800L ), miseq = c(104L, 32122L, 152625L, 100469L, 38109L, 40095L)), .Names = c(aa, bb, colmat, fastq, miseq), class = data.frame, row.names = c(NA, -6L)) p1 - ggplot(dat1, aes( as.factor(aa), y = bb, fill = as.factor(aa))) p1 - p1 + geom_bar(stat = identity) p1 John Kane Kingston ON Canada -Original Message- From: jarod...@libero.it Sent: Wed, 26 Nov 2014 18:04:21 +0100 (CET) To: r-help@r-project.org Subject: [R] How to use ggplot2 Dear All!! I'll try to plot a barplot using aggplot2 head(alt) as.factor.data...7..Col ColMat Fastq miseq 1 189158158158104 2 190 54272 54272 54272 32122 3 191 301574 301574 301574 152625 4 192 161620 161620 161620 100469 5 193 61263 61263 61263 38109 6 194 83800 83800 83800 40095 p- ggplot(data = alt, aes(y = alt[,2])) + geom_bar() Error : Mapping a variable to y and also using stat=bin. With stat=bin, it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat=bin and don't map a variable to y. If you want y to represent values in the data, use stat=identity. See ?geom_bar for examples. (Defunct; last used in version 0.9.2) How can resolve this problem? My data are in column: each columns are conditions and each row rappresnt a sample thanks for your help! M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.hclust point to older version
How disruptive would it be if R were changed so the startup line [Previously saved workspace restored] were changed to show the complete name, from normalizePath(), of the saved workspace file? E.g., [Previously saved workspace restored from 'C:\Program Files\R\.RData'] (It is bad enough that the file name starts with a dot so it is hidden from 'ls', but on Windows lots of people don't know what directory R is starting in. On my Windows PC R-3.1.2 starts in C:/Program Files/R, the parent of its RHOME directory.) Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Nov 26, 2014 at 1:02 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Thanks! That worked Of course: As in about 99.99% of all cases where Bill Dunlap helps. You probably have a local copy of an old version of plot.hclust or plot.dendrogram in your global environmenet or another package that masks the one in package:stats. E.g., I fired up R-2.14.2 and copied those 2 plot methods to .GlobalEnv and then saved by workspace when quitting R. I then fired up R-3.1.1, which loads the workspace saved by the older version of R. I get: objects() [1] plot.dendrogram plot.hclust plot(hclust(dist(c(2,3,5,7,11,13,17,19 Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) : there is no .Internal function 'dend.window' traceback() 2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 Note how calling traceback() after an error gives more information about the source of the error. To fix this, get rid of the .RData file that is being loaded when R starts. In the spirit of the old -- now politically incorrect -- sayings `` Real men don't . ''' I'd like to emphasize my own view that Real useRs don't use .RData in other words, experienced R users do not let their workspace be saved automatically (to '.RData') and hence do not load any .RData automatically at startup. Consequently, use R with the '--no-save' command line argument (maybe also with '--no-restore'). ESS (Emacs Speaks Statistics) users can put (custom-set-variables '(inferior-R-args --no-restore-history --no-save ) ) into their ~/.emacs {and I'd like to see a way to do this easily with RStudio...} Martin Maechler, ETH Zurich and R Core Team Bill Dunlap TIBCO Software wdunlap tibco.comhttp://tibco.com On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner r.tur...@auckland.ac.nz mailto:r.tur...@auckland.ac.nz wrote: On 26/11/14 08:53, Michael Mason wrote: Here you are. I expect most folks won't get the error. N = 100; M = 1000 mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M) h = hclust(as.dist(1-cor(mat))) plot(h) Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' Thanks again On 11/25/14 11:29 AM, Rolf Turner r.tur...@auckland.ac.nzmailto: r.tur...@auckland.ac.nz wrote: Reproducible example??? (I know from noddink about hclust, but I tried the example from the help page and it plotted without any problem.) cheers, Rolf Turner On 26/11/14 06:13, Michael Mason wrote: Hello fellow R users, I have recently updated to R 3.1.2. When trying to plot an hclust object to generate the dendrogram I get the following error: Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' I am indeed using R3.1.2 but my understanding is that the .Internal API to the C code is no longer used. I have tried detaching the stats package and restarting R to no avail. I would love any help from any wiser guRus. Please keep communications on-list; there are others on the list far more likely to be able to help you than I. I am cc-ing this reply to the list. For what it's worth, I can run your example without error. As to how to track down what is going wrong on your system, I'm afraid I have no idea. Someone on the list may have some thoughts. cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --CONFIDENTIALITY NOTICE--: The information contained in this email is intended for the exclusive use of the addressee and may contain confidential information. If you are not the intended recipient, you are hereby notified that any form of dissemination of this communication is strictly prohibited. www.benaroyaresearch.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function
On 26 November 2014 at 17:55, Charlotte Whitham charlotte.whit...@gmail.com wrote: Dear Rune, Thank you for your prompt reply and it looks like the ordinal package could be the answer I was looking for! If you don't mind, I'd also like to know please what to do if the tests show the proportional odds assumption is NOT met. (Unfortunately I notice effects from almost all variables that breach the proportional odds assumption in my dataset) That depends almost entirely on the purpose of the analysis and is not a topic fit for email - consulting a local statistician is probably sound advice... Yet: With enough data these tests can be sensitive beyond practical significance; if the 'proportional' part of the effect explains the majority of the deviance, perhaps the proportional odds model provides a reasonably good description of the main structures in the data anyway. On the other hand, if the magnitude (not significance!) of the non-proportional effects are large, perhaps a cumulative link model is not the right kind of model structure and you should be looking at alternative approaches in your analysis. Cheers, Rune Would you recommend a multinomial logistic model? Or re-scaling of the data? Thank you for your time, Best wishes, Charlie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.hclust point to older version
Short answer to your question is R files and original data from external sources. I tend to keep my projects in separate directories. I make a core R file that I can run from beginning to end using source() to generate my primary analysis objects. I then make another file to keep my source() function call in, as well as a few exploratory plot commands. Recently I have been also sourcing the analysis script in Rmd or Rnw files to knit my observations with the output. Some people complain that their analysis takes too long to be sourcing it all the time. When I have that problem I set up a variable outside my analysis script that I test in my analysis script. If the variable indicates it is time to recalculate, then I do all of that and then save the data in sn rds or rda file. If the variable indicates that I should reuse the cached data, then it skips the calculations and just loads the data. This way I always load the right libraries along with the data, and I don't accidentally save data that I changed outside the analysis script... keeping my results reproducible. (Rds files can be convenient if I have several different slow analyses to compare and I want to only work on one at a time. I set up one control variable for each analysis.) Some people (smarter than me?) like to build their analysis into an Sweave or knitr file. They can then strip out an analysis R file to use the way I have described if they choose to do so (literate programming) but I have not picked up that habit yet. The key is keeping a record of how every object that is in your save file was originally created. If you tolerate auto saving and loading of the environment then you lose that record, and pernicious errors can creep into your environment from who knows where, and you might as well be using Excel if that is how you work. (Note that this means I hardly ever copy data straight from Excel via the clipboard as that is not reproducible. Usually this means Save As CSV in Excel to start my R analysis if that is the data source.) --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On November 26, 2014 2:05:54 AM PST, PIKAL Petr petr.pi...@precheza.cz wrote: Hi You say in other words, experienced R users do not let their workspace be saved automatically (to '.RData') and hence do not load any .RData automatically at startup. I save/load .RData for years without any issues (except of not installed packages when working on different PCs).I usually keep each project in separated .RData (and separated folder, together with all stuff belonging to that project), which prevent to mess things together. There is no such warning as do not use .RData in books I have available. I wonder how experienced useR keep track of several projects without using startup loading .RData? What would you recommend for keeping track of commands and created objects instead of .RData? Petr -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Maechler Sent: Wednesday, November 26, 2014 10:03 AM To: Michael Mason Cc: R help Subject: Re: [R] plot.hclust point to older version Thanks! That worked Of course: As in about 99.99% of all cases where Bill Dunlap helps. You probably have a local copy of an old version of plot.hclust or plot.dendrogram in your global environmenet or another package that masks the one in package:stats. E.g., I fired up R-2.14.2 and copied those 2 plot methods to .GlobalEnv and then saved by workspace when quitting R. I then fired up R-3.1.1, which loads the workspace saved by the older version of R. I get: objects() [1] plot.dendrogram plot.hclust plot(hclust(dist(c(2,3,5,7,11,13,17,19 Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) : there is no .Internal function 'dend.window' traceback() 2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 Note how calling traceback() after an error gives more information about the source of the error. To fix this, get rid of the .RData file that is being loaded when R starts. In the spirit of the old -- now politically incorrect -- sayings `` Real men don't . ''' I'd like to emphasize my own view that Real useRs don't use .RData in other words, experienced R users do not let their workspace be saved automatically (to '.RData') and hence do not load any .RData
Re: [R] plot.hclust point to older version
On Nov 26, 2014, at 9:49 AM, William Dunlap wrote: How disruptive would it be if R were changed so the startup line [Previously saved workspace restored] were changed to show the complete name, from normalizePath(), of the saved workspace file? E.g., [Previously saved workspace restored from 'C:\Program Files\R\.RData'] (It is bad enough that the file name starts with a dot so it is hidden from 'ls', but on Windows lots of people don't know what directory R is starting in. On my Windows PC R-3.1.2 starts in C:/Program Files/R, the parent of its RHOME directory.) On the Mac Gui that happens with no effort as well as a message saying where the GUI history file resides. I just checked my .Rprofile file to make sure it wasn't doing that. I also have a line that prints the data and time: utils:::timestamp(stamp = Sys.Date() ) Couldn't you just create a template .Rprofile with the appropriate message printed to console? -- david. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Nov 26, 2014 at 1:02 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Thanks! That worked Of course: As in about 99.99% of all cases where Bill Dunlap helps. You probably have a local copy of an old version of plot.hclust or plot.dendrogram in your global environmenet or another package that masks the one in package:stats. E.g., I fired up R-2.14.2 and copied those 2 plot methods to .GlobalEnv and then saved by workspace when quitting R. I then fired up R-3.1.1, which loads the workspace saved by the older version of R. I get: objects() [1] plot.dendrogram plot.hclust plot(hclust(dist(c(2,3,5,7,11,13,17,19 Error in .Internal(dend.window(n, merge, height, hang, labels, ...)) : there is no .Internal function 'dend.window' traceback() 2: plot.hclust(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 1: plot(hclust(dist(c(2, 3, 5, 7, 11, 13, 17, 19 Note how calling traceback() after an error gives more information about the source of the error. To fix this, get rid of the .RData file that is being loaded when R starts. In the spirit of the old -- now politically incorrect -- sayings `` Real men don't . ''' I'd like to emphasize my own view that Real useRs don't use .RData in other words, experienced R users do not let their workspace be saved automatically (to '.RData') and hence do not load any .RData automatically at startup. Consequently, use R with the '--no-save' command line argument (maybe also with '--no-restore'). ESS (Emacs Speaks Statistics) users can put (custom-set-variables '(inferior-R-args --no-restore-history --no-save ) ) into their ~/.emacs {and I'd like to see a way to do this easily with RStudio...} Martin Maechler, ETH Zurich and R Core Team Bill Dunlap TIBCO Software wdunlap tibco.comhttp://tibco.com On Tue, Nov 25, 2014 at 12:18 PM, Rolf Turner r.tur...@auckland.ac.nz mailto:r.tur...@auckland.ac.nz wrote: On 26/11/14 08:53, Michael Mason wrote: Here you are. I expect most folks won't get the error. N = 100; M = 1000 mat = matrix(1:(N*M) + rnorm(N*M,0,.5),N,M) h = hclust(as.dist(1-cor(mat))) plot(h) Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' Thanks again On 11/25/14 11:29 AM, Rolf Turner r.tur...@auckland.ac.nzmailto: r.tur...@auckland.ac.nz wrote: Reproducible example??? (I know from noddink about hclust, but I tried the example from the help page and it plotted without any problem.) cheers, Rolf Turner On 26/11/14 06:13, Michael Mason wrote: Hello fellow R users, I have recently updated to R 3.1.2. When trying to plot an hclust object to generate the dendrogram I get the following error: Error in .Internal(dend.window(n, merge, height2, hang, labels, ...)) : there is no .Internal function 'dend.window' I am indeed using R3.1.2 but my understanding is that the .Internal API to the C code is no longer used. I have tried detaching the stats package and restarting R to no avail. I would love any help from any wiser guRus. Please keep communications on-list; there are others on the list far more likely to be able to help you than I. I am cc-ing this reply to the list. For what it's worth, I can run your example without error. As to how to track down what is going wrong on your system, I'm afraid I have no idea. Someone on the list may have some thoughts. cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --CONFIDENTIALITY NOTICE--: The information contained in this email is
[R] Using grid.layout inside grid.layout with grid package: naming of the viewports affects plotting
R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] C attached base packages: [1] grid stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.1 I have a plotting function to produce plots with stacked plots (for simplicity, here two rectangles). library(grid) stackedplot - function(main=){ top.vp - viewport( layout=grid.layout(2, 1)) p1 - viewport(layout.pos.col=1, layout.pos.row=1, name=plot1) p2 - viewport(layout.pos.col=1, layout.pos.row=2, name=plot2) splot - vpTree(top.vp, vpList(p1,p2)) pushViewport(splot) seekViewport(plot1) grid.rect(width=unit(0.9, npc), height=unit(0.9, npc)) seekViewport(plot2) grid.rect(width=unit(0.9, npc), height=unit(0.9, npc)) } For creating a 2x2 grid with four stacked plots I tried to use the following code: grid.newpage() multitop.vp - viewport(layout=grid.layout(2,2)) pl1 - viewport(layout.pos.col=1, layout.pos.row=1, name=A) pl2 - viewport(layout.pos.col=1, layout.pos.row=2, name=B) pl3 - viewport(layout.pos.col=2, layout.pos.row=1, name=C) pl4 - viewport(layout.pos.col=2, layout.pos.row=2, name=D) vpall - vpTree(multitop.vp, vpList(pl1,pl2,pl3,pl4)) pushViewport(vpall) seekViewport(A) stackedplot(main=A) seekViewport(B) stackedplot(main=B) seekViewport(C) stackedplot(main=C) seekViewport(D) stackedplot(main=D) This does not work as all the plots are plotted in the same cell of the grid (viewport A). However, if I plot them in a reversed order, the plots arrange as was supposed to: D to D, C to C and so on. seekViewport(D) stackedplot(main=D) seekViewport(C) stackedplot(main=C) seekViewport(B) stackedplot(main=B) seekViewport(A) stackedplot(main=A) I tried with different names and found out that if I plot in reversed alphabetical order everything works fine. Once I try to plot in a viewport with a name earlier in alphabetical order, all other plots thereafter are plotted in the same viewport. Why is this happening? Regards, Satu Helske [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting list to character
Thanks David, that's I was looking for. Thanks to Chel too. Massimiliano - Messaggio originale - Da: David L Carlson dcarl...@tamu.edu A: Chel Hee Lee chl...@mail.usask.ca, Massimiliano Tripoli mtrip...@istat.it, r-help@r-project.org Inviato: Martedì, 25 novembre 2014 19:40:51 Oggetto: RE: [R] Converting list to character Or just modify your aggregate() command: TAB - aggregate(mydata$CODE, by=list(ID=mydata$ID, +YEAR=mydata$YEAR), FUN=paste0, collapse=, ) TAB ID YEAR x 1 986 2008 GR.3.8 2 1251 2008 GR.3.1, GR.3.8 3 1801 2008 GR.3.8 411 2009 GR.3.7 5 986 2009 GR.3.8 6 1251 2009 GR.3.1, GR.3.8 7 1801 2009 GR.3.8 811 2010 GR.3.7 9 460 2010 GR.3.1 10 986 2010 GR.3.8 11 1251 2010 GR.3.1, GR.3.8 12 1801 2010 GR.3.8 13 460 2011 GR.3.1 14 986 2011 GR.3.8 15 1251 2011 GR.3.1, GR.3.8 16 1801 2011 GR.3.8 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Lee, Chel Hee Sent: Tuesday, November 25, 2014 11:23 AM To: Massimiliano Tripoli; r-help@r-project.org Subject: Re: [R] Converting list to character do.call(rbind, TAB$x) [,1] [,2] 1 GR.3.8 GR.3.8 2 GR.3.1 GR.3.8 4 GR.3.8 GR.3.8 5 GR.3.7 GR.3.7 6 GR.3.8 GR.3.8 7 GR.3.1 GR.3.8 9 GR.3.8 GR.3.8 10 GR.3.7 GR.3.7 11 GR.3.1 GR.3.1 12 GR.3.8 GR.3.8 13 GR.3.1 GR.3.8 15 GR.3.8 GR.3.8 16 GR.3.1 GR.3.1 17 GR.3.8 GR.3.8 18 GR.3.1 GR.3.8 20 GR.3.8 GR.3.8 Is this what you are looking for? I hope this helps. Chel Hee Lee On 11/25/2014 6:07 AM, Massimiliano Tripoli wrote: Dear all, I can't convert the result of aggregate function in a dataframe. My data looks like: mydata - structure(list(ID = c(11, 11, 460, 460, 986, 986, 986, 986, 1251, 1251, 1251, 1251, 1251, 1251, 1251, 1251, 1801, 1801, 1801, 1801 ), YEAR = c(2009, 2010, 2010, 2011, 2008, 2009, 2010, 2011, 2008, 2008, 2009, 2009, 2010, 2010, 2011, 2011, 2008, 2009, 2010, 2011 ), Y = c(158126, 153015, 3701, 5880, 718663, 661112, 527233, 558281, 450, 131714, 427, 124648, 425, 116500, 434, 123853, 17400, 16493, 8057, 8329), CODE = c(GR.3.7, GR.3.7, GR.3.1, GR.3.1, GR.3.8, GR.3.8, GR.3.8, GR.3.8, GR.3.1, GR.3.8, GR.3.1, GR.3.8, GR.3.1, GR.3.8, GR.3.1, GR.3.8, GR.3.8, GR.3.8, GR.3.8, GR.3.8)), .Names = c(ID, YEAR, Y, CODE), row.names = c(NA, 20L), class = data.frame) and by using aggregate function TAB - aggregate(mydata$CODE,by=list(ID=mydata$ID,YEAR=mydata$YEAR),FUN=paste0) What I want is a dataframe like of printing TAB: TAB ID YEAR x 1 986 2008 GR.3.8 2 1251 2008 GR.3.1, GR.3.8 3 1801 2008 GR.3.8 411 2009 GR.3.7 5 986 2009 GR.3.8 6 1251 2009 GR.3.1, GR.3.8 7 1801 2009 GR.3.8 811 2010 GR.3.7 9 460 2010 GR.3.1 10 986 2010 GR.3.8 11 1251 2010 GR.3.1, GR.3.8 12 1801 2010 GR.3.8 13 460 2011 GR.3.1 14 986 2011 GR.3.8 15 1251 2011 GR.3.1, GR.3.8 16 1801 2011 GR.3.8 str(TAB)[1:10] 'data.frame':16 obs. of 3 variables: $ ID : num 986 1251 1801 11 986 ... $ YEAR: num 2008 2008 2008 2009 2009 ... $ x :List of 16 ..$ 1 : chr GR.3.8 ..$ 2 : chr GR.3.1 GR.3.8 ..$ 4 : chr GR.3.8 ..$ 5 : chr GR.3.7 ..$ 6 : chr GR.3.8 ..$ 7 : chr GR.3.1 GR.3.8 ..$ 9 : chr GR.3.8 ..$ 10: chr GR.3.7 ..$ 11: chr GR.3.1 ..$ 12: chr GR.3.8 ..$ 13: chr GR.3.1 GR.3.8 ..$ 15: chr GR.3.8 ..$ 16: chr GR.3.1 ..$ 17: chr GR.3.8 ..$ 18: chr GR.3.1 GR.3.8 ..$ 20: chr GR.3.8 NULL As you can see the x coloumn is a list and I would want to change it to character variable. Anyone may help me? Thanks, Massimiliano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Massimiliano Tripoli Collaboratore T.E.R. scado il 31/12/2014 ISTAT - DCCN - Direzione Centrale della Contabilità Nazionale U.O. Contabilità dei flussi di materia del sistema economico - CSA/C Via Depretis, 74/B 00184 Roma Tel. 06.4673.3132 E-mail: mtrip...@istat.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rJava Package
Hi All, I am a beginner to R. I have installed tried a sample of JRI using Rengine and Rserve. I found normalization and sqrt function in some sample code. Is there any link where there is a list of functions that is provided in R which I can use to process data in java programs. Regards KB LT Technology Services Ltd www.LntTechservices.comhttp://www.lnttechservices.com/ This Email may contain confidential or privileged information for the intended recipient (s). If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function
Dear Rune, Thank you for your prompt reply and it looks like the ordinal package could be the answer I was looking for! If you don't mind, I'd also like to know please what to do if the tests show the proportional odds assumption is NOT met. (Unfortunately I notice effects from almost all variables that breach the proportional odds assumption in my dataset) Would you recommend a multinomial logistic model? Or re-scaling of the data? Thank you for your time, Best wishes, Charlie On 26 Nov 2014, at 14:08, Rune Haubo rune.ha...@gmail.com wrote: Dear Charlie, I admit that I haven't read your email closely, but here is a way to test for non-proportional odds using the ordinal package (warning: self-promotion) using the wine data set also from the ordinal package. There is more information in the package vignettes Hope this is something you can use. Cheers, Rune library(ordinal) ## Fit model: fm - clm(rating ~ temp + contact, data=wine) summary(fm) formula: rating ~ temp + contact data:wine link threshold nobs logLik AICniter max.grad cond.H logit flexible 72 -86.49 184.98 6(0) 4.64e-15 2.7e+01 Coefficients: Estimate Std. Error z value Pr(|z|) tempwarm 2.5031 0.5287 4.735 2.19e-06 *** contactyes 1.5278 0.4766 3.205 0.00135 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Threshold coefficients: Estimate Std. Error z value 1|2 -1.3444 0.5171 -2.600 2|3 1.2508 0.4379 2.857 3|4 3.4669 0.5978 5.800 4|5 5.0064 0.7309 6.850 ## Model with non-proportional odds for contact: fm2 - clm(rating ~ temp, nominal=~contact, data=wine) ## Likelihood ratio test of non-proportional odds: anova(fm, fm2) Likelihood ratio tests of cumulative link models: formula:nominal: link: threshold: fm rating ~ temp + contact ~1 logit flexible fm2 rating ~ temp ~contact logit flexible no.parAIC logLik LR.stat df Pr(Chisq) fm 6 184.98 -86.492 fm2 9 190.42 -86.209 0.5667 3 0.904 ## Automatic tests of non-proportional odds for all varibles: nominal_test(fm) Tests of nominal effects formula: rating ~ temp + contact Df logLikAICLRT Pr(Chi) none -86.492 184.98 temp 3 -84.904 187.81 3.1750 0.3654 contact 3 -86.209 190.42 0.5667 0.9040 On 25 November 2014 at 17:21, Charlotte Whitham charlotte.whit...@gmail.com wrote: Dear list, I have used the ‘polr’ function in the MASS package to run an ordinal logistic regression for an ordinal categorical response variable with 15 continuous explanatory variables. I have used the code (shown below) to check that my model meets the proportional odds assumption following advice provided at (http://www.ats.ucla.edu/stat/r/dae/ologit.htm) – which has been extremely helpful, thank you to the authors! However, I’m a little worried about the output implying that not only are the coefficients across various cutpoints similar, but they are exactly the same (see graphic below). Here is the code I used (and see attached for the output graphic) FGV1b-data.frame(FG1_val_cat=factor(FGV1b[,FG1_val_cat]),scale(FGV1[,c(X,Y,Slope,Ele,Aspect,Prox_to_for_FG,Prox_to_for_mL,Prox_to_nat_border,Prox_to_village,Prox_to_roads,Prox_to_rivers,Prox_to_waterFG,Prox_to_watermL,Prox_to_core,Prox_to_NR,PCA1,PCA2,PCA3)])) b-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE) #Checking the assumption. So the following code will estimate the values to be graphed. First it shows us #the logit transformations of the probabilities of being greater than or equal to each value of the target #variable FGV1b$FG1_val_cat-as.numeric(FGV1b$FG1_val_cat) sf - function(y) { c('VC=1' = qlogis(mean(FGV1b$FG1_val_cat = 1)), 'VC=2' = qlogis(mean(FGV1b$FG1_val_cat = 2)), 'VC=3' = qlogis(mean(FGV1b$FG1_val_cat = 3)), 'VC=4' = qlogis(mean(FGV1b$FG1_val_cat = 4)), 'VC=5' = qlogis(mean(FGV1b$FG1_val_cat = 5)), 'VC=6' = qlogis(mean(FGV1b$FG1_val_cat = 6)), 'VC=7' = qlogis(mean(FGV1b$FG1_val_cat = 7)), 'VC=8' = qlogis(mean(FGV1b$FG1_val_cat = 8))) } (t - with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf))) #The table displays the (linear) predicted values we would get if we regressed our #dependent variable on our
[R] How can I run a TSP program inside R
I have the following TSP code: options memory = 6; options crt; in 'mydat.tlb' ; ? ? Create 2 new variables ? age20 = age -20; lwage = log(wage); ? ? olsq lwage c f edy tenure age20 pu; How can I run it inside R? Where can I get more explanation on how to code for TSP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores
Hola Isidro, mira, te explico mejor: tengo una base de datos con información de 10 conductores en un recorrido de 30 minutos en coche. Para cada conductor, se le midió parámetros biomédicos como la temperatura corporal, su electrocardiograma, etc., durante todo el recorrido; en total 22 parámetros. Mi objetivo principal es poder determinar, dados dichos parámetros, los distintos estados en los que puede estar un conductor a lo largo del recorrido. Sin embargo, mi conjunto de datos no está etiquedo, es decir, no sé a priori la variable de respuesta, el estado del conductor, para cada combinación; tengo que descubrirla. Lo que quería hacer es, primero, transformar los parámetros porque suele ser recomendado para no tener overfitting y reducir la dimensión de los datos. Para ello, quiero probar dos técnicas: ICA y PCA. Tras esto, pensaba probar distintos algoritmos de clustering para ver cómo agrupan los datos. Con cada uno, puedo obtener la bondad con la que asignan un elemento a un cluster con, por ejemplo, el silhouette coefficient, o algún otro índice interno/externo. Con cada algoritmo de clustering que pruebe, etiquetaré mis datos de entrenamiento asignándoles un cluster (que luego más adelante intentaré darle una explicación semántica del estado que representa). Por cada conjunto resultado (ahora, etiquetado) de aplicar una técnica de extracción de características y otro de clustering, quiero probar distintos clasificadores, para ver cómo se comportan con esa agrupación. Por tanto, obtendré varios errores asociados a clasificación porqué haré cross-validation. De esta forma, si pruebo 2 algoritmos de extracción de características, 3 de clustering y 4 de clasificación, tengo un experimento factorial 2x3x4, ¿no? Lo que me gustaría obtener posteriormente es la mejor combinación de técnica de extracción de características, algoritmo de clustering y clasificador, teniendo en cuenta los errores de clasificación y cuán bien los algoritmos de clustering agrupan. De ahí, mi duda es cómo analizar los resultados, porque había pensado aplicar una ANOVA de 3 vías con interacción, pero no sé si es correcto. Además, no sé si tendría sentido, porque también quiero tener en cuenta la bondad del algoritmo de clustering, no solo los errores de clasificación. Es decir, necesitaría analizar las parejas (muestras del error de clasificación, bondad del clustering) para cada combinación de algoritmo de extracción de características, algoritmo de clustering y algoritmo de clasificación. Espero que te haya aclarado :) Muchas gracias. Un saludo, DANI On 26/11/14 01:02, Isidro Hidalgo Arellano wrote: Hola, Daniel: Quizá deberías ser más explícito porque de la información que suministras yo solo te puedo decir que no veo la relación entre los 3 tipos de algoritmos que nombras: - un análisis de componentes principales puede ser una fase previa de los otros dos - hacer un cluster es un tipo de aprendizaje no supervisado, mientras que un clasificador normalmente es utilizado en aprendizaje supervisado, porque se modeliza conociendo la variable dependiente Por ello, no veo cómo montar un ANOVA para analizar 3 procedimientos que a mí me parece que se utilizan para cosas completamente diferentes... Me imagino que no he sido de mucha ayuda, pero... ¿por qué no nos dices exactamente que quieres hacer, a ver si te podemos ayudar algo más? Un saludo, Isidro Hidalgo El 25/11/2014, a las 22:09, Daniel Carrillo Zapata escribió: Hola compañeros Soy Daniel Carrillo, y os escribo porque me ha surgido una duda sobre si puedo tratar algoritmos de clustering como un factor en un experimento. Concretamente, tengo un conjunto de datos sin etiquetar, y quiero probar los siguientes algoritmos sobre él: 1) Extracción de características por PCA y por ICA. 2) Una vez tenga extraídas las características, para cada uno de los dos conjuntos transformados quisiera probar 3 diferentes algoritmos de clustering: k-medoids, EM y hierachical clustering. 3) Por último, para cada conjunto etiquetado quisiera probar 4 ó 5 clasificadores. Como se puede ver, estoy diseñando un experimento factorial para encontrar el mejor clasificador basándome en probar diferentes técnicas de extracción de características, clustering y clasificación. Mi objetivo final es entrenar al mejor clasificador basándome en el mejor algoritmo de clustering, de clasificación y de extracción de características para que etiquete futuros datos. Sin embargo, me han surgido dudas de cómo analizar los resultados, y es que no sé si se puede aplicar una ANOVA de 3 vías con interacción, siendo los 3 factores el algoritmo de extracción de características, algoritmo de clustering y algoritmo de clasificación. Mis preguntas por tanto son: 1) ¿Tiene sentido aplicar ANOVA de 3 vías con interacción? 2) Si
Re: [R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores
CREO QUE ESTE TIPO DE CONSULTA, EXCEDE EL PROP�SITO DE ESTE FORO. El mi�rcoles, 26 de noviembre de 2014, Daniel Carrillo Zapata daniel.carril...@um.es escribi�: Hola Isidro, mira, te explico mejor: tengo una base de datos con informaci�n de 10 conductores en un recorrido de 30 minutos en coche. Para cada conductor, se le midi� par�metros biom�dicos como la temperatura corporal, su electrocardiograma, etc., durante todo el recorrido; en total 22 par�metros. Mi objetivo principal es poder determinar, dados dichos par�metros, los distintos estados en los que puede estar un conductor a lo largo del recorrido. Sin embargo, mi conjunto de datos no est� etiquedo, es decir, no s� a priori la variable de respuesta, el estado del conductor, para cada combinaci�n; tengo que descubrirla. Lo que quer�a hacer es, primero, transformar los par�metros porque suele ser recomendado para no tener overfitting y reducir la dimensi�n de los datos. Para ello, quiero probar dos t�cnicas: ICA y PCA. Tras esto, pensaba probar distintos algoritmos de clustering para ver c�mo agrupan los datos. Con cada uno, puedo obtener la bondad con la que asignan un elemento a un cluster con, por ejemplo, el silhouette coefficient, o alg�n otro �ndice interno/externo. Con cada algoritmo de clustering que pruebe, etiquetar� mis datos de entrenamiento asign�ndoles un cluster (que luego m�s adelante intentar� darle una explicaci�n sem�ntica del estado que representa). Por cada conjunto resultado (ahora, etiquetado) de aplicar una t�cnica de extracci�n de caracter�sticas y otro de clustering, quiero probar distintos clasificadores, para ver c�mo se comportan con esa agrupaci�n. Por tanto, obtendr� varios errores asociados a clasificaci�n porqu� har� cross-validation. De esta forma, si pruebo 2 algoritmos de extracci�n de caracter�sticas, 3 de clustering y 4 de clasificaci�n, tengo un experimento factorial 2x3x4, �no? Lo que me gustar�a obtener posteriormente es la mejor combinaci�n de t�cnica de extracci�n de caracter�sticas, algoritmo de clustering y clasificador, teniendo en cuenta los errores de clasificaci�n y cu�n bien los algoritmos de clustering agrupan. De ah�, mi duda es c�mo analizar los resultados, porque hab�a pensado aplicar una ANOVA de 3 v�as con interacci�n, pero no s� si es correcto. Adem�s, no s� si tendr�a sentido, porque tambi�n quiero tener en cuenta la bondad del algoritmo de clustering, no solo los errores de clasificaci�n. Es decir, necesitar�a analizar las parejas (muestras del error de clasificaci�n, bondad del clustering) para cada combinaci�n de algoritmo de extracci�n de caracter�sticas, algoritmo de clustering y algoritmo de clasificaci�n. Espero que te haya aclarado :) Muchas gracias. Un saludo, DANI On 26/11/14 01:02, Isidro Hidalgo Arellano wrote: Hola, Daniel: Quiz� deber�as ser m�s expl�cito porque de la informaci�n que suministras yo solo te puedo decir que no veo la relaci�n entre los 3 tipos de algoritmos que nombras: - un an�lisis de componentes principales puede ser una fase previa de los otros dos - hacer un cluster es un tipo de aprendizaje no supervisado, mientras que un clasificador normalmente es utilizado en aprendizaje supervisado, porque se modeliza conociendo la variable dependiente Por ello, no veo c�mo montar un ANOVA para analizar 3 procedimientos que a m� me parece que se utilizan para cosas completamente diferentes... Me imagino que no he sido de mucha ayuda, pero... �por qu� no nos dices exactamente que quieres hacer, a ver si te podemos ayudar algo m�s? Un saludo, Isidro Hidalgo El 25/11/2014, a las 22:09, Daniel Carrillo Zapata escribi�: Hola compa�eros Soy Daniel Carrillo, y os escribo porque me ha surgido una duda sobre si puedo tratar algoritmos de clustering como un factor en un experimento. Concretamente, tengo un conjunto de datos sin etiquetar, y quiero probar los siguientes algoritmos sobre �l: 1) Extracci�n de caracter�sticas por PCA y por ICA. 2) Una vez tenga extra�das las caracter�sticas, para cada uno de los dos conjuntos transformados quisiera probar 3 diferentes algoritmos de clustering: k-medoids, EM y hierachical clustering. 3) Por �ltimo, para cada conjunto etiquetado quisiera probar 4 � 5 clasificadores. Como se puede ver, estoy dise�ando un experimento factorial para encontrar el mejor clasificador bas�ndome en probar diferentes t�cnicas de extracci�n de caracter�sticas, clustering y clasificaci�n. Mi objetivo final es entrenar al mejor clasificador bas�ndome en el mejor algoritmo de clustering, de clasificaci�n y de extracci�n de caracter�sticas para que etiquete futuros datos. Sin embargo, me han surgido dudas de c�mo analizar los resultados, y es que no s� si se puede aplicar una ANOVA de 3 v�as con interacci�n, siendo
[R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores
Hola compañeros :) Soy Daniel Carrillo, y os escribo porque me ha surgido una duda sobre si puedo tratar algoritmos de clustering como un factor en un experimento. Concretamente, tengo un conjunto de datos sin etiquetar, y quiero probar los siguientes algoritmos sobre él: 1) Extracción de características por PCA y por ICA. 2) Una vez tenga extraídas las características, para cada uno de los dos conjuntos transformados quisiera probar 3 diferentes algoritmos de clustering: k-medoids, EM y hierachical clustering. 3) Por último, para cada conjunto etiquetado quisiera probar 4 ó 5 clasificadores. Como se puede ver, estoy diseñando un experimento factorial para encontrar el mejor clasificador basándome en probar diferentes técnicas de extracción de características, clustering y clasificación. Sin embargo, me han surgido dudas de cómo analizar los resultados, y es que no sé si se puede aplicar una ANOVA de 3 vías con interacción, siendo los 3 factores el algoritmo de extracción de características, algoritmo de clustering y algoritmo de clasificación. Mis preguntas por tanto son: 1) ¿Puedo aplicar ANOVA de 3 vías con interacción? 2) Si no, ¿cuál sería la mejor manera de analizar los resultados del experimento? Mis dudas vienen suscitadas por el hecho de que pienso que los algoritmos de clasificación son totalmente dependientes del los de clustering (que les etiqueta los datos). Confío en vuestra experiencia para que me aportéis un rayo de luz en esto :) ¡Muchísimas gracias! Un saludo, DANI ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] Duda sobre cómo analizar un experimento factorial con algoritmos de extracción de características, clustering y clasificación como factores
Hola Daniel, no te vayas a desanimar, seguro hay foros donde puedes plantear asuntos mas estadisticos que de R mismo. Saludos y suerte con todo, Eric. On 26/11/14 11:16, DANIEL CARRILLO ZAPATA wrote: Hola de nuevo a todos, me gustaría pediros disculpas por los correos que he enviado. La razón de enviarlos es que pensaba que era también un foro en el que podía plantear cuestiones estadísticas, no solo sobre R en concreto. Siempre es importante aprender algo de todo lo que haces, así que lo que me llevo es el conocimiento de que aquí solo puedo plantear cuestiones de implementación en R, y así lo haré de aquí en adelante, puesto que trabajo todos los días con él. De nuevo, mis más sinceras disculpas si os habéis molestado. Mi intención no era en ningún momento pedir que me hicierais el proyecto, ni mucho menos. Seguiré estudiando más y más cada día para formarme lo más que pueda y que no parezca eso ;) Gracias a todos! Un saludo, DANI On 26 November 2014 12:53:32 CET, Jorge I Velez jorgeivanve...@gmail.com wrote: Coincido con el Prof. Di Rienzo. A proposito, esta consulta me recuerda R require(fortunes) R fortune('brain') I wish to perform brain surgery this afternoon at 4pm and don't know where to start. My background is the history of great statistician sports legends but I am willing to learn. I know there are courses and numerous books on brain surgery but I don't have the time for those. Please direct me to the appropriate HowTos, and be on standby for solving any problem I may encounter while in the operating room. Some of you might ask for specifics of the case, but that would require my following the posting guide and spending even more time than I am already taking to write this note. -- I. Ben Fooled (aka Frank Harrell) R-help (April 1, 2005) Saludos, Jorge.- 2014-11-26 22:34 GMT+11:00 Julio Alejandro Di Rienzo dirienzo.ju...@gmail.com: CREO QUE ESTE TIPO DE CONSULTA, EXCEDE EL PROPÓSITO DE ESTE FORO. El miércoles, 26 de noviembre de 2014, Daniel Carrillo Zapata daniel.carril...@um.es escribió: Hola Isidro, mira, te explico mejor: tengo una base de datos con información de 10 conductores en un recorrido de 30 minutos en coche. Para cada conductor, se le midió parámetros biomédicos como la temperatura corporal, su electrocardiograma, etc., durante todo el recorrido; en total 22 parámetros. Mi objetivo principal es poder determinar, dados dichos parámetros, los distintos estados en los que puede estar un conductor a lo largo del recorrido. Sin embargo, mi conjunto de datos no está etiquedo, es decir, no sé a priori la variable de respuesta, el estado del conductor, para cada combinación; tengo que descubrirla. Lo que quería hacer es, primero, transformar los parámetros porque suele ser recomendado para no tener overfitting y reducir la dimensión de los datos. Para ello, quiero probar dos técnicas: ICA y PCA. Tras esto, pensaba probar distintos algoritmos de clustering para ver cómo agrupan los datos. Con cada uno, puedo obtener la bondad con la que asignan un elemento a un cluster con, por ejemplo, el silhouette coefficient, o algún otro índice interno/externo. Con cada algoritmo de clustering que pruebe, etiquetaré mis datos de entrenamiento asignándoles un cluster (que luego más adelante intentaré darle una explicación semántica del estado que representa). Por cada conjunto resultado (ahora, etiquetado) de aplicar una técnica de extracción de características y otro de clustering, quiero probar distintos clasificadores, para ver cómo se comportan con esa agrupación. Por tanto, obtendré varios errores asociados a clasificación porqué haré cross-validation. De esta forma, si pruebo 2 algoritmos de extracción de características, 3 de clustering y 4 de clasificación, tengo un experimento factorial 2x3x4, ¿no? Lo que me gustaría obtener posteriormente es la mejor combinación de técnica de extracción de características, algoritmo de clustering y clasificador, teniendo en cuenta los errores de clasificación y cuán bien los algoritmos de clustering agrupan. De ahí, mi duda es cómo analizar los resultados, porque había pensado aplicar una ANOVA de 3 vías con interacción, pero no sé si es correcto. Además, no sé si tendría sentido, porque también quiero tener en cuenta la bondad del algoritmo de clustering, no solo los errores de clasificación. Es decir, necesitaría analizar las parejas (muestras del error de clasificación, bondad del clustering) para cada combinación de algoritmo de extracción de características, algoritmo de clustering y algoritmo de clasificación. Espero que te haya aclarado :) Muchas gracias. Un saludo, DANI On 26/11/14 01:02, Isidro Hidalgo Arellano wrote: Hola, Daniel: Quizá deberías ser más explícito porque de la información que suministras
[R-es] foro http://stats.stackexchange.com
Estimados Separando de una consulta anterior a esta lista de correos (sobre estad�stica sin R), y por la pregunta de Rub�n Casal. Yo supe utilizar http://stats.stackexchange.com , algunas cosas me fueron �tiles, buenas ideas, otras estaban con errores, o escrito de otra forma, en mi computadora no daba el mismo resultado. La diferencia aparte del idioma, es la velocidad, me refiero a que si alguien escribi� algo es f�cil el copiar y pegar, comparado a esperar la respuesta por correo electr�nico. Yo comenc� a escribir los trucos que a m� me ayudaron con R, pero algo de tiempo, complejidad, organizar ideas y c�digo, ejemplos y ... , estad�stica para no estad�sticos, programaci�n para no programadores, estad�sticos para otras ciencias, as� es R, mezcla estudiantes a profesores de alt�simo nivel. Javier Marcuzzi [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es