[R] Point patterns and igraph
Hi, I have a data set consisting of the x and y coordinate locations of 1600 points. I would like to generate a graph using the functions in igraph. However the graph making functions in igraph requires the data to be in the form of an adjacency matrix. I'd like some advice on how to convert my point pattern to an adjacency matrix or functions out there that would do it directly. I've only gotten as far as obtaining the distance matrix using dist(). Thanks for the help. Best, Juanita -- Juanita Choo Graduate Student Section of Integrative Biology University of Texas at Austin 1 University Station Stop TX 78712 +1 512 471 5773 -- Juanita Choo Graduate Student Section of Integrative Biology University of Texas at Austin 1 University Station Stop TX 78712 +1 512 471 5773 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Truncating (not rounding) scientific value in R
How can I truncate the scientific value keeping two digits decimal. For example from: 6.95428812397439e-35 into 6.95e-35 -E.W. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2
See the respective help files. The continuity correction only affects the normal approximation in wilcox.test. With this small samples sizes, the default evaluation is exact, so it doesn't change anything. In contrast, kruskal.test is incapable to compute exact values but always uses the chi-square approximation. So the discrepancy is between exact test and approximation (guess you'd be better off with the former). If you get the urge to reproduce the p value from kruskal.test using wilcox.test, and maybe to better understand what's happening, try a - wilcox.test(x1, x2, paired=FALSE, exact=FALSE, correct=FALSE) (and yes, now with exact=FALSE, the continuity correction makes a difference). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Scott Sent: Dienstag, 8. September 2009 07:02 To: Thomas Farrar Cc: r-help@r-project.org Subject: Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2 Thomas Farrar wrote: Hi all, The Kruskal-Wallis test is a generalization of the two-sample Mann-Whitney test to *k* samples. That being the case, the Kruskal-Wallis test with *k*=2 should give an identical p-value to the Mann-Whitney test, should it not? x1-c(1:5) x2-c(6,8,9,11) a-wilcox.test(x1,x2,paired=FALSE) b-kruskal.test(list(x1,x2),paired=FALSE) a$p.value [1] 0.01587302 b$p.value [1] 0.01430588 The p-values are slightly different (note that there are no ties in the data, so computed p-values should be exact). Can anyone explain the discrepancy? It's been awhile since I studied nonparametric stats and this one has me scratching my head. Many thanks! Tom The continuity correction? It is true by default for wilcox.test and is not apparent in the help for kruskal.test. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email:d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Point patterns and igraph
Juanita, On Tue, Sep 8, 2009 at 8:02 AM, juanita choojuanitac...@gmail.com wrote: Hi, I have a data set consisting of the x and y coordinate locations of 1600 points. I would like to generate a graph using the functions in igraph. However the graph making functions in igraph requires the data to be in the form of an adjacency matrix. well, this is not quite true, there are quite a number of formats igraph can create graphs from, see graph.adjlist(), graph.formula(), graph.data.frame() or graph(). But you are right that graph.adjacency() is probably the easiest for you. I'd like some advice on how to convert my point pattern to an adjacency matrix or functions out there that would do it directly. I've only gotten as far as obtaining the distance matrix using dist(). You are almost there, then. You can filter your distance matrix if you like, and then create a weighted graph from the remaining edges: co - cbind(rnorm(10), rnorm(10)) D - as.matrix(dist(co)) D[ D 2 ] - 0 G - graph.adjacency(D, mode=undirected, weighted=TRUE) Gabor Thanks for the help. Best, Juanita -- Juanita Choo Graduate Student Section of Integrative Biology University of Texas at Austin 1 University Station Stop TX 78712 +1 512 471 5773 -- Juanita Choo Graduate Student Section of Integrative Biology University of Texas at Austin 1 University Station Stop TX 78712 +1 512 471 5773 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gabor Csardi gabor.csa...@unil.ch UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : calling combinations of variable names
may be this can work testfun-function(x) { rval= k-length(x) for (i in 1: k) rval-paste(rval,x[i],sep=-) rval } v1-paste(evalr,1:4,sep=) eval-expand.grid(w=v1,x=v1,y=v1,z=v1) n-dim(eval)[1] results-rep(, n) for (i in 1:n) { row-unique(unlist(eval[i,])) if (length(row)=3) results[i]-testfun(row) } You just have to replace testfun by your own function in this case ICC. Sincerly. Justin BEM BP 1917 Yaoundé Tél (237) 76043774 De : Helter Two helter...@care2.com À : r-help@r-project.org Envoyé le : Lundi, 7 Septembre 2009, 18h17mn 22s Objet : [R] calling combinations of variable names R-2.9.1, Windows7 Dear list, I have a question to you that seems very simple to me, but I just can't figure it out. I have a dataframe called ratings which contains the following variables: evalR1, evalR2, evalR3, evalR4, scoreR1, scoreR2, scoreR3, scoreR4, opinionR1, opinionR2, opinionR3, opinionR4. (there are more variables, but this gives an idea of the data structure). What I want is run several analyses on all 3 or 4-combinations of a given variable. So, for example, I want to compute the following ICC's (function from the psych package): ICC(cbind(evalR1,evalR2, evalR3)) ICC(cbind(evalR1,evalR2, evalR4)) ICC(cbind(evalR1, evalR3, evalR4)) ICC(cbind(evalR2, evalR3, evalR4)) ICC(cbind(evalR1, evalR2, evalR3, eval4)). I create a matrix containing the 3-combinations by combn(4,3). Now I need to call the variables into the function. First, I tried paste as follows: combis - combn(4,3) # this gives the 3-combinations attach(ratings) eval - paste(evalR,combis[1,1],,evalR,combis[2,1],,evalR,combis[3,1],sep =) (this is of course just for 1 combination, as an example) the output of this is evalR1,evalR2,evalR3, but when I run ICC(cbind(eval)), an error message is given which is not given when I enter ICC(cbind(evalR1,evalR2, evalR3)) manually. The function appears not to recognize the variable names. It also does not work to type ICC(cbind(unquote(eval))). Alternatively, I have tried the cat function, but also here ICC does not recognize the input as variable names. What am I doing wrong? How can I automatically construct the set of variable names such that a function recognizes them as variable names? ICC is one example, but there are also other computations to be run and the set of variables is pretty large, so typing the combinations of variable names manually is really unattractive. What am I missing? It seems to me that there probably is a very simple solution in R, but which? Thank you, Peter Verbeet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Omnibus test for main effects in the face ofaninteraction containing the main effects.
Hi John, When Group is entered as a factor, and the factor has two levels, the ANOVA table gives a p value for each level of the factor. This does not (normally) happen so you are doing something strange. ## From your first posting on this subject fita-lme(Post~Time+factor(Group)+factor(Group)*Time, random=~1|SS,data=blah$alldata) To begin with, what is blah$alldata? lme() asks for a data frame if the formula interface is used. This looks like part of a list, or the column of a data frame. Have a look at the output below, from a syntactically correct model that has essentially the same structure as yours. The data set should be loaded with nlme so you can run it directly to see the result. Sex, with two levels, is not split in the anova table. ## str(Orthodont) anova( lme(distance ~ age * Sex, data = Orthodont, random = ~ 1) ) Surely this is essentially what you are looking for? If Sex is not already a factor, and it really is better to make it one when you build your data set, then you can use as.factor as you have done, with the same result. (Note: age * Sex expands to age + Sex + age:Sex, which equals the messy and unnecessary age + Sex + age * Sex.) Regards, Mark. John Sorkin wrote: Daniel, When Group is entered as a factor, and the factor has two levels, the ANOVA table gives a p value for each level of the factor. What I am looking for is the omnibus p value for the factor, i.e. the test that the factor (with all its levels) improves the prediction of the outcome. You are correct that normally one could rely on the fact that the model Post-Time+as.factor(Group)+as.factor(Group)*Time contains the model Post-Time+as.factor(Group) and compare the two models using anova(model1,model2). However, my model is has a random effect, the comparison is not so easy. The REML comparions of nested random effects models is not valid when the fixed effects are not the same in the models, which is the essence of the problem in my case. In addition to the REML problem if one wants to perform an omnibus test for Group, one would want to compare nested models, one containing Group, and the other not containing group. This would suggest comparing Post-Time+ as.factor(Group)*Time to Post-Time+Group+as.factor(Group)*Time The quandry here is whether one should or not allow the first model as it is poorly specified - one term of the interaction, as.factor(Group)*Time, as.factor(Group) does not appear as a main effect - a no-no in model building. John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsor...@grecc.umaryland.edu Daniel Malter dan...@umd.edu 09/07/09 9:23 PM John, your question is confusing. After reading it twice, I still cannot figure out what exactly you want to compare. Your model a is the unrestricted model, and model b is a restricted version of model a (i.e., b is a hiearchically reduced version of a, or put differently, all coefficients of b are in a with a having additional coefficients). Thus, it is appropriate to compare the models (also called nested models). Comparing c with a and d with a is also appropriate for the same reason. However, note that depedent on discipline, it may be highly unconventional to fit an interaction without all direct effects of the interacted variables (the reason for this being that you may get biased estimates). What you might consider is: 1. Run an intercept only model 2. Run a model with group and time 3. Run a model with group, time, and the interaction Then compare 2 to 1, and 3 to 2. This tells you whether including more variables (hierarchically) makes your model better. HTH, Daniel On a different note, if lme fits with restricted maximum likelihood, I think I remember that you cannot compare them. You have to fit them with maximum likelihood. I am pointing this out because lmer with restricted maximum likelihood by standard, so lme might too. - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von John Sorkin Gesendet: Monday, September 07, 2009 4:00 PM An: r-help@r-project.org Betreff: [R] Omnibus test for main effects in the face of aninteraction containing the main effects. R 2.9.1 Windows XP UPDATE, Even my first suggestion anova(fita,fitb) is probably not appropriate as the fixed effects are different in the two model, so
Re: [R] Truncating (not rounding) scientific value in R
x=0.01345778543577 signif(x,3) - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Gundala Viswanath Gesendet: Tuesday, September 08, 2009 2:13 AM An: r-h...@stat.math.ethz.ch Betreff: [R] Truncating (not rounding) scientific value in R How can I truncate the scientific value keeping two digits decimal. For example from: 6.95428812397439e-35 into 6.95e-35 -E.W. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I change characteristics of a cca biplot in R
On Mon, 2009-09-07 at 12:49 +0300, Rousi Heta wrote: Hi, I´m doing cca for a community data set in R and I have made a biplot for my data. Otherwise everything seems to be allright but the biplot is so messy I can´t read it well enough or publish it. I would like to get the row numbers out of the plot: I want the species position and the environmental variables in it as vectors but not the station numbers. How do I get them out? Is this in vegan? If so, if obj contains your fitted cca model, then plot(obj, display = c(species,bp)) (or, if you have centroids for factor variables as well) plot(obj, display = c(species,bp,cn)) If you only want to plot the points rather than the labels, add argument 'type' to the call setting it to p, e.g.: plot(obj, display = c(species,bp), type = p) I have a raw species data set and a raw environmental data set and I don´t have station names or numbers in the data set but R puts the row names in anyway. I understand they are necessary but I just don´t want to show them in the biplot. All of this (and more) is explained in ?plot.cca (if you are talking about vegan::cca) You might also like to look at ?orditkplot (which allows you to move the labels around yourself to get a clear plot) and ?orditorp with which you can set a priority for certain sites to be labelled with text whilst points are used for those sites that would cause text labels to overlap. HTH G Thank you! Yours sincerely Heta Rousi ( Finnish environment institute, Marine center) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] R meets apache ant
Hello, This is to announce the first release of the ant R package, which has been pushed to CRAN yesterday, and should reach your mirror and your platform soon. The package provides an R-aware version of the famous build tool from the apache project. http://ant.apache.org/ The package ships an R script that can be used to invoke ant with enough plumbing so that it can use R specific tasks within targets. $ Rscript -e ant::ant() The package contains two R specific tasks: - r-run that can be used to run arbitrary R code. - r-set that can be used to set a property based on the result of an R expression. An demonstrative build file is included within the package to further illustrate the two tasks. Also available at my blog (http://tr.im/xMdt) R system.file( examples, build.xml, package = ant ) The package is source-controlled at r-forge as part of the orchestra project. http://r-forge.r-project.org/projects/orchestra/ Many thanks to Duncan Murdoch who suggested adding the ant function as a shortcut to invoke the script and Simon Urbanek who revealed to me the existence of the configure script. This is a very first release, so feedback and ideas for additional features or tasks is very welcome. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/xMdt : update on the ant package |- http://tr.im/xHLs : R capable version of ant `- http://tr.im/xHiZ : Tip: get java home from R with rJava ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3d scatter with trend line and errors
scatterplot3d() in package scatterplot3d can do, probably also plot3d() in rgl and others. Uwe Ligges oleg lugovoy wrote: Hello, Can anyone suggest a way to draw a plot similar to the example from Matlab ( http://www.mathworks.com/products/demos/fullsize.html?src=/products/demos/shipping/stats/orthoregdemo_03.png 3d plot with trend and errors in Matlab )? Thanks, oleg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using an array of strings with strsplit, issue when including a space in split criteria
After further investigation it appears that the problem is specific to my Vista PC. I am able to get the correct results using R 2.9.2 on a Window XP 64bit machine. However i do not know why this does not work on my Vista PC. The following was done after rebooting Vista. From CMD.exe I ran the following line: C:\Program Files\R\R-2.9.2\binRgui --vanilla This opened up R. ### R 2.9.2 START ### txt - c(sales to 23 August 2008 published 29 August, + sales to 6 September 2008 published 11 September) strsplit(txt, 'published', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 11 September strsplit(txt, 'published ', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 published 11 September sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base ### R 2.9.2 END ### The exact same thing happened when I used R 2.9.0 and R 2.8.1 on this same vista computer. ### R 2.9.0 ### sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] rcom_2.1-3 rscproxy_1.3-1 loaded via a namespace (and not attached): [1] tools_2.9.0 ### R 2.8.1 ### sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base my computer details are: Windows Vista Ultimate Service Pack 1 Manufacturer: Dell Rating: 3.4 Processor: Intel Core 2 Duo CPU E6750 @ 2.66 GHz Memory (RAM): 4.00 GB System type: 32-bit Operating System 2009/9/8 Gabor Grothendieck ggrothendi...@gmail.com: I am using the exact same version of R as you also on Vista but can't reproduce your result. For me it splits properly. Try starting R like this (modify path if needed) from the Windows cmd line: \Program Files\R\R-2.9.2\bin\Rgui --vanilla and then try it. On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyaltony.bre...@googlemail.com wrote: Dear all, I'm having a problem understanding why a split does not occur with in the 2nd use of the function strsplit below: # text strings txt - c(sales to 23 August 2008 published 29 August, + sales to 6 September 2008 published 11 September) # first use strsplit(txt, 'published', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 11 September # second use, but with a space ' ' in the split strsplit(txt, 'published ', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 published 11 September Thank you kindly for any help in advance. Tony O/S: Win Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tony Breyal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2
OK, I get it now. Thank you very much! Cheers, Tom On Tue, Sep 8, 2009 at 8:20 AM, Meyners,Michael,LAUSANNE,AppliedMathematics michael.meyn...@rdls.nestle.com wrote: See the respective help files. The continuity correction only affects the normal approximation in wilcox.test. With this small samples sizes, the default evaluation is exact, so it doesn't change anything. In contrast, kruskal.test is incapable to compute exact values but always uses the chi-square approximation. So the discrepancy is between exact test and approximation (guess you'd be better off with the former). If you get the urge to reproduce the p value from kruskal.test using wilcox.test, and maybe to better understand what's happening, try a - wilcox.test(x1, x2, paired=FALSE, exact=FALSE, correct=FALSE) (and yes, now with exact=FALSE, the continuity correction makes a difference). HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Scott Sent: Dienstag, 8. September 2009 07:02 To: Thomas Farrar Cc: r-help@r-project.org Subject: Re: [R] Equivalence of Mann-Whitney test and Kruskal-Wallis test with k=2 Thomas Farrar wrote: Hi all, The Kruskal-Wallis test is a generalization of the two-sample Mann-Whitney test to *k* samples. That being the case, the Kruskal-Wallis test with *k*=2 should give an identical p-value to the Mann-Whitney test, should it not? x1-c(1:5) x2-c(6,8,9,11) a-wilcox.test(x1,x2,paired=FALSE) b-kruskal.test(list(x1,x2),paired=FALSE) a$p.value [1] 0.01587302 b$p.value [1] 0.01430588 The p-values are slightly different (note that there are no ties in the data, so computed p-values should be exact). Can anyone explain the discrepancy? It's been awhile since I studied nonparametric stats and this one has me scratching my head. Many thanks! Tom The continuity correction? It is true by default for wilcox.test and is not apparent in the help for kruskal.test. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email:d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Derivative of nonparametric curve
Dear All, I'm looking for a way on computing the derivative of first and second order of a smoothing curve produced by a nonprametric regression. For instance, if we run the R script below, a smooth nonparametric regression curve is produced. provide.data(trawl) Zone92 - (Year == 0 Zone == 1) Position - cbind(Longitude - 143, Latitude) dimnames(Position)[[2]][1] - Longitude - 143 sm.regression(Longitude, Score1, method = aicc, col = red, model = linear) Could someone please give some hints on the way to find the derivative on the curve at some points ? Thank you. Kagba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data in Array
Dear All, I have some data which were stored in few matrices with different orders. Let have three different matrices a, b and c, which have the same number of column but different number of row. a - matrix(1, nrow = 5, ncol = 1) b - matrix(2, nrow = 10, ncol = 1) c - matrix(3, nrow = 15, ncol = 1) How could i put all these matrices in an array? Thank you Kagba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Google's R Style Guide (has become S3 vs S4, in part)
Martin Morgan mtmor...@fhcrc.org on Tue, 01 Sep 2009 09:07:05 -0700 writes: spencerg wrote: Bryan Hanson wrote: Looks like the discussion is no longer about R Style, but S3 vs S4? yes nice topic rename! To that end, I asked more or less the same question a few weeks ago, arising from the much the same motivations. The discussion was helpful, here's the link: http://www.nabble.com/Need-Advice%3A-Considering-Converting-a-Package-from-S 3-to-S4-tc24901482.html#a24904049 For what it's worth, I decided, but with some ambivalence, to stay with S3 for now and possibly move to S4 later. In the spirit of S4, I did write a function that is nearly the equivalent of validObject for my S3 object of interest. Overall, it looked like I would have to spend a lot of time moving to S4, while staying with S3 would allow me to get the project done and get results going much faster (see Frank Harrell's comment in the thread above). Bryan's original post started me thinking about this, but I didn't respond. I'd classify myself as an 'S4' 'expert', with my ignorance of S3 obvious from Duncan's corrections to my earlier post. It's hard for me to make a comparative statement about S3 vs. S4, and hard really to know what is 'hard' for someone new to S4, to R, to programming, ... I would have classified most of the responses in that thread as coming from 'S3' 'experts'. As a concrete example (concrete for us non-programmers, non-statisticians), I recently decided that I wanted to add a descriptive piece of text to a number of my plots, and it made sense to include the text with the object. So I just added a list element to the existing S3 object, e.g. Myobject$descrip No further work was necessary, I could use it right away. If instead, if I had made Myobject an S4 object, then I would have to go back, redefine the object, update validObject, and possibly write some new accessor and definitely constructor functions. At least, that's how I understand the way one uses S4 classes. This is a variant of Gabor's comment, I guess, that it's easy to modify S3 on an as-needed basis. In S3, forgoing any pretext of 'best practices', one might s3 - structure(list(x=1:10, y=10:1), class=MyS3Object) ## some lines of code... if (aTest) s3$descraption - A description (either 'description' or 'discraption' is a typo, uncaught by S3). In S4 I'd have to change my class definition from setClass(MyS4Object, representation(x=numeric, y=numeric)) to setClass(MyS4Object, representation(x=numeric, y=numeric, description=character)) but the body of the code would look surprising similar s4 - new(MyS4Object, x=1:10, y=10:1) ## some lines of code... if (aTest) s...@description - A description (no typo, because I'd have been told that the slot 'discraption' didn't exist). In the S3 case the (implicit) class definition is a single line, perhaps nested deep inside a function. In S4 the class definition is in a single location. Best practices might make me want to have a validity method (x and y the same dimensions? 'description' of length 1?), to use a constructor and accessors (to provide an abstraction to separate the interface from its implementation), etc., but those issues are about best practices. A downstream consequence is that s4 always has a 'description' slot (perhaps initialized with an appropriate default in the 'prototype' argument of setClass, but that's more advanced), whereas s3 only sometimes has 'description'. So I'm forced to check is.null(s3$description) whenever I'm expecting a character vector. It doesn't stop there: If you keep the same name for your redefined S4 class, I don't know what happens when you try to access stored objects of that class created before the change, but it might not be pretty. If you give your redefined S4 class a different name, then Actually, the old object is loaded in R. It is not valid (validObject(originalS4) would complain about 'slots in class definition not in object'). One might write an 'updateObject' generic and method that detects and corrects this. This contrasts with S3, where there is no knowing whether the object is consistent with the current (implicit) class definition. you have a lot more code to change before you can use the redefined class like you want. For slot addition, this is not true -- old code works fine. For slot removal / renaming, this is analogous to S3 -- code needs reworking; use of accessors might help isolate code using the class from the implementation of the class. A couple of
[R] [OT] _ inserted in postings
Sorry if this is too OT, but there is a particular relevance to postings to R-help. Of recent times, I have received several postings via R-help (and some other mailing-lists) in which the _ character is inserted where, presumably, a space, , was intended. An example (received this morning) is below, which (from the headers) was originally sent through Yahoo web-mail. This is as seen when I read it. [ From: FMH kagba2...@yahoo.com Subject: [R] Derivative of nonparametric curve Date: Tue, 8 Sep 2009 02:07:10 -0700 (PDT) ] Dear All, I'm looking for_a way on computing the derivative of first and second order_of a smoothing curve produced by a nonprametric regression. For instance, if we run the R script below, a smooth nonparametric regression curve is produced. provide.data(trawl) Zone92__ - (Year == 0 Zone == 1) Position - cbind(Longitude - 143, Latitude) dimnames(Position)[[2]][1] - Longitude - 143 sm.regression(Longitude, Score1, method = aicc, col = red,_ model = linear) Could someone please give some hints_on the way to_find the derivative_on the curve at_some points ? Thank you. Kagba In the message headers I see: Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable and, on inspection of the message text in the inbox folder (i.e. as it was delivered to me), I see that each and every occurrence of the _ is represented there as the three successive characters =A0, i.e. a quoted-printable code. Normally, I can happily ignore this kind of thing when it just occurs in text. But since, as in the above message, it can also be interpolated into R code, this could cause unnecessary inconvenience for people who want to test the code which people post to R-help. Ditto if someone should post code (such as the above) as a solution to someone else's problem: As received, it just would not work! Comment: I have for long time been under the impression, now apparently a delusion, that the abominable quoted-printable had found its due final resting-place in the Museum of Dishonorable Obsolescence; apparently not! Also, if people are using web-mailers (Yahoo or other) that wantonly insert this kind of rubbish, they should look into the possibility of either changing the configuration under which they post their mails (if possible), or mailing via a different agent. The quoted-printable aspect may be a red herring, since I also see that the = signs in the code are represented as =3D, as is forced in quoted-printable; but they have been rendered correctly in the text as seen (and copied). So it may just be due to substitution of _ for on the ,art of the mailer. What do others think? Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 10:59:30 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data in Array
have you tried rbind? On Tue, Sep 8, 2009 at 11:16 AM, FMHkagba2...@yahoo.com wrote: Dear All, I have some data which were stored in few matrices with different orders. Let have three different matrices a, b and c, which have the same number of column but different number of row. a - matrix(1, nrow = 5, ncol = 1) b - matrix(2, nrow = 10, ncol = 1) c - matrix(3, nrow = 15, ncol = 1) How could i put all these matrices in an array? Thank you Kagba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot positive predictive values
Hi, Thanks for the hint! Of course, you are correct. Here is a link with some background for others with tired heads: http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html On Sep 4, 1:15 pm, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: You could use a glm with the binomial family to model that. A solution with ggplot2 library(ggplot2) ggplot(dataset, aes(x = x, y = y, weights = n)) + geom_smooth(method = glm, family = binomial) geom_point() ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.bewww.inbo.be __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing font to times for EMF graphics
Hi, sorry for the simple question. I am usually able to change the font for PDF graphics with pdf.options(family=Times) However, I have found no way yet to get an emf file with a times font. I also tried a detour by using pstoedit to convert a pdf with the desired font to an emf but even there the font does not appear as times. Is there a way to get a times font in an emf file or is it even a restriction of the file format? I am on Win XP and R 2.9.1. Many thanks, Werner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing font to times for EMF graphics
Do you have 'times' in a format suitable for EMF? If so you can select it for win.metafile() as for any other windows() device. You would need a TrueType version, and I suspect you do under the name 'Times New Roman'. See ?windowsFonts: my memory is that family=serif gives you that font. Note that 'Times' is not in the PDF file, but in the viewer, when you use pdf(), so that precise font is not available to you (and is not a suitable format). On Tue, 8 Sep 2009, Werner Wernersen wrote: Hi, sorry for the simple question. I am usually able to change the font for PDF graphics with pdf.options(family=Times) However, I have found no way yet to get an emf file with a times font. I also tried a detour by using pstoedit to convert a pdf with the desired font to an emf but even there the font does not appear as times. Is there a way to get a times font in an emf file or is it even a restriction of the file format? I am on Win XP and R 2.9.1. Many thanks, Werner -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Very basic question regarding plot.Design...
Hello ALL! I have a problem to plot factor (lets say gender) as a line, or at least both line and point, from ols model: ols1 - ols(Y ~ gender, data=dat, x=T, y=T) plot(ols1, gender=NA, xlab=gender, ylab=Y, ylim=c(5,30), conf.int=FALSE) If I convert gender into discrete numeric predictor, and use forceLines=TRUE, plot is not nice and true, since it shows values between 1 and 2. Thanks! PM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] _ inserted in postings
On 08-Sep-09 10:40:18, Duncan Murdoch wrote: On 08/09/2009 6:09 AM, (Ted Harding) wrote: Sorry if this is too OT, but there is a particular relevance to postings to R-help. Of recent times, I have received several postings via R-help (and some other mailing-lists) in which the _ character is inserted where, presumably, a space, , was intended. An example (received this morning) is below, which (from the headers) was originally sent through Yahoo web-mail. This is as seen when I read it. [ From: FMH kagba2...@yahoo.com Subject: [R] Derivative of nonparametric curve Date: Tue, 8 Sep 2009 02:07:10 -0700 (PDT) ] Dear All, I'm looking for_a way on computing the derivative of first and second order_of a smoothing curve produced by a nonprametric regression. For instance, if we run the R script below, a smooth nonparametric regression curve is produced. provide.data(trawl) Zone92__ - (Year == 0 Zone == 1) Position - cbind(Longitude - 143, Latitude) dimnames(Position)[[2]][1] - Longitude - 143 sm.regression(Longitude, Score1, method = aicc, col = red,_ model = linear) Could someone please give some hints_on the way to_find the derivative_on the curve at_some points ? Thank you. Kagba In the message headers I see: Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable and, on inspection of the message text in the inbox folder (i.e. as it was delivered to me), I see that each and every occurrence of the _ is represented there as the three successive characters =A0, i.e. a quoted-printable code. Normally, I can happily ignore this kind of thing when it just occurs in text. But since, as in the above message, it can also be interpolated into R code, this could cause unnecessary inconvenience for people who want to test the code which people post to R-help. Ditto if someone should post code (such as the above) as a solution to someone else's problem: As received, it just would not work! Comment: I have for long time been under the impression, now apparently a delusion, that the abominable quoted-printable had found its due final resting-place in the Museum of Dishonorable Obsolescence; apparently not! Also, if people are using web-mailers (Yahoo or other) that wantonly insert this kind of rubbish, they should look into the possibility of either changing the configuration under which they post their mails (if possible), or mailing via a different agent. The quoted-printable aspect may be a red herring, since I also see that the = signs in the code are represented as =3D, as is forced in quoted-printable; but they have been rendered correctly in the text as seen (and copied). So it may just be due to substitution of _ for on the ,art of the mailer. What do others think? I don't see the underscores in that posting, but I do see this as the last line in the headers: X-MIME-Autoconverted: from quoted-printable to 8bit by fisher.stats.uwo.ca id n8898BkC031633 (fisher.stats.uwo.ca is the server that receives my email). So it looks as though whatever is doing the conversion on your system isn't doing it as well as it should. Duncan Murdoch Thanks for this. No such conversion was performed in my case: it was delivered as-is, i.e. in the original quoted-printable (QP). The destination to which it was originally delivered (manchester.ac.uk) apparently does nothing about re-encoding it. My local mailer handles the decoding and rendering of encoded content. The QP code =A0 in the source Content-Type: text/plain; charset=iso-8859-1 corresponds to Non-Breakable Space (NBSP), so presumably my mailer displays this as _ to distinguish it from the basic ASCII space (SP) (ASCII code 32, QP code =20). (Note that the QP elements =3D in the original were correctly rendered as =, so the decoding indeed took place). This still leaves open the issue that what was presumably a simple ASCII space SP when originally entered, got changed to NBSP somehow in the process of being sent! Thanks for the comment, Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 12:46:42 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I change characteristics of a cca biplot in R
Gavin Simpson gavin.simpson at ucl.ac.uk writes: On Mon, 2009-09-07 at 12:49 +0300, Rousi Heta wrote: Hi, I´m doing cca for a community data set in R and I have made a biplot for my data. Otherwise everything seems to be allright but the biplot is so messy I can´t read it well enough or publish it. I would like to get the row numbers out of the plot: I want the species position and the environmental variables in it as vectors but not the station numbers. How do I get them out? Is this in vegan? If so, if obj contains your fitted cca model, then plot(obj, display = c(species,bp)) (or, if you have centroids for factor variables as well) plot(obj, display = c(species,bp,cn)) If you only want to plot the points rather than the labels, add argument 'type' to the call setting it to p, e.g.: plot(obj, display = c(species,bp), type = p) I have a raw species data set and a raw environmental data set and I don´t have station names or numbers in the data set but R puts the row names in anyway. I understand they are necessary but I just don´t want to show them in the biplot. All of this (and more) is explained in ?plot.cca (if you are talking about vegan::cca) You might also like to look at ?orditkplot (which allows you to move the labels around yourself to get a clear plot) and ?orditorp with which you can set a priority for certain sites to be labelled with text whilst points are used for those sites that would cause text labels to overlap. Heta, Indeed, if this cca of vegan (and not some of the other cca's) you may also consider reading the documentation. You can use vegandocs(intro) which deals with cluttered plots in chapter 2.1, or vegandocs(FAQ) and read points 2.1.12 and 2.1.13. In addition to those alternatives that Gav listed, these also mention functions ordipointlabel (which can be further processed with orditkplot) and ordilabel. You can also see plot.cca help which has some examples. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data separated by spaces, getting data into R using field lengths
I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 8, 2009 at 12:53 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? ?read.fwf Read Fixed Width Format Files Description: Read a table of *f*ixed *w*idth *f*ormatted data into a 'data.frame'. Usage: read.fwf(file, widths, header = FALSE, sep = \t, skip = 0, row.names, col.names, n = -1, buffersize = 2000, ...) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Can you post how you would like it. On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data in Array
Yes, but what actually i want to have is an array that might store these different matrices. - Original Message From: Schalk Heunis schalk.heu...@enerweb.co.za To: FMH kagba2...@yahoo.com Cc: r-help@r-project.org Sent: Tuesday, September 8, 2009 11:16:29 AM Subject: Re: [R] Data in Array have you tried rbind? On Tue, Sep 8, 2009 at 11:16 AM, FMHkagba2...@yahoo.com wrote: Dear All, I have some data which were stored in few matrices with different orders. Let have three different matrices a, b and c, which have the same number of column but different number of row. a - matrix(1, nrow = 5, ncol = 1) b - matrix(2, nrow = 10, ncol = 1) c - matrix(3, nrow = 15, ncol = 1) How could i put all these matrices in an array? Thank you Kagba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC version 1.3-0 crashes with systemtables using SQL server 2000
Dear all, I need to test for the existence of an index on a table. This cannot be done with sqlPrimaryKeys as it is not a primary key. Therefore I select directly from the systemtable of SQL-Server 2000 named sysindexes. This works well with RODBC Version 1.2-5 but not with version 1.3-0. Here is the code of the test example: sink(file = proto.txt, append = FALSE, type = output,split = TRUE) library(RODBC) sessionInfo() str - DRIVER=SQL Server;SERVER=NN;APP=Test;DATABASE=Northwind;Trusted_Connection=YES sql.ch - odbcDriverConnect(connection = str, case = nochange, believeNRows = TRUE) sql - Select top 10 * From sysindexes cat(date(),Start query\n) rc - sqlQuery(sql.ch, sql) print(rc) odbcCloseAll() sink() This is the output of the correct working version 1.2-5: R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.2-5 Tue Sep 08 09:54:37 2009 Start query id statusfirst indid root minlen keycnt groupid dpages 1 1 18 08000100 1 0B000100 42 1 1 3 2 1 2 3A010100 2 3A010100 7 3 1 1 ... output shortened This is the output for version 1.3-0, which crashes after the call to sqlQuery: R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-0 Tue Sep 08 09:52:37 2009 Start query CRASH! By the way: 'Select top 1 * From sysindexes' does not crash! I really do'nt know why RODBC crashes. Any hints to solve this problem with version 1.3-0 are welcome. Thanks in advance Michael Irskens German Public Employment Services Operations Research Nuremberg, Germany [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [OT] _ inserted in postings
On 08/09/2009 6:09 AM, (Ted Harding) wrote: Sorry if this is too OT, but there is a particular relevance to postings to R-help. Of recent times, I have received several postings via R-help (and some other mailing-lists) in which the _ character is inserted where, presumably, a space, , was intended. An example (received this morning) is below, which (from the headers) was originally sent through Yahoo web-mail. This is as seen when I read it. [ From: FMH kagba2...@yahoo.com Subject: [R] Derivative of nonparametric curve Date: Tue, 8 Sep 2009 02:07:10 -0700 (PDT) ] Dear All, I'm looking for_a way on computing the derivative of first and second order_of a smoothing curve produced by a nonprametric regression. For instance, if we run the R script below, a smooth nonparametric regression curve is produced. provide.data(trawl) Zone92__ - (Year == 0 Zone == 1) Position - cbind(Longitude - 143, Latitude) dimnames(Position)[[2]][1] - Longitude - 143 sm.regression(Longitude, Score1, method = aicc, col = red,_ model = linear) Could someone please give some hints_on the way to_find the derivative_on the curve at_some points ? Thank you. Kagba In the message headers I see: Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable and, on inspection of the message text in the inbox folder (i.e. as it was delivered to me), I see that each and every occurrence of the _ is represented there as the three successive characters =A0, i.e. a quoted-printable code. Normally, I can happily ignore this kind of thing when it just occurs in text. But since, as in the above message, it can also be interpolated into R code, this could cause unnecessary inconvenience for people who want to test the code which people post to R-help. Ditto if someone should post code (such as the above) as a solution to someone else's problem: As received, it just would not work! Comment: I have for long time been under the impression, now apparently a delusion, that the abominable quoted-printable had found its due final resting-place in the Museum of Dishonorable Obsolescence; apparently not! Also, if people are using web-mailers (Yahoo or other) that wantonly insert this kind of rubbish, they should look into the possibility of either changing the configuration under which they post their mails (if possible), or mailing via a different agent. The quoted-printable aspect may be a red herring, since I also see that the = signs in the code are represented as =3D, as is forced in quoted-printable; but they have been rendered correctly in the text as seen (and copied). So it may just be due to substitution of _ for on the ,art of the mailer. What do others think? I don't see the underscores in that posting, but I do see this as the last line in the headers: X-MIME-Autoconverted: from quoted-printable to 8bit by fisher.stats.uwo.ca id n8898BkC031633 (fisher.stats.uwo.ca is the server that receives my email). So it looks as though whatever is doing the conversion on your system isn't doing it as well as it should. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Sure, here you go structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c(This, This is, This is an, This is an example ), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = structure(1:4, .Label = c(This, This is, This is an, This is an example), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) 2009/9/8 jim holtman jholt...@gmail.com: Can you post how you would like it. On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
I don't think you described your problem precisely. You implied that you wanted the field lengths to be (2,2,18,5,18) -- which is what you got with read.fwf -- but it looks like what you meant is something more like: field 1: first two characters field 2: characters 3-4 field 3: all alphabetic characters up to the next numeric value (not more than 18) field 4: all numeric values up to the next whitespace (not more than 5) field 5: all alphabetic characters to end of line (not more than 18) is that correct? (i.e., perhaps your field lengths were MAXIMUM lengths?) at the moment all I can think of is using read.fwf with field lengths 2,2, 41 and as.is=TRUE (to preserve the last field as character), then use some combination of gsub, grep, strsplit, paste to pull apart the last three fields ... Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Data-separated-by-spaces%2C-getting-data-into-R-using-field-lengths-tp25344686p25345083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Character manipulation using strsplit vectorization
On Sep 8, 2009, at 12:39 AM, Steven Kang wrote: Dear R users, Suppose I have a data set with inconsistent names for a field. I desire to make these to consistent names. i.e University of New Jersey, New Jersey Uni, New Jersey University (3 different inconsistent names) to The University of New Jersey (consistent name) Below are arbitrary data set produced from state.name (built in data set in R) and associated scripts. d - as.data.frame(c(state.name[30:40], paste(state.name[30:40], University, sep= ), paste(Th University of, state.name[30:40], sep= ),paste(University o, state.name[30:40], sep= ))) da - sapply(d, as.character) # factor to character transformation spl - strsplit(da, ) # spliting components dd - character(dim(da)[1]) # initializing empty vector for (i in 1:dim(da)[1]) { if (sum(c(New, Jersey, University) %in% spl[[i]]) = 3) dd[i] - The University of New Jersey else if (sum(c(New, Mexico, University) %in% spl[[i]]) = 3) dd[i] - The University of New Mexico else if (sum(c(New, York) %in% spl[[i]]) = 2) dd[i] - The University of New York else if (sum(c(North, Carolina) %in% spl[[i]]) = 2) dd[i] - The university of North Carolina } Note: above shows only partial (if/else if) conditions. The if (cond ){ }else{} construct is for program control rather revision of vectors. You should consider using the - ifelse(cond ) val1 , val2) construct. Q1: The above for loop works fine (but very slow on large data set..), thus I would like to explore whether there is an alternative VECTORIZATION method that may speed up the process. Q2: Also, is there other way to extract a string from a phrase without using %in%? Many grep-isch functions are available that are vectorised regular expression machines. ? grep will show quite a few. i.e ac %in% unlist(strsplit(ac dc, )) [1] TRUE -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inverse Mills in clustered (multilevel) cross-sectional panel data
Dear R saviors, kindly address to this problem, I would really appreciate any takers. I am trying to resolve this issue of IMR in clustered (multilevel) cross-sectional panel data for more than two months now,. The characteristics of my dataset are as follows: - some 900 000 individuals - total of 60 countries - cross-sectional time series at the country level max 10 years, not all countries included every year For each country, we have a maximum of 10 cross sectional samples (1 per year) of at least 2000 adult-age individuals (random selection). But, individuals are not followed over time. Every year a new random sampling is carried out. I am interested in analysing individuals' behaviors in a given economic activity -- entrepreneurship. To do this, I first need to control for the fact that some individuals self-select to entrepreneurship. This self-selection may be influenced by individual-level characteristics (such as age, gender, education etc) as well as country-level factors (e.g., taxation). Because both individual- and country-level factors may drive both self-selection and behavior, once self-selection has occurred, *multi-level techniques are required for the selection equation. How to do this in R. *The results of this selection equation would then be used as a control in the second stage where an OLS is to be run Thank you for any suggestions -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
This bears no relationship to what you were first asking. It look like you want to split the leading 4 characters into two groups of two and then split the remaining data into three parts based on numerics in the middle. Is this correct? On Tue, Sep 8, 2009 at 8:15 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Sure, here you go structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c(This, This is, This is an, This is an example ), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = structure(1:4, .Label = c(This, This is, This is an, This is an example), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) 2009/9/8 jim holtman jholt...@gmail.com: Can you post how you would like it. On Tue, Sep 8, 2009 at 8:07 AM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On 9/8/2009 8:21 AM, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Your problem is the intermediate file. Why not get R to read directly from the database, using RODBC? Duncan Murdoch Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 08, 2009 at 02:53:11PM +0300, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? I am not totally sure what exaclty the expected result is. From your description I got the impression that your data file uses a mixture of separation characters and fixed-width formatting. Maybe I misinterpreted your example. Have a look at read.fwf() an if that does not solve your problem maybe explain the Structure and expected result a little further. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tinn-R setup
Kevin, I did the same installation a week or two ago. The same downloading process occurred, but not so many iterations. I don't know if it was normal or not , but the system seems to work very well. Steve Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 rkevinbur...@cha rter.net Sent by: To r-help-boun...@r- r-help@r-project.org project.orgcc Subject 09/07/2009 11:44 [R] Tinn-R setup AM I recently installed R 2.9.2 on a new Windows platform. Everything seemed to installed OK. I then downloaded the latest Tinn-R (2.3.2.3 I think) and as I have always done I selected R - Configure - Permanent. I was greeted with a dialog box asking me for a mirror site. I don't remember this prompt before but I decided to play along and I select a mrror site. Then the process takes off installing what appears to be every package that has ever been conceived for R (translate - alot of packages). Is this normal? I repeated this about three times because at the end it gave me an error indicating sus-and-such library was not found. Each time it was a different library that couldn't be found. Finally on the third try it seemed to complete without error. Again I have never had to go through so many steps to get Tinn-R installed and configured. Did I do something wrong? Is there a bug in this package or a problem with the integration between R 2.9.2 and Tin-R 2.3.2.3? Thank you. Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ridge and PLS grouping property
Dear all, I have collected the standardize regression coefficients of Ridge and PLS. As theory says that Ridge and PLS can retain the grouping property. How can I see such grouping properties in a data sets i.e group of correlated variables will get similar coefficients. But how similar they should be? Thanks -- Linda Garcia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very basic question regarding plot.Design...
Petar Milin wrote: Hello ALL! I have a problem to plot factor (lets say gender) as a line, or at least both line and point, from ols model: ols1 - ols(Y ~ gender, data=dat, x=T, y=T) plot(ols1, gender=NA, xlab=gender, ylab=Y, ylim=c(5,30), conf.int=FALSE) If I convert gender into discrete numeric predictor, and use forceLines=TRUE, plot is not nice and true, since it shows values between 1 and 2. Thanks! PM Petar, forceLines seems to be doing what it was intended to do. I'm not clear on why you need a line, though. If you provide self-contained code and data that replicate your problem I may be able to help more, or you can try a new package I'm about to announce. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Thanks for the suggestion, but I don't have an access to this database, I just got this messy file. -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:21 AM, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Your problem is the intermediate file. Why not get R to read directly from the database, using RODBC? Duncan Murdoch Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Hi what about reading each line by readLine and then split it to desired portions? x-paste(letters, collapse=) substring(x, c(1,3,5),c(2,4,15)) Regards Petr r-help-boun...@r-project.org napsal dne 08.09.2009 14:21:53: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his is 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 08, 2009 at 03:21:53PM +0300, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. OK - now I got it. RODBC has already been sugested. If for some reason that is impossible you could try to dump the data using a proper delimiter (e.g. tab). Without a real delimiter it is certainly hard to parse the data - and it may even be impossible depending on what characters are allowed in your free-text fields. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Thanks Petr, I tried something like this con - file(C:temppi.txt, r, blocking = FALSE) g - readLines(con) close(con) sta - c(1, 3, 5, 19) sto - c(2, 4, 18, 100) do.call(rbind, lapply(g, function(x) substring(x, sta, sto))) [,1] [,2] [,3] [,4] [1,] DF 12 This is an ex ample 1 This [2,] DF 12 This is an 12 32 This is [3,] DF 14 This is 12334 This is an [4,] DF 15 This 23 This is an example But this is not the solution I was looking for. Thanks. -L 2009/9/8 Petr PIKAL petr.pi...@precheza.cz: Hi what about reading each line by readLine and then split it to desired portions? x-paste(letters, collapse=) substring(x, c(1,3,5),c(2,4,15)) Regards Petr r-help-boun...@r-project.org napsal dne 08.09.2009 14:21:53: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Regards, L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 8:07 AM, Lauri Nikkinen wrote: Thanks, I tried it but I got varlength - c(2, 2, 18, 5, 18) read.fwf(c:temppi.txt, widths=varlength) V1 V2 V3 V4 V5 1 DF 12 This is an exampl e 1 T his 2 DF 12 This is an 1232 T his i s 3 DF 14 This is 12334 Thi s is an 4 DF 15 This 23 This is a n exa mple Which is not the way I want it. It looks as though that's because you don't have fixed width data. This is an example is 19 chars, including the leading space. You told R it was 18. This is an is only 12 characters. I would say you have two fixed width fields, and three varying fields, with no delimiters. If the middle one of the three always contains digits and the others don't, you can probably extract them using sub(), but you can't use any of the read.* functions to do this: your format is too strange. Duncan Murdoch structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class = factor), V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, 1L), .Label = c( This 23 This is a, This is 12334 Thi, This is an 1232 T, This is an exampl), class = factor), V4 = structure(c(1L, 2L, 4L, 3L), .Label = c(e 1 T, his i, n exa, s is ), class = factor), V5 = structure(c(2L, 4L, 1L, 3L), .Label = c(an , his, mple, s), class = factor)), .Names = c(V1, V2, V3, V4, V5), class = data.frame, row.names = c(NA, -4L)) Any ideas? -L 2009/9/8 Duncan Murdoch murd...@stats.uwo.ca: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? See ?read.fwf. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using an array of strings with strsplit, issue when including a space in split criteria
UPDATE: I'm not sure why, but on my Windows XP 64bit machine, I ran the same code again and this time it is not working even though it worked previously. This has been done using the Rgui --vanilla command. x - c(Weekly sales figures to 30 August 2008 published 5 September, Weekly sales figures to 6 September 2008 published 11 September) strsplit(x, 'published ', fixed=TRUE) [[1]] [1] Weekly sales figures to 30 August 2008 [2] 5 September [[2]] [1] Weekly sales figures to 6 September 2008 published 11 September O/S: Windows XP 64bit Pro; Service Pack 2 sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States. 1252;LC_MONETARY=English_United States. 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On 8 Sep, 09:47, Tony Breyal tony.bre...@googlemail.com wrote: After further investigation it appears that the problem is specific to my Vista PC. I am able to get the correct results using R 2.9.2 on a Window XP 64bit machine. However i do not know why this does not work on my Vista PC. The following was done after rebooting Vista. From CMD.exe I ran the following line: C:\Program Files\R\R-2.9.2\binRgui --vanilla This opened up R. ### R 2.9.2 START ### txt - c(sales to 23 August 2008 published 29 August, + sales to 6 September 2008 published 11 September) strsplit(txt, 'published', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 11 September strsplit(txt, 'published ', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 published 11 September sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base ### R 2.9.2 END ### The exact same thing happened when I used R 2.9.0 and R 2.8.1 on this same vista computer. ### R 2.9.0 ### sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] rcom_2.1-3 rscproxy_1.3-1 loaded via a namespace (and not attached): [1] tools_2.9.0 ### R 2.8.1 ### sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base my computer details are: Windows Vista Ultimate Service Pack 1 Manufacturer: Dell Rating: 3.4 Processor: Intel Core 2 Duo CPU E6750 @ 2.66 GHz Memory (RAM): 4.00 GB System type: 32-bit Operating System 2009/9/8 Gabor Grothendieck ggrothendi...@gmail.com: I am using the exact same version of R as you also on Vista but can't reproduce your result. For me it splits properly. Try starting R like this (modify path if needed) from the Windows cmd line: \Program Files\R\R-2.9.2\bin\Rgui --vanilla and then try it. On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyaltony.bre...@googlemail.com wrote: Dear all, I'm having a problem understanding why a split does not occur with in the 2nd use of the function strsplit below: # text strings txt - c(sales to 23 August 2008 published 29 August, + sales to 6 September 2008 published 11 September) # first use strsplit(txt, 'published', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 11 September # second use, but with a space ' ' in the split strsplit(txt, 'published ', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 published 11 September Thank you kindly for any help in advance. Tony O/S: Win Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-0 __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
Re: [R] About create dataset permanently in package
Hi Yan, Before I try to answer your question: please send R questions to the r- help mailing list (cc'd here). You will likely get your answer more quickly (since I'm not the only one looking at it) and it might help other people who get stuck on a similar situation by finding the answer on a public forum (just like you did). On Sep 8, 2009, at 5:44 AM, Yan Hui wrote: Dear Steve I have the same problem that I cannot load my own data sets. Feel lucky I find solutions provided by you on the internet, but based on your two options, I still cannot load my own data. Could you please check my procedure to see which step is wrong: 1) I use package called ismev 2) I add rain1.rda into C:\Program Files\R\R-2.9.1\library\ismev \data , which data is a zip folder 3) Load ismev package into R GUI 4) data(rain1) 5) Warning message: In data(rain1) : data set 'rain1' not found OR: 4) load (rain1, package=ismev) 5) Error in load(rain1, package = ismev) : unused argument(s) (package = ismev)d See ?load and ?data -- notice that it's not the load function that takes the `package` parameter, but rather the data function. But, here's a better tip: DON'T PUT YOUR DATA IN SOMEONE ELSE'S PACKAGE Why do you want to do this to begin with? I can't imagine any compelling reason to do so unless you are the author of the package. If you do have a reason to put it there, let me know because I find this urge to do so a bit puzzling. Anyway, just save your rain1.rda file in some directory that's related to your project, and load it from there. So, if your data is in C:\Documents and Settings\YOU\RainRainGoAway \rain1.rda, then load it from there in R: R load(C:\Documents and Settings\YOU\ RainRainGoAway\rain1.rda) That should do the trick ... let us know if you need more help. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using fiel
On 08-Sep-09 12:21:53, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. Regards, L Perhaps not just hard to explain, but possibly inpossible to inplement without further indications of where the breaks between fields may occur, since it is possible for spaces to occur within a field! Taking the example, and the field-width data, which you first supplied: On 9/8/2009 7:53 AM, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? I am now inferring that it might be as follows: record 1: DF|12|This is an example|1|This | record 2: DF|12|This is an|1232 |This is | record 3: DF|14|This is |12334|This is an| record 4: DF|15|This |23 |This is an example| This inference is based on: 1: noticing that the length of This is an example is 18, and 2: noticing that there are two cases of 18 in your field lengths, followed by 3: some mental shuffling to see how the data you supplied could fit into that pattern in a not-too-nonsensical way. Without the final step (which is not computable, so R is out; and step 1 also depends on recognising a complete coherent phrase), you could have had: record 1: DF|12|This is an example|1|This | record 2: DF|12|This is an 1232 |This |is| record 3: DF|14|This is 12334 |This |is an | record 4: DF|15|This 23 This is |an |example | (or several similar variants), and there is nothing in the information you supplied which could help to choose between them. You have no deifinition of cell! I see that you said (in a later mail): I don't have an access to this database, I just got this messy file. In that case, unless you have further information about how the space-separated bits of text/number should be formed into individual fields (cells), I think you are stuck -- you have no basis on which to make progress. If you have further information, plase share it. Otherwise no-one will be able to get past the above! Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 14:03:12 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sparse vectors
Hi I deal with long vectors almost all of whose elements are zero. Typically, the length will be ~5e7 with ~100 nonzero elements. I want to deal with these objects using a sort of sparse vector. The problem is that I want to be able to 'add' two such vectors. Toy problem follows. Suppose I have two such objects, 'a' and 'b': a $index [1]20 30 1 $val [1] 2.2 3.3 4.4 b $index [1] 3 30 $val [1] 0.1 0.1 What I want is the sum of these: AplusB $index [1]3 20 30 1 $val [1] 0.1 2.2 3.4 4.4 See how the value for index=30 (being common to both) is 3.4 (=3.3+0.1). What's the best R idiom to achieve this? -- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regarding SVM using R
Hi Steve I am facing a little problem in predict function which is I think mismatch of dimension. Infacted area is covered by ***. svm = function() { library(RODBC) # load RODBC library for database access channel = odbcConnect(demo_dsn, sa, 1234) # connecting to the database with the dabtabase data = sqlQuery(channel, SELECT top 100 * FROM [Demographics].[dbo].[CHA_Training]) odbcClose(channel) # close the database connection index = 1:nrow(data) # getting a vector of same size as data sample_index - sample(index, length(index) / 3) # samples of the above vector training - data[-sample_index, ]# 2/3 training data validation - data[sample_index, ] # 1/3 test data x = training[, length(training)] # seperating class labels model.ksvm = ksvm(x, data = training, kernel = rbfdot, kpar= list(sigma = 0.05), C = 5, cross = 3) # train data through SVM *** Problamisitc area: prSV = predict(model.ksvm, validation[, -length(validation)], type = decision) # validate data Error: Error in .local(object, ...) : test vector does not match model ! Notes: If I modified the predict function as prSV = predict(model.ksvm, validation[, length(validation)], type = decision) then it works but its not correct. * table(prSV, validation[, length(validation)]) # draw table } Thanks Abbas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regarding SVM using R
Hi, On Sep 8, 2009, at 9:09 AM, Abbas R. Ali wrote: Hi Steve I am facing a little problem in predict function which is I think mismatch of dimension. Infacted area is covered by ***. svm = function() { library(RODBC) # load RODBC library for database access channel = odbcConnect(demo_dsn, sa, 1234) # connecting to the database with the dabtabase data = sqlQuery(channel, SELECT top 100 * FROM [Demographics]. [dbo].[CHA_Training]) odbcClose(channel) # close the database connection index = 1:nrow(data) # getting a vector of same size as data sample_index - sample(index, length(index) / 3) # samples of the above vector training - data[-sample_index, ]# 2/3 training data validation - data[sample_index, ] # 1/3 test data x = training[, length(training)] # seperating class labels model.ksvm = ksvm(x, data = training, kernel = rbfdot, kpar= list (sigma = 0.05), C = 5, cross = 3) # train data through SVM *** Problamisitc area: prSV = predict(model.ksvm, validation[, -length(validation)], type = decision) # validate data You need to pass in data of the same dimension (# of cols) that you trained on to your predict function. You have already split your data into training and testing (`training`, `validation`). Why are you removing certain dimensions (features/columns) from your validation set when you pass it into the predict function? ie: predict(model.ksvm, validation[, -length(validation)] should probably be predict(model.ksvm, validation, ...) That should work ... but if you're using this for anything serious, be sure you understand why. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New package: rms
This is to announce a new package rms on CRAN. rms goes along with my book Regression Modeling Strategies. The home page for rms is http://biostat.mc.vanderbilt.edu/rms, or go directly to http://biostat.mc.vanderbilt.edu/Rrms for information just about the software. rms is a re-write of the Design package that has improved graphics and that duplicates very little code in the survival package. In particular, rms does not use low-level C language interfaces to other packages and will be easier to maintain. rms also interfaces to quantile regression (new Rq function), and interfaces to glm and gls have been renamed Glm and Gls. rms requires the latest version of Hmisc on CRAN. rms has cleaned up graphics routines to make them more modular, to use lattice graphics, and to make it easier to use ggplot2 graphics. Defaults for confidence bands are now gray scale-shaded polygons. The most visable change for the user is the replacement of the plot.Design function with the Predict, plot.Predict, and bplot functions. plot.Predict is used for bivariate graphics (using lattice), and bplot is used for 3-d graphics using base graphics functions image, contour, and persp. Note that multi-panel lattice graphics are usually better than 3-d graphics for showing the effects of multiple predictors varying simultaneously. The output of Predict is suitable for direct use by lattice (e.g., the xyplot function) and ggplot2 if you don't want to use plot.Predict. plot.Predict allows you to specify a lattice formula (less the left hand side) if you don't like plot.Predict's choice of superpositioning and panel variables. The following outlines the most significant change users will need to make (the web page contains the complete list). Note that the convention used for getting predictions over the default range is now predictor=. rather than predictor=NA. require(rms) # instead of Design; loads Hmisc and survival plot(fit, x1=NA, x2=NA, ...) changed to p - Predict(fit, x1=., x2=., ...) plot(p) # ?plot.Predict for details; produces a lattice object plot(Predict(fit, ...)) print(plot(p)) # needed if using Sweave or are inside { } plot(fit, .., method='image' or 'contour' or 'persp') changed to p - Predict(..., np=50) # type ?Predict for details bplot(p, method=) # ?bplot for details; uses base graphics The nomogram function now has a plot method so nomogram() by itself does not plot. Type ?rmsOverview for an overview and extensive examples. The package's home page contains a reference card you can print. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse vectors
one simple way could be: sparse.vec - function (..., fun = sum) { lis - list(...) values - unlist(lapply(lis, [[, value)) inds - factor(unlist(lapply(lis, [[, index))) out - tapply(values, inds, FUN = fun) list(index = as.numeric(levels(inds)), values = out) } a - list(index = c(20, 30, 1), value = c(2.2, 3.3, 4.4)) b - list(index = c(3, 30), value = c(0.1, 0.1)) sparse.vec(a, b) sparse.vec(a, b, fun = prod) sparse.vec(a, b, fun = function(x) Reduce(-, x)) I hope it helps. Best, Dimitris Robin Hankin wrote: Hi I deal with long vectors almost all of whose elements are zero. Typically, the length will be ~5e7 with ~100 nonzero elements. I want to deal with these objects using a sort of sparse vector. The problem is that I want to be able to 'add' two such vectors. Toy problem follows. Suppose I have two such objects, 'a' and 'b': a $index [1]20 30 1 $val [1] 2.2 3.3 4.4 b $index [1] 3 30 $val [1] 0.1 0.1 What I want is the sum of these: AplusB $index [1]3 20 30 1 $val [1] 0.1 2.2 3.4 4.4 See how the value for index=30 (being common to both) is 3.4 (=3.3+0.1). What's the best R idiom to achieve this? -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected behavior in friedman.test and ks.test
I have to start by saying that I am new to R, so I might miss something crucial here. It seems to me that the results of friedman.test and ks.test are wrong. Now, obviously, the first thing which crossed my mind was it can't be, this is a package used by so many, someone should have observed, but I can't figure out what it might be. Problem: let's start with friedman.test. I have a lot of data to analyze and I grew sick and tired of clicking and selecting in Excel (for which we have a statistics Add-In purchased, don't' start to flame me on using Excel for stats, please!); so I wanted to automate the analysis in R and figured out the results differ from Excel. Example Take the data from example(friedman.test) (Hollander Wolfe (1973), p. 140ff.). I ran the example in R and got: Friedman rank sum test data: RoundingTimes Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805 Same data, in Excel, using the WinSTAT for Excel (Fitch software), gives: Friedman chi-squared = 10.6364, df = 2, p-value =0.004902 Puzzled, I entered the data in the calculator from Vassar (http://faculty.vassar.edu/lowry/fried3.html ) and got exactly the same values as in Excel (and, again, different from R). Admittedly, the differences are not large, and both fall below the 0.05 threshold, but, still. So, question 1 would be why is R different from both Excel and Vassar? Now to the Kolmogorov–Smirnov test, from which my odeal actually started: the results from ks.test are wildly different from the ones I have got with the Excel add-in. Basically, I have 32 sets of observations (patients) for 100 independent variables (different blood analyses). Question was whether the data is normally distributed for each of the analyses and, hence, whether I can apply a parametric test or not. Once I had loaded the data in a dataframe (and it looks as expected), I ran: ks.test(myData$f1_A, pnorm) ks.test(myData$f8_A, pnorm) They give p-values of 2.2e-16 (with ties) and 8.882e-16. The Excel Add-In gives p-values of 0.0074491 and, respectively, 0.2730477 Here the difference is serious, like between highly significant non-normal for both f1 and f8 (R), or one non-normal and one normal (the Add-in). I first thought that the difference might arise from different probablity distributions (but what else, if not pnorm). Then I ran the friedman test, to find out similar discrepancies. I'd really appreciate some input on this: what's wrong and how should I decide whom to trust? Many thanks in advance, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse vectors
Hi, On Sep 8, 2009, at 9:06 AM, Robin Hankin wrote: Hi I deal with long vectors almost all of whose elements are zero. Typically, the length will be ~5e7 with ~100 nonzero elements. I want to deal with these objects using a sort of sparse vector. Would using sparse matrices (from the Matrix or SparseM packages) be overkill? -steve The problem is that I want to be able to 'add' two such vectors. Toy problem follows. Suppose I have two such objects, 'a' and 'b': a $index [1]20 30 1 $val [1] 2.2 3.3 4.4 b $index [1] 3 30 $val [1] 0.1 0.1 What I want is the sum of these: AplusB $index [1]3 20 30 1 $val [1] 0.1 2.2 3.4 4.4 See how the value for index=30 (being common to both) is 3.4 (=3.3+0.1). What's the best R idiom to achieve this? -- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA in cca (vegan)
Gavin Simpson gavin.simpson at ucl.ac.uk writes: On Fri, 2009-09-04 at 17:15 +0200, Kim Vanselow wrote: Dear all, I would like to calculate a cca (package vegan) with species and environmental data. One of these environmental variables is cos(EXPOSURE). The problem: for flat releves there is no exposure. The value is missing and I can't call it 0 as 0 stands for east and west. The cca does not run with missing values. What can I do to make vegan cca ignoring these missing values? Thanks a lot, Kim This is timely as Jari Oksanen (lead developer on vegan) has been looking into making this happen automatically in vegan ordination functions. The solution for something like cca is very simple but it gets more complicated when you might like to allow features like na.exclude etc and have all the functions that operate on objects of class cca work nicely. For the moment, you should just process your data before it goes into cca. Here I assume that you have two data frames; i) Y is the species data, and ii) X the environmental data. Further I assume that only one variable in X has missings, lets call this Exposure: Kim, A test version of NA handling in cca is now in the development version of vegan at http://vegan.r-forge.r-project.org/. You may get current source code or a bit stale packages from that address (when writing this, the packages are two to three days behind the current devel version). Instruction of downloading the working version of vegan can be found in the same web site. Basically the development version does exactly the same thing as Gavin showed you in his response. It does a listwise elimination of missing values. Indeed, it may be better to do that manually and knowingly than to use perhaps surprising automation of handling missing values within the function. Your missing values are somewhat wierd as they are not missing values (= unknown and unobserved), but you just decided to use a coding system that does not cope with your well known and measured values. I would prefer to find a coding that puts flat ground together with exposure giving similar conditions. In no case should they be regarded as NA since they are available and known, and censoring them from your data may distort your analysis. Perhaps having a new variable (hasExposure, TRUE/FALSE) and coding them as east/west (=0) in Exposure could make more sense. Indeed, model term hasExposure*Exposure would make sense as this would separate flat ground from slopes of different Exposures. The interaction term and aliasing would take care of having flat ground with known values but separate from exposed slopes. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse vectors
Robin == Robin Hankin rk...@cam.ac.uk on Tue, 08 Sep 2009 14:58:49 +0100 writes: Robin Hi guys Robin thanks for this, it works fine, but I'm not sure the Matrix package does Robin what I want: a = sparseMatrix(i=c(20, 30, 10), j=rep(1, 3), x=c(2.2, 3.3, 4.4)) Robin Error in asMethod(object) : Robin Cholmod error 'out of memory' at file:../Core/cholmod_memory.c, line 148 Robin Surely an efficient storage mechanism would need only six pieces of Robin information? sure. sparseMatrix() is designed to produces column-compressed sparse matrices (CsparseMatrix), as these are optimal in some sense for further matrix operations notably in the CHOLMOD C library to which the Matrix package is interfaced. Indeed, you have triggered a problem in that CHOLMOD code, which needs an inordinate amount of memory when it really should not. Alternatively, for your case, I'd recommend to use the (older, slightly less flexible) constructor spMatrix() [which produces a triplet (Tsparse..) sparse matrix representation] which works without that funny memory glitch. Note BTW, that 'i=10' pretty close .Machine$integer.max and we currently require the indices to be integer (in the sense of R, i.e., 32-bit). Alternatively, I had introduced the sparseVector class into the Matrix package a while ago, which *does* allow numeric indices ... .. but at the moment does not have too many methods defined, notably not arithmetic. { The reason I introduced the class was actually to allow reshaping sparse matrices, i.e., to use dim(sparseMatrix) - c(n1, n2) }. Robin I've been pondering the solution that Henrique suggested, that uses Robin merge(). This seems to be fine, although it might be possible Robin to squeeze some efficiency gains by using the fact that Robin the index vector is always sorted, which migh save some Robin searching time. the sparseMatrix and sparseVector classes in Matrix do always keep the indices sorted, and actually your use case would motivate me quite a bit to add more (arithmetic) capabilities to the sparseVector classes. Martin Robin Any thoughts anyone? Robin best wishes Robin Robin Robin Benilton Carvalho wrote: library(Matrix) a = sparseMatrix(i=c(20, 30, 1), j=rep(1, 3), x=c(2.2, 3.3, 4.4)) b = sparseMatrix(i=c(3, 30), j=rep(1, 2), x=c(0.1, 0.1), dims=dim(a)) theSum = a+b summary(theSum) hth, b On Sep 8, 2009, at 10:19 AM, Henrique Dallazuanna wrote: Try this: abMerge - merge(a, b, by = 'index', all = TRUE) list(index = abMerge$index, val = rowSums(abMerge[,2:3], na.rm = TRUE)) On Tue, Sep 8, 2009 at 10:06 AM, Robin Hankin rk...@cam.ac.uk wrote: Hi I deal with long vectors almost all of whose elements are zero. Typically, the length will be ~5e7 with ~100 nonzero elements. I want to deal with these objects using a sort of sparse vector. The problem is that I want to be able to 'add' two such vectors. Toy problem follows. Suppose I have two such objects, 'a' and 'b': a $index [1]20 30 1 $val [1] 2.2 3.3 4.4 b $index [1] 3 30 $val [1] 0.1 0.1 What I want is the sum of these: AplusB $index [1]3 20 30 1 $val [1] 0.1 2.2 3.4 4.4 See how the value for index=30 (being common to both) is 3.4 (=3.3+0.1). What's the best R idiom to achieve this? -- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] ATT1.txt Robin -- Robin Robin K. S. Hankin Robin Uncertainty Analyst Robin University of Cambridge Robin 19 Silver Street Robin Cambridge CB3 9EP Robin 01223-764877 Robin __ Robin R-help@r-project.org mailing list Robin https://stat.ethz.ch/mailman/listinfo/r-help Robin PLEASE do read the posting guide http://www.R-project.org/posting-guide.html Robin and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
Re: [R] Data in Array
Not sure what you mean by 'store', but you can use a list: a - matrix(1, nrow = 5, ncol = 1) b - matrix(2, nrow = 10, ncol = 1) c - matrix(3, nrow = 15, ncol = 1) myList - list(a, b, c) str(myList) List of 3 $ : num [1:5, 1] 1 1 1 1 1 $ : num [1:10, 1] 2 2 2 2 2 2 2 2 2 2 $ : num [1:15, 1] 3 3 3 3 3 3 3 3 3 3 ... myList[[2]] [,1] [1,]2 [2,]2 [3,]2 [4,]2 [5,]2 [6,]2 [7,]2 [8,]2 [9,]2 [10,]2 On Tue, Sep 8, 2009 at 8:14 AM, FMHkagba2...@yahoo.com wrote: Yes, but what actually i want to have is an array that might store these different matrices. - Original Message From: Schalk Heunis schalk.heu...@enerweb.co.za To: FMH kagba2...@yahoo.com Cc: r-help@r-project.org Sent: Tuesday, September 8, 2009 11:16:29 AM Subject: Re: [R] Data in Array have you tried rbind? On Tue, Sep 8, 2009 at 11:16 AM, FMHkagba2...@yahoo.com wrote: Dear All, I have some data which were stored in few matrices with different orders. Let have three different matrices a, b and c, which have the same number of column but different number of row. a - matrix(1, nrow = 5, ncol = 1) b - matrix(2, nrow = 10, ncol = 1) c - matrix(3, nrow = 15, ncol = 1) How could i put all these matrices in an array? Thank you Kagba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] barplot with lines instead of bars
Dear useRs, I want to plot the following barplot with lines instead of bars. Is there a way? data - data.frame(cbind(k = 0:3, fk = c(11, 20,7,2), f0k = c(13.72, 17.64, 7.56, 1.08), fkest = c(11.85, 17.78, 8.89, 1.48))) d - t(data[,2:4]) barplot(d, beside=TRUE) Regards, Rafael. [[elided Yahoo spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a linear model with a break point
dear Dan, As far as I know, the strucchange package can be helpful for you.. On the other hand, if your regression function is continuous at the unknown break points to be estimated, you could try the segmented package. Hope this helps you, vito Daniel Brewer ha scritto: Hello, I would like to test some data to see whether it has the shape of a step function (i.e. y1 up until x_th and then y2 where x_th is the threshold). The threshold x_th is unknown and the x values can only take discrete values (0,1,2,3,4). An example would be: data- data.frame(x=1:20,y=c(rnorm(10),rnorm(10,10))) I was thinking along the lines of fitting some sort of piiecewise linear model which has the gradient constrained to zero trying out all possible different threshold and taking the one with the least residuals. I am not sure how to implement this in R. Anyone got any ideas? Also is there a way of including the threshold in the actual model, so that could be estimated too? Thanks Dan -- Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Università di Palermo viale delle Scienze, edificio 13 90128 Palermo - ITALY tel: 091 6626240 fax: 091 485726/485612 http://dssm.unipa.it/vmuggeo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regarding SVM using R
Hi, On Sep 8, 2009, at 9:56 AM, Abbas R. Ali wrote: my dimentions of trining set dim(trainingset) = 7 x 96 and dim (validation) = 3 x 96 other thing if i want to predicit trainingset accuracy it is also giviing me same error. Now its no issue of dimentions from my side. 1. Please keep replies on list 2 . Please post your new code along with the error. Thank, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regarding SVM using R
Hi, On Sep 8, 2009, at 9:28 AM, Abbas R. Ali wrote: by using predict(model.ksvm, validationset, ...) I am facing following error: Error: '...' used in an incorrect context No. I didn't mean to explicitly use ..., sorry. Actually, I had a two line fix here, but I'm looking back at your original code, now it seems there's more of a problem. Note that the data you pass to `ksvm` and `predict` should be similar! Presumably you are passing in a matrix of n observations with m features. The matrix should be of dimension n x m! The matrix for training needs to have m columns, as does the matrix for predict. Look at your original code and notice that you are removing columns from your `training` and `validation` sets ... WHY? Here are relevant pieces of your code: training - data[-sample_index, ]# 2/3 training data validation - data[sample_index, ] # 1/3 test data # So far so good x = training[, length(training)] # Why are you removing the columns? The columns are the features # for each example, aren't they? All examples need to have the same # number of features! # ... prSV = predict(model.ksvm, validation[, -length(validation)], type = decision) # Notice that you are now removing A DIFFERENT column in the test # set. Does it make sense to remove some random FEATURE in training # and another completely different feature from testing? No, it probably # doesn't. In short -- stop removing the features (columns) from your training/ testing data and it should work. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot with lines instead of bars
Here is a solutions using ggplot2 and reshape library(reshape) library(ggplot2) data - data.frame(k = 0:3, fk = c(11, 20,7,2), f0k = c(13.72, 17.64, 7.56, 1.08), fkest = c(11.85, 17.78, 8.89, 1.48)) Molten - melt(data, id.vars = k) ggplot(Molten, aes(x = k, y = value, colour = variable)) + geom_line() HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Rafael Moral Verzonden: dinsdag 8 september 2009 16:45 Aan: r-help Onderwerp: [R] barplot with lines instead of bars Dear useRs, I want to plot the following barplot with lines instead of bars. Is there a way? data - data.frame(cbind(k = 0:3, fk = c(11, 20,7,2), f0k = c(13.72, 17.64, 7.56, 1.08), fkest = c(11.85, 17.78, 8.89, 1.48))) d - t(data[,2:4]) barplot(d, beside=TRUE) Regards, Rafael. [[elided Yahoo spam]] [[alternative HTML version deleted]] Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pharmacokinetic and pharmacodynamic modeling and simulation
Pharmacokinetic and pharmacodynamic modeling and simulation By Dr. Jan Freijer September 24, 2009 Amsterdam, The Netherlands http://www.can.nl/events/details.php?id=57 This course is aimed at users of R or S-PLUS in the bio-pharmaceutical sciences who would like to use R for clinical trial simulations. topics include Working with packages in R - MASS - odesolve Random generation from univariate distributions - density, distribution function, quantile function and random generation - various distributions Random generation from multivariate distributions - normal distribution - working with the covariance matrix - simulating PK and PK-PD model parameters Solving differential equations - solving differential equations in R - testing the numerical solution versus analytical solution - PK models for single dose oral or IV administration - PK models for multiple dose oral or IV administration - implementing PD models Clinical trial simulations - combining the structural model with the random effects model - uncertainty versus variability - example: two compartment PK model with indirect response model Location: Amsterdam Date: September 24 Time: 10:00h.-16:30h. Price : EURO 395,- excluding VAT Register: - phone : +31-(0)20-560-8400 - Email : pau...@can.nl - Web : http://www.can.nl/events/details.php?id=57 There is a maximum of 12 participants. You may register by replying to this email and provide us with the following information. Name : M / F Title : Department : Institute : Address: City : Zip: Telephone : Fax: Email : Please let us know if you have any questions. Please feel free to send this message on to your colleagues and friends for whom it might be interesting!! Kind regards, Dick Verkerk _ CANdiensten, Nieuwpoortkade 23-25, NL-1055 RX Amsterdam tel: +31 20 5608410 fax: +31 20 5608448 verk...@candiensten.nl _ Your Partner in Mathematics and Statistics! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse vectors
library(Matrix) a = sparseMatrix(i=c(20, 30, 1), j=rep(1, 3), x=c(2.2, 3.3, 4.4)) b = sparseMatrix(i=c(3, 30), j=rep(1, 2), x=c(0.1, 0.1), dims=dim(a)) theSum = a+b summary(theSum) hth, b On Sep 8, 2009, at 10:19 AM, Henrique Dallazuanna wrote: Try this: abMerge - merge(a, b, by = 'index', all = TRUE) list(index = abMerge$index, val = rowSums(abMerge[,2:3], na.rm = TRUE)) On Tue, Sep 8, 2009 at 10:06 AM, Robin Hankin rk...@cam.ac.uk wrote: Hi I deal with long vectors almost all of whose elements are zero. Typically, the length will be ~5e7 with ~100 nonzero elements. I want to deal with these objects using a sort of sparse vector. The problem is that I want to be able to 'add' two such vectors. Toy problem follows. Suppose I have two such objects, 'a' and 'b': a $index [1]20 30 1 $val [1] 2.2 3.3 4.4 b $index [1] 3 30 $val [1] 0.1 0.1 What I want is the sum of these: AplusB $index [1]3 20 30 1 $val [1] 0.1 2.2 3.4 4.4 See how the value for index=30 (being common to both) is 3.4 (=3.3+0.1). What's the best R idiom to achieve this? -- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] ATT1.txt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse vectors
On 08-Sep-09 13:06:28, Robin Hankin wrote: Hi I deal with long vectors almost all of whose elements are zero. Typically, the length will be ~5e7 with ~100 nonzero elements. I want to deal with these objects using a sort of sparse vector. The problem is that I want to be able to 'add' two such vectors. Toy problem follows. Suppose I have two such objects, 'a' and 'b': a $index [1]20 30 1 $val [1] 2.2 3.3 4.4 b $index [1] 3 30 $val [1] 0.1 0.1 What I want is the sum of these: AplusB $index [1]3 20 30 1 $val [1] 0.1 2.2 3.4 4.4 See how the value for index=30 (being common to both) is 3.4 (=3.3+0.1). What's the best R idiom to achieve this? I don't know about the best, Robin, but how about something like: indices - sort(unique(c(a$index,b$index))) N - length(indices) values - NULL for(i in indices){ if(i %in% a$index){A - a$val[a$index==i]} else A - 0 if(i %in% b$index){B - b$val[b$index==i]} else B - 0 values - c(values,A+B) } AplusB - list(index=indices,val=values) ## Test: a-list(index=c(20,30,1),val=c(2.2,3.3,4.4)) b-list(index=c(3,30),val=c(0.1, 0.1)) indices - sort(unique(c(a$index,b$index))) N - length(indices) values - NULL for(i in indices){ if(i %in% a$index){A - a$val[a$index==i]} else A - 0 if(i %in% b$index){B - b$val[b$index==i]} else B - 0 values - c(values,A+B) } AplusB - list(index=indices,val=values) AplusB # $index # [1] 3e+00 2e+01 3e+01 1e+08 # $val # [1] 0.1 2.2 3.4 4.4 Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 14:42:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re : calling combinations of variable names
Thanks to Justin, Baptiste, and Sebed for your answers. The solutions work well. I have been putting them to good use today: the code now works wonderfully and I learnt some useful tricks! thanks, Peter -Original Message- From: justin bem [justin_...@yahoo.fr] Sent: 9/8/2009 9:06:23 AM To: helter...@care2.com Cc: r-h...@stat.math.ethz.ch Subject: Re: Re : [R] calling combinations of variable names may be this can work testfun-function(x) { rval= k-length(x) for (i in 1: k) rval-paste(rval,x[i],sep=-) rval } v1-paste(evalr,1:4,sep=) eval-expand.grid(w=v1,x=v1,y=v1,z=v1) n-dim(eval)[1] results-rep(, n) for (i in 1:n) { row-unique(unlist(eval[i,])) if (length(row)=3) results[i]-testfun(row) } You just have to replace testfun by your own function in this case ICC. Sincerly... Justin BEM BP 1917 Yaoundé Tél (237) 76043774 De : Helter Two helter...@care2.com À : r-help@r-project.org Envoyé le : Lundi, 7 Septembre 2009, 18h17mn 22s Objet : [R] calling combinations of variable names R-2.9.1, Windows7 Dear list, I have a question to you that seems very simple to me, but I just can't figure it out. I have a dataframe called ratings which contains the following variables: evalR1, evalR2, evalR3, evalR4, scoreR1, scoreR2, scoreR3, scoreR4, opinionR1, opinionR2, opinionR3, opinionR4. (there are more variables, but this gives an idea of the data structure). What I want is run several analyses on all 3 or 4-combinations of a given variable. So, for example, I want to compute the following ICC's (function from the psych package): ICC(cbind(evalR1,evalR2, evalR3)) ICC(cbind(evalR1,evalR2, evalR4)) ICC(cbind(evalR1, evalR3, evalR4)) ICC(cbind(evalR2, evalR3, evalR4)) ICC(cbind(evalR1, evalR2, evalR3, eval4)). I create a matrix containing the 3-combinations by combn(4,3). Now I need to call the variables into the function. First, I tried paste as follows: combis - combn(4,3) # this gives the 3-combinations attach(ratings) eval - paste(evalR,combis[1,1],,evalR,combis[2,1],,evalR,combis[3,1],se p =) (this is of course just for 1 combination, as an example) the output of this is evalR1,evalR2,evalR3, but when I run ICC(cbind(eval)), an error message is given which is not given when I enter ICC(cbind(evalR1,evalR2, evalR3)) manually. The function appears not to recognize the variable names. It also does not work to type ICC(cbind(unquote(eval))). Alternatively, I have tried the cat function, but also here ICC does not recognize the input as variable names. What am I doing wrong? How can I automatically construct the set of variable names such that a function recognizes them as variable names? ICC is one example, but there are also other computations to be run and the set of variables is pretty large, so typing the combinations of variable names manually is really unattractive. What am I missing? It seems to me that there probably is a very simple solution in R, but which? Thank you, Peter Verbeet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior in friedman.test and ks.test
Alex, It's mainly speculation, as I cannot check the Excel add-in nor Vassar, but I'll give it a try. For the Friedman-test: Results of R coincide with those reported by Hollander Wolfe, which I'd take as a point in favor of R. In any case, my guess is that ties are handled differently (average ranks in R), but you'd have to check with the documentation of WinSTAT and Vassar. If it is not documented, see what test statistic you'd get manually according to which handling of ties. For the ks.test: See the ?ks.test for meaning of the exact argument of this function. I'd assume that Excel gives you the asymptotic p value only, while R will by default return an exact one for 32 samples. From the same help page: Otherwise, asymptotic distributions are used whose approximations may be inaccurate in small samples. You could check using something like ks.test(myData$f1_A, pnorm, exact=FALSE). If that doesn't resolve the issue: do the KS test (semi-)manually, which should not be that difficult (even in Excel, if the need may be), and compare the D value with the one obtained from R and Excel, respectively. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of atconsta-rh...@yahoo.com Sent: Dienstag, 8. September 2009 15:33 To: R-help@r-project.org Subject: [R] Unexpected behavior in friedman.test and ks.test I have to start by saying that I am new to R, so I might miss something crucial here. It seems to me that the results of friedman.test and ks.test are wrong. Now, obviously, the first thing which crossed my mind was it can't be, this is a package used by so many, someone should have observed, but I can't figure out what it might be. Problem: let's start with friedman.test. I have a lot of data to analyze and I grew sick and tired of clicking and selecting in Excel (for which we have a statistics Add-In purchased, don't' start to flame me on using Excel for stats, please!); so I wanted to automate the analysis in R and figured out the results differ from Excel. Example Take the data from example(friedman.test) (Hollander Wolfe (1973), p. 140ff.). I ran the example in R and got: Friedman rank sum test data: RoundingTimes Friedman chi-squared = 11.1429, df = 2, p-value = 0.003805 Same data, in Excel, using the WinSTAT for Excel (Fitch software), gives: Friedman chi-squared = 10.6364, df = 2, p-value =0.004902 Puzzled, I entered the data in the calculator from Vassar (http://faculty.vassar.edu/lowry/fried3.html ) and got exactly the same values as in Excel (and, again, different from R). Admittedly, the differences are not large, and both fall below the 0.05 threshold, but, still. So, question 1 would be why is R different from both Excel and Vassar? Now to the Kolmogorov-Smirnov test, from which my odeal actually started: the results from ks.test are wildly different from the ones I have got with the Excel add-in. Basically, I have 32 sets of observations (patients) for 100 independent variables (different blood analyses). Question was whether the data is normally distributed for each of the analyses and, hence, whether I can apply a parametric test or not. Once I had loaded the data in a dataframe (and it looks as expected), I ran: ks.test(myData$f1_A, pnorm) ks.test(myData$f8_A, pnorm) They give p-values of 2.2e-16 (with ties) and 8.882e-16. The Excel Add-In gives p-values of 0.0074491 and, respectively, 0.2730477 Here the difference is serious, like between highly significant non-normal for both f1 and f8 (R), or one non-normal and one normal (the Add-in). I first thought that the difference might arise from different probablity distributions (but what else, if not pnorm). Then I ran the friedman test, to find out similar discrepancies. I'd really appreciate some input on this: what's wrong and how should I decide whom to trust? Many thanks in advance, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] output layout question
Win7, R-2.9.1 Dear list, I have a large (nxm) matrix which is the output of an analysis. Since it is so large, I would like to use output formatting to make it easier to find particular values and patterns in the matrix. In particular, I want to print the matrix as follows: - cells with a value that is below a given threshold should be printed as empty (thus leaving out the low values from sight) - cells with particular values should be printed in bold face or with a larger font size Is this possible? How would one do such a thing? thanks, Roger __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Incomplete time series
Hi list, I have a data frame with a Date column and a Price column - for example: Date Price 01/01/2009 5.45 01/03/2009 6.53 01/04/2009 7.55 01/06/2009 6.76 01/08/2009 4.12 01/18/2009 5.87 ... As you can see, there are days for which I don't have any data. I would like to insert rows for missing dates that have values of NA for Price - for example: Date Price 01/01/2009 5.45 01/02/2009 NA 01/03/2009 6.53 01/04/2009 7.55 01/05/2009 NA 01/06/2009 6.76 ... With the goal of ultimately converting Price to a time series and dealing with the NAs via the zoo package or something similar. The first step, however is to add a row for every date. I have considered converting Date to a time series then using seq() to create a vector with the appropriate number of rows starting at the appropriate number of days since epoch, and then using match() to combine columns and add the desired rows. I'm new to time series in R, but it seems like there should be an easier way. I've gotten as far as Google and rseek can take me so any help would be appreciated. Bryan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SPSS Statistics-R Integration Plug-In
Dear All, Has anyone tried to use this plug-in? Since I am running R-2.9.1 it will not even let me install it. Further, since I am running Windows I cannot use the R provided R-2.7.0 Linux installation file from the archive (tried to install it through cygwin and it was a mess). Suggestions? Ideas? Has anybody used this plug-in? Michael -- Michael Chajewski, M.A. Department of Psychology Fordham University Dealy Hall Room 239 441 East Fordham Road Bronx, NY 10458 (718) 817-0654 http://www.chajewski.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting a linear model with a break point
Hello, I would like to test some data to see whether it has the shape of a step function (i.e. y1 up until x_th and then y2 where x_th is the threshold). The threshold x_th is unknown and the x values can only take discrete values (0,1,2,3,4). An example would be: data- data.frame(x=1:20,y=c(rnorm(10),rnorm(10,10))) I was thinking along the lines of fitting some sort of piiecewise linear model which has the gradient constrained to zero trying out all possible different threshold and taking the one with the least residuals. I am not sure how to implement this in R. Anyone got any ideas? Also is there a way of including the threshold in the actual model, so that could be estimated too? Thanks Dan -- ** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.bre...@icr.ac.uk ** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Averaging rows if a condition is true.
Thanks a lot, Mohamed. Kind regards, Ezhil --- On Mon, 9/7/09, Mohamed Lajnef mohamed.laj...@inserm.fr wrote: From: Mohamed Lajnef mohamed.laj...@inserm.fr Subject: Re: [R] Averaging rows if a condition is true. To: A Ezhil ezhi...@yahoo.com Cc: r-help@r-project.org Date: Monday, September 7, 2009, 9:22 PM Hi, Try to use aggregate function RSiteSearch (aggregate) #for help Regards ML A Ezhil a écrit : Dear All, I have matrix (5 X 60) of subjects and their responses to a set of questions. All responses are classified into categories (500). I would like to average all subject's responses for each category. I wrote a code using a for loop but is not working. Could please tell me what's wrong with the code? I guess, there is a elegant R way of doing the same thing. Thanks in advance. Kind regards, Ezhil j - 1; n - dim(dat)[1]; cat - as.character(dat[,1]); row - matrix(nrow=nrow(dat), ncol=ncol(dat)); for(i in 1:n-1) { if(cat[i] != cat[i+1]) {row[j, ] - dat[j, ]} else { start - j; end - i; } row[j, ] - colMeans(dat[j:i, ]); j+1; } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mohamed Lajnef INSERM Unité 955. 40 rue de Mesly. 94000 Créteil. Courriel : mohamed.laj...@inserm.fr tel. : 01 49 81 31 31 (poste 18470) Sec : 01 49 81 32 90 fax : 01 49 81 30 99 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
Ok, I think that I have to give up and try to get this data separated by some char. It seem pretty much impossible to separate those fields. Thanks for your help and efforts. -L 2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi: This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : R-crash when loading workspace - Windows
Hi Actually there is no more trouble with R 2.9.2 version. I was using 2.5.0 and really needed to update! Thanks anyway Edwige De : William Dunlap wdun...@tibco.com Envoyé le : Mardi, 8 Septembre 2009, 18h12mn 59s Objet : RE: [R] R-crash when loading workspace - Windows Could you put the offending workspace file on a website and also tell us the version of R that you were using? Then someone could try reproducing the problem and see what is going on. (There was a bug in reading such files that was fixed a few months ago.) Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sent: Monday, September 07, 2009 7:35 AM To: r-help@r-project.org Subject: [R] R-crash when loading workspace - Windows Dear all, One day when I tried to load an existing workspace (when opening R or by load()), R crashed without any error notification. The day before I had worked and saved my workspace without any trouble. At first I though it was a memory problem (workspace reaching 180Mo) or related to a particular script or command, so I start a new workspace. Everything was ok, that script and others working. Then I saved the workspace (55Mo) and tried to open it, without any result : R crashes without any notification again. This occurs only with Windows. Does someone know how to solve that problem? Regards, Edwige. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] makefile for sweave
Hello, I have the following makefile. The problem is that the bibliography doesn t work. Any help would be appreciated! I really don t don t what to do..:-( # The sources of the report (tex, Rnw and other files (e.g. bib, idx)) TEX_CMPS = Report problem RNW_CMPS = prop1 prop2 ExeExps OTHER = Report.bib # The name of the report to produce all: Report.pdf code: $(RNW_CMPS:=.R) clean: rm -f *.log *.dvi *~ # On what does the report depends? Report.pdf: $(TEX_CMPS:=.tex) $(RNW_CMPS:=.tex) ${OTHER} makefile TEXINPUTS=${TPUTS} pdflatex $ TEXINPUTS=${TPUTS} pdflatex $ rm *.log #mv *.aux $(dir $) # How to build the tex files from the Rnw (Sweave) files %.tex: %.Rnw echo library(utils); options(width=60); Sweave('$') | ${R_PRG} --no-save --vanilla mv $(notdir $*.tex) $(dir $) # How to build the R code files from the Rnw (Sweave) files %.R: %.Rnw echo library(utils); Stangle('$') | ${R_PRG} --no-save --vanilla %.bib: TEXINPUTS=${TPUTS} pdflatex $ bibtex $ %.aux: TEXINPUTS=${TPUTS} pdflatex $ bibtex $ %.idx: TEXINPUTS=${TPUTS} pdflatex $ makeindex $ cheers! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] optim() argument scoping: passing parameter values into user's subfunction
Dear useRs, I have a complicated function to be optimized with optim(), and whose parameters are passed to another function within its evaluation. This function allows for the parameters to enter as arguments to various probability distribution functions. However, I am violating some scoping convention, as somewhere within the hierarchy of calls a variable is not visible. I will give a genericized example here. myFxn - function(parms, Y, phi, other args) {body of function} ### I want to optimize this over its first argument optim(par=numeric(2), fn=myFxn, ### end of named args, next are all in ... Y=data, other args, phi=function(r) pnorm(r, mean=parms[2], sd=parms[1]) ) Error in pnorm(r, mean = parms[2], sd = parms[1]) : object 'parms' not found debugger()## options(error=expression(dump.frames())) in .RProfile Message: Error in pnorm(r, mean = parms[2], sd = parms[1]) : object 'parms' not found Available environments had calls: 1: optim(par = numeric(2), fn = myFxn, Y=data, other args, .. 2: function (par) 3: fn(par, ...) 4: ifelse(logical vector, phi(Y), other stuff within myFxn 5: phi(Y) 6: pnorm(r, mean = parms[2], sd = parms[1]) Now, when using the debugger in environments 1 and 2, I can see the value of 'par' correctly. I cannot access environment 4 as it just returns the original error message. Trying to access environment 3 gives the (to me) cryptic 'Error in get(.obj, envir = dump[[.selection]]) : argument ... is missing, with no default' and returns to the top level without debugging. I will try to explain to the best of my ability what I think is happening here. Environments 2 and 3 are from the first lines of optim(), where it is building an internal function to evaluate the candidate parameter values. When accessing environment 3, it seems like when it fills out the ... argument of fn(), it is passing phi=function(r) pnorm(r, mean=parms[2], sd=parms[1]) but upon trying to evaluate the variable 'parms', it cannot see it in the search path. When actually running the original call, 'parms' is apparently not evaluated yet, but is once the pnorm call is hit. It appears the 'parms' variable is being evaluated before the fn(par) is evaluated into myFxn(parms=par). A point which is probably, but not certainly, irrelevant: myFxn() has ... as its final argument, so as to pass tuning arguments to integrate(). The function being integrated contains phi(), as well as other stuff, of a dummy variable. The calls in the debugging tree, however, are *not* those involving integrate(). It would probably be possible to include some substitute/eval/as.name/etc. constructions within the function myFxn, in order to avoid this problem, but as myFxn already involves numeric integrations whose integrand involves the optimized parameters themselves, computing on the language at each step of the optimization seems like a bad idea. My question: is there a straightforward and efficient way of constructing this function and optimizing it with optim(), allowing for the argument phi to pass an arbitrary distribution function whose parameters are the global ones being optimized? In particular, in the third environment of the debugger tree, is there a way to force the fn(par, ...) myFxn(parms=par, ...) evaluation before the ... get evaluated? Thanks, John John Szumiloski, Ph.D. Senior Biometrician Biometrics Research WP53B-120 Merck Research Laboratories P.O. Box 0004 West Point, PA 19486-0004 (215) 652-7346 (PH) (215) 993-1835 (FAX) # obligatory session info Windows XP 2002 sp3, customized by corporate IT version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 9.2 year 2009 month 09 day05 svn rev49600 language R version.string R version 2.9.2 Patched (2009-09-05 r49600) search() [1] .GlobalEnvpackage:geometry package:datasets mylib1 mylib2 [6] package:VGAM package:stats4package:Design package:Hmisc package:boot [11] package:splines package:MASS package:nnet package:utils package:stats [16] package:graphics package:grDevices mylib3 mylib4 mylib5 [21] mylib6 package:methods Autoloads package:base
Re: [R] Data separated by spaces, getting data into R using fiel
Lauri, Having looked at your example file, and examined its byte-by-byte content, it is a plain ASCII file which gives exactly the same layout as you originally posted. This, along with the field-width information you originally supplied, is not sufficient to determine a unique dcomposition into fields. See my previous reply! I think I have to go along with Barry here: In my view, no further progress is possible without seeing an excerpt (the first few lines, or a few lines that cause the problem), from the *original* file. Even then, no further progress may be possible! And, just to make sure of things, do not use a copy paste method of extracting the sample lines -- as Barry points out, it is possible for a tab to get copied as a space. So the best way is to make a copy of the original file, and use a test editor to delete unwanted lines from the copy, so that the bytes in the sample file are a subset of the bytes from the original, and not something they have been translated into. And, by the way, what operating system are you using? Ted. On 08-Sep-09 15:47:55, Lauri Nikkinen wrote: This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. _I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 17:12:12 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confusion on use of permTS() in 'perm'
Consider the following simple example using R-2.9.0 and 'perm' 2.9.1: require('perm') p- c(15,21,26,32,39,45,52,60,70,82) g- c('y','n','y','y', rep('n',6)) #Patients ranked 1,3,4 receive treatment permTS(p ~ g, alternative = 'two.sided', method='exact.ce') #find p-value by complete enumeration Exact Permutation Test (complete enumeration) data: p by g p-value = 0.05 alternative hypothesis: true mean of g=n minus mean of g=y is not equal to 0 sample estimates: mean of g=n minus mean of g=y 28.38095 The permutation observed is '134', which has a rank sum of 8. Other permutations with rank sums of 8 or less are '123', '124' and '125'. So there are a total of 4 out of 4! = 120 possible, or a one-tail p-value of 4/120 = 0.0333, or a 2-tail p-value of 2*4/120 = 0.067. This is not, however, what permTS() returns. The permTS() value of 0.05 appears to correspond to 3 patterns, not 4. I am misunderstanding how to solve this simple problem, or is something going on with permTS() that I'm missing. Thanks. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Sep 8, 2009, at 12:00 PM, Lauri Nikkinen wrote: Ok, I think that I have to give up and try to get this data separated by some char. It seem pretty much impossible to separate those fields. Thanks for your help and efforts. The solution that Henrique offered seems to be a complete one: read.table(textConnection(gsub(([0-9]+), ;\\1;, DF12 This is an example 1 This + DF12 This is an 1232 This is + DF14 This is 12334 This is an + DF15 This 23 This is an example + )), sep = ;) V1 V2 V3V4 V5 1 DF 12 This is an example 1This 2 DF 12 This is an 1232 This is 3 DF 14 This is 12334 This is an 4 DF 15This 23 This is an example Verus what you wanted... structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class + = factor), +V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, +1L), .Label = c(This, This is, This is an, This is an example +), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = + structure(1:4, .Label = c(This, +This is, This is an, This is an example), class = + factor)), .Names = c(V1, + V2, V3, V4, V5), class = data.frame, row.names = c(NA, + -4L)) V1 V2 V3V4 V5 1 DF 12 This is an example 1 This 2 DF 12 This is an 1232This is 3 DF 14This is 12334 This is an 4 DF 15 This23 This is an example Unless you can be any clearer ... than you have been to this hour. -L 2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi: This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot with lines instead of bars
I'm sorry, but I think I was misunderstood. What I need is something like this: http://img525.imageshack.us/img525/2818/imagemyu.jpg Lines instead of bars Thanks! Rafael. ONKELINX, Thierry wrote: Here is a solutions using ggplot2 and reshape library(reshape) library(ggplot2) data - data.frame(k = 0:3, fk = c(11, 20,7,2), f0k = c(13.72, 17.64, 7.56, 1.08), fkest = c(11.85, 17.78, 8.89, 1.48)) Molten - melt(data, id.vars = k) ggplot(Molten, aes(x = k, y = value, colour = variable)) + geom_line() HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Rafael Moral Verzonden: dinsdag 8 september 2009 16:45 Aan: r-help Onderwerp: [R] barplot with lines instead of bars Dear useRs, I want to plot the following barplot with lines instead of bars. Is there a way? data - data.frame(cbind(k = 0:3, fk = c(11, 20,7,2), f0k = c(13.72, 17.64, 7.56, 1.08), fkest = c(11.85, 17.78, 8.89, 1.48))) d - t(data[,2:4]) barplot(d, beside=TRUE) Regards, Rafael. [[elided Yahoo spam]] [[alternative HTML version deleted]] Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/barplot-with-lines-instead-of-bars-tp25347695p25350500.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mantel test least square line
swertie v_coudrain at voila.fr writes: Hello, I performed a Mantel test and plotted communitiy similarities. I would like to add a least square line. I thought about using abline taking as slope the r-statistic of the Mantel test and calculating the y-intercept analytically. Is this method correct? Is there any function for this calculation? Thank you If you have Mantel statistic for two dist() objects (as produced by dist(), as.dist() or compatible functions), you can just use abline(lm(ydist ~ xdist)) because dist object is a vector with some extra attributes. Of course, this does not quite make sense, since distances do not have least squares fit in any reasonable sense. People do this all the time, though (ecologists, and aquatic ecologists in particular, I mean). Cheers, Jari Okanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using fiel
On 08-Sep-09 16:17:00, David Winsemius wrote: On Sep 8, 2009, at 12:00 PM, Lauri Nikkinen wrote: Ok, I think that I have to give up and try to get this data separated by some char. It seem pretty much impossible to separate those fields. Thanks for your help and efforts. The solution that Henrique offered seems to be a complete one: read.table(textConnection(gsub(([0-9]+), ;\\1;, DF12 This is an example 1 This + DF12 This is an 1232 This is + DF14 This is 12334 This is an + DF15 This 23 This is an example + )), sep = ;) V1 V2 V3V4 V5 1 DF 12 This is an example 1This 2 DF 12 This is an 1232 This is 3 DF 14 This is 12334 This is an 4 DF 15This 23 This is an example Surely the above solution is ad-hoc? It is based on an assumption that the fields alternate Text/Num/Text/Num/Text (hence the gsub usage), and does not at all make use of the field-width information varlength - c(2, 2, 18, 5, 18). It simply puts a ; separator at the start and end of every sequence of digits. If that is how Lauri's data really are organised, then the solution could work. But, if not, ... Ted. Verus what you wanted... structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class + = factor), +V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, +1L), .Label = c(This, This is, This is an, This is an example +), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = + structure(1:4, .Label = c(This, +This is, This is an, This is an example), class = + factor)), .Names = c(V1, + V2, V3, V4, V5), class = data.frame, row.names = c(NA, + -4L)) V1 V2 V3V4 V5 1 DF 12 This is an example 1 This 2 DF 12 This is an 1232This is 3 DF 14This is 12334 This is an 4 DF 15 This23 This is an example Unless you can be any clearer ... than you have been to this hour. -L 2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi: This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 17:39:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Count number of different patterns (Polytomous variable)
Hi there, Does anyone know a method to calculate the number of different patterns in a given data frame. The variables are of polytomous type and not binary (for the latter i found a package called countpattern which unfortunately only functions for binary variables). V1 V2 V3 0 3 1 1 2 0 1 2 0 So, in this case, i would like to get 2 as output. Thanks in advance Jürgen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cbind formula definition
Hi there, I have the following problem: I have a package called polLCA which has the following syntax: poLCA(formula, data) and needs the following formula definition: formula - cbind(V1,V2,V3,...) So far so good. What I tried now was the following: #Get data with the read.table fuction data - read.table(d:/ .) #Select cols to use in the analysis aktDat - data[2:15] #get the names names(aktDat) #put them together in one string, comma as separation sign bi - paste(names(aktDat),collapse=,) #use this string in the f function to bind the variables formula - cbind(bi)~1 #Calculate the modell poLCA(formula, data) but this doesn't work: I get the following error message: Warnung in model.matrix.default(formula, mframe) : variable 'cbind(bi)' converted to a factor Could anyone help me? Thanks Greetings Jürgen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using an array of strings with strsplit, issue when including a space in split criteria
SOLVED. Thanks to a reply off-list it appears that the 'space' in published 11 is actually some kind of multibyte character. If I physically delete the 'space' and replace it by using the spacebar on my keyboard, then strsplit() behaves as expected. I had got the text from a hyperlink and copy and pasted it into R. It did not occur to me that the 'spaces' might be something else. However I am surprised that it worked in the first instance for both of the kind posters above. Perhaps i'm just unluky with the local settings on my Vista PC :S Cheers everyone, much appreciated! Tony On 8 Sep, 11:57, Tony Breyal tony.bre...@googlemail.com wrote: UPDATE: I'm not sure why, but on my Windows XP 64bit machine, I ran the same code again and this time it is not working even though it worked previously. This has been done using the Rgui --vanilla command. x - c(Weekly sales figures to 30 August 2008 published 5 September, Weekly sales figures to 6 September 2008 published 11 September) strsplit(x, 'published ', fixed=TRUE) [[1]] [1] Weekly sales figures to 30 August 2008 [2] 5 September [[2]] [1] Weekly sales figures to 6 September 2008 published 11 September O/S: Windows XP 64bit Pro; Service Pack 2 sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States. 1252;LC_MONETARY=English_United States. 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On 8 Sep, 09:47, Tony Breyal tony.bre...@googlemail.com wrote: After further investigation it appears that the problem is specific to my Vista PC. I am able to get the correct results using R 2.9.2 on a Window XP 64bit machine. However i do not know why this does not work on my Vista PC. The following was done after rebooting Vista. From CMD.exe I ran the following line: C:\Program Files\R\R-2.9.2\binRgui --vanilla This opened up R. ### R 2.9.2 START ### txt - c(sales to 23 August 2008 published 29 August, + sales to 6 September 2008 published 11 September) strsplit(txt, 'published', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 11 September strsplit(txt, 'published ', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 published 11 September sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base ### R 2.9.2 END ### The exact same thing happened when I used R 2.9.0 and R 2.8.1 on this same vista computer. ### R 2.9.0 ### sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] rcom_2.1-3 rscproxy_1.3-1 loaded via a namespace (and not attached): [1] tools_2.9.0 ### R 2.8.1 ### sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base my computer details are: Windows Vista Ultimate Service Pack 1 Manufacturer: Dell Rating: 3.4 Processor: Intel Core 2 Duo CPU E6750 @ 2.66 GHz Memory (RAM): 4.00 GB System type: 32-bit Operating System 2009/9/8 Gabor Grothendieck ggrothendi...@gmail.com: I am using the exact same version of R as you also on Vista but can't reproduce your result. For me it splits properly. Try starting R like this (modify path if needed) from the Windows cmd line: \Program Files\R\R-2.9.2\bin\Rgui --vanilla and then try it. On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyaltony.bre...@googlemail.com wrote: Dear all, I'm having a problem understanding why a split does not occur with in the 2nd use of the function strsplit below: # text strings txt - c(sales to 23 August 2008 published 29 August, + sales to 6 September 2008 published 11 September) # first use strsplit(txt, 'published', fixed=TRUE) [[1]] [1] sales to 23 August 2008 29 August [[2]] [1] sales to 6 September 2008 11 September # second use, but with a space ' ' in the split strsplit(txt, 'published ', fixed=TRUE) [[1]] [1] sales to 23 August
[R] strange results in summary and IQR functions
Dear R users, Something is strange in summary and IQR. Suppose, I have a data set and I would like to find the Q1, Q2, Q3 and IQR. x-c(2,4,11,12,13,15,31,31,37,47) summary(x) Min. 1st Qu. MedianMean 3rd Qu.Max. 2.00 11.25 14.00 20.30 31.00 47.00 IQR(x) [1] 19.75 However, I test the same data set in SAS proc univariate, and SAS shows that Q1=11, Q2=14 and Q3=31. I think most of us agree that Q1 is 11 not 11.25. Could someone please explain to me why R shows Q1=11.25 not 11? Many Thanks Tu -- View this message in context: http://www.nabble.com/strange-results-in-summary-and-IQR-functions-tp25348079p25348079.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mapping factors to a new set of factors
Hello, I am trying to map a factor variable within a data frame to a new variable whose entries are derived from the content of the original variable and there are fewer factors in the new variable. That is, I'm trying to set up a surjection. After first thinking that this would be a common operation and would have a quite simple interface, I can not seem to find one, nor any similar posts on this topic (please correct me if there is something). Therefore, I have written a function to perform this mapping. However, the function I have written doesn't seem to work with vectors greater than length 1, and as such is useless. Is there any way to ensure the function would work appropriately for each element of the vector input? mapLN - function(x) { Reg - levels(df$Var1) if (x==Reg[1] | x==Reg[2] | x==Reg[13] | x==Reg[17] | x==Reg[20] | x==Reg[23] | x==Reg[27]) {North} else if (x==Reg[3] | x==Reg[5] | x==Reg[7] | x==Reg[14] | x==Reg[15] | x==Reg[24] | x==Reg[30]) {East} else if (x==Reg[4] | x==Reg[6] | x==Reg[8] | x==Reg[9] | x==Reg[11] | x==Reg[16] | x==Reg[18] | x==Reg[21] | x==Reg[22] | x==Reg[25] | x==Reg[28] | x==Reg[29] | x==Reg[31]) {West} else if (x==Reg[10] | x==Reg[12] | x==Reg[19] | x==Reg[26] | x==Reg[32]) {South} else stop(Not in original set) } Many thanks, James This E-Mail is confidential and intended solely for the use of the individual to whom it is addressed. If you are not the addressee, any disclosure, reproduction, copying, distribution or other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately by replying to this e-mail, or telephone 01382 207 222, and then delete this e-mail. All outgoing messages are checked for viruses however no guarantee is given that this e-mail message, and any attachments, are free from viruses. You are strongly recommend to check for viruses using your own virus scanner. Neither SCRC or SSSC will accept responsibility for any damage caused as a result of virus infection. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting pvalues from ttest
okay fixed it by putting c in quote marks. 1Rnwb wrote: Hello, I am using B as a vector to store all the t.tests. since i am a newbie to both R and statistics I am not sure if B is the list. Also I see c used in the do.call formula and do not know what it is being used for. I used aggregate but getting this error Error in aggregate.data.frame(eo, eo$PlateID, function(.sub) t.test(ENA78 ~ : 'by' must be a list. As i mention in my OP that I am using is function from an earlier post. any help is appreciated. Thanks Sharad Jun Shen-3 wrote: I never used by. Is B a list? If not, I am not sure if lapply can take it. Try aggregate(). On Fri, Aug 28, 2009 at 10:53 AM, 1Rnwb sbpuro...@gmail.com wrote: Hello list, I have a similar issue as this post http://tolstoy.newcastle.edu.au/R/e6/help/09/04/11438.html#options2 and I used the suggestion provided by Jorge with modifications to my data do.call(c,lapply(your_list_with_the_t_tests,function(x) x$p.value)) but I am getting the following error after excuting the code B-by(eo,eo$PlateID, function(.sub) t.test(mcp1~Self_T1D,data=.sub, na.rm=T)) #ttest platewise do.call(c,lapply(B, function(x) x$p.value)) Error in do.call(c, lapply(B, function(x) x$p.value)) : 'what' must be a character string or a function here B is equal to your_list_with_the_t_tests. is something i am doing wrong -- View this message in context: http://www.nabble.com/extracting-pvalues-from-ttest-tp25192381p25192381.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/extracting-pvalues-from-ttest-tp25192381p25349826.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange results in summary and IQR functions
It's all simply a matter of definitions, and there are many who disagree. See ?quantile , specifically the type argument. Since IQR does not appear to have a type argument, you could easily write your own versions of these that do what SAS does (assuming that is your goal). With x defined as you have it, look at the results of this function call, which shows the different values for quantile that you get by using different type arguments. sapply(1:9, function(y) quantile(x, type = y)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][,9] 0% 222 2.02 2.00 2.00 2.0 2. 25%11 114 7.5 11 9.25 11.25 10.41667 10.5625 50%13 14 13 13.0 14 14.00 14.00 14.0 14. 75%31 31 31 31.0 31 32.50 31.00 31.5 31.3750 100% 47 47 47 47.0 47 47.00 47.00 47.0 47. Best, Erik Iverson -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Chunhao Tu Sent: Tuesday, September 08, 2009 10:09 AM To: r-help@r-project.org Subject: [R] strange results in summary and IQR functions Dear R users, Something is strange in summary and IQR. Suppose, I have a data set and I would like to find the Q1, Q2, Q3 and IQR. x-c(2,4,11,12,13,15,31,31,37,47) summary(x) Min. 1st Qu. MedianMean 3rd Qu.Max. 2.00 11.25 14.00 20.30 31.00 47.00 IQR(x) [1] 19.75 However, I test the same data set in SAS proc univariate, and SAS shows that Q1=11, Q2=14 and Q3=31. I think most of us agree that Q1 is 11 not 11.25. Could someone please explain to me why R shows Q1=11.25 not 11? Many Thanks Tu -- View this message in context: http://www.nabble.com/strange-results-in-summary-and-IQR-functions-tp25348079p25348079.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] barplot with lines instead of bars
What's the difference between a line and a thin bar? Hadley On Tue, Sep 8, 2009 at 12:17 PM, rafamoralrafa_moral2...@yahoo.com.br wrote: I'm sorry, but I think I was misunderstood. What I need is something like this: http://img525.imageshack.us/img525/2818/imagemyu.jpg Lines instead of bars Thanks! Rafael. ONKELINX, Thierry wrote: Here is a solutions using ggplot2 and reshape library(reshape) library(ggplot2) data - data.frame(k = 0:3, fk = c(11, 20,7,2), f0k = c(13.72, 17.64, 7.56, 1.08), fkest = c(11.85, 17.78, 8.89, 1.48)) Molten - melt(data, id.vars = k) ggplot(Molten, aes(x = k, y = value, colour = variable)) + geom_line() HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Rafael Moral Verzonden: dinsdag 8 september 2009 16:45 Aan: r-help Onderwerp: [R] barplot with lines instead of bars Dear useRs, I want to plot the following barplot with lines instead of bars. Is there a way? data - data.frame(cbind(k = 0:3, fk = c(11, 20,7,2), f0k = c(13.72, 17.64, 7.56, 1.08), fkest = c(11.85, 17.78, 8.89, 1.48))) d - t(data[,2:4]) barplot(d, beside=TRUE) Regards, Rafael. [[elided Yahoo spam]] [[alternative HTML version deleted]] Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/barplot-with-lines-instead-of-bars-tp25347695p25350500.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ada package question
Hi, I am using ada to predict a data set with 36 variables ada(x~.,data=train,iter=Iter, control=rpart.control(maxdepth=4,cp=-1,minsplit=0,xval=0)) can any one tell me in in laymans terms maxdepth- how do you set this, how do you change this to improve predictions success cp- same question for this also minsplit- same question for this also how do I change all this parameters to my advantage/ Greatly appreciated the help pc [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using fiel
Well, yeah, Henrique's solutions works fine with this data. Thanks for that, although this is not so generic solutions which I was looking after. As I originally posted, I was looking for solution which uses the field-width information, as Ted pointed out. But as I already mentioned, it seems that this is quite impossible to achieve. Thanks anyway. 2009/9/8 Ted Harding ted.hard...@manchester.ac.uk: On 08-Sep-09 16:17:00, David Winsemius wrote: On Sep 8, 2009, at 12:00 PM, Lauri Nikkinen wrote: Ok, I think that I have to give up and try to get this data separated by some char. It seem pretty much impossible to separate those fields. Thanks for your help and efforts. The solution that Henrique offered seems to be a complete one: read.table(textConnection(gsub(([0-9]+), ;\\1;, DF12 This is an example 1 This + DF12 This is an 1232 This is + DF14 This is 12334 This is an + DF15 This 23 This is an example + )), sep = ;) V1 V2 V3 V4 V5 1 DF 12 This is an example 1 This 2 DF 12 This is an 1232 This is 3 DF 14 This is 12334 This is an 4 DF 15 This 23 This is an example Surely the above solution is ad-hoc? It is based on an assumption that the fields alternate Text/Num/Text/Num/Text (hence the gsub usage), and does not at all make use of the field-width information varlength - c(2, 2, 18, 5, 18). It simply puts a ; separator at the start and end of every sequence of digits. If that is how Lauri's data really are organised, then the solution could work. But, if not, ... Ted. Verus what you wanted... structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = DF, class + = factor), + V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L, + 1L), .Label = c(This, This is, This is an, This is an example + ), class = factor), V4 = c(1L, 1232L, 12334L, 23L), V5 = + structure(1:4, .Label = c(This, + This is, This is an, This is an example), class = + factor)), .Names = c(V1, + V2, V3, V4, V5), class = data.frame, row.names = c(NA, + -4L)) V1 V2 V3 V4 V5 1 DF 12 This is an example 1 This 2 DF 12 This is an 1232 This is 3 DF 14 This is 12334 This is an 4 DF 15 This 23 This is an example Unless you can be any clearer ... than you have been to this hour. -L 2009/9/8 Lauri Nikkinen lauri.nikki...@iki.fi: This is the file (see the attachment) that represents the problem I'm facing with the original file. I'm looking for some generic way to solve this problem. Thank you for your time. -L 2009/9/8 Barry Rowlingson b.rowling...@lancaster.ac.uk: On Tue, Sep 8, 2009 at 1:52 PM, Lauri Nikkinenlauri.nikki...@iki.fi wrote: But this is not the solution I was looking for. Thanks. I think the only way you'll get the solution you are looking for is if you can let us have a copy of the original input file, or at least the first few lines - and not pasted into an email because special characters like spaces and tabs get smushed up and confuse things. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 08-Sep-09 Time: 17:39:27 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ellipse: Major Axis Minor Axis
Vishal vishalps at gmail.com writes: Jari, thanks for the quick answer. sqrt(eigen(cov.trob(mydataforellipse)$cov)$values) what will this return? For my data, I get: sqrt(eigen(cov.trob(r)$cov)$values) [1] 1.857733e-05 4.953181e-06 Is this Left hand value the major or the semi major length? I also try to plot a circuit keeping this as the radius/diameter I don't get a circle that intersects the major axis of the ellipse. Vishal, Then you probably do not have an ellipse directly defined by the robust covariance matrix, but it is scaled up by some statistic. Check the code of the function (or in a happy case, its documentation) to see what is the scaling factor. I checked with drawing an ellipse directly with the cov.trob data and both of the circles fit nicely. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count number of different patterns (Polytomous variable)
Dear Jürgen, If x is your data, here are two suggestions: # Suggestion 1 length(unique( apply(x, 1, paste, sep=, collapse=) ) ) # [1] 2 # Suggestion 2 res - as.data.frame( xtabs( ~. , as.data.frame( x ) ) ) dim(res[res$Freq 0,])[1] # [1] 2 HTH, Jorge 2009/9/8 Biedermann, Jürgen juergen.biederm...@charite.de Hi there, Does anyone know a method to calculate the number of different patterns in a given data frame. The variables are of polytomous type and not binary (for the latter i found a package called countpattern which unfortunately only functions for binary variables). V1 V2 V3 0 3 1 1 2 0 1 2 0 So, in this case, i would like to get 2 as output. Thanks in advance Jürgen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count number of different patterns (Polytomous variable)
try this: DF - data.frame(V1 = c(0,1,1), V2 = c(3,2,2), V2 = c(1,0,0)) DF nrow(unique(DF)) I hope it helps. Best, Dimitris Biedermann, Jürgen wrote: Hi there, Does anyone know a method to calculate the number of different patterns in a given data frame. The variables are of polytomous type and not binary (for the latter i found a package called countpattern which unfortunately only functions for binary variables). V1 V2 V3 0 3 1 1 2 0 1 2 0 So, in this case, i would like to get 2 as output. Thanks in advance Jürgen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.