Re: [R] CPCA?
Daniel Berner mail.mcgill.ca> writes: > > It would be great to know if and where an R code for Common Principal Component Analysis is available. The only code I know is in IMSL Fortran library, and that is proprietary, and that is expensive. After a quick look, it really appears that IMSL C library does not have the code, but you got to go to Fortran. Look for function DKPRIN (KPRIN). That code cannot be ported to R, naturally. Another issue is that the code obviously was written by Bernhard Flury, and he also has FORTRAN listing in his book "Commaon Principal Components and Related Multivariate Models" (J. Wiley & Sons, 1988). I don't know of the licensing status of that code. I didn't find any explicit licensing information in the book (but may have overlooked something), apart from the usual book copyright that forbids everything. I assume it is the same code that is the base of DKPRIN in the IMSL FORTRAN library. Would mean a lot of typing at minimum, and for any sensible code, editing the code to use LAPACK where ever possible. Further, Flury descirbes the algorithm in his book, and it could be implemented independently in R. I think nobody has done that (yet). cheers, jari oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scatterplot Showing All Points
Wayne Aldo Gavioli fas.harvard.edu> writes: > > > Hello all, > > I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of data > with the caveat that many of the data points overlap with each other (share > the > same x AND y coordinates). In using the usual "plot" command, > > > plot(education, xlab="etc", ylab="etc") > > it seems that the overlap of points is not shown in the graph. Namely, there > are 5,000 points that should be plotted, as I mentioned above, but because so > many of the points overlap with each other exactly, only about 50-60 points > are > actually plotted on the graph. Thus, there's no indication that Point A > shares > its coordinates with 200 other pieces of data and thus is very common while > Point B doesn't share its coordinates with any other pieces of data and thus > isn't common at all. Is there anyway to indicate the frequency of such points > on such a graph? Should I be using a different command than "plot"? > > One suggestion seems to be still missing: 'sunflowerplot' of base R. May look taggy, though, if you have 200 "petals". Actually the documentation of sunflowerplot is wrong in botanical sense. Sunflowers have composite flowers in capitula, and the things called 'petals' in documentation are ligulate, sterile ray-florets (each with vestigial petals which are not easily visible in sunflower, but in some other species you may see three (occasionally two) teeth). cheers, jari oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error on distance matrix
Marc Moragues scri.ac.uk> writes: > > Hi, > > I am trying to calculate a distance matrix on a binary data frame using > dist.binary() {ade4}. This is the code I run and the error I get: > > > sjlc.dist <- dist.binary(as.data.frame(data), method=2) #D = (a+d) / > (a+b+c+d) > Error in if (any(df < 0)) stop("non negative value expected in df") : > missing value where TRUE/FALSE needed > > I don't know if the problem are the missing values in my data. If so how > can I handle them? > Dear Marc Moragues, At least adding NA to a data.frame gave the same error message as you report above. Odds are good for NA being responsible (but we cannot know: we only guess). Further, it seems that ade4:::dist.binary does not have an option to handle NA input. Problem here is that what do you think should be done with NA? Should you get a NA result? Should the whole observation be removed because of NA? Or should the comparisons be based on pairwise omissions of NA meaning that index entries are based on different data in the same matrix? Or should you impute some values for missing entries (which is fun but tricky)? One solution is to use function designdist in vegan where you can with some acrobary design your own dissimilarity indices. Function designdist uses different notations, because its author hates that misleading and dangerous 2x2 contingency table notation. The following, however, seems to define the same index as ade4: designdist(data, "sqrt(1-(2*J+P-A-B)/P)") See the documentation of vegan:::designdist to see how to define things there (and the sqrt(1-x) part comes from the way ade4 changes similarities to dissimilarities). BTW, don't call your data 'data'. R wisdom (see fortunes) tells you that you do not call your dog dog, but I'm not quite sure of this. At least in yesterdays horse races in national betting, one of the winner horses was called 'Animal', so why not... cheers, jari oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error on distance matrix
Jari Oksanen oulu.fi> writes: > > Marc Moragues scri.ac.uk> writes: > > > > > Hi, > > > > I am trying to calculate a distance matrix on a binary data frame using > > dist.binary() {ade4}. This is the code I run and the error I get: > > > > > sjlc.dist <- dist.binary(as.data.frame(data), method=2) #D = (a+d) / > > (a+b+c+d) > > Error in if (any(df < 0)) stop("non negative value expected in df") : > > missing value where TRUE/FALSE needed > > > > I don't know if the problem are the missing values in my data. If so how > > can I handle them? > > > Dear Marc Moragues, > > At least adding NA to a data.frame gave the same error message as you report > above. Odds are good for NA being responsible (but we cannot know: we only > guess). ...clip... > One solution is to use function designdist in vegan where you can with some > acrobary design your own dissimilarity indices. It is a bad habit to comment on yourself, but if you write stupid things, you should tell so: that designdist() "handles" missing values is accidental, unintentional and dangerous. Now you still should figure out *how* they are handled, and the function should warn users if they supply data with missing values. I hope that will be changed. >Function designdist uses > different notations, because its author hates that misleading and dangerous > 2x2 > contingency table notation. The following, however, seems to define the same > index as ade4: > > designdist(data, "sqrt(1-(2*J+P-A-B)/P)") > You can make this cuter: designdist(data, "sqrt((A+B-2*J)/P)") Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] png() and pdf() alignment problems
Dear Roland Kaiser, I think there may different pixel-to-point scaling in different devices. This scaling seems not only to vary with the device but even with your physical screen. This is what happens with my current box (MacBook): > quartz() > par("mar")/par("mai") [1] 6 6 6 6 > par("cra") [1] 6 12 > png() > par("mar")/par("mai") [1] 4.686347 4.686347 4.686347 4.686347 > par("cra") [1] 13 16 > dev.off() > pdf() > par("mar")/par("mai") [1] 5 5 5 5 > par("cra") [1] 10.8 14.4 > x11() > par("mar")/par("mai") [1] 4.686347 4.686347 4.686347 4.686347 > par("cra") [1] 13 16 The quotient gives the par margin line height in inches, par("cra") the dimensions of the characters in "rasters" (which may be close to size in points, but may differ a lot in some screens). For your alignment, I think you want to make margin proportions equal in two devices, and you must scale so figs so instead of using physically equal dimensions. The mar/mai ratio or cra does not change with device width & height, so you must change those so that for fig looks like it should. I have no ideas if this works, but I think this still is your problem... Greetings to the Moderator, Jari Oksanen Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adonis (vegan package) and subsetted factors
On 10 Apr 2008, at 18:57, tyler wrote: > On Thu, Apr 10, 2008 at 04:47:44PM +0100, Gavin Simpson wrote: >> >> This behaviour arises from the following, using the in-built dune >> data: >> >>> newdune.env <- subset(dune.env, Management != "NM") >>> newdune.env$Management >> [1] BF SF SF SF HF SF HF HF BF BF HF SF SF HF >> Levels: BF HF NM SF >> >> Notice this hasn't dropped the empty level "NM", and this is what is >> catching out adonis --- it is not checking for empty levels in the >> grouping factor, as this shows: >> >>> newdune <- dune[which(dune.env$Management != "NM"), ] >>> adonis(newdune ~ Management*A1, data=newdune.env, permutations=100) >> >> Call: >> adonis(formula = newdune ~ Management * A1, data = newdune.env, >> permutations = 100) >> >>Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) >> Management 3.0 0.57288 0.19096 1.27735 0.2694 <0.01 *** >> >> For now, forcibly remove empty factor levels as per your second >> example, >> but I'll take a look at fixing adonis() > > Great, thanks! Howdy folks, Fine that this was taken care of while my wife and I were seeing Fellini's 8 1/2 in our laptop. Actually, this same problem recently emerged in varpart and could be lurking elsewhere in vegan. With some luck, this fix would have made its way to the latest minor CRAN release of vegan we had today. Unfortunately the Earth is designed so that you folks over there in Canada are lagging behind, and you were still asleep when we had our release here in Europe. It seems that CRAN was fast today, and the new version already is available as a source and Windows binary in the main CRAN home. With our faster release cycles this will be in CRAN rather soon, and it is immediately (or from our morning and Canada's late night) in packages at R-Forge. Best wishes, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gam and ordination (vegan and labdsv surf and ordisurf)
On Thu, 2008-11-20 at 09:45 -0500, stephen sefick wrote: > #for instance this > ordisurf(bug.4, env.savannah[,"TSS"]+env.savannah[,"TIN.TP"]) > Stephen, According to ordisurf documentation, this is correct if 'bug.4' is an ordination result, and the sum of those two env.savannah columns is the single variable you want to plot. The independent variables will be the axes which are found from 'bug.4', and there can only be one response variable, but that can be a sum of several variables. The sensibility of this model is up to the user. If you want to plot several y-variates in the same graph instead of their sum, you can use argument add=TRUE for the second variable (defaults FALSE) and ordisurf() overlays the fitted surface in an existing plot. Cheers, Jari Oksanen -- Jari Oksanen <[EMAIL PROTECTED]> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting two distance matrices
> > I'm trying to plot two distance matrices against each other > (74x74,phylogenetic distance and phenotypic distances). However R > gives an error message:"Error in plot.new() : figure margins too > large". > > Is it because I have too many points to plot? No. It *sounds* like you don't have distance matrices but data frames of distances, because this is the error message that you get. The default plot of a data frame is a pairs() plot of all columns (variables) against each other. You will have a data frame if you read in your distances from an external file (or changed them into data frame). Try changing your data sets into regular R dist objects, and plot these against each other (and as.dist really seems to work to data frames directly): plot(as.dist(d1), as.dist(d2)) This was only guessing, since the message did not contain sufficient information. cheers, jari oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Canberra distance
Frédéric Chiroleu cirad.fr> writes: > > Hi, > > I misunderstand the definition of Canberra distance in R. > > On Internet and in function description pages of dist() from stats and > Dist() from amap, Canberra distance between vectors x and y, d(x,y), is : > > d(x,y) = sum(abs(x-y)/(x+y)) > > But in use, through simple examples, we find that the formula is : > > d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y)) > > with NZ = nb of pairs of coordinates that are different from (0,0) (Non > Zeros) > I think you must try another example. At least in my simple experiments the multiplier seemed to be NZ/NZ or one instead of your almost one, and this one was also the documented case. I could not find any difference to the documentation. However, there is a note about "double zeros" (zero denominator and numerator) in the dist documentation. Could that cause some difference? If you really want to know how the distance is calculated, download the R source file and look at there. If you want to know how the index was originally suggested to be calculated, you must find the Lance & Williams paper in Aust. Comput. J. 1, 15-20, 1967 (I haven't found it, but would be curious to see it). Cheers, jari oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prcomp(X,center=F) ??
Dear Agustin & the Listers, Noncentred PCA is an old and establishes method. It is rarely used, but still (methinks) it is used more often than it should be used. There is nothing wrong in having noncentred PCA in R, and it is a real PCA. Details will follow. On 08/03/2009, at 11:07 AM, Agustin Lobo wrote: I do not understand, from a PCA point of view, the option center=F of prcomp() According to the help page, the calculation in prcomp() "is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix" (as it's done by princomp()) . "This is generally the preferred method for numerical accuracy" The question is that while prcomp(X,center=T,scaling=F) is equivalent to princomp(X,scaling=F), but prcomp(X,center=F) has no equivalent in princomp() princomp() does not have explicit noncentred analysis because its algorithm is based on the eigen analysis of covariance matrix using funciton cov.wt(). While function cov.wt() knows argument 'center' (which is the US spelling of centre), princomp() does not pass that argument to cov.wt(). However, if you supply a noncentred 'covmat' instead of 'x' (see ?princomp), princomp() will give you noncentred PCA. We can only guess why the authors of princomp() did not allow noncentred analysis. However, I know why the author of rda() function in vegan did not explicitly allow noncentred analysis: he thought that it is a Patently Bad Idea(TM). Further, he suggests in vegan function that if you think you need noncentred analysis, you are able to edit vegan::rda.default() to do so -- if you cannot edit the function, then you probably are wrong in assuming you need noncentred analysis. It may be that the princomp authors think in the same way (but they also allow noncentred analyis iif you supply a noncentred 'covmat'). One of the sins of my youth was to publish a paper using noncentred PCA. I'm penitent. (However, I commited bigger sins, too.) Also, the rotation made with either the eigenvectors of prcomp(X,center=T,scaling=F) or the ones of princomp(X,scaling=F) yields PCs with a minimum correlation, as expected for a PCA. But the rotation made with the eigenvectors of prcomp(X,center=F) yields axes that are correlated. Therefore, prcomp(X,center=F) is not really a PCA. PCA axes are *orthogonal*, but not necessarily uncorrelated (people often mistake these as synonyms). Centred orthogonal axes also are uncorrelated, but noncentred orthogonal are (or may be) correlated. Your example code may be simplified a bit: prcomp returns the rotated matrix so that you do not need the rotation %*% data multiplication after analysis: you get that in the analysis. Below I use multivariate normal random numbers (MASS::mvrnorm) to generate correlated observations: library(MASS) x<- mvrnorm(100, mu = c(1, 3), Sigma=matrix(c(1, 0.6, 0.6, 1), nrow=2)) pc <- prcomp(x, cent=F) Here the scores are orthogonal: crossprod(pc$x) or the off-diagonal elements are numerically zero, but they are correlated: cor(pc$x) The only requirement we have is orthogonality, and uncorrelatedness is a collateral damage in centred analysis. Cheers, Jari Oksanen See the following example, in which the second column of data matrix X is linearly correlated to the first column: > X <- cbind(rnorm(100,100,50),rnorm(100,100,50)) > X[,2] <- X[,1]*1.5-50 +runif(100,-70,70) > plot(X) > cor(X[,1],X[,2]) [1] 0.903597 > eigvnocent <- prcomp(X,center=F,scaling=F)[[1]] > eigvcent <- prcomp(X,center=T,scaling=F)[[1]] > eigvecnocent <- prcomp(X,center=F,scaling=F)[[2]] > eigveccent <- prcomp(X,center=T,scaling=F)[[2]] > PCnocent <- X%*%eigvecnocent > PCcent <- X%*%eigveccent > par(mfrow=c(2,2)) > plot(X) > plot(PCnocent) > plot(PCcent) > cor(X[,1],X[,2]) [1] 0.903597 > cor(PCcent[,1],PCcent[,2]) [1] -8.778818e-16 > cor(PCnocent[,1],PCnocent[,2]) [1] -0.6908334 > Also the help page of prcomp() states: "Details The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix..." The parenthesis implies some ambiguity, but I do interpret the sentence as indicating that the calculation should always be done using a centered data matrix. Finally, all the examples in the help page use centering (or scaling, which implies centering) Therefore, why the option center=F ? There really is a small conflict between docs and function: the function allows noncentred analysis, and it also performs such an analysis if asked. This is documented for the argument "center", but the option is later ignored in the passage you cite. Probably because the authors of the function think that this offered option should not be used. You may submi
Re: [R] CCA - manual selection
Fabricius Domingos gmail.com> writes: > > Hello, > > I am trying to obtain f-values for response (independent) variables from a > CCA performed in vegan package, to see which ones of them have > significative influence in my dependent variables (like the manual selection > in canoco), but I can't find any function (or package) that do such a thing. > The dependents variables are species data, and the independents are > ambiental data. > > Than you. > Dear Fabricius Domingos, Use add1/drop1 for the vegan object with argument test = "permutation". For adding constraints, you need a 'scope' of variables considered for adding to the model. The following example shows how to do this with vegan package and one of its standard data sets: > library(vegan) > data(mite) > data(mite.env) > mbig <- cca(mite ~ ., mite.env) > m0 <- cca(mite ~ 1, mite.env) > add1(m0, scope = formula(mbig), test="perm") > m0 <- update(m0, . ~ . + Topo) > add1(m0, scope = formula(mbig), test="perm") > m0 <- update(m0, . ~ . + WatrCont) You can also see if some of the included variables should be dropped: > drop1(m0, test="perm") The usage of add1.cca is explained on its help page (see ?add1.cca). Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calcualte Jaccard Coefficient
Greg gmail.com> writes: > > You could try 'vegdist()' in the 'vegan' package. > > Yep, you could. However, if you want to have Jaccard for binary data although your data are not binary, you must set binary = TRUE in vegan::vegdist. Indeed, the greatest problem with recommending Jaccard is that there is a huge number of packages that have this index, and you probably forget some of those in any recommendation. I think at least the following packages have Jaccard index: ade4, analogue, ecodist, labdsv and probably many more. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kruskal's MDS results
Dieter Vanderelst ua.ac.be> writes: > > A few people suggested taking a look at Ripley's book MASS. I know the formula listed there. > > The point is that the manual for the isoMDS function says it's stress output is in "percent". Does this mean, > the stress reported by isoMDS is just the stress value in MASS (which ranges from 0 to 1) value multiplied by > 100? I've haven't been able to find any resource that expresses stress in values from 0 to 100. So, this > would be a convention introduced by the authors of the package? Yes, it is stress multplied with 100 which makes it "per cent" instead of "per unit". You can see this also in the software. The extract is from VR_mds_fn function in MASS.c: sstar = 0.0; tstar = 0.0; for (i = 0; i < n; i++) { tmp = y[i] - yf[i]; sstar += tmp * tmp; tstar += y[i] * y[i]; } ssq = 100 * sqrt(sstar / tstar); The last line has the multiplication, the previous collect the terms for the stress. Best wishes, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kruskal's MDS results
Dieter Vanderelst ua.ac.be> writes: > > The point is that the manual for the isoMDS function says it's stress output is in "percent". Does this mean, > the stress reported by isoMDS is just the stress value in MASS (which ranges from 0 to 1) value multiplied by > 100? I've haven't been able to find any resource that expresses stress in values from 0 to 100. So, this > would be a convention introduced by the authors of the package? > A comment about novelty of using percentages. I also had a look at some NMDS resources, and the first I found were two Kruskal's papers that happened to be on my desk (Psychometrika 29, 1-27 and Psychometrika 29, 115-129, both from 1964). Both of these expressed stress in percents. Certainly this is not a convention introduced by the authors of the package, since they are much too young to have done that prior to 1964. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re : PCA and automatic determination of the number of components
justin bem yahoo.fr> writes: > > See ade4 or mva package. > Justin BEM > BP 1917 Yaoundé > I guess the problem was not to find PCA (which is easy to find), but finding an automatic method of selecting ("determining" sounds like that selection would be correct in some objective sense) numbers of components to be retained. I thin neither ade4 nor mva give much support here (in particular the latter which does not exist any more). The usual place to look at is multivariate task view: http://cran.r-project.org/web/views/Multivariate.html Under the heading "Projection methods" and there under "Principal components" the taskview mentions packages nFactors and paran that help in selecting the number of components to retain. Are these Task Views really so invisible in R that people don't find them? Usually they are the first place to look at when you need something you don't have. In statistics, I mean. If they are invisible, could they be made more visible? Cheers, Jari Oksanen > > De : nikolay12 gmail.com> > À : r-help r-project.org > Envoyé le : Lundi, 20 Avril 2009, 4h37mn 41s > Objet : [R] PCA and automatic determination of the number of components > > Hi all, > > I have relatively small dataset on which I would like to perform a PCA. I am > interested about a package that would also combine a method for determining > the number of components (I know there are plenty of approaches to this > problem). Any suggestions about a package/function? > > thanks, > > Nick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Principle components analysis on a large dataset
Moshe Olshansky yahoo.com> writes: > > Hi Misha, > > Since PCA is a linear procedure and you have only 6000 observations, you do not need 68000 variables. Using > any 6000 of your variables so that the resulting 6000x6000 matrix is non-singular will do. You can choose > these 6000 variables (columns) randomly, hoping that the resulting matrix is non-singular (and > checking for this). Alternatively, you can try something like choosing one "nice" column, then choosing > the second one which is the mostly orthogonal to the first one (kind of Gram-Schmidt), then choose the > third one which is mostly orthogonal to the first two, etc. (I am not sure how much rounoff may be a problem- > try doing this using higher precision if you can). Note that you do not need to load the entire 6000x68000 > matrix into memory (you can load several thousands of columns, proc > ess them and discard them). > Anyway, you will end up with a 6000x6000 matrix, i.e. 36,000,000 entries, which can fit into a memory and you > can perform the usual PCA on this matrix. > > Good luck! > > Moshe. > > P.S. I am curious to see what other people think. > I think this will give you *a* principal component analysis, but it won't give you *the* principal component analysis in the sense that the first principal component would account for a certain proportion of the total variance etc. If you try this, you see that each random sample will have different eigenvalues, different proportions of eigenvalues and different sum of all eigenvalues like you would expect for different data sets. I even failed to create the raw data matrix of dimensins 68000 x 6000 (Error: cannot allocate vector of size 3.0 Gb). Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ellipse: Major Axis Minor Axis
Vishal gmail.com> writes: > > I have a matrix(3000x2) of numbers and I have plotted a scatterplot > (defined in the ``car'' library.) > > scatterplot(r$V1,r$V2,ellipse=TRUE) > > The ellipse plotted is an error ellipse. > > I want the find the major(semi), minor(semi) minor axis length of this > ellipse. > If you have the ellipse from cov.trob(), then you could do like this sqrt(eigen(cov.trob(mydataforellipse)$cov)$values) Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "biplot" graphical options?
Manca Marco (PATH path.unimaas.nl> writes: . > I am struggling in the attempt to impose some graphical conditions (changing point symbols, colors, etc) > to biplot function (I am using it to visualize the results of princomp) but I can't apparently manage to > change anything but the axis... and I have been browsing manuals and vignettes without finding any > explicit suggestions on how to operate... > > Can anyone, please, point my attention to the relevant documentation? Howdy, The relevant documentation in this case is ?biplot. The documentation to the specific method is less useful (?biplot.princomp). That documentation tells you that you can't do what you want to do. No choice. For instance, col (for colour) can be of length 2 giving different colours for each set of points, but you cannot vary the colour within the set. Further, you cannot turn off using arrows and text or even suppress the plotting and add your own points. If you use RSiteSearch("biplot"), you can find some packages with more flexibility than stats:::biplot.princomp and stats:::biplot.default. Another alternative is to write your own biplot function or do the plot by hand using the basic commands. Best wishes, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I change characteristics of a cca biplot in R
Gavin Simpson ucl.ac.uk> writes: > > On Mon, 2009-09-07 at 12:49 +0300, Rousi Heta wrote: > > > > Hi, > > > > I´m doing cca for a community data set in R and I have made a biplot > > for my data. Otherwise everything seems to be allright but the biplot > > is so messy I can´t read it well enough or publish it. I would like to > > get the row numbers out of the plot: I want the species position and > > the environmental variables in it as vectors but not the station > > numbers. How do I get them out? > > Is this in vegan? If so, if obj contains your fitted cca model, then > > plot(obj, display = c("species","bp")) > > (or, if you have centroids for factor variables as well) > > plot(obj, display = c("species","bp","cn")) > > If you only want to plot the points rather than the labels, add argument > 'type' to the call setting it to "p", e.g.: > > plot(obj, display = c("species","bp"), type = "p") > > > > > I have a raw species data set and a raw environmental data set and I > > don´t have station names or numbers in the data set but R puts the row > > names in anyway. I understand they are necessary but I just don´t want > > to show them in the biplot. > > All of this (and more) is explained in ?plot.cca (if you are talking > about vegan::cca) > > You might also like to look at ?orditkplot (which allows you to move the > labels around yourself to get a clear plot) and ?orditorp with which you > can set a priority for certain sites to be labelled with text whilst > points are used for those sites that would cause text labels to overlap. > Heta, Indeed, if this cca of vegan (and not some of the other cca's) you may also consider reading the documentation. You can use vegandocs("intro") which deals with cluttered plots in chapter 2.1, or vegandocs("FAQ") and read points 2.1.12 and 2.1.13. In addition to those alternatives that Gav listed, these also mention functions ordipointlabel (which can be further processed with orditkplot) and ordilabel. You can also see plot.cca help which has some examples. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA in cca (vegan)
Gavin Simpson ucl.ac.uk> writes: > > On Fri, 2009-09-04 at 17:15 +0200, Kim Vanselow wrote: > > Dear all, > > I would like to calculate a cca (package vegan) with species and > > environmental data. One of these environmental variables is > > cos(EXPOSURE). > > The problem: for flat releves there is no exposure. The value is > > missing and I can't call it 0 as 0 stands for east and west. > > The cca does not run with missing values. What can I do to make vegan > > cca ignoring these missing values? > > Thanks a lot, > > Kim > > > This is timely as Jari Oksanen (lead developer on vegan) has been > looking into making this happen automatically in vegan ordination > functions. The solution for something like cca is very simple but it > gets more complicated when you might like to allow features like > na.exclude etc and have all the functions that operate on objects of > class "cca" work nicely. > > For the moment, you should just process your data before it goes into > cca. Here I assume that you have two data frames; i) Y is the species > data, and ii) X the environmental data. Further I assume that only one > variable in X has missings, lets call this Exposure: > Kim, A test version of NA handling in cca is now in the development version of vegan at http://vegan.r-forge.r-project.org/. You may get current source code or a bit stale packages from that address (when writing this, the packages are two to three days behind the current devel version). Instruction of downloading the working version of vegan can be found in the same web site. Basically the development version does exactly the same thing as Gavin showed you in his response. It does a "listwise" elimination of missing values. Indeed, it may be better to do that manually and knowingly than to use perhaps surprising automation of handling missing values within the function. Your missing values are somewhat wierd as they are not missing values (= unknown and unobserved), but you just decided to use a coding system that does not cope with your well known and measured values. I would prefer to find a coding that puts flat ground together with exposure giving similar conditions. In no case should they be regarded as NA since they are available and known, and censoring them from your data may distort your analysis. Perhaps having a new variable (hasExposure, TRUE/FALSE) and coding them as east/west (=0) in Exposure could make more sense. Indeed, model term hasExposure*Exposure would make sense as this would separate flat ground from slopes of different Exposures. The interaction term and aliasing would take care of having flat ground with known values but separate from exposed slopes. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mantel test least square line
swertie voila.fr> writes: > > Hello, I performed a Mantel test and plotted communitiy similarities. I > would like to add a least square line. I thought about using abline taking > as slope the r-statistic of the Mantel test and calculating the y-intercept > analytically. Is this method correct? Is there any function for this > calculation? Thank you If you have Mantel statistic for two dist() objects (as produced by dist(), as.dist() or compatible functions), you can just use abline(lm(ydist ~ xdist)) because dist object is a vector with some extra attributes. Of course, this does not quite make sense, since distances do not have least squares fit in any reasonable sense. People do this all the time, though (ecologists, and aquatic ecologists in particular, I mean). Cheers, Jari Okanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ellipse: Major Axis Minor Axis
Vishal gmail.com> writes: > > Jari, thanks for the quick answer. > > > sqrt(eigen(cov.trob(mydataforellipse)$cov)$values) > > what will this return? > > For my data, I get: > > > sqrt(eigen(cov.trob(r)$cov)$values) > [1] 1.857733e-05 4.953181e-06 > > Is this Left hand value the major or the semi major length? > > I also try to plot a circuit keeping this as the radius/diameter > I don't get a circle that intersects the major axis of the ellipse. > Vishal, Then you probably do not have an ellipse directly defined by the robust covariance matrix, but it is scaled up by some statistic. Check the code of the function (or in a happy case, its documentation) to see what is the scaling factor. I checked with drawing an ellipse directly with the cov.trob data and both of the circles fit nicely. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Hello Kim & Gav, Gavin Simpson ucl.ac.uk> writes: > > On Wed, 2009-09-09 at 14:43 +0200, Kim Vanselow wrote: > > Dear r-Community, > > Step1: I would like to calculate a NMDS (package vegan, function > > metaMDS) with species data. > > Step2: Then I want to plot environmental variables over it, using > > function envfit. > > The Problem: One of these environmental variables is cos(EXPOSURE). > > But for flat releves there is no exposure. The value is missing and I > > can't call it 0 as 0 stands for east and west. Therefore I kicked all > > releves with missing environmental variables. Both, metaMDS and envfit > > then work without problems. > > Now I want to bring the releves with missing environmetal variables > > back to my ordination-plot. > > > > Gavin Simpson gave me the advice to use the predict-function for the > > same missing value problem when I was calculating a cca. This worked > > without problem. > > > Also note that Jari has commented on how you are coding your Exposure > variable; I glossed over that bit when providing my response and you > should probably rethink your coding along the lines Jari suggested. > Yes, and Dylan Beaudette in gave some more concrete ideas in this thread (provided it is the Exposure to sun, and not to, say, wind or pollution source). > There isn't a predict function for metaMDS() because there aren't rules > or relationships that would allow you to do what predict.cca does but > for a nMDS. In the CCA case we were estimating the locations in > ordination space for the left-out samples on the basis of their species > composition and computing their site score as a weighted average of the > species scores determined from the data that went into building the CCA. > Actually, you can add new points to NMDS. I once wrote a function for one user who asked this, but I did not have it in vegan, because its use needed great technical skill, and was too tricky for a general package. For instance, you need to have a rectangular dissimilarity matrix between new points and old points (which you can find with Gav's distance() function in analogue). > > What you now want to do is superimpose upon that plot the env data where > you might have missingness. envfit doesn't allow missings and it is not > immediately clear to me how it might be modified to do so, certainly not > without a lot of changes. > True. Seems to be bad design. You should change four functions and the user interface to have this. Even in that case it would be na.rm (remove rows with any missing data), and if you want to do that, you can do it by hand: scor <- scores(mymetaMDSresult) keep <- complete.cases(myenvdata) envfit(scor[keep,] ~., myenvdata[keep,]) > Instead, ordisurf() may be more useful, but you will loose the nice > "plot all vectors on the ordination at once" feature of envfit (you gain > a lot with ordisurf though as there is no reason to assume anything is > linear across an nMDS configuration). > > A cursory look at the guts of ordisurf indicates that it can handle > missing values in the (env) variable you wish to overlay onto the nMDS > plot; the data is passed to mgcv::gam usig the formula interface which > deals with the NA. > Yes, this is true (in most cases). > > You could also try capscale() also in vegan. This is like CCA and RDA > but allows you to use any dissimilarity coefficient. It is a bit like a > constrained Principal Coordinates Analysis. This can use the rda method > for predict to do what you did with the CCA earlier. > However, it does not handle missing values directly (not even in the R-Forge version, and there are no immediate plans to change this). So you must remove missing data rows manually. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.