Re: [R-sig-phylo] PCoA with custom distance matrix
Hello, thank You very much for Your reply. I'm still little confused. Dne Pá 7. června 2013 16:03:13 jste napsal(a): Hello there, you could have done the PCoA on Chord's distance using adegenet, it would have probably been simpler (see the vignette adegenet-basics, section 6 Multivariate Analysis“). Function dist.genpop() contains Nei's distance, but man pages says it is only for genpop objects (I wish to analyse individuals, not populations). I imported the data using read.loci() and converted it to genind using loci2genind(). I found also function dist.genet() from ade4 package, but it requires genet object. So far I didn't find straightforward way to convert my data to it. And I haven't find any function providing the Goldstein's distance. 'dist' is the canonical Euclidean distance, but dudi.pco will accept any Euclidean distance. You can use cailliez in ade4 to make your distance Euclidean before the PCoA. I'm not mathematician, so I don't understand one point. Let's say I have non- Euclidean distance matrix and some individuals (e.g. from same population) have zero distance (they are identical). When I use cailliez(), I have two possibilities (parameter cor.zero): 1) to add the constant also to zero-length distances, so that the shift is everywhere same. But as the result, the distance between originally identical objects is positive, so they will be treated as different in ongoing analysis, right?; or 2) keep zero-length distances, so that those objects stay identical, but their distances from another objects change, so the further analysis are biased again, right? Is there any solution? Or am I wrong? Cheers Thibaut Sincerely, Vojtěch -- ## Dr Thibaut JOMBART MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health St Mary’s Campus Norfolk Place London W2 1PG United Kingdom Tel. : 0044 (0)20 7594 3658 t.jomb...@imperial.ac.uk http://sites.google.com/site/thibautjombart/ http://adegenet.r-forge.r-project.org/ From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] Sent: 07 June 2013 15:46 To: mailinglist R Subject: [R-sig-phylo] PCoA with custom distance matrix Hello, I have microsatellite data and I would like to analyze them using PCoA in R. I would like to use following genetic distance: Goldstein's (1995) (dμ)2 and Nei's chord distance (1983). I calculated those distances in MSA (Dieringer and Schlötterer 2003), because I didn't find any possibility to calculate them in R. I imported the distances like that: dist.dms - read.csv(dms_ind.txt, header=TRUE, sep=\t, dec=., row.names=1) dioszegia.dist.dms - as.dist(dioszegia.dist.dms) class(dioszegia.dist.dms) [1] dist“ dms_ind.txt is ordinary square matrix with diagonal. But using of dudi.pco() from ade4 package fails: pcoa - dudi.pco(dist.dms, scannf=FALSE, nf=3) Warning message: In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) : Non euclidean distance When using Euclidean distance (function dist()), it works fine. It produces some results despite the above error, but the results are very far from those I get using dist() function. And the results don't look realistic. What do I do wrong? :-) Have a nice day! Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/ signature.asc Description: This is a digitally signed message part. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PCoA with custom distance matrix
Hello, dist.genpop is a re-implementation of dist.genet - all material for genetics in ade4 should be deprecated by now, as it has been transfered into adegenet or adephylo a few years ago already. I do not know about a package implementing Goldstein's distance. As for your question, you are right: adding a constant to a set of distances to make them Euclidean alters the geometry of the cloud of points, and is thus not very satisfying. This said, especially when it comes to individual microsatellite data, I have yet to see a fancy distance do more than the basic Euclidean distance in a PCoA, which in this case is also a PCA. The advantage of PCA is that it will also give you allele loadings which can be biologically meaningful. Not to say you should stick to it, but it is probably a good start to look at your data. Cheers Thibaut From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] Sent: 10 June 2013 14:59 To: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] PCoA with custom distance matrix Hello, thank You very much for Your reply. I'm sftill little confused. Dne Pá 7. června 2013 16:03:13 jste napsal(a): Hello there, you could have done the PCoA on Chord's distance using adegenet, it would have probably been simpler (see the vignette adegenet-basics, section 6 Multivariate Analysis“). Function dist.genpop() contains Nei's distance, but man pages says it is only for genpop objects (I wish to analyse individuals, not populations). I imported the data using read.loci() and converted it to genind using loci2genind(). I found also function dist.genet() from ade4 package, but it requires genet object. So far I didn't find straightforward way to convert my data to it. And I haven't find any function providing the Goldstein's distance. 'dist' is the canonical Euclidean distance, but dudi.pco will accept any Euclidean distance. You can use cailliez in ade4 to make your distance Euclidean before the PCoA. I'm not mathematician, so I don't understand one point. Let's say I have non- Euclidean distance matrix and some individuals (e.g. from same population) have zero distance (they are identical). When I use cailliez(), I have two possibilities (parameter cor.zero): 1) to add the constant also to zero-length distances, so that the shift is everywhere same. But as the result, the distance between originally identical objects is positive, so they will be treated as different in ongoing analysis, right?; or 2) keep zero-length distances, so that those objects stay identical, but their distances from another objects change, so the further analysis are biased again, right? Is there any solution? Or am I wrong? Cheers Thibaut Sincerely, Vojtěch -- ## Dr Thibaut JOMBART MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health St Mary’s Campus Norfolk Place London W2 1PG United Kingdom Tel. : 0044 (0)20 7594 3658 t.jomb...@imperial.ac.uk http://sites.google.com/site/thibautjombart/ http://adegenet.r-forge.r-project.org/ From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] Sent: 07 June 2013 15:46 To: mailinglist R Subject: [R-sig-phylo] PCoA with custom distance matrix Hello, I have microsatellite data and I would like to analyze them using PCoA in R. I would like to use following genetic distance: Goldstein's (1995) (dμ)2 and Nei's chord distance (1983). I calculated those distances in MSA (Dieringer and Schlötterer 2003), because I didn't find any possibility to calculate them in R. I imported the distances like that: dist.dms - read.csv(dms_ind.txt, header=TRUE, sep=\t, dec=., row.names=1) dioszegia.dist.dms - as.dist(dioszegia.dist.dms) class(dioszegia.dist.dms) [1] dist“ dms_ind.txt is ordinary square matrix with diagonal. But using of dudi.pco() from ade4 package fails: pcoa - dudi.pco(dist.dms, scannf=FALSE, nf=3) Warning message: In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) : Non euclidean distance When using Euclidean distance (function dist()), it works fine. It produces some results despite the above error, but the results are very far from those I get using dist() function. And the results don't look realistic. What do I do wrong? :-) Have a nice day! Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PCoA with custom distance matrix
Hello Dne Po 10. června 2013 14:26:41 jste napsal(a): Hello, dist.genpop is a re-implementation of dist.genet - all material for genetics in ade4 should be deprecated by now, as it has been transfered into adegenet or adephylo a few years ago already. OK, I see. So can I use dist.genpop() also for genind objects? I do not know about a package implementing Goldstein's distance. As for your question, you are right: adding a constant to a set of distances to make them Euclidean alters the geometry of the cloud of points, and is thus not very satisfying. This said, especially when it comes to individual microsatellite data, I have yet to see a fancy distance do more than the basic Euclidean distance in a PCoA, which in this case is also a PCA. The advantage of PCA is that it will also give you allele loadings which can be biologically meaningful. Not to say you should stick to it, but it is probably a good start to look at your data. PCA and PCoA using distance generated by dist() give me more or less same results. I really wonder how this problem was solved in old Syntax (Podani, 2000), which was calculating PCA/PCoA from any distance. ;-) For now I don't see any enough good solution, but I at least understand it more. :-) Cheers Thibaut Have a nice day, Vojtěch From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] Sent: 10 June 2013 14:59 To: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] PCoA with custom distance matrix Hello, thank You very much for Your reply. I'm sftill little confused. Dne Pá 7. června 2013 16:03:13 jste napsal(a): Hello there, you could have done the PCoA on Chord's distance using adegenet, it would have probably been simpler (see the vignette adegenet-basics, section 6 Multivariate Analysis“). Function dist.genpop() contains Nei's distance, but man pages says it is only for genpop objects (I wish to analyse individuals, not populations). I imported the data using read.loci() and converted it to genind using loci2genind(). I found also function dist.genet() from ade4 package, but it requires genet object. So far I didn't find straightforward way to convert my data to it. And I haven't find any function providing the Goldstein's distance. 'dist' is the canonical Euclidean distance, but dudi.pco will accept any Euclidean distance. You can use cailliez in ade4 to make your distance Euclidean before the PCoA. I'm not mathematician, so I don't understand one point. Let's say I have non-Euclidean distance matrix and some individuals (e.g. from same population) have zero distance (they are identical). When I use cailliez(), I have two possibilities (parameter cor.zero): 1) to add the constant also to zero-length distances, so that the shift is everywhere same. But as the result, the distance between originally identical objects is positive, so they will be treated as different in ongoing analysis, right?; or 2) keep zero-length distances, so that those objects stay identical, but their distances from another objects change, so the further analysis are biased again, right? Is there any solution? Or am I wrong? Cheers Thibaut Sincerely, Vojtěch -- ## Dr Thibaut JOMBART MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health St Mary’s Campus Norfolk Place London W2 1PG United Kingdom Tel. : 0044 (0)20 7594 3658 t.jomb...@imperial.ac.uk http://sites.google.com/site/thibautjombart/ http://adegenet.r-forge.r-project.org/ From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] Sent: 07 June 2013 15:46 To: mailinglist R Subject: [R-sig-phylo] PCoA with custom distance matrix Hello, I have microsatellite data and I would like to analyze them using PCoA in R. I would like to use following genetic distance: Goldstein's (1995) (dμ)2 and Nei's chord distance (1983). I calculated those distances in MSA (Dieringer and Schlötterer 2003), because I didn't find any possibility to calculate them in R. I imported the distances like that: dist.dms - read.csv(dms_ind.txt, header=TRUE, sep=\t, dec=., row.names=1) dioszegia.dist.dms - as.dist(dioszegia.dist.dms) class(dioszegia.dist.dms) [1] dist“ dms_ind.txt is ordinary square matrix with diagonal. But using of dudi.pco() from ade4 package fails: pcoa - dudi.pco(dist.dms, scannf=FALSE, nf=3) Warning message: In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) : Non euclidean distance When using Euclidean distance (function dist()), it works fine. It produces some results despite the above error,
[R-sig-phylo] apTreeshape and as.treeshape
Hi, I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R package. I have a set of DNA sequence in fasta format. I am following the steps below. 1. I construct a NJ using the ape R package using the nj(dist.dna(dataset). 2. After I use the as.treeshape form the apTreeshape package I get the following message when I try to plot my tree (after converting to treeshape format) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 3. So I decided to use the as.treeshape(tree, model = yule). After doing this I can plot my tree as reshape, but when I perform one of the tests I get the following message when using the maxlik.betasplit There were 50 or more warnings (use warnings() to see the first 50) and when using the collies.test Warning message: In if (class(tree) != treeshape) { : the condition has length 1 and only the first element will be used My questions: What are the reasons for these warnings? Can I ignore these warnings? How can I solve this problem and convert my NJ generated using ape in a treeshape class? Thanks! Fabricia. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] apTreeshape and as.treeshape
Dear Fabricia, I am the maintainer of the package. Can you send me the tree that is responsible for the warnings please. It might be because of unresolved nodes. Sincerely Michael Blum Le 10/06/13 17:53, Fabricia Nascimento a écrit : Hi, I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R package. I have a set of DNA sequence in fasta format. I am following the steps below. 1. I construct a NJ using the ape R package using the nj(dist.dna(dataset). 2. After I use the as.treeshape form the apTreeshape package I get the following message when I try to plot my tree (after converting to treeshape format) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 3. So I decided to use the as.treeshape(tree, model = yule). After doing this I can plot my tree as reshape, but when I perform one of the tests I get the following message when using the maxlik.betasplit There were 50 or more warnings (use warnings() to see the first 50) and when using the collies.test Warning message: In if (class(tree) != treeshape) { : the condition has length 1 and only the first element will be used My questions: What are the reasons for these warnings? Can I ignore these warnings? How can I solve this problem and convert my NJ generated using ape in a treeshape class? Thanks! Fabricia. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- -- Michael BLUM CNRS Research Associate Tel: +33 (0)4 56 52 00 65 Fax: +33 (0)4 56 52 00 55 michael.b...@imag.fr http://membres-timc.imag.fr/Michael.Blum/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/