Re: [R-sig-phylo] apTreeshape and as.treeshape
Dear Fabricia, I am the maintainer of the package. Can you send me the tree that is responsible for the warnings please. It might be because of unresolved nodes. Sincerely Michael Blum Le 10/06/13 17:53, Fabricia Nascimento a écrit : Hi, I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R package. I have a set of DNA sequence in fasta format. I am following the steps below. 1. I construct a NJ using the ape R package using the nj(dist.dna(dataset). 2. After I use the "as.treeshape" form the apTreeshape package I get the following message when I try to plot my tree (after "converting" to treeshape format) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 3. So I decided to use the as.treeshape(tree, model = "yule"). After doing this I can plot my tree as reshape, but when I perform one of the tests I get the following message when using the maxlik.betasplit "There were 50 or more warnings (use warnings() to see the first 50)" and when using the collies.test "Warning message: In if (class(tree) != "treeshape") { : the condition has length > 1 and only the first element will be used" My questions: What are the reasons for these warnings? Can I ignore these "warnings"? How can I solve this "problem" and convert my NJ generated using ape in a treeshape class? Thanks! Fabricia. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- -- Michael BLUM CNRS Research Associate Tel: +33 (0)4 56 52 00 65 Fax: +33 (0)4 56 52 00 55 michael.b...@imag.fr http://membres-timc.imag.fr/Michael.Blum/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] apTreeshape and as.treeshape
Hi, I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R package. I have a set of DNA sequence in fasta format. I am following the steps below. 1. I construct a NJ using the ape R package using the nj(dist.dna(dataset). 2. After I use the "as.treeshape" form the apTreeshape package I get the following message when I try to plot my tree (after "converting" to treeshape format) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf 3. So I decided to use the as.treeshape(tree, model = "yule"). After doing this I can plot my tree as reshape, but when I perform one of the tests I get the following message when using the maxlik.betasplit "There were 50 or more warnings (use warnings() to see the first 50)" and when using the collies.test "Warning message: In if (class(tree) != "treeshape") { : the condition has length > 1 and only the first element will be used" My questions: What are the reasons for these warnings? Can I ignore these "warnings"? How can I solve this "problem" and convert my NJ generated using ape in a treeshape class? Thanks! Fabricia. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PCoA with custom distance matrix
> OK, I see. So can I use dist.genpop() also for genind objects? No, the method is defined for genind objects. These distances are defined for populations - they use allele frequencies. Technically speaking, allele frequencies are also defined for individual genotypes. But then possible values are 0, 0.5, 1 (diploid data), or 0/1 (haploid). In such cases basic distances are pretty good at capturing the main structures in the data. Note that this is probably true for population data too. Look at the example in dist.genpop: 5 different distances, essentially the same results. > I do not know about a package implementing Goldstein's distance. > > As for your question, you are right: adding a constant to a set of distances > to make them Euclidean alters the geometry of the cloud of points, and is > thus not very satisfying. This said, especially when it comes to individual > microsatellite data, I have yet to see a fancy distance do more than the > basic Euclidean distance in a PCoA, which in this case is also a PCA. The > advantage of PCA is that it will also give you allele loadings which can be > biologically meaningful. Not to say you should stick to it, but it is > probably a good start to look at your data. > PCA and PCoA using distance generated by dist() give me more or less same results. It should be exactly the same components (possibly with different signs). This is because the variance (optimized in PCA) can be expressed as a function of the pairwise squared euclidean distances between observations (maximized in PCoA). Differences might come from scaling the data in PCA (PCA on 'correlation matrix', useless in most genetic data), or from the strategy used to replace missing data. > I really wonder how this problem was solved in old Syntax (Podani, > 2000), which was calculating PCA/PCoA from any distance. ;-) > For now I don't see any enough good solution, but I at least understand it > more. :-) Again, PCA should do the trick. > > From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] > on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] > Sent: 10 June > 2013 14:59 > To: r-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] PCoA with custom distance matrix > > Hello, > thank You very much for Your reply. I'm sftill little confused. > > Dne Pá 7. června 2013 16:03:13 jste napsal(a): > > > Hello there, > > > > > > > > you could have done the PCoA on Chord's distance using adegenet, it would > > have probably been simpler (see the vignette adegenet-basics, section 6 > > "Multivariate Analysis“). > > > Function dist.genpop() contains Nei's distance, but man pages says it is > only for genpop objects (I wish to analyse individuals, not populations). > I imported the data using read.loci() and converted it to genind using > loci2genind(). I found also function dist.genet() from ade4 package, but it > requires genet object. So far I didn't find straightforward way to convert > my data to it. And I haven't find any function providing the Goldstein's > distance. > > > 'dist' is the canonical Euclidean distance, but dudi.pco will accept any > > Euclidean distance. You can use "cailliez" in ade4 to make your distance > > Euclidean before the PCoA. > > > I'm not mathematician, so I don't understand one point. Let's say I have > non-Euclidean distance matrix and some individuals (e.g. from same > population) have zero distance (they are identical). When I use cailliez(), > I have two possibilities (parameter cor.zero): > 1) to add the constant also to zero-length distances, so that the shift is > everywhere same. But as the result, the distance between originally > identical objects is positive, so they will be treated as different in > ongoing analysis, right?; or > 2) keep zero-length distances, so that those objects stay identical, but > their distances from another objects change, so the further analysis are > biased again, right? > Is there any solution? Or am I wrong? > > > > Cheers > > Thibaut > > > Sincerely, > Vojtěch > > > > -- > > ## > > Dr Thibaut JOMBART > > MRC Centre for Outbreak Analysis and Modelling > > Department of Infectious Disease Epidemiology > > Imperial College - School of Public Health > > St Mary’s Campus > > Norfolk Place > > London W2 1PG > > United Kingdom > > Tel. : 0044 (0)20 7594 3658 > > t.jomb...@imperial.ac.uk > > http://sites.google.com/site/thibautjombart/ > > http://adegenet.r-forge.r-project.org/ > > > > From: r-sig-phylo-boun...@r-project.org > > [r-sig-phylo-boun...@r-project.org] > > on behalf of Vojtěch Zeisek > > [vojtech.zei...@opensuse.org] > > Sent: 07 June > > 2013 15:46 > > To: mailinglist R > > Subject: [R-sig-phylo] PCoA with custom distance matrix > > > > > > > > Hello, > > I have microsatellite data and I would like to analyze them using PCoA in > > R. I would like to use following geneti
Re: [R-sig-phylo] PCoA with custom distance matrix
Hello Dne Po 10. června 2013 14:26:41 jste napsal(a): > Hello, > > dist.genpop is a re-implementation of dist.genet - all material for genetics > in ade4 should be deprecated by now, as it has been transfered into > adegenet or adephylo a few years ago already. OK, I see. So can I use dist.genpop() also for genind objects? > I do not know about a package implementing Goldstein's distance. > > As for your question, you are right: adding a constant to a set of distances > to make them Euclidean alters the geometry of the cloud of points, and is > thus not very satisfying. This said, especially when it comes to individual > microsatellite data, I have yet to see a fancy distance do more than the > basic Euclidean distance in a PCoA, which in this case is also a PCA. The > advantage of PCA is that it will also give you allele loadings which can be > biologically meaningful. Not to say you should stick to it, but it is > probably a good start to look at your data. PCA and PCoA using distance generated by dist() give me more or less same results. I really wonder how this problem was solved in old Syntax (Podani, 2000), which was calculating PCA/PCoA from any distance. ;-) For now I don't see any enough good solution, but I at least understand it more. :-) > Cheers > Thibaut Have a nice day, Vojtěch > > From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] > on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] > Sent: 10 June > 2013 14:59 > To: r-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] PCoA with custom distance matrix > > Hello, > thank You very much for Your reply. I'm sftill little confused. > > Dne Pá 7. června 2013 16:03:13 jste napsal(a): > > > Hello there, > > > > > > > > you could have done the PCoA on Chord's distance using adegenet, it would > > have probably been simpler (see the vignette adegenet-basics, section 6 > > "Multivariate Analysis“). > > > Function dist.genpop() contains Nei's distance, but man pages says it is > only for genpop objects (I wish to analyse individuals, not populations). > I imported the data using read.loci() and converted it to genind using > loci2genind(). I found also function dist.genet() from ade4 package, but it > requires genet object. So far I didn't find straightforward way to convert > my data to it. And I haven't find any function providing the Goldstein's > distance. > > > 'dist' is the canonical Euclidean distance, but dudi.pco will accept any > > Euclidean distance. You can use "cailliez" in ade4 to make your distance > > Euclidean before the PCoA. > > > I'm not mathematician, so I don't understand one point. Let's say I have > non-Euclidean distance matrix and some individuals (e.g. from same > population) have zero distance (they are identical). When I use cailliez(), > I have two possibilities (parameter cor.zero): > 1) to add the constant also to zero-length distances, so that the shift is > everywhere same. But as the result, the distance between originally > identical objects is positive, so they will be treated as different in > ongoing analysis, right?; or > 2) keep zero-length distances, so that those objects stay identical, but > their distances from another objects change, so the further analysis are > biased again, right? > Is there any solution? Or am I wrong? > > > > Cheers > > Thibaut > > > Sincerely, > Vojtěch > > > > -- > > ## > > Dr Thibaut JOMBART > > MRC Centre for Outbreak Analysis and Modelling > > Department of Infectious Disease Epidemiology > > Imperial College - School of Public Health > > St Mary’s Campus > > Norfolk Place > > London W2 1PG > > United Kingdom > > Tel. : 0044 (0)20 7594 3658 > > t.jomb...@imperial.ac.uk > > http://sites.google.com/site/thibautjombart/ > > http://adegenet.r-forge.r-project.org/ > > > > From: r-sig-phylo-boun...@r-project.org > > [r-sig-phylo-boun...@r-project.org] > > on behalf of Vojtěch Zeisek > > [vojtech.zei...@opensuse.org] > > Sent: 07 June > > 2013 15:46 > > To: mailinglist R > > Subject: [R-sig-phylo] PCoA with custom distance matrix > > > > > > > > Hello, > > I have microsatellite data and I would like to analyze them using PCoA in > > R. I would like to use following genetic distance: Goldstein's (1995) > > (dμ)2 and Nei's chord distance (1983). I calculated those distances in > > MSA (Dieringer and Schlötterer 2003), because I didn't find any > > possibility to calculate them in R. I imported the distances like that: > > dist.dms <- read.csv("dms_ind.txt", header=TRUE, sep="\t", dec=".", > > row.names=1) > > dioszegia.dist.dms <- as.dist(dioszegia.dist.dms) > > class(dioszegia.dist.dms) > > [1] "dist“ > > dms_ind.txt is ordinary square matrix with diagonal. > > But using of dudi.pco() from ade4 package fails: > > pcoa <- dudi.pco(dist.dms, scannf=FALSE, nf=3) > > Warning message: > > In dudi.pco(dioszegia.dist.
Re: [R-sig-phylo] PCoA with custom distance matrix
Hello, dist.genpop is a re-implementation of dist.genet - all material for genetics in ade4 should be deprecated by now, as it has been transfered into adegenet or adephylo a few years ago already. I do not know about a package implementing Goldstein's distance. As for your question, you are right: adding a constant to a set of distances to make them Euclidean alters the geometry of the cloud of points, and is thus not very satisfying. This said, especially when it comes to individual microsatellite data, I have yet to see a fancy distance do more than the basic Euclidean distance in a PCoA, which in this case is also a PCA. The advantage of PCA is that it will also give you allele loadings which can be biologically meaningful. Not to say you should stick to it, but it is probably a good start to look at your data. Cheers Thibaut From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] Sent: 10 June 2013 14:59 To: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] PCoA with custom distance matrix Hello, thank You very much for Your reply. I'm sftill little confused. Dne Pá 7. června 2013 16:03:13 jste napsal(a): > Hello there, > > you could have done the PCoA on Chord's distance using adegenet, it would > have probably been simpler (see the vignette adegenet-basics, section 6 > "Multivariate Analysis“). Function dist.genpop() contains Nei's distance, but man pages says it is only for genpop objects (I wish to analyse individuals, not populations). I imported the data using read.loci() and converted it to genind using loci2genind(). I found also function dist.genet() from ade4 package, but it requires genet object. So far I didn't find straightforward way to convert my data to it. And I haven't find any function providing the Goldstein's distance. > 'dist' is the canonical Euclidean distance, but dudi.pco will accept any > Euclidean distance. You can use "cailliez" in ade4 to make your distance > Euclidean before the PCoA. I'm not mathematician, so I don't understand one point. Let's say I have non- Euclidean distance matrix and some individuals (e.g. from same population) have zero distance (they are identical). When I use cailliez(), I have two possibilities (parameter cor.zero): 1) to add the constant also to zero-length distances, so that the shift is everywhere same. But as the result, the distance between originally identical objects is positive, so they will be treated as different in ongoing analysis, right?; or 2) keep zero-length distances, so that those objects stay identical, but their distances from another objects change, so the further analysis are biased again, right? Is there any solution? Or am I wrong? > Cheers > Thibaut Sincerely, Vojtěch > -- > ## > Dr Thibaut JOMBART > MRC Centre for Outbreak Analysis and Modelling > Department of Infectious Disease Epidemiology > Imperial College - School of Public Health > St Mary’s Campus > Norfolk Place > London W2 1PG > United Kingdom > Tel. : 0044 (0)20 7594 3658 > t.jomb...@imperial.ac.uk > http://sites.google.com/site/thibautjombart/ > http://adegenet.r-forge.r-project.org/ > > From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] > on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] > Sent: 07 June > 2013 15:46 > To: mailinglist R > Subject: [R-sig-phylo] PCoA with custom distance matrix > > Hello, > I have microsatellite data and I would like to analyze them using PCoA in R. > I would like to use following genetic distance: Goldstein's (1995) (dμ)2 > and Nei's chord distance (1983). I calculated those distances in MSA > (Dieringer and Schlötterer 2003), because I didn't find any possibility to > calculate them in R. I imported the distances like that: > dist.dms <- read.csv("dms_ind.txt", header=TRUE, sep="\t", dec=".", > row.names=1) > dioszegia.dist.dms <- as.dist(dioszegia.dist.dms) > class(dioszegia.dist.dms) > [1] "dist“ > dms_ind.txt is ordinary square matrix with diagonal. > But using of dudi.pco() from ade4 package fails: > pcoa <- dudi.pco(dist.dms, scannf=FALSE, nf=3) > Warning message: > In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) : > Non euclidean distance > When using Euclidean distance (function dist()), it works fine. It produces > some results despite the above error, but the results are very far from > those I get using dist() function. And the results don't look realistic. > What do I do wrong? :-) > Have a nice day! > Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-pr
Re: [R-sig-phylo] PCoA with custom distance matrix
Hello, thank You very much for Your reply. I'm still little confused. Dne Pá 7. června 2013 16:03:13 jste napsal(a): > Hello there, > > you could have done the PCoA on Chord's distance using adegenet, it would > have probably been simpler (see the vignette adegenet-basics, section 6 > "Multivariate Analysis“). Function dist.genpop() contains Nei's distance, but man pages says it is only for genpop objects (I wish to analyse individuals, not populations). I imported the data using read.loci() and converted it to genind using loci2genind(). I found also function dist.genet() from ade4 package, but it requires genet object. So far I didn't find straightforward way to convert my data to it. And I haven't find any function providing the Goldstein's distance. > 'dist' is the canonical Euclidean distance, but dudi.pco will accept any > Euclidean distance. You can use "cailliez" in ade4 to make your distance > Euclidean before the PCoA. I'm not mathematician, so I don't understand one point. Let's say I have non- Euclidean distance matrix and some individuals (e.g. from same population) have zero distance (they are identical). When I use cailliez(), I have two possibilities (parameter cor.zero): 1) to add the constant also to zero-length distances, so that the shift is everywhere same. But as the result, the distance between originally identical objects is positive, so they will be treated as different in ongoing analysis, right?; or 2) keep zero-length distances, so that those objects stay identical, but their distances from another objects change, so the further analysis are biased again, right? Is there any solution? Or am I wrong? > Cheers > Thibaut Sincerely, Vojtěch > -- > ## > Dr Thibaut JOMBART > MRC Centre for Outbreak Analysis and Modelling > Department of Infectious Disease Epidemiology > Imperial College - School of Public Health > St Mary’s Campus > Norfolk Place > London W2 1PG > United Kingdom > Tel. : 0044 (0)20 7594 3658 > t.jomb...@imperial.ac.uk > http://sites.google.com/site/thibautjombart/ > http://adegenet.r-forge.r-project.org/ > > From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] > on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org] > Sent: 07 June > 2013 15:46 > To: mailinglist R > Subject: [R-sig-phylo] PCoA with custom distance matrix > > Hello, > I have microsatellite data and I would like to analyze them using PCoA in R. > I would like to use following genetic distance: Goldstein's (1995) (dμ)2 > and Nei's chord distance (1983). I calculated those distances in MSA > (Dieringer and Schlötterer 2003), because I didn't find any possibility to > calculate them in R. I imported the distances like that: > dist.dms <- read.csv("dms_ind.txt", header=TRUE, sep="\t", dec=".", > row.names=1) > dioszegia.dist.dms <- as.dist(dioszegia.dist.dms) > class(dioszegia.dist.dms) > [1] "dist“ > dms_ind.txt is ordinary square matrix with diagonal. > But using of dudi.pco() from ade4 package fails: > pcoa <- dudi.pco(dist.dms, scannf=FALSE, nf=3) > Warning message: > In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) : > Non euclidean distance > When using Euclidean distance (function dist()), it works fine. It produces > some results despite the above error, but the results are very far from > those I get using dist() function. And the results don't look realistic. > What do I do wrong? :-) > Have a nice day! > Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/ signature.asc Description: This is a digitally signed message part. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/