Re: [R-sig-phylo] apTreeshape and as.treeshape

2013-06-10 Thread Blum michael

Dear Fabricia,

I am the maintainer of the package.
Can you send me the tree that is responsible for the warnings please. It 
might be because of unresolved nodes.


Sincerely
Michael Blum
Le 10/06/13 17:53, Fabricia Nascimento a écrit :

Hi,

I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R 
package.

I have a set of DNA sequence in fasta format. I am following the steps below.

1. I construct a NJ using the ape R package using the nj(dist.dna(dataset).
2. After I use the "as.treeshape" form the apTreeshape package I get the following 
message when I try to plot my tree (after "converting" to treeshape format)

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf

3. So I decided to use the as.treeshape(tree, model = "yule"). After doing this 
I can plot my tree as reshape, but when I perform one of the tests I get the following 
message

when using the maxlik.betasplit
"There were 50 or more warnings (use warnings() to see the first 50)"

and when using the collies.test
  "Warning message:
In if (class(tree) != "treeshape") { :
   the condition has length > 1 and only the first element will be used"

My questions:  What are the reasons for these warnings? Can I ignore these "warnings"? 
How can I solve this "problem" and convert my NJ generated using ape in a  treeshape 
class?

Thanks!
Fabricia.



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/




--
--
Michael BLUM
CNRS Research Associate
Tel: +33 (0)4 56 52 00 65
Fax: +33 (0)4 56 52 00 55
michael.b...@imag.fr
http://membres-timc.imag.fr/Michael.Blum/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] apTreeshape and as.treeshape

2013-06-10 Thread Fabricia Nascimento
Hi,

I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R 
package.

I have a set of DNA sequence in fasta format. I am following the steps below.

1. I construct a NJ using the ape R package using the nj(dist.dna(dataset).
2. After I use the "as.treeshape" form the apTreeshape package I get the 
following message when I try to plot my tree (after "converting" to treeshape 
format)

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf

3. So I decided to use the as.treeshape(tree, model = "yule"). After doing this 
I can plot my tree as reshape, but when I perform one of the tests I get the 
following message

when using the maxlik.betasplit 
"There were 50 or more warnings (use warnings() to see the first 50)"

and when using the collies.test
 "Warning message: 
In if (class(tree) != "treeshape") { :
  the condition has length > 1 and only the first element will be used"

My questions:  What are the reasons for these warnings? Can I ignore these 
"warnings"? How can I solve this "problem" and convert my NJ generated using 
ape in a  treeshape class?

Thanks!
Fabricia.



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PCoA with custom distance matrix

2013-06-10 Thread Jombart, Thibaut

> OK, I see. So can I use dist.genpop() also for genind objects?

No, the method is defined for genind objects. These distances are defined for 
populations - they use allele frequencies. Technically speaking, allele 
frequencies are also defined for individual genotypes. But then possible values 
are 0, 0.5, 1 (diploid data), or 0/1 (haploid). In such cases basic distances 
are pretty good at capturing the main structures in the data. Note that this is 
probably true for population data too. Look at the example in dist.genpop: 5 
different distances, essentially the same results.


> I do not know about a package implementing Goldstein's distance.
>
> As for your question, you are right: adding a constant to a set of distances
> to make them Euclidean alters the geometry of the cloud of points, and is
> thus not very satisfying. This said, especially when it comes to individual
> microsatellite data, I have yet to see a fancy distance do more than the
> basic Euclidean distance in a PCoA, which in this case is also a PCA. The
> advantage of PCA is that it will also give you allele loadings which can be
> biologically meaningful. Not to say you should stick to it, but it is
> probably a good start to look at your data.

> PCA and PCoA using distance generated by dist() give me more or less same
results. 

It should be exactly the same components (possibly with different signs). This 
is because the variance (optimized in PCA) can be expressed as a function of 
the pairwise squared euclidean distances between observations (maximized in 
PCoA). Differences might come from scaling the data in PCA (PCA on 'correlation 
matrix', useless in most genetic data), or from the strategy used to replace 
missing data.

> I really wonder how this problem was solved in old Syntax (Podani,
> 2000), which was calculating PCA/PCoA from any distance. ;-)
> For now I don't see any enough good solution, but I at least understand it
> more. :-)

Again, PCA should do the trick. 


> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org]
> on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
> Sent: 10 June
> 2013 14:59
> To: r-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] PCoA with custom distance matrix
>
> Hello,
> thank You very much for Your reply. I'm sftill little confused.
>
> Dne Pá 7. června 2013 16:03:13 jste napsal(a):
>
> > Hello there,
> >
> >
> >
> > you could have done the PCoA on Chord's distance using adegenet, it would
> > have probably been simpler (see the vignette adegenet-basics, section 6
> > "Multivariate Analysis“).
>
>
> Function dist.genpop() contains Nei's distance, but man pages says it is
> only for genpop objects (I wish to analyse individuals, not populations).
> I imported the data using read.loci() and converted it to genind using
> loci2genind(). I found also function dist.genet() from ade4 package, but it
> requires genet object. So far I didn't find straightforward way to convert
> my data to it. And I haven't find any function providing the Goldstein's
> distance.
>
> > 'dist' is the canonical Euclidean distance, but dudi.pco will accept any
> > Euclidean distance. You can use "cailliez" in ade4 to make your distance
> > Euclidean before the PCoA.
>
>
> I'm not mathematician, so I don't understand one point. Let's say I have
> non-Euclidean distance matrix and some individuals (e.g. from same
> population) have zero distance (they are identical). When I use cailliez(),
> I have two possibilities (parameter cor.zero):
> 1) to add the constant also to zero-length distances, so that the shift is
> everywhere same. But as the result, the distance between originally
> identical objects is positive, so they will be treated as different in
> ongoing analysis, right?; or
> 2) keep zero-length distances, so that those objects stay identical, but
> their distances from another objects change, so the further analysis are
> biased again, right?
> Is there any solution? Or am I wrong?
>
>
> > Cheers
> > Thibaut
>
>
> Sincerely,
> Vojtěch
>
>
> > --
> > ##
> > Dr Thibaut JOMBART
> > MRC Centre for Outbreak Analysis and Modelling
> > Department of Infectious Disease Epidemiology
> > Imperial College - School of Public Health
> > St Mary’s Campus
> > Norfolk Place
> > London W2 1PG
> > United Kingdom
> > Tel. : 0044 (0)20 7594 3658
> > t.jomb...@imperial.ac.uk
> > http://sites.google.com/site/thibautjombart/
> > http://adegenet.r-forge.r-project.org/
> > 
> > From: r-sig-phylo-boun...@r-project.org
> > [r-sig-phylo-boun...@r-project.org]
> > on behalf of Vojtěch Zeisek
> > [vojtech.zei...@opensuse.org]
> > Sent: 07 June
> > 2013 15:46
> > To: mailinglist R
> > Subject: [R-sig-phylo] PCoA with custom distance matrix
> >
> >
> >
> > Hello,
> > I have microsatellite data and I would like to analyze them using PCoA in
> > R. I would like to use following geneti

Re: [R-sig-phylo] PCoA with custom distance matrix

2013-06-10 Thread Vojtěch Zeisek
Hello

Dne Po 10. června 2013 14:26:41 jste napsal(a):
> Hello, 
> 
> dist.genpop is a re-implementation of dist.genet - all material for genetics
> in ade4 should be deprecated by now, as it has been transfered into
> adegenet or adephylo a few years ago already.

OK, I see. So can I use dist.genpop() also for genind objects?

> I do not know about a package implementing Goldstein's distance. 
> 
> As for your question, you are right: adding a constant to a set of distances
> to make them Euclidean alters the geometry of the cloud of points, and is
> thus not very satisfying. This said, especially when it comes to individual
> microsatellite data, I have yet to see a fancy distance do more than the
> basic Euclidean distance in a PCoA, which in this case is also a PCA. The
> advantage of PCA is that it will also give you allele loadings which can be
> biologically meaningful. Not to say you should stick to it, but it is
> probably a good start to look at your data.

PCA and PCoA using distance generated by dist() give me more or less same 
results. I really wonder how this problem was solved in old Syntax (Podani, 
2000), which was calculating PCA/PCoA from any distance. ;-)
For now I don't see any enough good solution, but I at least understand it 
more. :-)

> Cheers
> Thibaut

Have a nice day,
Vojtěch

> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org]
> on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
> Sent: 10 June
> 2013 14:59
> To: r-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] PCoA with custom distance matrix
> 
> Hello,
> thank You very much for Your reply. I'm sftill little confused.
> 
> Dne Pá 7. června 2013 16:03:13 jste napsal(a):
> 
> > Hello there,
> >
> >
> >
> > you could have done the PCoA on Chord's distance using adegenet, it would
> > have probably been simpler (see the vignette adegenet-basics, section 6
> > "Multivariate Analysis“).
> 
> 
> Function dist.genpop() contains Nei's distance, but man pages says it is
> only for genpop objects (I wish to analyse individuals, not populations).
> I imported the data using read.loci() and converted it to genind using
> loci2genind(). I found also function dist.genet() from ade4 package, but it
> requires genet object. So far I didn't find straightforward way to convert
> my data to it. And I haven't find any function providing the Goldstein's
> distance. 
> 
> > 'dist' is the canonical Euclidean distance, but dudi.pco will accept any
> > Euclidean distance. You can use "cailliez" in ade4 to make your distance
> > Euclidean before the PCoA.
> 
> 
> I'm not mathematician, so I don't understand one point. Let's say I have
> non-Euclidean distance matrix and some individuals (e.g. from same
> population) have zero distance (they are identical). When I use cailliez(),
> I have two possibilities (parameter cor.zero):
> 1) to add the constant also to zero-length distances, so that the shift is
> everywhere same. But as the result, the distance between originally
> identical objects is positive, so they will be treated as different in
> ongoing analysis, right?; or
> 2) keep zero-length distances, so that those objects stay identical, but
> their distances from another objects change, so the further analysis are
> biased again, right?
> Is there any solution? Or am I wrong?
> 
> 
> > Cheers
> > Thibaut
> 
> 
> Sincerely,
> Vojtěch
> 
> 
> > --
> > ##
> > Dr Thibaut JOMBART
> > MRC Centre for Outbreak Analysis and Modelling
> > Department of Infectious Disease Epidemiology
> > Imperial College - School of Public Health
> > St Mary’s Campus
> > Norfolk Place
> > London W2 1PG
> > United Kingdom
> > Tel. : 0044 (0)20 7594 3658
> > t.jomb...@imperial.ac.uk
> > http://sites.google.com/site/thibautjombart/
> > http://adegenet.r-forge.r-project.org/
> > 
> > From: r-sig-phylo-boun...@r-project.org
> > [r-sig-phylo-boun...@r-project.org]
> > on behalf of Vojtěch Zeisek
> > [vojtech.zei...@opensuse.org]
> > Sent: 07 June
> > 2013 15:46
> > To: mailinglist R
> > Subject: [R-sig-phylo] PCoA with custom distance matrix
> >
> >
> >
> > Hello,
> > I have microsatellite data and I would like to analyze them using PCoA in
> > R. I would like to use following genetic distance: Goldstein's (1995)
> > (dμ)2 and Nei's chord distance (1983). I calculated those distances in
> > MSA (Dieringer and Schlötterer 2003), because I didn't find any
> > possibility to calculate them in R. I imported the distances like that:
> > dist.dms <- read.csv("dms_ind.txt", header=TRUE, sep="\t", dec=".",
> > row.names=1)
> > dioszegia.dist.dms <- as.dist(dioszegia.dist.dms)
> > class(dioszegia.dist.dms)
> > [1] "dist“
> > dms_ind.txt is ordinary square matrix with diagonal.
> > But using of dudi.pco() from ade4 package fails:
> > pcoa <- dudi.pco(dist.dms, scannf=FALSE, nf=3)
> > Warning message:
> > In dudi.pco(dioszegia.dist.

Re: [R-sig-phylo] PCoA with custom distance matrix

2013-06-10 Thread Jombart, Thibaut

Hello, 

dist.genpop is a re-implementation of dist.genet - all material for genetics in 
ade4 should be deprecated by now, as it has been transfered into adegenet or 
adephylo a few years ago already.

I do not know about a package implementing Goldstein's distance. 

As for your question, you are right: adding a constant to a set of distances to 
make them Euclidean alters the geometry of the cloud of points, and is thus not 
very satisfying. This said, especially when it comes to individual 
microsatellite data, I have yet to see a fancy distance do more than the basic 
Euclidean distance in a PCoA, which in this case is also a PCA. The advantage 
of PCA is that it will also give you allele loadings which can be biologically 
meaningful. Not to say you should stick to it, but it is probably a good start 
to look at your data.

Cheers
Thibaut


From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
Sent: 10 June 2013 14:59
To: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] PCoA with custom distance matrix

Hello,
thank You very much for Your reply. I'm sftill little confused.

Dne Pá 7. června 2013 16:03:13 jste napsal(a):
> Hello there,
>
> you could have done the PCoA on Chord's distance using adegenet, it would
> have probably been simpler (see the vignette adegenet-basics, section 6
> "Multivariate Analysis“).

Function dist.genpop() contains Nei's distance, but man pages says it is only
for genpop objects (I wish to analyse individuals, not populations). I
imported the data using read.loci() and converted it to genind using
loci2genind(). I found also function dist.genet() from ade4 package, but it
requires genet object. So far I didn't find straightforward way to convert my
data to it. And I haven't find any function providing the Goldstein's distance.

> 'dist' is the canonical Euclidean distance, but dudi.pco will accept any
> Euclidean distance. You can use "cailliez" in ade4 to make your distance
> Euclidean before the PCoA.

I'm not mathematician, so I don't understand one point. Let's say I have non-
Euclidean distance matrix and some individuals (e.g. from same population)
have zero distance (they are identical). When I use cailliez(), I have two
possibilities (parameter cor.zero):
1) to add the constant also to zero-length distances, so that the shift is
everywhere same. But as the result, the distance between originally identical
objects is positive, so they will be treated as different in ongoing analysis,
right?; or
2) keep zero-length distances, so that those objects stay identical, but their
distances from another objects change, so the further analysis are biased
again, right?
Is there any solution? Or am I wrong?

> Cheers
> Thibaut

Sincerely,
Vojtěch

> --
> ##
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary’s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> t.jomb...@imperial.ac.uk
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org]
> on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
> Sent: 07 June
> 2013 15:46
> To: mailinglist R
> Subject: [R-sig-phylo] PCoA with custom distance matrix
>
> Hello,
> I have microsatellite data and I would like to analyze them using PCoA in R.
> I would like to use following genetic distance: Goldstein's (1995) (dμ)2
> and Nei's chord distance (1983). I calculated those distances in MSA
> (Dieringer and Schlötterer 2003), because I didn't find any possibility to
> calculate them in R. I imported the distances like that:
> dist.dms <- read.csv("dms_ind.txt", header=TRUE, sep="\t", dec=".",
> row.names=1)
> dioszegia.dist.dms <- as.dist(dioszegia.dist.dms)
> class(dioszegia.dist.dms)
> [1] "dist“
> dms_ind.txt is ordinary square matrix with diagonal.
> But using of dudi.pco() from ade4 package fails:
> pcoa <- dudi.pco(dist.dms, scannf=FALSE, nf=3)
> Warning message:
> In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) :
>   Non euclidean distance
> When using Euclidean distance (function dist()), it works fine. It produces
> some results despite the above error, but the results are very far from
> those I get using dist() function. And the results don't look realistic.
> What do I do wrong? :-)
> Have a nice day!
> Vojtěch
--
Vojtěch Zeisek

Komunita openSUSE GNU/Linuxu
Community of the openSUSE GNU/Linux

http://www.opensuse.org/
http://trapa.cz/
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-pr

Re: [R-sig-phylo] PCoA with custom distance matrix

2013-06-10 Thread Vojtěch Zeisek
Hello,
thank You very much for Your reply. I'm still little confused.

Dne Pá 7. června 2013 16:03:13 jste napsal(a):
> Hello there, 
> 
> you could have done the PCoA on Chord's distance using adegenet, it would
> have probably been simpler (see the vignette adegenet-basics, section 6
> "Multivariate Analysis“).

Function dist.genpop() contains Nei's distance, but man pages says it is only 
for genpop objects (I wish to analyse individuals, not populations). I 
imported the data using read.loci() and converted it to genind using 
loci2genind(). I found also function dist.genet() from ade4 package, but it 
requires genet object. So far I didn't find straightforward way to convert my 
data to it. And I haven't find any function providing the Goldstein's distance.

> 'dist' is the canonical Euclidean distance, but dudi.pco will accept any
> Euclidean distance. You can use "cailliez" in ade4 to make your distance
> Euclidean before the PCoA.

I'm not mathematician, so I don't understand one point. Let's say I have non-
Euclidean distance matrix and some individuals (e.g. from same population) 
have zero distance (they are identical). When I use cailliez(), I have two 
possibilities (parameter cor.zero):
1) to add the constant also to zero-length distances, so that the shift is 
everywhere same. But as the result, the distance between originally identical 
objects is positive, so they will be treated as different in ongoing analysis, 
right?; or
2) keep zero-length distances, so that those objects stay identical, but their 
distances from another objects change, so the further analysis are biased 
again, right?
Is there any solution? Or am I wrong?

> Cheers
> Thibaut

Sincerely,
Vojtěch

> --
> ##
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary’s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> t.jomb...@imperial.ac.uk
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> 
> From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org]
> on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
> Sent: 07 June
> 2013 15:46
> To: mailinglist R
> Subject: [R-sig-phylo] PCoA with custom distance matrix
> 
> Hello,
> I have microsatellite data and I would like to analyze them using PCoA in R.
> I would like to use following genetic distance: Goldstein's (1995) (dμ)2
> and Nei's chord distance (1983). I calculated those distances in MSA
> (Dieringer and Schlötterer 2003), because I didn't find any possibility to
> calculate them in R. I imported the distances like that:
> dist.dms <- read.csv("dms_ind.txt", header=TRUE, sep="\t", dec=".",
> row.names=1)
> dioszegia.dist.dms <- as.dist(dioszegia.dist.dms)
> class(dioszegia.dist.dms)
> [1] "dist“
> dms_ind.txt is ordinary square matrix with diagonal.
> But using of dudi.pco() from ade4 package fails:
> pcoa <- dudi.pco(dist.dms, scannf=FALSE, nf=3)
> Warning message:
> In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) :
>   Non euclidean distance
> When using Euclidean distance (function dist()), it works fine. It produces
> some results despite the above error, but the results are very far from
> those I get using dist() function. And the results don't look realistic.
> What do I do wrong? :-)
> Have a nice day!
> Vojtěch
-- 
Vojtěch Zeisek

Komunita openSUSE GNU/Linuxu
Community of the openSUSE GNU/Linux

http://www.opensuse.org/
http://trapa.cz/


signature.asc
Description: This is a digitally signed message part.
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/