Re: [R-sig-phylo] PCoA with custom distance matrix

2013-06-10 Thread Vojtěch Zeisek
Hello,
thank You very much for Your reply. I'm still little confused.

Dne Pá 7. června 2013 16:03:13 jste napsal(a):
 Hello there, 
 
 you could have done the PCoA on Chord's distance using adegenet, it would
 have probably been simpler (see the vignette adegenet-basics, section 6
 Multivariate Analysis“).

Function dist.genpop() contains Nei's distance, but man pages says it is only 
for genpop objects (I wish to analyse individuals, not populations). I 
imported the data using read.loci() and converted it to genind using 
loci2genind(). I found also function dist.genet() from ade4 package, but it 
requires genet object. So far I didn't find straightforward way to convert my 
data to it. And I haven't find any function providing the Goldstein's distance.

 'dist' is the canonical Euclidean distance, but dudi.pco will accept any
 Euclidean distance. You can use cailliez in ade4 to make your distance
 Euclidean before the PCoA.

I'm not mathematician, so I don't understand one point. Let's say I have non-
Euclidean distance matrix and some individuals (e.g. from same population) 
have zero distance (they are identical). When I use cailliez(), I have two 
possibilities (parameter cor.zero):
1) to add the constant also to zero-length distances, so that the shift is 
everywhere same. But as the result, the distance between originally identical 
objects is positive, so they will be treated as different in ongoing analysis, 
right?; or
2) keep zero-length distances, so that those objects stay identical, but their 
distances from another objects change, so the further analysis are biased 
again, right?
Is there any solution? Or am I wrong?

 Cheers
 Thibaut

Sincerely,
Vojtěch

 --
 ##
 Dr Thibaut JOMBART
 MRC Centre for Outbreak Analysis and Modelling
 Department of Infectious Disease Epidemiology
 Imperial College - School of Public Health
 St Mary’s Campus
 Norfolk Place
 London W2 1PG
 United Kingdom
 Tel. : 0044 (0)20 7594 3658
 t.jomb...@imperial.ac.uk
 http://sites.google.com/site/thibautjombart/
 http://adegenet.r-forge.r-project.org/
 
 From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org]
 on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
 Sent: 07 June
 2013 15:46
 To: mailinglist R
 Subject: [R-sig-phylo] PCoA with custom distance matrix
 
 Hello,
 I have microsatellite data and I would like to analyze them using PCoA in R.
 I would like to use following genetic distance: Goldstein's (1995) (dμ)2
 and Nei's chord distance (1983). I calculated those distances in MSA
 (Dieringer and Schlötterer 2003), because I didn't find any possibility to
 calculate them in R. I imported the distances like that:
 dist.dms - read.csv(dms_ind.txt, header=TRUE, sep=\t, dec=.,
 row.names=1)
 dioszegia.dist.dms - as.dist(dioszegia.dist.dms)
 class(dioszegia.dist.dms)
 [1] dist“
 dms_ind.txt is ordinary square matrix with diagonal.
 But using of dudi.pco() from ade4 package fails:
 pcoa - dudi.pco(dist.dms, scannf=FALSE, nf=3)
 Warning message:
 In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) :
   Non euclidean distance
 When using Euclidean distance (function dist()), it works fine. It produces
 some results despite the above error, but the results are very far from
 those I get using dist() function. And the results don't look realistic.
 What do I do wrong? :-)
 Have a nice day!
 Vojtěch
-- 
Vojtěch Zeisek

Komunita openSUSE GNU/Linuxu
Community of the openSUSE GNU/Linux

http://www.opensuse.org/
http://trapa.cz/


signature.asc
Description: This is a digitally signed message part.
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] PCoA with custom distance matrix

2013-06-10 Thread Jombart, Thibaut

Hello, 

dist.genpop is a re-implementation of dist.genet - all material for genetics in 
ade4 should be deprecated by now, as it has been transfered into adegenet or 
adephylo a few years ago already.

I do not know about a package implementing Goldstein's distance. 

As for your question, you are right: adding a constant to a set of distances to 
make them Euclidean alters the geometry of the cloud of points, and is thus not 
very satisfying. This said, especially when it comes to individual 
microsatellite data, I have yet to see a fancy distance do more than the basic 
Euclidean distance in a PCoA, which in this case is also a PCA. The advantage 
of PCA is that it will also give you allele loadings which can be biologically 
meaningful. Not to say you should stick to it, but it is probably a good start 
to look at your data.

Cheers
Thibaut


From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
Sent: 10 June 2013 14:59
To: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] PCoA with custom distance matrix

Hello,
thank You very much for Your reply. I'm sftill little confused.

Dne Pá 7. června 2013 16:03:13 jste napsal(a):
 Hello there,

 you could have done the PCoA on Chord's distance using adegenet, it would
 have probably been simpler (see the vignette adegenet-basics, section 6
 Multivariate Analysis“).

Function dist.genpop() contains Nei's distance, but man pages says it is only
for genpop objects (I wish to analyse individuals, not populations). I
imported the data using read.loci() and converted it to genind using
loci2genind(). I found also function dist.genet() from ade4 package, but it
requires genet object. So far I didn't find straightforward way to convert my
data to it. And I haven't find any function providing the Goldstein's distance.

 'dist' is the canonical Euclidean distance, but dudi.pco will accept any
 Euclidean distance. You can use cailliez in ade4 to make your distance
 Euclidean before the PCoA.

I'm not mathematician, so I don't understand one point. Let's say I have non-
Euclidean distance matrix and some individuals (e.g. from same population)
have zero distance (they are identical). When I use cailliez(), I have two
possibilities (parameter cor.zero):
1) to add the constant also to zero-length distances, so that the shift is
everywhere same. But as the result, the distance between originally identical
objects is positive, so they will be treated as different in ongoing analysis,
right?; or
2) keep zero-length distances, so that those objects stay identical, but their
distances from another objects change, so the further analysis are biased
again, right?
Is there any solution? Or am I wrong?

 Cheers
 Thibaut

Sincerely,
Vojtěch

 --
 ##
 Dr Thibaut JOMBART
 MRC Centre for Outbreak Analysis and Modelling
 Department of Infectious Disease Epidemiology
 Imperial College - School of Public Health
 St Mary’s Campus
 Norfolk Place
 London W2 1PG
 United Kingdom
 Tel. : 0044 (0)20 7594 3658
 t.jomb...@imperial.ac.uk
 http://sites.google.com/site/thibautjombart/
 http://adegenet.r-forge.r-project.org/
 
 From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org]
 on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
 Sent: 07 June
 2013 15:46
 To: mailinglist R
 Subject: [R-sig-phylo] PCoA with custom distance matrix

 Hello,
 I have microsatellite data and I would like to analyze them using PCoA in R.
 I would like to use following genetic distance: Goldstein's (1995) (dμ)2
 and Nei's chord distance (1983). I calculated those distances in MSA
 (Dieringer and Schlötterer 2003), because I didn't find any possibility to
 calculate them in R. I imported the distances like that:
 dist.dms - read.csv(dms_ind.txt, header=TRUE, sep=\t, dec=.,
 row.names=1)
 dioszegia.dist.dms - as.dist(dioszegia.dist.dms)
 class(dioszegia.dist.dms)
 [1] dist“
 dms_ind.txt is ordinary square matrix with diagonal.
 But using of dudi.pco() from ade4 package fails:
 pcoa - dudi.pco(dist.dms, scannf=FALSE, nf=3)
 Warning message:
 In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) :
   Non euclidean distance
 When using Euclidean distance (function dist()), it works fine. It produces
 some results despite the above error, but the results are very far from
 those I get using dist() function. And the results don't look realistic.
 What do I do wrong? :-)
 Have a nice day!
 Vojtěch
--
Vojtěch Zeisek

Komunita openSUSE GNU/Linuxu
Community of the openSUSE GNU/Linux

http://www.opensuse.org/
http://trapa.cz/
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] PCoA with custom distance matrix

2013-06-10 Thread Vojtěch Zeisek
Hello

Dne Po 10. června 2013 14:26:41 jste napsal(a):
 Hello, 
 
 dist.genpop is a re-implementation of dist.genet - all material for genetics
 in ade4 should be deprecated by now, as it has been transfered into
 adegenet or adephylo a few years ago already.

OK, I see. So can I use dist.genpop() also for genind objects?

 I do not know about a package implementing Goldstein's distance. 
 
 As for your question, you are right: adding a constant to a set of distances
 to make them Euclidean alters the geometry of the cloud of points, and is
 thus not very satisfying. This said, especially when it comes to individual
 microsatellite data, I have yet to see a fancy distance do more than the
 basic Euclidean distance in a PCoA, which in this case is also a PCA. The
 advantage of PCA is that it will also give you allele loadings which can be
 biologically meaningful. Not to say you should stick to it, but it is
 probably a good start to look at your data.

PCA and PCoA using distance generated by dist() give me more or less same 
results. I really wonder how this problem was solved in old Syntax (Podani, 
2000), which was calculating PCA/PCoA from any distance. ;-)
For now I don't see any enough good solution, but I at least understand it 
more. :-)

 Cheers
 Thibaut

Have a nice day,
Vojtěch

 
 From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org]
 on behalf of Vojtěch Zeisek [vojtech.zei...@opensuse.org]
 Sent: 10 June
 2013 14:59
 To: r-sig-phylo@r-project.org
 Subject: Re: [R-sig-phylo] PCoA with custom distance matrix
 
 Hello,
 thank You very much for Your reply. I'm sftill little confused.
 
 Dne Pá 7. června 2013 16:03:13 jste napsal(a):
 
  Hello there,
 
 
 
  you could have done the PCoA on Chord's distance using adegenet, it would
  have probably been simpler (see the vignette adegenet-basics, section 6
  Multivariate Analysis“).
 
 
 Function dist.genpop() contains Nei's distance, but man pages says it is
 only for genpop objects (I wish to analyse individuals, not populations).
 I imported the data using read.loci() and converted it to genind using
 loci2genind(). I found also function dist.genet() from ade4 package, but it
 requires genet object. So far I didn't find straightforward way to convert
 my data to it. And I haven't find any function providing the Goldstein's
 distance. 
 
  'dist' is the canonical Euclidean distance, but dudi.pco will accept any
  Euclidean distance. You can use cailliez in ade4 to make your distance
  Euclidean before the PCoA.
 
 
 I'm not mathematician, so I don't understand one point. Let's say I have
 non-Euclidean distance matrix and some individuals (e.g. from same
 population) have zero distance (they are identical). When I use cailliez(),
 I have two possibilities (parameter cor.zero):
 1) to add the constant also to zero-length distances, so that the shift is
 everywhere same. But as the result, the distance between originally
 identical objects is positive, so they will be treated as different in
 ongoing analysis, right?; or
 2) keep zero-length distances, so that those objects stay identical, but
 their distances from another objects change, so the further analysis are
 biased again, right?
 Is there any solution? Or am I wrong?
 
 
  Cheers
  Thibaut
 
 
 Sincerely,
 Vojtěch
 
 
  --
  ##
  Dr Thibaut JOMBART
  MRC Centre for Outbreak Analysis and Modelling
  Department of Infectious Disease Epidemiology
  Imperial College - School of Public Health
  St Mary’s Campus
  Norfolk Place
  London W2 1PG
  United Kingdom
  Tel. : 0044 (0)20 7594 3658
  t.jomb...@imperial.ac.uk
  http://sites.google.com/site/thibautjombart/
  http://adegenet.r-forge.r-project.org/
  
  From: r-sig-phylo-boun...@r-project.org
  [r-sig-phylo-boun...@r-project.org]
  on behalf of Vojtěch Zeisek
  [vojtech.zei...@opensuse.org]
  Sent: 07 June
  2013 15:46
  To: mailinglist R
  Subject: [R-sig-phylo] PCoA with custom distance matrix
 
 
 
  Hello,
  I have microsatellite data and I would like to analyze them using PCoA in
  R. I would like to use following genetic distance: Goldstein's (1995)
  (dμ)2 and Nei's chord distance (1983). I calculated those distances in
  MSA (Dieringer and Schlötterer 2003), because I didn't find any
  possibility to calculate them in R. I imported the distances like that:
  dist.dms - read.csv(dms_ind.txt, header=TRUE, sep=\t, dec=.,
  row.names=1)
  dioszegia.dist.dms - as.dist(dioszegia.dist.dms)
  class(dioszegia.dist.dms)
  [1] dist“
  dms_ind.txt is ordinary square matrix with diagonal.
  But using of dudi.pco() from ade4 package fails:
  pcoa - dudi.pco(dist.dms, scannf=FALSE, nf=3)
  Warning message:
  In dudi.pco(dioszegia.dist.dms, scannf = FALSE, nf = 3) :
  
Non euclidean distance
  
  When using Euclidean distance (function dist()), it works fine. It
  produces some results despite the above error, 

[R-sig-phylo] apTreeshape and as.treeshape

2013-06-10 Thread Fabricia Nascimento
Hi,

I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R 
package.

I have a set of DNA sequence in fasta format. I am following the steps below.

1. I construct a NJ using the ape R package using the nj(dist.dna(dataset).
2. After I use the as.treeshape form the apTreeshape package I get the 
following message when I try to plot my tree (after converting to treeshape 
format)

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf

3. So I decided to use the as.treeshape(tree, model = yule). After doing this 
I can plot my tree as reshape, but when I perform one of the tests I get the 
following message

when using the maxlik.betasplit 
There were 50 or more warnings (use warnings() to see the first 50)

and when using the collies.test
 Warning message: 
In if (class(tree) != treeshape) { :
  the condition has length  1 and only the first element will be used

My questions:  What are the reasons for these warnings? Can I ignore these 
warnings? How can I solve this problem and convert my NJ generated using 
ape in a  treeshape class?

Thanks!
Fabricia.



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] apTreeshape and as.treeshape

2013-06-10 Thread Blum michael

Dear Fabricia,

I am the maintainer of the package.
Can you send me the tree that is responsible for the warnings please. It 
might be because of unresolved nodes.


Sincerely
Michael Blum
Le 10/06/13 17:53, Fabricia Nascimento a écrit :

Hi,

I am trying to use the collies.test and maxlik.betasplit tests in apTreeshape R 
package.

I have a set of DNA sequence in fasta format. I am following the steps below.

1. I construct a NJ using the ape R package using the nj(dist.dna(dataset).
2. After I use the as.treeshape form the apTreeshape package I get the following 
message when I try to plot my tree (after converting to treeshape format)

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf

3. So I decided to use the as.treeshape(tree, model = yule). After doing this 
I can plot my tree as reshape, but when I perform one of the tests I get the following 
message

when using the maxlik.betasplit
There were 50 or more warnings (use warnings() to see the first 50)

and when using the collies.test
  Warning message:
In if (class(tree) != treeshape) { :
   the condition has length  1 and only the first element will be used

My questions:  What are the reasons for these warnings? Can I ignore these warnings? 
How can I solve this problem and convert my NJ generated using ape in a  treeshape 
class?

Thanks!
Fabricia.



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/




--
--
Michael BLUM
CNRS Research Associate
Tel: +33 (0)4 56 52 00 65
Fax: +33 (0)4 56 52 00 55
michael.b...@imag.fr
http://membres-timc.imag.fr/Michael.Blum/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/