Re: [R] row.names in dunes and dunes.env?

2012-04-12 Thread Petr PIKAL
Hi

see inline

 
 Hello,
 
 I've got a small dataset on box turtle shell measurements that I would 
 like to perform a detrended correspondence analysis on. I thought that 
it 
 would be interesting to examine the morphometrics for each species in 
the 
 area of overlap and in areas where neither species occurs. 
 
 I've taken a look at the dune and dune.env datasets in vegan. Using the 
 str() command gives me 
 
  str(dune)
 'data.frame':   20 obs. of  30 variables:
  $ Belper: num  3 0 2 0 0 0 0 2 0 0 ...
  $ Empnig: num  0 0 0 0 0 0 0 0 0 0 ...
  $ Junbuf: num  0 3 0 0 0 0 0 0 0 0 ...
  $ Junart: num  0 0 0 3 0 0 4 0 0 3 ...
  ...
 
 However, when I try looking directly at the data frame using the edit 
 command I see that there is a column called row.names to the left of 
Belper.
 
 Likewise, when I use the str() command on dune.env I get
 
  str(dune.env)
 'data.frame':   20 obs. of  5 variables:
  $ A1: num  3.5 6 4.2 5.7 4.3 2.8 4.2 6.3 4 11.5 ...
  $ Moisture  : Ord.factor w/ 4 levels 1245: 1 4 2 4 1 1 4 1 2 
4 ...
  $ Management: Factor w/ 4 levels BF,HF,NM,..: 1 4 4 4 2 4 2 2 3 3 
...
  $ Use   : Ord.factor w/ 3 levels HayfieldHaypastu..: 2 2 2 3 
2 
 2 3 1 1 2 ...
  $ Manure: Ord.factor w/ 5 levels 0123..: 3 4 5 4 3 5 4 
3 1 1 ...
 
 but using the edit() command shows a column named row.names.

No. This is not a column but it is what it says row.names

 str(rosin)
'data.frame':   10 obs. of  5 variables:
 $ pytel: int  1 2 3 4 5 6 7 8 9 10
 $ rstr : num  1.022 0.981 0.992 1.01 0.976 ...
 $ gama : num  1.4 1.44 1.41 1.43 1.39 ...
 $ cas  : int  0 3 6 9 12 15 18 21 24 27
 $ typ  : chr  anatas anatas anatas anatas 


 head(rosin)
  pytel  rstr gama castyp
1 1 1.0216621 1.397885   0 anatas
2 2 0.9809663 1.442439   3 anatas
3 3 0.9916211 1.411767   6 anatas
^^ these are row names

 
 I assume that the the row.names column is used to link the two files 
together.

If you are in doubt, recommended way is to consult documentation.

?row.names
All data frames have a row names attribute, a character vector of length 
the number of rows with no duplicates nor missing values. 

 
 My turtle data is saved as a *.csv, and I've added a column called 
 row.names, so that it looks like this
 
 row.names,CL,CCL,CW,CCW,CH,CCH
 1,104.4,131.8,89.887,137.4,43.391,89.7
 2,108.79,135.9,87.78,118.1,50.72,71.2
 3,114.12,126.1,89.33,132.8,142.39,78.3
 4,102.87,128.2,84.2,125,45.42,72.4
 5,84.6,104.8,72.61,111.8,41.1,57.3
 
 I've called this file turtles_dca.csv. I've also created a file called 

 turtles_dca_env.csv that looks like this
 
 row.names,Species,Sex,Distribution,Concatenated,Species_overlap
 1,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
 2,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
 3,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
 4,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
 5,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
 
 However, when I read the data into R using this command
 
 turtles.env = read.csv(turtles_dca_env.csv, header = TRUE)
 
 
 and then using the str() command I get 
 
 
  str(turtles)
 'data.frame':   67 obs. of  7 variables:
  $ row.names: int  1 2 3 4 5 6 7 8 9 10 ...
  $ CL   : num  104.4 108.8 114.1 102.9 84.6 ...
  $ CCL  : num  132 136 126 128 105 ...
  $ CW   : num  89.9 87.8 89.3 84.2 72.6 ...
  $ CCW  : num  137 118 133 125 112 ...
  $ CH   : num  43.4 50.7 142.4 45.4 41.1 ...
  $ CCH  : num  89.7 71.2 78.3 72.4 57.3 73.4 67 57 68.8 68 ...
 
 When I run decorana() on this dataset, it appears that the column 
 row.names is included in the analysis, which isn't what I'm looking 
for. 

Then why you added this column to your data?

 
 If I go ahead and delete the column row.names from my data frames 
(i.e. 
 removing it from turtles and turtles.env), I don't believe that the 
 analysis is performed correctly. The two species differ significantly in 

 most of their measurements, but the ordihull() and ordispider() commands 

 show them overlapping almost completely.
 
 I think that I'm missing something pretty basic about inputting and 
 formatting this data for this analysis. Can anyone offer a suggestion on 

 where I'm going astray? I can send a copy of the data if anyone wants to 
look at it.

I am not familiar with functions you use. However you probably want to 
link those 2 files together. If they both are in the same order you can 
just do

turtles.complet - cbind(turtles, turtles.env)

Or if they are in different order you need to find some common column(s) 
and 

?merge

those two files.

Regards 
Petr


 
 Best wishes,
 Chris
 University of Central Oklahoma
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, 

[R] row.names in dunes and dunes.env?

2012-04-11 Thread Chris Butler
Hello,

I've got a small dataset on box turtle shell measurements that I would like to 
perform a detrended correspondence analysis on. I thought that it would be 
interesting to examine the morphometrics for each species in the area of 
overlap and in areas where neither species occurs. 

I've taken a look at the dune and dune.env datasets in vegan. Using the str() 
command gives me 

 str(dune)
'data.frame':   20 obs. of  30 variables:
 $ Belper: num  3 0 2 0 0 0 0 2 0 0 ...
 $ Empnig: num  0 0 0 0 0 0 0 0 0 0 ...
 $ Junbuf: num  0 3 0 0 0 0 0 0 0 0 ...
 $ Junart: num  0 0 0 3 0 0 4 0 0 3 ...
 ...

However, when I try looking directly at the data frame using the edit command I 
see that there is a column called row.names to the left of Belper.

Likewise, when I use the str() command on dune.env I get

 str(dune.env)
'data.frame':   20 obs. of  5 variables:
 $ A1        : num  3.5 6 4.2 5.7 4.3 2.8 4.2 6.3 4 11.5 ...
 $ Moisture  : Ord.factor w/ 4 levels 1245: 1 4 2 4 1 1 4 1 2 4 ...
 $ Management: Factor w/ 4 levels BF,HF,NM,..: 1 4 4 4 2 4 2 2 3 3 ...
 $ Use       : Ord.factor w/ 3 levels HayfieldHaypastu..: 2 2 2 3 2 2 3 1 
1 2 ...
 $ Manure    : Ord.factor w/ 5 levels 0123..: 3 4 5 4 3 5 4 3 1 1 
...

but using the edit() command shows a column named row.names.

I assume that the the row.names column is used to link the two files together.

My turtle data is saved as a *.csv, and I've added a column called row.names, 
so that it looks like this

row.names,CL,CCL,CW,CCW,CH,CCH
1,104.4,131.8,89.887,137.4,43.391,89.7
2,108.79,135.9,87.78,118.1,50.72,71.2
3,114.12,126.1,89.33,132.8,142.39,78.3
4,102.87,128.2,84.2,125,45.42,72.4
5,84.6,104.8,72.61,111.8,41.1,57.3

I've called this file turtles_dca.csv. I've also created a file called 
turtles_dca_env.csv that looks like this

row.names,Species,Sex,Distribution,Concatenated,Species_overlap
1,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
2,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
3,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
4,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
5,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap

However, when I read the data into R using this command

turtles.env = read.csv(turtles_dca_env.csv, header = TRUE)


and then using the str() command I get 


 str(turtles)
'data.frame':   67 obs. of  7 variables:
 $ row.names: int  1 2 3 4 5 6 7 8 9 10 ...
 $ CL       : num  104.4 108.8 114.1 102.9 84.6 ...
 $ CCL      : num  132 136 126 128 105 ...
 $ CW       : num  89.9 87.8 89.3 84.2 72.6 ...
 $ CCW      : num  137 118 133 125 112 ...
 $ CH       : num  43.4 50.7 142.4 45.4 41.1 ...
 $ CCH      : num  89.7 71.2 78.3 72.4 57.3 73.4 67 57 68.8 68 ...

When I run decorana() on this dataset, it appears that the column row.names 
is included in the analysis, which isn't what I'm looking for. 

If I go ahead and delete the column row.names from my data frames (i.e. 
removing it from turtles and turtles.env), I don't believe that the analysis is 
performed correctly. The two species differ significantly in most of their 
measurements, but the ordihull() and ordispider() commands show them 
overlapping almost completely.

I think that I'm missing something pretty basic about inputting and formatting 
this data for this analysis. Can anyone offer a suggestion on where I'm going 
astray? I can send a copy of the data if anyone wants to look at it.

Best wishes,
Chris
University of Central Oklahoma
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.