[R] problem with factor levels

2012-12-04 Thread Jeremy.Shearman
Hi
  I have a data.frame with 371,718 obs. of 12 variables (see below for
an str). My problem is with V1, a Factor w/ 93144 levels, there should
actually be 93994 levels. Each entry looks like:
comp[number]_c[number]_seq[number]
for example
comp215489_c0_seq40
R is grouping as though the last number is a decimal for some reason, in
other words comp215489_c0_seq40 and comp215489_c0_seq4 are considered to be
the same. My problem is that they are not the same so when I group by this
factor I am losing 800 levels.

Here is an str

'data.frame':   371718 obs. of  12 variables:
 $ V1 : Factor w/ 93144 levels comp10_c0_seq1,..: 92271 91685 29 30
1564 1564 1623 91700 91701 91848 ...
 $ V2 : Factor w/ 17162 levels gi|345842331|ref|NM_001244016.1|,..: 10119
10779 13210 13210 11522 8115 13079 14493 14493 15858 ...
 $ V3 : num  95.5 90.2 98.7 99.2 81.4 ...
 $ V4 : int  335 153 237 122 258 127 306 258 120 177 ...
 $ V5 : int  15 15 3 1 38 19 20 23 5 9 ...
 $ V6 : int  0 0 0 0 4 2 0 0 0 0 ...
 $ V7 : int  1 45 1 43 1 129 1 54 1 70 ...
 $ V8 : int  335 197 237 164 254 254 306 311 120 246 ...
 $ V9 : int  6866 18 3172 3438 67 122 3927 42 346 195 ...
 $ V10: int  7200 170 3408 3559 318 247 4232 299 465 19 ...
 $ V11: num  7e-155 2e-46 4e-125 2e-61 3e-24 ...
 $ V12: num  545 184 446 234 111 69.9 448 329 198 280 ..



--
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-factor-levels-tp4652006.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with factor levels

2012-12-04 Thread Jeremy.Shearman
Oh, your skepticism was spot on!
I was using excel to check the output (silly, but I am still in the process
of moving from excel to R) and there was a discrepancy in the number of
output from R and excel. Turns out the problem was with excel and not with R
at all. That's a relief.

SOLVED




--
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-factor-levels-tp4652006p4652019.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.