On Wed, 24 Dec 2014, Bert Gunter wrote:

You said:
"The elements of the first vector are irrelevant because they are only
counted, so we should get the same result if it were a character
vector, but we don't: "

You don't get to invent your own rules! ?ave -- always nice to read the Help docs **before posting** -- clearly states that the x argument must be __numeric__. So if you choose to ignore what you are told, you do so at your own risk. Who knows what you'll get? -- it's a user error, not a bug.

I guess the goal is to humiliate the person who posted the question. I've had trouble convincing doctoral students in biostat to post questions here because they are afraid of being treated like dirt. It doesn't bother me personally, but I see it as counterproductive. The code I was working with was written by such a student and it has been in CRAN for a couple of years. I'm just trying to fix it. Your comment is helpful, but it would have been even better without the hostile tone.

Regarding the way ave() works -- why doesn't it check that the input vector is numeric? Apparently, integer input is acceptable. Does numeric sometimes mean "numeric" and sometimes "either 'integer' or 'numeric'"? Either way, if character is unacceptable, it could throw an error instead of pumping out an almost-correct answer. That made it much harder to track down the bug in the code base I was working on.

Also, regarding the sacred text, "x A numeric." is a bit terse. The same text later refers to length(x), so I suspect that "A numeric" is short for "A numeric vector", but that might not mean "a vector of 'numeric' type."

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ave.html


And if (my understanding of) what you say is the case, this whole post is silly. See ?table to do exactly what you claim is wanted without trying to invent square wheels.

table() counts elements but it has to repeat them in the proper pattern.

For every element of a vector we want to know how many times it occurs in that vector. So if the vector is c("A","A","B","C","C","C") the output should be c(2,2,1,3,3,3). I'm sure we all know that table() will count the elements, but it doesn't place them in a vector as desired. I can do this with a character vector:

charvec <- c("A","A","B","C","C","C")
as.vector(( table( charvec )[charvec] ))
[1] 2 2 1 3 3 3

It's slightly trickier with an integer vector:

intvec <- c(4,4,5,6,6,6)
table( intvec )[intvec]
intvec
<NA> <NA> <NA> <NA> <NA> <NA>
  NA   NA   NA   NA   NA   NA
as.vector(table( intvec )[as.character(intvec)])
[1] 2 2 1 3 3 3

So I think this will always work for vectors of either type:

as.vector(table( as.character(vec) )[as.character(vec)])

To me that looks like the right way to do it.  Think so?

Best,
Mike


On Wed, Dec 24, 2014 at 11:30 AM, Mike Miller <mbmille...@gmail.com> wrote:
R 3.0.1 on Linux 64...

I was working with someone else's code.  They were using ave() in a way that
I guess is nonstandard:  Isn't FUN always supposed to be a variant of
mean()?  The idea was to count for every element of a factor vector how many
times the level of that element occurs in the factor vector.


gl() makes a factor:

gl(2,2,5)

[1] 1 1 2 2 1
Levels: 1 2


ave() applies FUN to produce the desired count, and it works:

ave( 1:5, gl(2,2,5), FUN=length )

[1] 3 3 2 2 3


The elements of the first vector are irrelevant because they are only
counted, so we should get the same result if it were a character vector, but
we don't:

ave( as.character(1:5), gl(2,2,5), FUN=length )

[1] "3" "3" "2" "2" "3"

The output has character type, but it is supposed to be a collection of
vector lengths.


Two questions:

(1) Is that a bug in ave()?  It certainly is unexpected.

(2) What is the best way to do this sort of thing?

The truth is that we start with a character vector and we want to create an
integer vector that tells us for every element of the character vector how
many times that string occurs.  Here are two vectors of length 6 that should
give the same result:

intvec <- c(4,5,6,5,6,6)
charvec <- c("A","B","C","B","C","C")


The code was used like this with integer vectors and it seemed to work:

ave( intvec, intvec, FUN=length )

[1] 1 2 3 2 3 3

When a character vector came along, it would fail by producing a character
vector as output:

ave( charvec, charvec, FUN=length )

[1] "1" "2" "3" "2" "3" "3"

This seems more appropriate, and it might always work, but is it OK?:

ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum )

[1] 1 2 3 2 3 3

I suspect that ave() isn't the best choice, but what is the best way to do
this?


Thanks in advance.

Mike

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to