[R] Number of replications of a term
Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector - function(obj){ # order IDName ord - gif.indexByIDName(obj) ordobj - obj[ord,] nspots - nrow(obj) # vector of spot replicates number spotrep - rep(NA, nspots ) # function to get ID:Name for a given spot spotidname - function(ind){ paste(ordobj$genes[ind, c(ID,Name) ], collapse=:) } spot - 1 while( spot nspots ){ i-1 while( spotidname(spot) == spotidname(spot + i) ){ i - i + 1 } spotrep[spot : (spot + i-1)] - i spot - spot + i #cat(spot : ,spot,\n) } obj$genes$spotrep - spotrep[order(ord)] obj } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) One-liner: table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 'table(ids)' computes the counts, then the subscripting [ids] looks it all up. Now try it on your 40,000-long vector! Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
Laetitia Marisa [EMAIL PROTECTED] writes: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Will this do it? table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 Or (could be faster): f - factor(ids,levels=unique(ids)) as.vector(table(f))[f] [1] 1 2 2 3 3 3 1 -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) rep(as.vector(table(ids)), as.vector(table(ids))) [1] 1 2 2 3 3 3 1 Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector - function(obj){ # order IDName ord - gif.indexByIDName(obj) ordobj - obj[ord,] nspots - nrow(obj) # vector of spot replicates number spotrep - rep(NA, nspots ) # function to get ID:Name for a given spot spotidname - function(ind){ paste(ordobj$genes[ind, c(ID,Name) ], collapse=:) } spot - 1 while( spot nspots ){ i-1 while( spotidname(spot) == spotidname(spot + i) ){ i - i + 1 } spotrep[spot : (spot + i-1)] - i spot - spot + i #cat(spot : ,spot,\n) } obj$genes$spotrep - spotrep[order(ord)] obj } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
This should work: ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 Andy From: Laetitia Marisa Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector - function(obj){ # order IDName ord - gif.indexByIDName(obj) ordobj - obj[ord,] nspots - nrow(obj) # vector of spot replicates number spotrep - rep(NA, nspots ) # function to get ID:Name for a given spot spotidname - function(ind){ paste(ordobj$genes[ind, c(ID,Name) ], collapse=:) } spot - 1 while( spot nspots ){ i-1 while( spotidname(spot) == spotidname(spot + i) ){ i - i + 1 } spotrep[spot : (spot + i-1)] - i spot - spot + i #cat(spot : ,spot,\n) } obj$genes$spotrep - spotrep[order(ord)] obj } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
Try this: ave(as.numeric(factor(ds)), ds, FUN = length) See ?ave for more info. On 1/24/06, Laetitia Marisa [EMAIL PROTECTED] wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector - function(obj){ # order IDName ord - gif.indexByIDName(obj) ordobj - obj[ord,] nspots - nrow(obj) # vector of spot replicates number spotrep - rep(NA, nspots ) # function to get ID:Name for a given spot spotidname - function(ind){ paste(ordobj$genes[ind, c(ID,Name) ], collapse=:) } spot - 1 while( spot nspots ){ i-1 while( spotidname(spot) == spotidname(spot + i) ){ i - i + 1 } spotrep[spot : (spot + i-1)] - i spot - spot + i #cat(spot : ,spot,\n) } obj$genes$spotrep - spotrep[order(ord)] obj } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
table() -thomas On Tue, 24 Jan 2006, Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector - function(obj){ # order IDName ord - gif.indexByIDName(obj) ordobj - obj[ord,] nspots - nrow(obj) # vector of spot replicates number spotrep - rep(NA, nspots ) # function to get ID:Name for a given spot spotidname - function(ind){ paste(ordobj$genes[ind, c(ID,Name) ], collapse=:) } spot - 1 while( spot nspots ){ i-1 while( spotidname(spot) == spotidname(spot + i) ){ i - i + 1 } spotrep[spot : (spot + i-1)] - i spot - spot + i #cat(spot : ,spot,\n) } obj$genes$spotrep - spotrep[order(ord)] obj } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
?table ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) x - table(ids) x ids ID1 ID2 ID3 ID5 1 2 3 1 count - x[ids] # index using the names in the string count ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 On 1/24/06, Laetitia Marisa [EMAIL PROTECTED] wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector - function(obj){ # order IDName ord - gif.indexByIDName(obj) ordobj - obj[ord,] nspots - nrow(obj) # vector of spot replicates number spotrep - rep(NA, nspots ) # function to get ID:Name for a given spot spotidname - function(ind){ paste(ordobj$genes[ind, c(ID,Name) ], collapse=:) } spot - 1 while( spot nspots ){ i-1 while( spotidname(spot) == spotidname(spot + i) ){ i - i + 1 } spotrep[spot : (spot + i-1)] - i spot - spot + i #cat(spot : ,spot,\n) } obj$genes$spotrep - spotrep[order(ord)] obj } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
Ah. It's a bit more complicated than just table(), because you want the result to be the same length. tt - table(id) tt[match(id,names(tt))] -thomas On Tue, 24 Jan 2006, Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector - function(obj){ # order IDName ord - gif.indexByIDName(obj) ordobj - obj[ord,] nspots - nrow(obj) # vector of spot replicates number spotrep - rep(NA, nspots ) # function to get ID:Name for a given spot spotidname - function(ind){ paste(ordobj$genes[ind, c(ID,Name) ], collapse=:) } spot - 1 while( spot nspots ){ i-1 while( spotidname(spot) == spotidname(spot + i) ){ i - i + 1 } spotrep[spot : (spot + i-1)] - i spot - spot + i #cat(spot : ,spot,\n) } obj$genes$spotrep - spotrep[order(ord)] obj } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
It is great! It takes now less than 1 second (with the table function (0.34'), 2 sec with the ave function (1.91') ) with my big data and only two lines of code ;). Thanks a lot every one. Regards, Laetitia. Laetitia Marisa [EMAIL PROTECTED] writes: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Will this do it? table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 Or (could be faster): f - factor(ids,levels=unique(ids)) as.vector(table(f))[f] [1] 1 2 2 3 3 3 1 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
Nice. I timed it and its much faster than mine too. On 1/24/06, Barry Rowlingson [EMAIL PROTECTED] wrote: Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) One-liner: table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 'table(ids)' computes the counts, then the subscripting [ids] looks it all up. Now try it on your 40,000-long vector! Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
There's an even faster one, which nobody seems to have mentioned yet: rep(l - rle(ids)$lengths, l) Timing on my 2.8GHz NetBSD system shows: length(ids) [1] 45150 # Gabor: system.time(for (i in 1:100) ave(as.numeric(factor(ids)), ids, FUN = length)) [1] 3.45 0.06 3.54 0.00 0.00 # Barry (and others I think): system.time(for (i in 1:100) table(ids)[ids]) [1] 2.13 0.05 2.20 0.00 0.00 Me: system.time(for (i in 1:100) rep(l - rle(ids)$lengths, l)) [1] 1.60 0.00 1.62 0.00 0.00 Of course the difference between 21 milliseconds and 16 milliseconds is not great, unless you are doing this a lot. Ray Brownrigg From: Gabor Grothendieck [EMAIL PROTECTED] Nice. I timed it and its much faster than mine too. On 1/24/06, Barry Rowlingson [EMAIL PROTECTED] wrote: Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) One-liner: table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 'table(ids)' computes the counts, then the subscripting [ids] looks it all up. Now try it on your 40,000-long vector! Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
Note that that assumes that all occurrences of a value are contiguous. On 1/24/06, Ray Brownrigg [EMAIL PROTECTED] wrote: There's an even faster one, which nobody seems to have mentioned yet: rep(l - rle(ids)$lengths, l) Timing on my 2.8GHz NetBSD system shows: length(ids) [1] 45150 # Gabor: system.time(for (i in 1:100) ave(as.numeric(factor(ids)), ids, FUN = length)) [1] 3.45 0.06 3.54 0.00 0.00 # Barry (and others I think): system.time(for (i in 1:100) table(ids)[ids]) [1] 2.13 0.05 2.20 0.00 0.00 Me: system.time(for (i in 1:100) rep(l - rle(ids)$lengths, l)) [1] 1.60 0.00 1.62 0.00 0.00 Of course the difference between 21 milliseconds and 16 milliseconds is not great, unless you are doing this a lot. Ray Brownrigg From: Gabor Grothendieck [EMAIL PROTECTED] Nice. I timed it and its much faster than mine too. On 1/24/06, Barry Rowlingson [EMAIL PROTECTED] wrote: Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) One-liner: table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 'table(ids)' computes the counts, then the subscripting [ids] looks it all up. Now try it on your 40,000-long vector! Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Number of replications of a term
On Wed, 25 Jan 2006, Ray Brownrigg wrote: There's an even faster one, which nobody seems to have mentioned yet: rep(l - rle(ids)$lengths, l) I considered this but it wasn't clear to me from the initial post that each ID occupied a contiguous section of the vector. Also, lazy evaluation makes code like this rep(l - rle(ids)$lengths, l) a bit worrying. It relies on rep() using the first argument before it uses the second one. In this case, clearly, it works, but it is not a style I would encourage and it's easy to construct functions where it fails. -thomas Timing on my 2.8GHz NetBSD system shows: length(ids) [1] 45150 # Gabor: system.time(for (i in 1:100) ave(as.numeric(factor(ids)), ids, FUN = length)) [1] 3.45 0.06 3.54 0.00 0.00 # Barry (and others I think): system.time(for (i in 1:100) table(ids)[ids]) [1] 2.13 0.05 2.20 0.00 0.00 Me: system.time(for (i in 1:100) rep(l - rle(ids)$lengths, l)) [1] 1.60 0.00 1.62 0.00 0.00 Of course the difference between 21 milliseconds and 16 milliseconds is not great, unless you are doing this a lot. Ray Brownrigg From: Gabor Grothendieck [EMAIL PROTECTED] Nice. I timed it and its much faster than mine too. On 1/24/06, Barry Rowlingson [EMAIL PROTECTED] wrote: Laetitia Marisa wrote: Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids - c( ID1, ID2, ID2, ID3, ID3,ID3, ID5) I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) One-liner: table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 'table(ids)' computes the counts, then the subscripting [ids] looks it all up. Now try it on your 40,000-long vector! Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html