Re: [R] Ordering long vectors
On Mon, 9 Jun 2003, Prof Brian Ripley wrote: On Sun, 8 Jun 2003, Göran Broström wrote: On Sun, 8 Jun 2003, Thomas Lumley wrote: On Sat, 7 Jun 2003, [ISO-8859-1] Göran Broström wrote: I need to order a long vector of integers with rather few unique values. This is very slow: I think the culprit is src/main/sort.c: orderVector1 /* Shell sort isn't stable, but it proves to be somewhat faster to run a final insertion sort to re-order runs of ties when comparison is cheap. */ This also explains: aa-sample(rep(1:10,5)) system.time( order(aa, 1:length(aa))) [1] 3.67 0.01 3.68 0.00 0.00 system.time( order(aa)) ^C Timing stopped at: 49.33 0.01 49.34 0 0 which is perhaps the simplest work-around :). There is a simple and very much faster solution if you don't care about ordering of ties: system.time(ord4 - sort(x, method=quick, index.return = TRUE)$ix) Thanks, it is indeed much faster; about 7 times on my example. That is in the help, and I am very surprised you failed to find it. Well, I found it, but I was cooled off by the somewhat faster than Shellsort and poor performance in the worst case, together with the fact that I only needed the order and not the sorted vector. Maybe a more positive description is in order :-). Especially on the help page of 'order'. G. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Ordering long vectors
On Sat, 7 Jun 2003, Göran Broström wrote: I need to order a long vector of integers with rather few unique values. This is very slow: x - sample(rep(c(1:10), 5)) system.time(ord - order(x)) [1] 189.18 0.09 190.48 0.00 0.00 But with no ties y - sample(50) system.time(ord1 - order(y)) [1] 1.18 0.00 1.18 0.00 0.00 it is very fast! This gave me the following idea: Since I don't care about keeping the order within tied values, why not add some small disturbance to x, and indeed, unix.time(ord2 - order(x + runif(length(x), -0.1, 0.1))) [1] 1.66 0.00 1.66 0.00 0.00 An even better way is system.time(ord3 - order(x + seq(0, 0.9, length = length(x [1] 1.32 0.05 1.37 0.00 0.00 Faster, but more important; it keeps the original ordering for tied values. Thanks to James Holtman. Göran [...] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Ordering long vectors
On Sun, 8 Jun 2003, [ISO-8859-1] Göran Broström wrote: On Sat, 7 Jun 2003, Göran Broström wrote: I need to order a long vector of integers with rather few unique values. This is very slow: x - sample(rep(c(1:10), 5)) system.time(ord - order(x)) [1] 189.18 0.09 190.48 0.00 0.00 But with no ties y - sample(50) system.time(ord1 - order(y)) [1] 1.18 0.00 1.18 0.00 0.00 it is very fast! This gave me the following idea: Since I don't care about keeping the order within tied values, why not add some small disturbance to x, Another option: system.time(a-sapply(sort(unique(x)),function(i) which(x==i))) This turns out to be slightly slower than your method, but doesn't require that you know what the smallest difference between values is (and works for characters as well as numbers) -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Ordering long vectors
On Sat, 7 Jun 2003, [ISO-8859-1] Göran Broström wrote: I need to order a long vector of integers with rather few unique values. This is very slow: I think the culprit is src/main/sort.c: orderVector1 /* Shell sort isn't stable, but it proves to be somewhat faster to run a final insertion sort to re-order runs of ties when comparison is cheap. */ This also explains: aa-sample(rep(1:10,5)) system.time( order(aa, 1:length(aa))) [1] 3.67 0.01 3.68 0.00 0.00 system.time( order(aa)) ^C Timing stopped at: 49.33 0.01 49.34 0 0 which is perhaps the simplest work-around :). -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Ordering long vectors
On Sun, 8 Jun 2003, Thomas Lumley wrote: On Sat, 7 Jun 2003, [ISO-8859-1] Göran Broström wrote: I need to order a long vector of integers with rather few unique values. This is very slow: I think the culprit is src/main/sort.c: orderVector1 /* Shell sort isn't stable, but it proves to be somewhat faster to run a final insertion sort to re-order runs of ties when comparison is cheap. */ This also explains: aa-sample(rep(1:10,5)) system.time( order(aa, 1:length(aa))) [1] 3.67 0.01 3.68 0.00 0.00 system.time( order(aa)) ^C Timing stopped at: 49.33 0.01 49.34 0 0 which is perhaps the simplest work-around :). Thanks. This is really surprising: it is *much* faster to break ties by a second condition than not breaking them. I think it should be mentioned in the help. And could 'order/sort' be modified to check for 'tieness'? But I guess the the overhead would be too heavy. (if (length(unique(x)) alpha * length(x)) then else ) Göran __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help