Re: [R] Ordering long vectors

2003-06-09 Thread Göran Broström
On Mon, 9 Jun 2003, Prof Brian Ripley wrote:

 On Sun, 8 Jun 2003, Göran Broström wrote:
 
  On Sun, 8 Jun 2003, Thomas Lumley wrote:
  
   On Sat, 7 Jun 2003, [ISO-8859-1] Göran Broström wrote:
   
   
I need to order a long vector of integers with rather few unique values.
This is very slow:
   
   
   I think the culprit is
   
   src/main/sort.c: orderVector1
   
   /* Shell sort isn't stable, but it proves to be somewhat faster
  to run a final insertion sort to re-order runs of ties when
  comparison is cheap.
   */
   
   This also explains:
   
aa-sample(rep(1:10,5))
system.time( order(aa, 1:length(aa)))
   [1] 3.67 0.01 3.68 0.00 0.00
system.time( order(aa))
   ^C
   Timing stopped at: 49.33 0.01 49.34 0 0
   
   which is perhaps the simplest work-around :).
  
 
 There is a simple and very much faster solution if you don't care about 
 ordering of ties:
 
 system.time(ord4 - sort(x, method=quick, index.return = TRUE)$ix)

Thanks, it is indeed much faster; about 7 times on my example. 

 That is in the help, and I am very surprised you failed to find it.

Well, I found it, but I was cooled off by the somewhat faster than 
Shellsort and poor performance in the worst case, together with the 
fact that I only needed the order and not the sorted vector. Maybe a more 
positive description is in order :-). Especially on the help page of 'order'.

G.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Ordering long vectors

2003-06-08 Thread Göran Broström
On Sat, 7 Jun 2003, Göran Broström wrote:

 
 I need to order a long vector of integers with rather few unique values.
 This is very slow:
 
  x - sample(rep(c(1:10), 5))
  system.time(ord - order(x))
 [1] 189.18   0.09 190.48   0.00   0.00
 
 But with no ties
 
  y - sample(50)
  system.time(ord1 - order(y))
 [1] 1.18 0.00 1.18 0.00 0.00
 
 it is very fast!
 This gave me the following idea: Since I don't care about keeping the 
 order within tied values, why not add some small disturbance to  x,
 and indeed,
 
  unix.time(ord2 - order(x + runif(length(x), -0.1, 0.1)))
 [1] 1.66 0.00 1.66 0.00 0.00

An even better way is 

 system.time(ord3 - order(x + seq(0, 0.9, length = length(x 
[1] 1.32 0.05 1.37 0.00 0.00

Faster, but more important; it keeps the original ordering for tied 
values. Thanks to James Holtman.

Göran
[...]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Ordering long vectors

2003-06-08 Thread Thomas Lumley

On Sun, 8 Jun 2003, [ISO-8859-1] Göran Broström wrote:

 On Sat, 7 Jun 2003, Göran Broström wrote:

 
  I need to order a long vector of integers with rather few unique values.
  This is very slow:
 
   x - sample(rep(c(1:10), 5))
   system.time(ord - order(x))
  [1] 189.18   0.09 190.48   0.00   0.00
 
  But with no ties
 
   y - sample(50)
   system.time(ord1 - order(y))
  [1] 1.18 0.00 1.18 0.00 0.00
 
  it is very fast!
  This gave me the following idea: Since I don't care about keeping the
  order within tied values, why not add some small disturbance to  x,

Another option:

 system.time(a-sapply(sort(unique(x)),function(i) which(x==i)))

This turns out to be slightly slower than your method, but doesn't require
that you know what the smallest difference between values is (and works
for characters as well as numbers)

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Ordering long vectors

2003-06-08 Thread Thomas Lumley
On Sat, 7 Jun 2003, [ISO-8859-1] Göran Broström wrote:


 I need to order a long vector of integers with rather few unique values.
 This is very slow:


I think the culprit is

src/main/sort.c: orderVector1

/* Shell sort isn't stable, but it proves to be somewhat faster
   to run a final insertion sort to re-order runs of ties when
   comparison is cheap.
*/

This also explains:

 aa-sample(rep(1:10,5))
 system.time( order(aa, 1:length(aa)))
[1] 3.67 0.01 3.68 0.00 0.00
 system.time( order(aa))
^C
Timing stopped at: 49.33 0.01 49.34 0 0

which is perhaps the simplest work-around :).


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Ordering long vectors

2003-06-08 Thread Göran Broström
On Sun, 8 Jun 2003, Thomas Lumley wrote:

 On Sat, 7 Jun 2003, [ISO-8859-1] Göran Broström wrote:
 
 
  I need to order a long vector of integers with rather few unique values.
  This is very slow:
 
 
 I think the culprit is
 
 src/main/sort.c: orderVector1
 
 /* Shell sort isn't stable, but it proves to be somewhat faster
to run a final insertion sort to re-order runs of ties when
comparison is cheap.
 */
 
 This also explains:
 
  aa-sample(rep(1:10,5))
  system.time( order(aa, 1:length(aa)))
 [1] 3.67 0.01 3.68 0.00 0.00
  system.time( order(aa))
 ^C
 Timing stopped at: 49.33 0.01 49.34 0 0
 
 which is perhaps the simplest work-around :).

Thanks. This is really surprising: it is *much* faster to break ties by a 
second condition than not breaking them. I think it should be mentioned 
in the help. And could 'order/sort' be modified to check for 'tieness'? 
But I guess the the overhead would be too heavy.

(if (length(unique(x))  alpha * length(x)) then  else )

Göran

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help