Tim,

My comment were not directed at you but at any conception that R should honor 
any conditions it does not pledge to honor. You pointed out a clear example 
that it does not get identical results when you use two things that both look 
like sorting algorithms. Yes, Stefan asked the original question. You provided 
some additional ideas and observations.  

As I pointed out, there are many sorting algorithms commonly in use and they 
generally do not care about order in the sense expected by some when multiple 
items match. Neither do most algorithms of the sort. We discussed earlier the 
possibility of identifying if a data item was either all-numeric or all 
alphabetic or OTHER as in having both or neither or perhaps other characters 
not wanted like a comma. If the algorithm were to return a list of three lists, 
would you necessarily expect them in the right order of the way they were 
encountered, or sorted forward (or in reverse) alphabetically or using the 
current locale or numerically, or even randomly? 

I can well imagine a parallel algorithm that hands off one or a subset of data 
to a thread and also other subsets to other threads and then monitors some kind 
of communication where the threads send reply messages about one item at a 
time. The results may be interleaved many ways. What the result would be should 
be treated as an unordered thread and if you want it ordered, do it yourself 
after you get it.

I took a look at the R code for order() and the manual page. I note you can 
call it with "method=" followed by one of "radix", "shell" and "quick" but it 
is a bit complex to read through and the major work is done in an internal 
routine probably written in C or something. But this is written about it:

"The sort used is stable (except for method = "quick"), so any unresolved ties 
will be left in their original ordering."

That suggests that the default and some other cases will return it in the order 
specified when there is a tie but the method called "quick" may have a 
different order sometimes. However, my version did not allow me to use quick! 
If you look at the manual page for sort() it too allows you to specify a method 
but note amusingly that for the default method it calls order() for part of the 
job for some kinds of objects.

I made a suggestion to anyone wanting a result and simplify it here. Instead of 
calling order(Dat1) alone, add a second argument like order(Dat1, second) that 
can be 1:N where N is the length of Dat1, or the reverse, or anything you want 
to use that will break a tie. Here is some code showing  that:

# Initialize sample data where .1 is duplicated.
> Dat1 <- c(0.6, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2)

# Default output from order:
> order(Dat1)
[1] 5 6 4 7 3 2 1

# Call order with a second vector of c(1,2,3,4,5,6,7)
> order(Dat1, seq_along(Dat1))
[1] 5 6 4 7 3 2 1

# Call order with a second vector of c(7,6,5,4,3,2,1) meaning reversed
> order(Dat1, length(Dat1):1)
[1] 6 5 7 4 3 2 1

I suspect one of these gives the requester the control to get what they want.

Again, I end by saying no insult intended. Some things in computer science can 
provide guarantees of working exactly a certain way and others do not. But 
often you can find ways, including some more complex and annoying ones. I am 
used to doing many things in the tidyverse where I would use the arrange() verb 
on a data.frame naturally to do a sort on multiple columns BUT this forum has 
people who discourage the tidyverse. 

library(tidyverse)

> arrange(data.frame("először"=Dat1, "másodikat"=7:1))
  eloször másodikat
1     0.6         7
2     0.5         6
3     0.3         5
4     0.2         4
5     0.1         3
6     0.1         2
7     0.2         1
> arrange(data.frame("először"=Dat1, "másodikat"=7:1))$`először`
[1] 0.6 0.5 0.3 0.2 0.1 0.1 0.2
> arrange(data.frame("először"=Dat1, "másodikat"=7:1))[1]
  eloször
1     0.6
2     0.5
3     0.3
4     0.2
5     0.1
6     0.1
7     0.2
> arrange(data.frame("először"=Dat1, "másodikat"=7:1))[[1]]
[1] 0.6 0.5 0.3 0.2 0.1 0.1 0.2

Of course for this simple a need, definitely overkill and I would stick with 
base R. LOL!




-----Original Message-----
From: Ebert,Timothy Aaron <teb...@ufl.edu>
To: Avi Gross <avigr...@verizon.net>; maech...@stat.math.ethz.ch 
<maech...@stat.math.ethz.ch>; stefan.b.fl...@gmail.com 
<stefan.b.fl...@gmail.com>
Cc: r-help@r-project.org <r-help@r-project.org>
Sent: Mon, Jan 31, 2022 3:36 pm
Subject: RE: [R]  [External] Weird behaviour of order() when having multiple 
ties



Dear Avi,
 
I made no comment or question or statement about EQUAL items. That was 
elsewhere in posts by you and others in this thread.
My intent was only to show a simple example of Stefan’s post wherein the 
outcome of sort() and order() are different. Not why, or how, just they are 
different. If you type order() when you mean sort() things will not work as 
expected, as shown below.
Yes both right due to communicative law of multiplication. I don’t see the 
point.
 
  
 
I am suggesting nothing! I am simply observing the behavior of a function. If 
it satisfies my need then great. If not, I need to write my own or find a 
different function. It does help if I clearly understand the output of the 
function, and sometimes the documentation is not as helpful as hoped for given 
the range of readers from novice to expert. Here is the data in a different 
format where order=1 means it is the first observation in the data.
 
Dat1
 
Order
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
Data
 
0.6
 
0.5
 
0.3
 
0.2
 
0.1
 
0.1
 
0.2
 
  
 
print(order(Dat1))  returns [1] 5 6 4 7 3 2 1 
 
  
 
So I sort the raw data by “Data” so that the values of order remain with each 
observed data point.
 
Original
 
Order
 
5
 
6
 
4
 
7
 
3
 
2
 
1
 
Data
 
0.1
 
0.1
 
0.2
 
0.2
 
0.3
 
0.5
 
0.6
 
  
 
Now reading off the values in row named “Order” I get the result of 
print(order(Dat1)).
 
  
 
Order does not return the sorted data, it returns the location of the sorted 
value in the original dataset. At least that is what it looks like. I assume 
that this is what the documentation means by “ ’order’ returns a permutation 
which rearranges its first argument into ascending or descending order” but I 
am afraid that I still do not get that connection from the text 
provided:https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/order.
 
  
 
As far as I can tell there is no error or inconsistency.
 
  
 
I am not quite skilled enough in R to take DatSort  <- order(Dat1) and then 
return the sorted data, but I need to move to other tasks.
 
  
 
I am really sorry that my post makes you mad.
 
  
 
Regards,
 
Tim
 
  
 

From: Avi Gross <avigr...@verizon.net> 
Sent: Monday, January 31, 2022 12:33 PM
To: Ebert,Timothy Aaron <teb...@ufl.edu>; maech...@stat.math.ethz.ch; 
stefan.b.fl...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties
 

  
 
[External Email]
 

Tim,
 
  
 

I thought I saw someone tell you that the order in which EQUAL items are 
presented is not deterministic in that any order, whether given by order() or 
sort() or anything else, is VALID.
 

  
 

Unless you supply additional constraints such as a second key to sort by, then 
the order becomes deterministic up to the point where both the keys are the 
same.
 

  
 

Here is a dumb suggestion. Place your Dat1 vector in a data.frame alongside 
another vector of 1:length(Dat1) and use some method that orders by Dat1 and 
then by the second vector, ascending. You have now forced it to take the first 
of a matching set before any others. 
 

  
 

Let me try another. Say I give you a problem that might have multiple answer 
such as a quadratic equation with solutions of 2 and 10. You ask me for AN 
answer and I say 10. Am I wrong? You ask me for all answers and I say [10,2] 
and someone else says [2,10] and you wonder which of us is right. Well we are 
both right. The proper way to test is not to ask if the lists or tuples or 
anything ordered is equivalent but to use something like a set and show that 
they are equivalent or something like one is a subset of the other both ways. 
 

  
 

Back to your topic, you are suggesting two independent developers should come 
up with algorithms to solve similar but different tasks the same way. Do you 
have any idea how many methods there are for sorting things? This site lists 
eleven and I am sure there are many more.
 

  
 

https://www.javatpoint.com/sorting-algorithms
 

  
 

What is considered more important is choosing an algorithm that works well on 
the kinds of data and some of those methods do not keep the data in the same 
order and produce results in the same order.
 

  
 

What you are pointing out is not an error but an inconsistency. The message to 
you is to not depend on a UNIQUE solution.
 




 
-----Original Message-----
From: Ebert,Timothy Aaron <teb...@ufl.edu>
To: Martin Maechler <maech...@stat.math.ethz.ch>; Stefan Fleck 
<stefan.b.fl...@gmail.com>
Cc: r-help@r-project.org <r-help@r-project.org>
Sent: Mon, Jan 31, 2022 10:07 am
Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties
 
Dat1 <- c(0.6, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2)
print(order(Dat1))
print(sort(Dat1))

Compare output



-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Martin Maechler
Sent: Monday, January 31, 2022 9:04 AM
To: Stefan Fleck <stefan.b.fl...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties

[External Email]

>>>>> Stefan Fleck
>>>>>    on Sun, 30 Jan 2022 21:07:19 +0100 writes:

    > it's not about the sort order of the ties, shouldn't all the 1s in
    > order(c(2,3,4,1,1,1,1,1)) come before 2,3,4? because that's not what
    > happening

aaah.. now we are getting somewhere:
It looks you have always confused order() with sort() ...
have you ?


    > On Sun, Jan 30, 2022 at 9:00 PM Richard M. Heiberger <r...@temple.edu> 
wrote:

    >> when there are ties it doesn't matter which is first.
    >> in a situation where it does matter, you will need a tiebreaker column.
    >> ------------------------------
    >> *From:* R-help <r-help-boun...@r-project.org> on behalf of Stefan Fleck <
    >> stefan.b.fl...@gmail.com>
    >> *Sent:* Sunday, January 30, 2022 4:16:44 AM
    >> *To:* r-help@r-project.org <r-help@r-project.org>
    >> *Subject:* [External] [R] Weird behaviour of order() when having multiple
    >> ties
    >>
    >> I am experiencing a weird behavior of `order()` for numeric vectors. I
    >> tested on 3.6.2 and 4.1.2 for windows and R 4.0.2 on ubuntu. Can anyone
    >> confirm?
    >>
    >> order(
    >> c(
    >> 0.6,
    >> 0.5,
    >> 0.3,
    >> 0.2,
    >> 0.1,
    >> 0.1
    >> )
    >> )
    >> ## Result [should be in order]
    >> [1] 5 6 4 3 2 1
    >>
    >> The sort order is obviously wrong. This only occurs if i have multiple
    >> ties. The problem does _not_ occur for decreasing = TRUE.
    > 

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to