Hello! data.table diverged over 7 years ago from this code when Matt
rewrote forder.

> This code was originally contributed by data.table. I believe Michael
> Lawrence handled the integration at the time. There were a number of
> issues like this early on that were resolved on the R side and I
> believe contributed back to data.table. If you have the energy it
> might be good to compare the two now and see if there are things that
> should be ported from one to the other.

The original code had the following note which is also mentioned
multiple times but not explicitly part of radixsort.c

>> Note on challenges implementing `nalast=NA` so that things are under the 
>> same perspective when looked at later. To go to the point, there are four 
>> challenges in implementing `nalast=NA` with the current form of forder.c 
>> (which is excellent!):
>>
>>   1) Handling the i|d|csorted routines for nalast=NA:
>>      The current implementation of nalast=NA in *sorted routine is that if 
>> all values are NAs, then it returns -2, a different value, so that all 
>> values of o/osub can be replacedw with 0. If any values are NA, then we 
>> assume that this vector is unsorted and return 0 to be taken care of by the 
>> corresponding sort routine. Now, this may very well be questioned/improved. 
>> For ex: we could detect if the vector is sorted and also if the first value 
>> is NA, and if so, get the group size of these NAs and replace that many 
>> indices with 0. This is not yet done, but could very well be implemented 
>> later. If there are no NAs in the vector, then things are the same.
>>
>>   2) Handling cases where n==1 and n==2:
>>      The current sorting routines don't take care of n<=2, and rightly so, 
>> as they can be dealt with directly. But it's not the case for nalast=NA 
>> because, when n==2 and all are NAs, point 1 will handle it. But when n==1 
>> and it is NA and the column is not the first column to order upon, with the 
>> current implementation, "thisgrpn" is checked for "== 1" and if so, just 
>> pushes that group size and continues.. But we've to replace it with 0 in our 
>> case. So, nalast==0 is checked for inside that if statement and if so, if 
>> the value is NA is checked for all types and if so, replaced with 0. For 
>> n==2 and "any" is NA, we've to introduce a special case for this in each 
>> sort routine to take care of and not result in an error (which tells this 
>> should be already taken care of in *sorted).
>>
>>   3) o[0] and newo[0] = -1:
>>      Since nalast already populates NAs with 0 values in o, we can't use the 
>> old value of 0 to check if it's already populated or not. Therefore this has 
>> been changed to -1.
>>
>>   4) default cases are untouched as much as possible:
>>      In order to not compromise in speed for default case (`setkey`), 
>> iinsert and dinsert are not touched at all, which means, we've to make sure 
>> that they are not run when n<200 and nalast==0, as they don't replace 
>> o[x=NA] with 0. And, 'icount', 'iradix', 'dradix' are modified to take care 
>> of nalast=NA, to my knowledge.

However, case 2) does not cater for Ivans identified corner case
example of order(NA_character_, "c", method = "radix", na.last = NA),
which is not tied to n==1 or n==2 but rather to thisgrpn==1 and
x=c(NA_character_, another_string).

Hence, the missing case is indeed to cater for o[i] = 0 with Ivans
1-liner or more explicitly via

--- src/main/radixsort.c
@@ -1766,7 +1766,12 @@
                    // this edge case had to be taken care of
                    // here.. (see the bottom of this file for
                    // more explanation)
-                   switch (TYPEOF(x)) {
+            if (o[i] == 0) {
+                       isSorted = false;
+                       i++;
+                       push(1);
+                       continue;
+                   }               switch (TYPEOF(x)) {
                    case INTSXP:
                        if (INTEGER(x)[o[i] - 1] == NA_INTEGER) {
                            isSorted = false;

Best, Ben

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to