Re: [R] Mixed sorting/ordering of strings acknowledging roman numerals?
Thank you David - it took me awhile to get back to this and dig into it. It's clever to imitate gtools::mixedorder() as far as possible. A few comments: 1. It took me a while to understand why you picked 3899 in your Roman-to-integer table; it's because roman(x) is NA for x 3899. (BTW, in 'utils', there's utils:::.roman2numeric() which could be utilized, but it's currently internal.) 2. I think you forgot D=500 and M=1000. 3. There was a typo in your code; I think you meant rank.roman instead of rank.numeric in one place. 4. The idea behind nonnumeric() is to identify non-numeric substrings by is.na(as.numeric()). Unfortunately, for romans that does not work. Instead, we need to use is.na(numeric(x)) here, i.e. nonnumeric - function(x) { suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), NA)) } Actually, gtools::mixedorder() could use the same. 5. I undid your .numeric to .roman to minimize any differences to gtools::mixedorder(). With the above fixes, we now have: mixedorderRoman - function (x) { if (length(x) 1) return(NULL) else if (length(x) == 1) return(1) if (is.numeric(x)) return(order(x)) delim = \\$\\@\\$ # NOTE: Note that as.roman(x) is NA for x 3899 romanC - as.character( as.roman(1:3899) ) numeric - function(x) { suppressWarnings(match(x, romanC)) } nonnumeric - function(x) { suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), NA)) } x - as.character(x) which.nas - which(is.na(x)) which.blanks - which(x == ) if (length(which.blanks) 0) x[which.blanks] - -Inf if (length(which.nas) 0) x[which.nas] - Inf delimited - gsub(([IVXCLM]+), paste(delim, \\1, delim, sep = ), x) step1 - strsplit(delimited, delim) step1 - lapply(step1, function(x) x[x ]) step1.numeric - lapply(step1, numeric) step1.character - lapply(step1, nonnumeric) maxelem - max(sapply(step1, length)) step1.numeric.t - lapply(1:maxelem, function(i) sapply(step1.numeric, function(x) x[i])) step1.character.t - lapply(1:maxelem, function(i) sapply(step1.character, function(x) x[i])) rank.numeric - sapply(step1.numeric.t, rank) rank.character - sapply(step1.character.t, function(x) as.numeric(factor(x))) rank.numeric[!is.na(rank.character)] - 0 rank.character - t(t(rank.character) + apply(matrix(rank.numeric), 2, max, na.rm = TRUE)) rank.overall - ifelse(is.na(rank.character), rank.numeric, rank.character) order.frame - as.data.frame(rank.overall) if (length(which.nas) 0) order.frame[which.nas, ] - Inf retval - do.call(order, order.frame) return(retval) } The difference to gtools::mixedorder() is minimal: romanC - as.character( as.roman(1:3899) ) 21c11 suppressWarnings(match(x, romanC)) --- suppressWarnings(as.numeric(x)) 24c14 suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), --- suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), 34c24 delimited - gsub(([IVXCLDM]+), --- delimited - gsub(([+-]{0,1}[0-9]+\\.{0,1}[0-9]*([eE][\\+\\-]{0,1}[0-9]+\\.{0,1}[0-9]*){0,1}), 59,62d48 This difference is so small that the above could now be an option to mixedorder() with minimal overhead added, e.g. mixedorder(y, type=c(decimal, roman)). One could even imagine adding support for binary, octal and hexadecimal (not done). Greg (maintainer of gtools; cc:ed), is this something you would consider adding to gtools? I've modified the gtools source code available on CRAN (that's the only source I found), added package tests, updated the Rd and verified it passes R CMD check. If interested, please find the updates at: https://github.com/HenrikBengtsson/gtools/compare/cran:master...master Thanks Henrik On Tue, Aug 26, 2014 at 6:46 PM, David Winsemius dwinsem...@comcast.net wrote: On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote: Hi, does anyone know of an implementation/function that sorts strings that *contain* roman numerals (I, II, III, IV, V, ...) which are treated as numbers. In 'gtools' there is mixedsort() which does this for strings that contains (decimal) numbers. I'm looking for a mixedsortroman() function that does the same but with roman numbers, e.g. It's pretty easy to sort something you know to be congruent with the existing roman class: romanC - as.character( as.roman(1:3899) ) match(c(I, II, III,X,V), romanC) #[1] 1 2 3 10 5 But I guess you already know that, so you want a regex approach to parsing. Looking at the path taken by Warnes, it would involve doing something like his regex based insertion of a delimiter for Roman numeral but simpler because he needed to deal with decimal points and signs and exponent notation, none of which you appear to need. If you only need to consider character and Roman, then this hack of Warnes tools succeeds:
Re: [R] Mixed sorting/ordering of strings acknowledging roman numerals?
On Sep 7, 2014, at 7:40 PM, Henrik Bengtsson wrote: Thank you David - it took me awhile to get back to this and dig into it. It's clever to imitate gtools::mixedorder() as far as possible. A few comments: 1. It took me a while to understand why you picked 3899 in your Roman-to-integer table; it's because roman(x) is NA for x 3899. (BTW, in 'utils', there's utils:::.roman2numeric() which could be utilized, but it's currently internal.) Yes, that was the reason. I didn't think I needed a Roman-to-numeric function because I discovered the roman numbers were actually simple numeric vectors to which a class had been assigned and that it was the class-facilities that did all the work. The standard Ops functions were just acting on numeric vectors. If one doesn't take care, their romanity can be lost: R - as.roman(10^(0:4)) R [1] IXCMNA unclass(R) [1]1 10 100 1000 NA sum(R, na.rm=TRUE) [1] as.roman(sum(R, na.rm=TRUE)) [1] MCXI 2. I think you forgot D=500 and M=1000. Quite possible. I suspect Greg will have corrected the omission, but if not, this will be helpful to him. 3. There was a typo in your code; I think you meant rank.roman instead of rank.numeric in one place. I understood Greg's intention to wrap this into the mixedorder and mixed sort duo. Best; David. 4. The idea behind nonnumeric() is to identify non-numeric substrings by is.na(as.numeric()). Unfortunately, for romans that does not work. Instead, we need to use is.na(numeric(x)) here, i.e. nonnumeric - function(x) { suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), NA)) } Actually, gtools::mixedorder() could use the same. 5. I undid your .numeric to .roman to minimize any differences to gtools::mixedorder(). With the above fixes, we now have: mixedorderRoman - function (x) { if (length(x) 1) return(NULL) else if (length(x) == 1) return(1) if (is.numeric(x)) return(order(x)) delim = \\$\\@\\$ # NOTE: Note that as.roman(x) is NA for x 3899 romanC - as.character( as.roman(1:3899) ) numeric - function(x) { suppressWarnings(match(x, romanC)) } nonnumeric - function(x) { suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), NA)) } x - as.character(x) which.nas - which(is.na(x)) which.blanks - which(x == ) if (length(which.blanks) 0) x[which.blanks] - -Inf if (length(which.nas) 0) x[which.nas] - Inf delimited - gsub(([IVXCLM]+), paste(delim, \\1, delim, sep = ), x) step1 - strsplit(delimited, delim) step1 - lapply(step1, function(x) x[x ]) step1.numeric - lapply(step1, numeric) step1.character - lapply(step1, nonnumeric) maxelem - max(sapply(step1, length)) step1.numeric.t - lapply(1:maxelem, function(i) sapply(step1.numeric, function(x) x[i])) step1.character.t - lapply(1:maxelem, function(i) sapply(step1.character, function(x) x[i])) rank.numeric - sapply(step1.numeric.t, rank) rank.character - sapply(step1.character.t, function(x) as.numeric(factor(x))) rank.numeric[!is.na(rank.character)] - 0 rank.character - t(t(rank.character) + apply(matrix(rank.numeric), 2, max, na.rm = TRUE)) rank.overall - ifelse(is.na(rank.character), rank.numeric, rank.character) order.frame - as.data.frame(rank.overall) if (length(which.nas) 0) order.frame[which.nas, ] - Inf retval - do.call(order, order.frame) return(retval) } The difference to gtools::mixedorder() is minimal: romanC - as.character( as.roman(1:3899) ) 21c11 suppressWarnings(match(x, romanC)) --- suppressWarnings(as.numeric(x)) 24c14 suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), --- suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), 34c24 delimited - gsub(([IVXCLDM]+), --- delimited - gsub(([+-]{0,1}[0-9]+\\.{0,1}[0-9]*([eE][\\+\\-]{0,1}[0-9]+\\.{0,1}[0-9]*){0,1}), 59,62d48 This difference is so small that the above could now be an option to mixedorder() with minimal overhead added, e.g. mixedorder(y, type=c(decimal, roman)). One could even imagine adding support for binary, octal and hexadecimal (not done). Greg (maintainer of gtools; cc:ed), is this something you would consider adding to gtools? I've modified the gtools source code available on CRAN (that's the only source I found), added package tests, updated the Rd and verified it passes R CMD check. If interested, please find the updates at: https://github.com/HenrikBengtsson/gtools/compare/cran:master...master Thanks Henrik On Tue, Aug 26, 2014 at 6:46 PM, David Winsemius dwinsem...@comcast.net wrote: On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote: Hi, does anyone know of an implementation/function that sorts strings that *contain* roman numerals (I, II, III, IV, V,
[R] Mixed sorting/ordering of strings acknowledging roman numerals?
Hi, does anyone know of an implementation/function that sorts strings that *contain* roman numerals (I, II, III, IV, V, ...) which are treated as numbers. In 'gtools' there is mixedsort() which does this for strings that contains (decimal) numbers. I'm looking for a mixedsortroman() function that does the same but with roman numbers, e.g. ## DECIMAL NUMBERS x - sprintf(chr %d, 12:1) x [1] chr 12 chr 11 chr 10 chr 9 chr 8 [6] chr 7 chr 6 chr 5 chr 4 chr 3 [11] chr 2 chr 1 sort(x) [1] chr 1 chr 10 chr 11 chr 12 chr 2 [6] chr 3 chr 4 chr 5 chr 6 chr 7 [11] chr 8 chr 9 gtools::mixedsort(x) [1] chr 1 chr 2 chr 3 chr 4 chr 5 [6] chr 6 chr 7 chr 8 chr 9 chr 10 [11] chr 11 chr 12 ## ROMAN NUMBERS y - sprintf(chr %s, as.roman(12:1)) y [1] chr XII chr XI chr Xchr IX [5] chr VIII chr VII chr VI chr V [9] chr IV chr III chr II chr I sort(y) [1] chr Ichr II chr III chr IV [5] chr IX chr Vchr VI chr VII [9] chr VIII chr Xchr XI chr XII mixedsortroman(y) [1] chr Ichr II chr III chr IV [5] chr Vchr VI chr VII chr VIII [9] chr IX chr Xchr XI chr XII The latter is what I'm looking for. Before hacking together something myself (e.g. identify roman numerals substrings, translate them to decimal numbers, use gtools::mixedsort() to sort them and then translate them back to roman numbers), I'd like to hear if someone already has this implemented/know of a package that does this. Thanks, Henrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed sorting/ordering of strings acknowledging roman numerals?
On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote: Hi, does anyone know of an implementation/function that sorts strings that *contain* roman numerals (I, II, III, IV, V, ...) which are treated as numbers. In 'gtools' there is mixedsort() which does this for strings that contains (decimal) numbers. I'm looking for a mixedsortroman() function that does the same but with roman numbers, e.g. It's pretty easy to sort something you know to be congruent with the existing roman class: romanC - as.character( as.roman(1:3899) ) match(c(I, II, III,X,V), romanC) #[1] 1 2 3 10 5 But I guess you already know that, so you want a regex approach to parsing. Looking at the path taken by Warnes, it would involve doing something like his regex based insertion of a delimiter for Roman numeral but simpler because he needed to deal with decimal points and signs and exponent notation, none of which you appear to need. If you only need to consider character and Roman, then this hack of Warnes tools succeeds: mixedorderRoman - function (x) { if (length(x) 1) return(NULL) else if (length(x) == 1) return(1) if (is.numeric(x)) return(order(x)) delim = \\$\\@\\$ roman - function(x) { suppressWarnings(match(x, romanC)) } nonnumeric - function(x) { suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), NA)) } x - as.character(x) which.nas - which(is.na(x)) which.blanks - which(x == ) if (length(which.blanks) 0) x[which.blanks] - -Inf if (length(which.nas) 0) x[which.nas] - Inf delimited - gsub(([IVXCL]+), paste(delim, \\1, delim, sep = ), x) step1 - strsplit(delimited, delim) step1 - lapply(step1, function(x) x[x ]) step1.roman - lapply(step1, roman) step1.character - lapply(step1, nonnumeric) maxelem - max(sapply(step1, length)) step1.roman.t - lapply(1:maxelem, function(i) sapply(step1.roman, function(x) x[i])) step1.character.t - lapply(1:maxelem, function(i) sapply(step1.character, function(x) x[i])) rank.roman - sapply(step1.roman.t, rank) rank.character - sapply(step1.character.t, function(x) as.numeric(factor(x))) rank.roman[!is.na(rank.character)] - 0 rank.character - t(t(rank.character) + apply(matrix(rank.roman), 2, max, na.rm = TRUE)) rank.overall - ifelse(is.na(rank.character), rank.numeric, rank.character) order.frame - as.data.frame(rank.overall) if (length(which.nas) 0) order.frame[which.nas, ] - Inf retval - do.call(order, order.frame) return(retval) } y[mixedorderRoman(y)] [1] chr Ichr II chr III chr IV chr IX [6] chr Vchr VI chr VII chr VIII chr X [11] chr XI chr XII -- David. ## DECIMAL NUMBERS x - sprintf(chr %d, 12:1) x [1] chr 12 chr 11 chr 10 chr 9 chr 8 [6] chr 7 chr 6 chr 5 chr 4 chr 3 [11] chr 2 chr 1 sort(x) [1] chr 1 chr 10 chr 11 chr 12 chr 2 [6] chr 3 chr 4 chr 5 chr 6 chr 7 [11] chr 8 chr 9 gtools::mixedsort(x) [1] chr 1 chr 2 chr 3 chr 4 chr 5 [6] chr 6 chr 7 chr 8 chr 9 chr 10 [11] chr 11 chr 12 ## ROMAN NUMBERS y - sprintf(chr %s, as.roman(12:1)) y [1] chr XII chr XI chr Xchr IX [5] chr VIII chr VII chr VI chr V [9] chr IV chr III chr II chr I sort(y) [1] chr Ichr II chr III chr IV [5] chr IX chr Vchr VI chr VII [9] chr VIII chr Xchr XI chr XII mixedsortroman(y) [1] chr Ichr II chr III chr IV [5] chr Vchr VI chr VII chr VIII [9] chr IX chr Xchr XI chr XII The latter is what I'm looking for. Before hacking together something myself (e.g. identify roman numerals substrings, translate them to decimal numbers, use gtools::mixedsort() to sort them and then translate them back to roman numbers), I'd like to hear if someone already has this implemented/know of a package that does this. Thanks, Henrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.