Re: [Rd] max on numeric_version with long components
В Sat, 27 Apr 2024 13:56:58 -0500 Jonathan Keane пишет: > In devel: > > max(numeric_version(c("1.0.1.1", "1.0.3.1", > "1.0.2.1"))) > [1] ‘1.0.1.1’ > > max(numeric_version(c("1.0.1.1000", "1.0.3.1000", > "1.0.2.1000"))) > [1] ‘1.0.3.1000’ Thank you Jon for spotting this! This is an unintended consequence of https://bugs.r-project.org/show_bug.cgi?id=18697. The old behaviour of max() was to call which.max(xtfrm(x)), which first produced a permutation that sorted the entire .encode_numeric_version(x). The new behavioiur is to call which.max directly on .encode_numeric_version(x), which is faster (only O(length(x)) instead of a sort). What do the encoded version strings look like? x <- numeric_version(c( "1.0.1.1", "1.0.3.1", "1.0.2.1" )) # Ignore the attributes (e <- as.vector(.encode_numeric_version(x))) # [1] "101575360400" # [2] "103575360400" # [3] "102575360400" # order(), xtfrm(), sort() all agree that e[2] is the maximum: order(e) # [1] 1 3 2 xtfrm(e) # [1] 1 3 2 sort(e) # [1] "101575360400" # [2] "102575360400" # [3] "103575360400" # but not which.max: which.max(e) # [1] 1 This happens because which.max() converts its argument to double, which loses precision: (n <- as.numeric(e)) # [1] 1e+27 1e+27 1e+27 identical(n[1], n[2]) # [1] TRUE identical(n[3], n[2]) # [1] TRUE Will be curious to know if there is a clever way to keep both the O(N) complexity and the full arbitrary precision. -- Best regards, Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] max on numeric_version with long components
I've noticed something in R devel which seems a little off and not the behavior I see in 4.4.0 or earlier versions. With numeric_versions that have long (>8 digit) final components max and min return the first element and not the max or min: In devel: > max(numeric_version(c("1.0.1.1", "1.0.3.1", "1.0.2.1"))) [1] ‘1.0.1.1’ > max(numeric_version(c("1.0.1.1000", "1.0.3.1000", "1.0.2.1000"))) [1] ‘1.0.3.1000’ In 4.4.0: > max(numeric_version(c("1.0.1.1", "1.0.3.1", "1.0.2.1"))) [1] ‘1.0.3.1’ > max(numeric_version(c("1.0.1.1000", "1.0.3.1000", "1.0.2.1000"))) [1] ‘1.0.3.1000’ Is this expected? I've looked in NEWS to see but didn't see anything referencing this. Happy to submit an issue to bug tracker. -Jon [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] read.csv
I was horrified when I saw John Weinstein's article about Excel turning gene names into dates. Mainly because I had been complaining about that phenomenon for years, and it never remotely occurred to me that you could get a publication out of it. I eventually rectified the situation by publishing "Blasted Cell Line Names", describing how to match different researchers' recording of the names of cell lines, by applying techniques for DNA or protein sequence alignment. Best, Kevin On Tue, Apr 16, 2024, 4:51 PM Reed A. Cartwright wrote: > Gene names being misinterpreted by spreadsheet software (read.csv is > no different) is a classic issue in bioinformatics. It seems like > every practitioner ends up encountering this issue in due time. E.g. > > https://pubmed.ncbi.nlm.nih.gov/15214961/ > > https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7 > > https://www.nature.com/articles/d41586-021-02211-4 > > > https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates > > > On Tue, Apr 16, 2024 at 3:46 AM jing hua zhao > wrote: > > > > Dear R-developers, > > > > I came to a somewhat unexpected behaviour of read.csv() which is trivial > but worthwhile to note -- my data involves a protein named "1433E" but to > save space I drop the quote so it becomes, > > > > Gene,SNP,prot,log10p > > YWHAE,13:62129097_C_T,1433E,7.35 > > YWHAE,4:72617557_T_TA,1433E,7.73 > > > > Both read.cv() and readr::read_csv() consider prot(ein) name as > (possibly confused by scientific notation) numeric 1433 which only alerts > me when I tried to combine data, > > > > all_data <- data.frame() > > for (protein in proteins[1:7]) > > { > >cat(protein,":\n") > >f <- paste0(protein,".csv") > >if(file.exists(f)) > >{ > > p <- read.csv(f) > > print(p) > > if(nrow(p)>0) all_data <- bind_rows(all_data,p) > >} > > } > > > > proteins[1:7] > > [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z" > > > > dplyr::bind_rows() failed to work due to incompatible types nevertheless > rbind() went ahead without warnings. > > > > Best wishes, > > > > > > Jing Hua > > > > __ > > R-devel@r-project.org mailing list > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!YJzURlAK1O3rlvXvq9xl99aUaYL5iKm9gnN5RBi-WJtWa5IEtodN3vaN9pCvRTZA23dZyfrVD7X8nlYUk7S1AK893A$ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Should c(..., recursive = TRUE) and unlist(x, recursive = TRUE) recurse into expression vectors?
On 2024-04-27 10:53 am, Mikael Jagan wrote: Reading the body of function 'AnswerType' in bind.c, called from 'do_c' and 'do_unlist', I notice that EXPRSXP and VECSXP are handled identically in the recurse = TRUE case. A corollary is that c(recursive = TRUE) and unlist(recursive = TRUE) treat expression vectors like expression(a, b) as lists of symbols and calls. And since they treat symbols and calls as lists of length 1, we see: > x <- expression(a, b); y <- expression(c, d) > c(x, y) expression(a, b, c, d) > c(x, y, recursive = TRUE) [[1]] a [[2]] b [[3]] c [[4]] d My expectation based on the documentation in help("c") and help("unlist") is that those functions would recurse into lists and pairlists, but _not_ into expression vectors. recursive: logical. If 'recursive = TRUE', the function recursively descends through lists (and pairlists) combining all their elements into a vector. recursive: logical. Should unlisting be applied to list components of 'x'? My feeling is that either: (1) the behaviour should change, so that both calls to 'c' above give the result of type "expression". (2) the documentation should change to say that expression vectors are handled as lists in the recursive case. Option (2) won't break anything but is a bit awkward because it means that a type "higher" in the documented hierarchy (... < list < expression) is coerced to a lower type. Er - this last comment about Option (2) being awkward can be ignored. The expression vector is not itself coerced to a list. Rather, its non-vector components are treated as lists of length 1. And that's well-documented. If anything, Option (1) is awkward as it would treat two types of generic vectors, list and expression, asymmetrically ... I can submit a patch implementing Option (2) in a few days to allow for comments if any. Mikael I'll add here that, confusingly, help("expression") says: "an object of mode 'expression' is a list". I understand the author's intent (lists and expression vectors differ only in the 'type' field of the SEXP header) but I wonder if substituting "list" with "generic vector" there would cause less confusion ... ? Mikael __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Should c(..., recursive = TRUE) and unlist(x, recursive = TRUE) recurse into expression vectors?
Reading the body of function 'AnswerType' in bind.c, called from 'do_c' and 'do_unlist', I notice that EXPRSXP and VECSXP are handled identically in the recurse = TRUE case. A corollary is that c(recursive = TRUE) and unlist(recursive = TRUE) treat expression vectors like expression(a, b) as lists of symbols and calls. And since they treat symbols and calls as lists of length 1, we see: > x <- expression(a, b); y <- expression(c, d) > c(x, y) expression(a, b, c, d) > c(x, y, recursive = TRUE) [[1]] a [[2]] b [[3]] c [[4]] d My expectation based on the documentation in help("c") and help("unlist") is that those functions would recurse into lists and pairlists, but _not_ into expression vectors. recursive: logical. If 'recursive = TRUE', the function recursively descends through lists (and pairlists) combining all their elements into a vector. recursive: logical. Should unlisting be applied to list components of 'x'? My feeling is that either: (1) the behaviour should change, so that both calls to 'c' above give the result of type "expression". (2) the documentation should change to say that expression vectors are handled as lists in the recursive case. Option (2) won't break anything but is a bit awkward because it means that a type "higher" in the documented hierarchy (... < list < expression) is coerced to a lower type. I'll add here that, confusingly, help("expression") says: "an object of mode 'expression' is a list". I understand the author's intent (lists and expression vectors differ only in the 'type' field of the SEXP header) but I wonder if substituting "list" with "generic vector" there would cause less confusion ... ? Mikael __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [R-pkg-devel] Extending proj with proj.line3d methods and overloading the methods
27 апреля 2024 г. 00:49:47 GMT+03:00, Leo Mada via R-package-devel пишет: >Dear List-Members, > >I try to implement a proj.line3d method and to overload this method as follows: > >proj.line3d <- function(p, x, y, z, ...) > UseMethod("proj.line3d") > >proj.line3d.numeric = function(p, x, y, z, ...) { > # ... >} > >proj.line3d.matrix = function(p, x, y, z, ...) { > # ... >} >p = c(1,2,3) >line = matrix(c(0,5,2,3,1,4), 2) >proj.line3d(p, line) ># Error in UseMethod("proj.line3d") : ># no applicable method for 'proj.line3d' applied to an object of class >"c('double', 'numeric')" >methods(proj) ># [1] proj.aov* proj.aovlist* proj.default* proj.line3d ># [5] proj.line3d.matrix proj.line3d.numeric proj.lm In your NAMESPACE, you've registered methods for the generic function 'proj', classes 'line3d.matrix' and 'line3d.numeric', but above you are calling a different generic, 'proj.line3d', for which no methods are registered. For proj.line3d(, ) to work, you'll have to register the methods for the proj.line3d generic. If you need a visible connection to the proj() generic, you can try registering a method on the 'proj' generic, class 'line3d' *and* creating a class 'line3d' that would wrap your vectors and matrices: proj(line3d(p), line) -> call lands in proj.line3d -> maybe additional dispatch on the remaining classes of 'p'? This seems to work, but I haven't tested it extensively: > proj.line3d <- \(x, ...) UseMethod('proj.line3d') > proj.line3d.numeric <- \(x, ...) { message('proj.line3d.numeric'); x } > line3d <- \(x) structure(x, class = c('line3d', class(x))) > proj(line3d(pi)) proj.line3d.numeric [1] 3.141593 attr(,"class") [1] "line3d" "numeric" -- Best regards, Ivan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel