Re: [Rd] max on numeric_version with long components

2024-04-27 Thread Ivan Krylov via R-devel
В Sat, 27 Apr 2024 13:56:58 -0500
Jonathan Keane  пишет:

> In devel:
> > max(numeric_version(c("1.0.1.1", "1.0.3.1",  
> "1.0.2.1")))
> [1] ‘1.0.1.1’
> > max(numeric_version(c("1.0.1.1000", "1.0.3.1000",  
> "1.0.2.1000")))
> [1] ‘1.0.3.1000’

Thank you Jon for spotting this!

This is an unintended consequence of
https://bugs.r-project.org/show_bug.cgi?id=18697.

The old behaviour of max() was to call
which.max(xtfrm(x)), which first produced a permutation that sorted the
entire .encode_numeric_version(x). The new behavioiur is to call
which.max directly on .encode_numeric_version(x), which is faster (only
O(length(x)) instead of a sort).

What do the encoded version strings look like?

x <- numeric_version(c(
 "1.0.1.1", "1.0.3.1", "1.0.2.1"
))
# Ignore the attributes
(e <- as.vector(.encode_numeric_version(x)))
# [1] "101575360400"
# [2] "103575360400"
# [3] "102575360400"

# order(), xtfrm(), sort() all agree that e[2] is the maximum:
order(e)
# [1] 1 3 2
xtfrm(e)
# [1] 1 3 2
sort(e)
# [1] "101575360400"
# [2] "102575360400"
# [3] "103575360400"

# but not which.max:
which.max(e)
# [1] 1

This happens because which.max() converts its argument to double, which
loses precision:

(n <- as.numeric(e))
# [1] 1e+27 1e+27 1e+27
identical(n[1], n[2])
# [1] TRUE
identical(n[3], n[2])
# [1] TRUE

Will be curious to know if there is a clever way to keep both the O(N)
complexity and the full arbitrary precision.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] max on numeric_version with long components

2024-04-27 Thread Jonathan Keane
I've noticed something in R devel which seems a little off and not the
behavior I see in 4.4.0 or earlier versions. With numeric_versions that
have long (>8 digit) final components max and min return the first element
and not the max or min:

In devel:
> max(numeric_version(c("1.0.1.1", "1.0.3.1",
"1.0.2.1")))
[1] ‘1.0.1.1’
> max(numeric_version(c("1.0.1.1000", "1.0.3.1000",
"1.0.2.1000")))
[1] ‘1.0.3.1000’

In 4.4.0:
> max(numeric_version(c("1.0.1.1", "1.0.3.1",
"1.0.2.1")))
[1] ‘1.0.3.1’
> max(numeric_version(c("1.0.1.1000", "1.0.3.1000",
"1.0.2.1000")))
[1] ‘1.0.3.1000’

Is this expected? I've looked in NEWS to see but didn't see anything
referencing this. Happy to submit an issue to bug tracker.

-Jon

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] read.csv

2024-04-27 Thread Kevin Coombes
I was horrified when I saw John Weinstein's article about Excel turning
gene names into dates. Mainly because I had been complaining about that
phenomenon for years, and it never remotely occurred to me that you could
get a publication out of it.

I eventually rectified the situation by publishing "Blasted Cell Line
Names", describing how to match different researchers' recording of the
names of cell lines, by applying techniques for DNA or protein sequence
alignment.

Best,
   Kevin

On Tue, Apr 16, 2024, 4:51 PM Reed A. Cartwright 
wrote:

> Gene names being misinterpreted by spreadsheet software (read.csv is
> no different) is a classic issue in bioinformatics. It seems like
> every practitioner ends up encountering this issue in due time. E.g.
>
> https://pubmed.ncbi.nlm.nih.gov/15214961/
>
> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
>
> https://www.nature.com/articles/d41586-021-02211-4
>
>
> https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates
>
>
> On Tue, Apr 16, 2024 at 3:46 AM jing hua zhao 
> wrote:
> >
> > Dear R-developers,
> >
> > I came to a somewhat unexpected behaviour of read.csv() which is trivial
> but worthwhile to note -- my data involves a protein named "1433E" but to
> save space I drop the quote so it becomes,
> >
> > Gene,SNP,prot,log10p
> > YWHAE,13:62129097_C_T,1433E,7.35
> > YWHAE,4:72617557_T_TA,1433E,7.73
> >
> > Both read.cv() and readr::read_csv() consider prot(ein) name as
> (possibly confused by scientific notation) numeric 1433 which only alerts
> me when I tried to combine data,
> >
> > all_data <- data.frame()
> > for (protein in proteins[1:7])
> > {
> >cat(protein,":\n")
> >f <- paste0(protein,".csv")
> >if(file.exists(f))
> >{
> >  p <- read.csv(f)
> >  print(p)
> >  if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
> >}
> > }
> >
> > proteins[1:7]
> > [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
> >
> > dplyr::bind_rows() failed to work due to incompatible types nevertheless
> rbind() went ahead without warnings.
> >
> > Best wishes,
> >
> >
> > Jing Hua
> >
> > __
> > R-devel@r-project.org mailing list
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!YJzURlAK1O3rlvXvq9xl99aUaYL5iKm9gnN5RBi-WJtWa5IEtodN3vaN9pCvRTZA23dZyfrVD7X8nlYUk7S1AK893A$
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Should c(..., recursive = TRUE) and unlist(x, recursive = TRUE) recurse into expression vectors?

2024-04-27 Thread Mikael Jagan




On 2024-04-27 10:53 am, Mikael Jagan wrote:

Reading the body of function 'AnswerType' in bind.c, called from 'do_c'
and 'do_unlist', I notice that EXPRSXP and VECSXP are handled identically
in the  recurse = TRUE  case.

A corollary is that  c(recursive = TRUE)  and  unlist(recursive = TRUE)
treat expression vectors like  expression(a, b)  as lists of symbols and
calls.  And since they treat symbols and calls as lists of length 1, we
see:

  > x <- expression(a, b); y <- expression(c, d)
  > c(x, y)
expression(a, b, c, d)
  > c(x, y, recursive = TRUE)
[[1]]
a

[[2]]
b

[[3]]
c

[[4]]
d

My expectation based on the documentation in help("c") and help("unlist")
is that those functions would recurse into lists and pairlists, but _not_
into expression vectors.

  recursive: logical.  If 'recursive = TRUE', the function recursively
descends through lists (and pairlists) combining all their
elements into a vector.

  recursive: logical.  Should unlisting be applied to list components of
'x'?

My feeling is that either:

(1) the behaviour should change, so that both calls to 'c' above give
  the result of type "expression".
(2) the documentation should change to say that expression vectors are
  handled as lists in the recursive case.

Option (2) won't break anything but is a bit awkward because it means
that a type "higher" in the documented hierarchy (... < list < expression)
is coerced to a lower type.



Er - this last comment about Option (2) being awkward can be ignored.  The
expression vector is not itself coerced to a list.  Rather, its non-vector
components are treated as lists of length 1.  And that's well-documented.

If anything, Option (1) is awkward as it would treat two types of generic
vectors, list and expression, asymmetrically ...

I can submit a patch implementing Option (2) in a few days to allow for
comments if any.

Mikael


I'll add here that, confusingly, help("expression") says: "an object of
mode 'expression' is a list".  I understand the author's intent (lists and
expression vectors differ only in the 'type' field of the SEXP header) but
I wonder if substituting "list" with "generic vector" there would cause
less confusion ... ?

Mikael


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Should c(..., recursive = TRUE) and unlist(x, recursive = TRUE) recurse into expression vectors?

2024-04-27 Thread Mikael Jagan

Reading the body of function 'AnswerType' in bind.c, called from 'do_c'
and 'do_unlist', I notice that EXPRSXP and VECSXP are handled identically
in the  recurse = TRUE  case.

A corollary is that  c(recursive = TRUE)  and  unlist(recursive = TRUE)
treat expression vectors like  expression(a, b)  as lists of symbols and
calls.  And since they treat symbols and calls as lists of length 1, we
see:

> x <- expression(a, b); y <- expression(c, d)
> c(x, y)
expression(a, b, c, d)
> c(x, y, recursive = TRUE)
[[1]]
a

[[2]]
b

[[3]]
c

[[4]]
d

My expectation based on the documentation in help("c") and help("unlist")
is that those functions would recurse into lists and pairlists, but _not_
into expression vectors.

recursive: logical.  If 'recursive = TRUE', the function recursively
  descends through lists (and pairlists) combining all their
  elements into a vector.

recursive: logical.  Should unlisting be applied to list components of
  'x'?

My feeling is that either:

(1) the behaviour should change, so that both calls to 'c' above give
the result of type "expression".
(2) the documentation should change to say that expression vectors are
handled as lists in the recursive case.

Option (2) won't break anything but is a bit awkward because it means
that a type "higher" in the documented hierarchy (... < list < expression)
is coerced to a lower type.

I'll add here that, confusingly, help("expression") says: "an object of
mode 'expression' is a list".  I understand the author's intent (lists and
expression vectors differ only in the 'type' field of the SEXP header) but
I wonder if substituting "list" with "generic vector" there would cause
less confusion ... ?

Mikael

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Extending proj with proj.line3d methods and overloading the methods

2024-04-27 Thread Ivan Krylov via R-package-devel
27 апреля 2024 г. 00:49:47 GMT+03:00, Leo Mada via R-package-devel 
 пишет:
>Dear List-Members,
>
>I try to implement a proj.line3d method and to overload this method as follows:
>
>proj.line3d <- function(p, x, y, z, ...)
>  UseMethod("proj.line3d")
>
>proj.line3d.numeric = function(p, x, y, z, ...) {
>  # ...
>}
>
>proj.line3d.matrix = function(p, x, y, z, ...) {
>  # ...
>}

>p = c(1,2,3)
>line = matrix(c(0,5,2,3,1,4), 2)
>proj.line3d(p, line)
>#  Error in UseMethod("proj.line3d") :
>#   no applicable method for 'proj.line3d' applied to an object of class 
>"c('double', 'numeric')"

>methods(proj)
># [1] proj.aov*   proj.aovlist*   proj.default*   proj.line3d
># [5] proj.line3d.matrix  proj.line3d.numeric proj.lm

In your NAMESPACE, you've registered methods for the generic function 'proj', 
classes 'line3d.matrix' and 'line3d.numeric', but above you are calling a 
different generic, 'proj.line3d', for which no methods are registered.

For proj.line3d(, ) to work, you'll have to register the 
methods for the proj.line3d generic. If you need a visible connection to the 
proj() generic, you can try registering a method on the 'proj' generic, class 
'line3d' *and* creating a class 'line3d' that would wrap your vectors and 
matrices:

proj(line3d(p), line) -> call lands in proj.line3d -> maybe additional dispatch 
on the remaining classes of 'p'?

This seems to work, but I haven't tested it extensively:

> proj.line3d <- \(x, ...) UseMethod('proj.line3d')
> proj.line3d.numeric <- \(x, ...) { message('proj.line3d.numeric'); x }
> line3d <- \(x) structure(x, class = c('line3d', class(x)))
> proj(line3d(pi))
proj.line3d.numeric
[1] 3.141593
attr(,"class")
[1] "line3d"  "numeric"

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel