[Rd] Petition to set warnPartialMatch* options to TRUE during R CMD check by default
Hi all, What it says in the title. This is likely to cause a lot of CRAN packages to fail (I can try and quantify this if seen fit), but I think it's for the best. Packages should not (IMHO) be relying on partial matching in package code / tests. One might be more permissive for vignette/examples code, though I still find it poor practice. Among many reasons why package authors should resist using partial matching, today I saw this: upstream package A adds a new argument 'nb' to its print method, now there are two arguments 'na' and 'nb' starting with 'n' downstream package B importing A starts failing because it was using print(n=...) and relying on that resolving to na= but now it's ambiguous This feels like an unnecessary risk for considerate package A authors to need to take into account when designing their API. Yes, downstream package B "should" be broken & resubmitted to CRAN, but (1) there is some perceived "blame" for package A "causing B to be removed" and (2) the re-submitted package is by no means assured to be "safe" -- time-constrained B authors may just fix the minimum set of partially-matched calls while leaving potentially many other similar at-risk call sites unchanged. Better to enforce exact matching in package code globally, I think. It seems likely (given how options work in R) that committed authors will be able to sidestep the issue by disabling the PartialMatch warnings, but better to make this inconvenient with a stricter default. Mike C [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] Re: Repeated library() of one package with different include.only= entries
OK, I've moved your suggestions into the tracker for further discussion: https://bugs.r-project.org/show_bug.cgi?id=18703 https://bugs.r-project.org/show_bug.cgi?id=18704 https://bugs.r-project.org/show_bug.cgi?id=18705 I will tackle a patch for 18705 to start with as the lowest-hanging fruit. On Mon, Apr 15, 2024 at 6:46 AM Martin Maechler wrote: > I think we should try to advance and hopefully finalize this > thread before we forget about it .. > > >>>>> Michael Chirico n Thu, 11 Apr 2024 09:10:11 -0700 writes: > > >> I would assume that > > >> library(Matrix, include.only="isDiagonal") > > >> implies that only `isDiagonal` ends up on the search path > > > This could also be a reasonable behavior, but neither does that > happen > > today. > > Indeed; I think we should signal a warning at least in the case > it does not happen --- namely when "parts of Matrix" are already > in the search path. > > >> I think a far better approach to solve Michael's problem is simply > to use > > >> fac2sparse <- Matrix::fac2sparse > > > This does not fully simulate attachment, e.g. running package hooks & > > resolving Depends. > > (indeed; as library() is really about search() patch attchment) > > >> ?library could also mention using detach() followed by library() or > >> attachNamespace() with a new include.only specification. > > > This is the "quite hard to accomplish" I alluded to, admittedly I > hadn't > > forced myself to write it all out -- maybe it's not as bad as all > that. > > After some iterations, today I think we'd want to do... > > > modify_attach <- function(pkg, new_names) { > > if (!startsWith(pkg, "package:")) pkg <- paste0("package:", pkg) > > old <- tryCatch(ls(pkg, all.names=TRUE), error=\(c) character()) > > if (length(old)) detach(pkg) > > attachNamespace(.rmpkg(pkg), include.only=c(new_names, old)) > > } > > > Perhaps detach() could invisibly return the exported names to make > this a > > tiny bit easier (today it returns NULL): > > I agree that such changed detach() behavior would be "nice" here; > but I wouldn't like to change its default behavior, notably as > in 99.5% of its use, the names would not be used. > I'd agree to add a new option for detach() to return the names > (visibly in that case). > > > modify_attach <- function(pkg, new_names) { > > if (!startsWith(pkg, "package:")) pkg <- paste0("package:", pkg) > > old <- tryCatch(detach(pkg), error=\(c) character()) > > attachNamespace(.rmpkg(pkg), include.only=c(new_names, old)) > > } > > > Regardless, I think your suggestion to just point to > > detach()+attachNamespace() is reasonable enough, the rare users that > care > > about this are likely to be able to figure out the rest from there. > > So I think we agree here; mentioning such a modify_attach() > ... I'd use an example *without* tryCatch() as I think the user > should choose their own line of action in such cases > ... > on the help page would then be appropriate. > > Martin > > > On Thu, Apr 11, 2024 at 7:37 AM wrote: > > >> On Thu, 11 Apr 2024, Duncan Murdoch wrote: > >> > >> > On 11/04/2024 7:04 a.m., Martin Maechler wrote: > >> >>>>>>> Michael Chirico > >> >>>>>>> on Mon, 8 Apr 2024 10:19:29 -0700 writes: > >> >> > >> >> > Right now, attaching the same package with different > >> include.only= has no effect: > >> >> > >> >> > library(Matrix, include.only="fac2sparse") > >> >> > library(Matrix) > >> >> > ls("package:Matrix") > >> >> > # [1] "fac2sparse" > >> >> > >> >> > ?library does not cover this case -- what is covered is > the _loading_ > >> >> > behavior of repeated calls: > >> >> > >> >> >> [library and require] check and update the list of > currently attached > >> >> > packages and do not reload a namespace which is already > loaded > >> >> > >> >> > But here we're looking at
Re: [Rd] [External] Re: Repeated library() of one package with different include.only= entries
> I would assume that > library(Matrix, include.only="isDiagonal") > implies that only `isDiagonal` ends up on the search path This could also be a reasonable behavior, but neither does that happen today. > I think a far better approach to solve Michael's problem is simply to use > fac2sparse <- Matrix::fac2sparse This does not fully simulate attachment, e.g. running package hooks & resolving Depends. > ?library could also mention using detach() followed by library() or > attachNamespace() with a new include.only specification. This is the "quite hard to accomplish" I alluded to, admittedly I hadn't forced myself to write it all out -- maybe it's not as bad as all that. After some iterations, today I think we'd want to do... modify_attach = function(pkg, new_names) { if (!startsWith(pkg, "package:")) pkg <- paste0("package:", pkg) old <- tryCatch(ls(pkg, all.names=TRUE), error=\(c) character()) if (length(old)) detach(pkg) attachNamespace(.rmpkg(pkg), include.only=c(new_names, old)) } Perhaps detach() could invisibly return the exported names to make this a tiny bit easier (today it returns NULL): modify_attach = function(pkg, new_names) { if (!startsWith(pkg, "package:")) pkg <- paste0("package:", pkg) old <- tryCatch(detach(pkg), error=\(c) character()) attachNamespace(.rmpkg(pkg), include.only=c(new_names, old)) } Regardless, I think your suggestion to just point to detach()+attachNamespace() is reasonable enough, the rare users that care about this are likely to be able to figure out the rest from there. On Thu, Apr 11, 2024 at 7:37 AM wrote: > On Thu, 11 Apr 2024, Duncan Murdoch wrote: > > > On 11/04/2024 7:04 a.m., Martin Maechler wrote: > >>>>>>> Michael Chirico > >>>>>>> on Mon, 8 Apr 2024 10:19:29 -0700 writes: > >> > >> > Right now, attaching the same package with different > include.only= > >> has no > >> > effect: > >> > >> > library(Matrix, include.only="fac2sparse") > >> > library(Matrix) > >> > ls("package:Matrix") > >> > # [1] "fac2sparse" > >> > >> > ?library does not cover this case -- what is covered is the > >> _loading_ > >> > behavior of repeated calls: > >> > >> >> [library and require] check and update the list of currently > >> attached > >> > packages and do not reload a namespace which is already loaded > >> > >> > But here we're looking at the _attach_ behavior of repeated > calls. > >> > >> > I am particularly interested in allowing the exports of a > package to > >> be > >> > built up gradually: > >> > >> > library(Matrix, include.only="fac2sparse") > >> > library(Matrix, include.only="isDiagonal") # want: > >> ls("package:Matrix") --> > >> > c("fac2sparse", "isDiagonal") > >> > ... > >> > >> > It seems quite hard to accomplish this at the moment. Is the > >> behavior to > >> > ignore new inclusions intentional? Could there be an argument to > get > >> > different behavior? > >> > >> As you did not get an answer yet, ..., some remarks by an > >> R-corer who has tweaked library() behavior in the past : > >> > >> - The `include.only = *` argument to library() has been a > >>*relatively* recent addition {given the 25+ years of R history}: > >> > >>It was part of the extensive new features by Luke Tierney for > >>R 3.6.0 [r76248 | luke | 2019-03-18 17:29:35 +0100], with NEWS entry > >> > >> • library() and require() now allow more control over handling > >>search path conflicts when packages are attached. The policy is > >>controlled by the new conflicts.policy option. > >> > >> - I haven't seen these (then) new features been used much, > unfortunately, > >>also not from R-core members, but I'd be happy to be told a > different > >> story. > >> > >> For the above reasons, it could well be that the current > >> implementation {of these features} has not been exercised a lot > >> yet, and limitations as you found them haven't been noticed yet, > >> or at least not noticed on the public R mailing lists, nor > >> otherwise by R-core (?). >
[Rd] Repeated library() of one package with different include.only= entries
Right now, attaching the same package with different include.only= has no effect: library(Matrix, include.only="fac2sparse") library(Matrix) ls("package:Matrix") # [1] "fac2sparse" ?library does not cover this case -- what is covered is the _loading_ behavior of repeated calls: > [library and require] check and update the list of currently attached packages and do not reload a namespace which is already loaded But here we're looking at the _attach_ behavior of repeated calls. I am particularly interested in allowing the exports of a package to be built up gradually: library(Matrix, include.only="fac2sparse") library(Matrix, include.only="isDiagonal") # want: ls("package:Matrix") --> c("fac2sparse", "isDiagonal") ... It seems quite hard to accomplish this at the moment. Is the behavior to ignore new inclusions intentional? Could there be an argument to get different behavior? [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] readChar() could read the whole file by default?
I am curious why readLines() has a default (n=-1L) to read the full file while readChar() has no default for nchars= (i.e., readChar(file) is an error). Is there a technical reason for this? I often[1] see code like paste(readLines(f), collapse="\n") which would be better served by readChar(), especially given issues with the global string cache I've come across[2]. But lacking the default, the replacement might come across less clean. For my own purposes the incantation readChar(file, file.size(file)) is ubiquitous. Taking CRAN code[3] as a sample[4], 41% of readChar() calls use either readChar(f, file.info(f)$size) or readChar(f, file.size(f))[5]. Thanks for the consideration and feedback, Mike C [1] e.g. a quick search shows O(100) usages in CRAN packages: https://github.com/search?q=org%3Acran+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code, and O(1000) usages generally on GitHub: https://github.com/search?q=lang%3AR+%2Fpaste%5B%28%5D%5Cs*readLines%5B%28%5D.*%5B%29%5D%2C%5Cs*collapse%5Cs*%3D%5Cs*%5B%27%22%5D%5B%5C%5C%5D%2F+lang%3AR&type=code [2] AIUI the readLines() approach "pollutes" the global string cache with potentially 1000s/1s of strings for each line, only to get them gc()'d after combining everything with paste(collapse="\n") [3] The mirror on GitHub, which includes archived packages as well as current (well, eventually-consistent) versions. [4] Note that usage in packages is likely not representative of usage in scripts, e.g. I often saw readChar(f, 1), or eol-finders like readChar(f, 500) + grep("[\n\r]"), which makes more sense to me as something to find in package internals than in analysis scripts. FWIW I searched an internal codebase (scripts and packages) and found 70% of usages reading the full file. [5] repro: https://gist.github.com/MichaelChirico/247ea9500460dca239f031e74bdcf76b requires GitHub PAT in env GITHUB_PAT for API permissions. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] c(NA, 0+1i) not the same as c(as.complex(NA), 0+1i)?
Thanks Martin. My hang-up was not on what the outcome of as.complex(NA) should be, but rather, how I should read code like c(x, y) generally. Till now, I have thought of it like 'c(x, y)' is c(as(x, typeof(y)), y)` when "type(y) > type(x)". Basically in my mind, "coercion" in R <-> as.(.) (or coerceVector() in C). So I tracked down the source (which admittedly has been this way for much longer than the present discussion) to see what exactly c() is doing in this case: https://github.com/r-devel/r-svn/blob/71e7480b07767f3b7d5c45a4247959aa4d83d910/src/main/bind.c#L418-L425 And indeed! It's not "coercion" in the sense I just described... there's a branch for the 'x == NA_LOGICAL' case to _convert_ to NA_complex_. On Mon, Nov 6, 2023 at 3:08 AM Martin Maechler wrote: > >>>>> Michael Chirico > >>>>> on Sun, 5 Nov 2023 09:41:42 -0800 writes: > > > This is another follow-up to the thread from September > > "Recent changes to as.complex(NA_real_)". > > > A test in data.table was broken by the changes for NA > > coercion to complex; the breakage essentially comes from > > > c(NA, 0+1i) > > # vs > > c(as.complex(NA), 0+1i) > > > The former is the output we tested against; the latter is > essentially (via > > coerceVector() in C) what's generated by our data.table::shift() > > > However, these are now (r85472) different: > > > Im(c(NA, 0+1i)) > > # [1] NA 1 > > Im(c(as.complex(NA), 0+1i)) > > # [1] 0 1 > > > > The former matches the behavior of directly using NA_complex_: > > > Im(c(NA_complex_, 0+1i)) > > # [1] NA 1 > > > On R4.3.2, they both match the NA_complex_ behavior: > > Im(c(NA, 0+1i)) > > # [1] NA 1 > > Im(c(as.complex(NA), 0+1i)) > > # [1] NA 1 > > > Is this intended behavior, does something need to be updated for c() > as > > well? > > > Certainly it's messing with my understanding of how c() behaves, > e.g. in ?c > > >> All arguments are coerced to a common type which is the type of the > > returned value > > I think you have confused yourself, and everything behaves as expected: > > As we now have (in R-devel, since {r85233 | maechler | 2023-09-29 }) > > • ‘as.complex(x)’ now returns ‘complex(real=x, imaginary=0)’ > for _all_ numerical and logical ‘x’, notably also for ‘NA’ > or ‘NA_integer_’. > > ==> as.complex(NA) is indeed complex(real = NA, imaginary = 0) > > And now, in your > >c(as.complex(NA), 0+1i) > > you are calling c() on two complex numbers, i.e., there is *no* coercion > (and c(.) is rather "trivial"), and the same is true for > >c(NA_complex_, 0+1i) > > > However, in 85233, I had only modified & added examples to ?as.complex, > and now have added more (corresponding to the above NEWS entry); > -> svn rev 85475 > > . > > The underlying "dilemma" that nobody can help us with is that > "almost infinitely" many different complex numbers z fulfill > is.na(z) |--> TRUE > and only one of them is NA_complex_ and that may be unintuitive. > > OTOH, we already have for the doubles that there are at least two > different x fulfulling is.na(x), namely NaN and NA > and from C's point of view there are even considerably more > different NaN's .. but now I'm definitely digressing. > > Martin > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] c(NA, 0+1i) not the same as c(as.complex(NA), 0+1i)?
This is another follow-up to the thread from September "Recent changes to as.complex(NA_real_)". A test in data.table was broken by the changes for NA coercion to complex; the breakage essentially comes from c(NA, 0+1i) # vs c(as.complex(NA), 0+1i) The former is the output we tested against; the latter is essentially (via coerceVector() in C) what's generated by our data.table::shift() However, these are now (r85472) different: Im(c(NA, 0+1i)) # [1] NA 1 Im(c(as.complex(NA), 0+1i)) # [1] 0 1 The former matches the behavior of directly using NA_complex_: Im(c(NA_complex_, 0+1i)) # [1] NA 1 On R4.3.2, they both match the NA_complex_ behavior: Im(c(NA, 0+1i)) # [1] NA 1 Im(c(as.complex(NA), 0+1i)) # [1] NA 1 Is this intended behavior, does something need to be updated for c() as well? Certainly it's messing with my understanding of how c() behaves, e.g. in ?c > All arguments are coerced to a common type which is the type of the returned value [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] FR: valid_regex() to test string validity as a regular expression
Great to know this exists in package space! Of course, using re2 validation for a regex to be executed with TRE (via grep*) is just begging for trouble (e.g. [1] suggests re2 is closer to PCRE, [2] says "mostly" PCRE compatible). And overhauling everything to use re2 just for regex validation is hardly practical. [1] https://laurikari.net/tre/google-releases-the-re2-library/ [2] https://hackerboss.com/is-your-regex-matcher-up-to-snuff/ On Wed, Oct 11, 2023 at 4:02 PM Toby Hocking wrote: > > Hi Michael, it sounds like you don't want to use a CRAN package for > this, but you may try re2, see below. > > > grepl("(invalid","subject",perl=TRUE) > Error in grepl("(invalid", "subject", perl = TRUE) : > invalid regular expression '(invalid' > In addition: Warning message: > In grepl("(invalid", "subject", perl = TRUE) : > PCRE pattern compilation error > 'missing closing parenthesis' > at '' > > > grepl("(invalid","subject",perl=FALSE) > Error in grepl("(invalid", "subject", perl = FALSE) : > invalid regular expression '(invalid', reason 'Missing ')'' > In addition: Warning message: > In grepl("(invalid", "subject", perl = FALSE) : > TRE pattern compilation error 'Missing ')'' > > > re2::re2_regexp("(invalid") > Error: missing ): (invalid > > On Tue, Oct 10, 2023 at 7:57 AM Michael Chirico via R-devel > wrote: > > > > > Grepping an empty string might work in many cases... > > > > That's precisely why a base R offering is important, as a surer way of > > validating in all cases. To be clear I am trying to directly access the > > results of tre_regcomp(). > > > > > it is probably more portable to simply be prepared to propagate such > > errors from the actual use on real inputs > > > > That works best in self-contained calls -- foo(re) and we execute re inside > > foo(). > > > > But the specific context where I found myself looking for a regex validator > > is more complicated (https://github.com/r-lib/lintr/pull/2225). User > > supplies a regular expression in a configuration file, only "later" is it > > actually supplied to grepl(). > > > > Till now, we've done your suggestion -- just surface the regex error at run > > time. But our goal is to make it friendlier and fail earlier at "compile > > time" as the config is loaded, "long" before any regex is actually executed. > > > > At a bare minimum this is a good place to return a classed warning (say > > invalid_regex_warning) to allow finer control than tryCatch(condition=). > > > > On Mon, Oct 9, 2023, 11:30 PM Tomas Kalibera > > wrote: > > > > > > > > On 10/10/23 01:57, Michael Chirico via R-devel wrote: > > > > > > It will be useful to package authors trying to validate input which is > > > supposed to be a valid regular expression. > > > > > > As near as I can tell, the only way we can do so now is to run any > > > regex function and check for the warning and/or condition to bubble > > > up: > > > > > > valid_regex <- function(str) { > > > stopifnot(is.character(str), length(str) == 1L) > > > !inherits(tryCatch(grepl(str, ""), condition = identity), "condition") > > > } > > > > > > That's pretty hefty/inscrutable for such a simple validation. I see a > > > variety of similar approaches in CRAN packages [1], all slightly > > > different. It would be good for R to expose a "canonical" way to run > > > this validation. > > > > > > At root, the problem is that R does not expose the regex compilation > > > routines like 'tre_regcomp', so from the R side we have to resort to > > > hacky approaches. > > > > > > Hi Michael, > > > > > > I don't think you need compilation functions for that. If a regular > > > expression is found invalid by a specific third party library R uses, the > > > library should return and error to R and R should return an error to you, > > > and you should probably propagate that to your users. Grepping an empty > > > string might work in many cases as a test, but it is probably more > > > portable > > > to simply be prepared to propagate such errors from the actual use on real > > > inputs. In theory, there could be some optimization for a parti
Re: [Rd] FR: valid_regex() to test string validity as a regular expression
> Grepping an empty string might work in many cases... That's precisely why a base R offering is important, as a surer way of validating in all cases. To be clear I am trying to directly access the results of tre_regcomp(). > it is probably more portable to simply be prepared to propagate such errors from the actual use on real inputs That works best in self-contained calls -- foo(re) and we execute re inside foo(). But the specific context where I found myself looking for a regex validator is more complicated (https://github.com/r-lib/lintr/pull/2225). User supplies a regular expression in a configuration file, only "later" is it actually supplied to grepl(). Till now, we've done your suggestion -- just surface the regex error at run time. But our goal is to make it friendlier and fail earlier at "compile time" as the config is loaded, "long" before any regex is actually executed. At a bare minimum this is a good place to return a classed warning (say invalid_regex_warning) to allow finer control than tryCatch(condition=). On Mon, Oct 9, 2023, 11:30 PM Tomas Kalibera wrote: > > On 10/10/23 01:57, Michael Chirico via R-devel wrote: > > It will be useful to package authors trying to validate input which is > supposed to be a valid regular expression. > > As near as I can tell, the only way we can do so now is to run any > regex function and check for the warning and/or condition to bubble > up: > > valid_regex <- function(str) { > stopifnot(is.character(str), length(str) == 1L) > !inherits(tryCatch(grepl(str, ""), condition = identity), "condition") > } > > That's pretty hefty/inscrutable for such a simple validation. I see a > variety of similar approaches in CRAN packages [1], all slightly > different. It would be good for R to expose a "canonical" way to run > this validation. > > At root, the problem is that R does not expose the regex compilation > routines like 'tre_regcomp', so from the R side we have to resort to > hacky approaches. > > Hi Michael, > > I don't think you need compilation functions for that. If a regular > expression is found invalid by a specific third party library R uses, the > library should return and error to R and R should return an error to you, > and you should probably propagate that to your users. Grepping an empty > string might work in many cases as a test, but it is probably more portable > to simply be prepared to propagate such errors from the actual use on real > inputs. In theory, there could be some optimization for a particular case, > the checking may not be the same - but that is the same say for compilation > and checking. > > Things get slightly complicated by encoding/useBytes modes > (tre_regwcomp, tre_regncomp, tre_regwncomp, tre_regcompb, > tre_regncompb; all in tre.h), but all are already present in other > regex routines, so this is doable. > > Re encodings, simply R strings should be valid in their encoding. This is > not just for regular expressions but also for anything else. You shouldn't > assume that R can handle invalid strings in any reasonable way. Definitely > you shouldn't try adding invalid strings in tests - behavior with invalid > strings is unspecified. To test whether a string is valid, there is > validEnc() (or validUTF8()). But, again, it is probably safest to propagate > errors from the regular expression R functions (in case the checks differ, > particularly for non-UTF-8), also, duplicating the encoding checks can be a > non-trivial overhead. > > If there was a strong need to have an automated way to somehow classify > specifically errors from the regex libraries, perhaps R could attach some > classes to them when the library tells. > > Tomas > > Exposing a function to compile regular expressions is common in other > languages, e.g. Go [2], Python [3], JavaScript [4]. > > [1] > https://github.com/search?q=lang%3AR+%2Fis%5Ba-zA-Z0-9._%5D*reg%5Ba-zA-Z0-9._%5D*ex.*%28%3C-%7C%3D%29%5Cs*function%2F+org%3Acran&type=code > [2] https://pkg.go.dev/regexp#Compile > [3] https://docs.python.org/3/library/re.html#re.compile > [4] > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp > > __r-de...@r-project.org mailing > listhttps://stat.ethz.ch/mailman/listinfo/r-devel > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] FR: valid_regex() to test string validity as a regular expression
It will be useful to package authors trying to validate input which is supposed to be a valid regular expression. As near as I can tell, the only way we can do so now is to run any regex function and check for the warning and/or condition to bubble up: valid_regex <- function(str) { stopifnot(is.character(str), length(str) == 1L) !inherits(tryCatch(grepl(str, ""), condition = identity), "condition") } That's pretty hefty/inscrutable for such a simple validation. I see a variety of similar approaches in CRAN packages [1], all slightly different. It would be good for R to expose a "canonical" way to run this validation. At root, the problem is that R does not expose the regex compilation routines like 'tre_regcomp', so from the R side we have to resort to hacky approaches. Things get slightly complicated by encoding/useBytes modes (tre_regwcomp, tre_regncomp, tre_regwncomp, tre_regcompb, tre_regncompb; all in tre.h), but all are already present in other regex routines, so this is doable. Exposing a function to compile regular expressions is common in other languages, e.g. Go [2], Python [3], JavaScript [4]. [1] https://github.com/search?q=lang%3AR+%2Fis%5Ba-zA-Z0-9._%5D*reg%5Ba-zA-Z0-9._%5D*ex.*%28%3C-%7C%3D%29%5Cs*function%2F+org%3Acran&type=code [2] https://pkg.go.dev/regexp#Compile [3] https://docs.python.org/3/library/re.html#re.compile [4] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Unclear provenance of message from S4
MRE to produce the message is the following: setClass("Foo") setOldClass("Bar") setAs("Bar", "Foo", \(x) x) # NOTE: arguments in definition for coerce changed from (x) to (from) In an interactive setting, that may be fine, but I first encountered this message in the install log of a package for which I am not the author (nor do I have any context on the package, really). As such it took me much longer than it otherwise would have to pick up on the connection between 'coerce' and 'setAs' (as searching the sources for 'coerce' yielded nothing). That the faulty argument is the ubiquitous 'x' is also a major impediment. It would be much better to surface the relationship to 'setAs' in this message. I believe that logic has to live in setAs itself as the delegated substituteFunctionArgs loses the proper context (unless we dare to resort to sys.calls()). Sharing here first per usual as I am not very familiar with S4, in case this all is much clearer to someone with a sharper eye for S4. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
> I'm not entirely sure about the extra call. = FALSE My thinking is the signature of the internal .InheritForDispatch() is not likely to help anyone, in fact having the opposite effect for beginners unsure how to use that info. > Now I'd like to find an example that only uses packages with priority base | > Recommended Sure, here are a few. library(Matrix) # searching for Matrix-owned generics matrix_generics <- getGenerics(where = asNamespace("Matrix")) matrix_generics@.Data[matrix_generics@package == "Matrix"] # simple signature, one argument 'x' symmpart() # Error: unable to find an inherited method for function ‘symmpart’ for signature ‘x="missing"’ # more complicated signature, especially including ... Cholesky(a = 1) # Error: unable to find an inherited method for function ‘Cholesky’ for signature ‘A="missing"’ Cholesky(a = 1, perm = TRUE) # Error: unable to find an inherited method for function ‘Cholesky’ for signature ‘A="missing"’ Cholesky(a = 1, perm = TRUE, IMult = 2) # Error: unable to find an inherited method for function ‘Cholesky’ for signature ‘A="missing"’ --- 'base' is a bit harder since stats4 just provides classes over stats functions, so the missigness error comes from non-S4 code. library(stats4) coef() # Error in coef.default() : argument "object" is missing, with no default Defining our own generic: setGeneric("BaseGeneric", \(a, ...) standardGeneric("BaseGeneric")) BaseGeneric() # Error: unable to find an inherited method for function ‘BaseGeneric’ for signature ‘a="missing"’ # getting multiple classes to show up requires setting the signature: setMethod("BaseGeneric", signature(x = "double", y = "double"), \(x, y, ...) x + y) BaseGeneric(X = 1, Y = 2) # Error: unable to find an inherited method for function ‘BaseGeneric’ for signature ‘x="missing", y="missing"’ On Fri, Aug 11, 2023 at 2:26 AM Martin Maechler wrote: > > >>>>> Michael Chirico via R-devel > >>>>> on Thu, 10 Aug 2023 23:56:45 -0700 writes: > > > Here's a trivial patch that offers some improvement: > > > Index: src/library/methods/R/methodsTable.R > > === > > --- src/library/methods/R/methodsTable.R (revision 84931) > > +++ src/library/methods/R/methodsTable.R (working copy) > > @@ -752,11 +752,12 @@ > > if(length(methods) == 1L) > > return(methods[[1L]]) # the method > > else if(length(methods) == 0L) { > > -cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", > > +cnames <- paste0(head(fdef@signature, length(classes)), "=\"", > > vapply(classes, as.character, ""), "\"", > > collapse = ", ") > > stop(gettextf("unable to find an inherited method for function %s > > for signature %s", > > sQuote(fdef@generic), > > sQuote(cnames)), > > + call. = FALSE, > > domain = NA) > > } > > else > > > Here's the upshot for the example on DBI: > > > dbGetQuery(connection = conn, query = query) > > Error: unable to find an inherited method for function ‘dbGetQuery’ > > for signature ‘conn="missing", statement="missing"’ > > > I don't have any confidence about edge cases / robustness of this > > patch for generic S4 use cases (make check-all seems fine), > > Good you checked, but you are right that that's not all enough to be sure: > > Checking error *messages* is not something we do often {not > the least because you'd need to consider message translations > and hence ensure you only check in case of English ...}. > > > but I don't suppose a full patch would be dramatically different from > the > > above. > > I agree: The patch looks to make sense to me, too, > while I'm not entirely sure about the extra call. = FALSE > (which I of course understand you'd prefer for the above example) > > Now I'd like to find an example that only uses packages with priority > base | Recommended > > Martin > > -- > Martin Maechler > ETH Zurich and R Core team > > > > Mike C > > > On Thu, Aug 10, 2023 at 12:39 PM Gabriel Becker > wrote: > >> > >> I just want to add my 2 cents that I think it would be very useful and > beneficial to improve S4 to surface that information as well. > >> > >> Mo
Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
Here's a trivial patch that offers some improvement: Index: src/library/methods/R/methodsTable.R === --- src/library/methods/R/methodsTable.R (revision 84931) +++ src/library/methods/R/methodsTable.R (working copy) @@ -752,11 +752,12 @@ if(length(methods) == 1L) return(methods[[1L]]) # the method else if(length(methods) == 0L) { -cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", +cnames <- paste0(head(fdef@signature, length(classes)), "=\"", vapply(classes, as.character, ""), "\"", collapse = ", ") stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), + call. = FALSE, domain = NA) } else Here's the upshot for the example on DBI: dbGetQuery(connection = conn, query = query) Error: unable to find an inherited method for function ‘dbGetQuery’ for signature ‘conn="missing", statement="missing"’ I don't have any confidence about edge cases / robustness of this patch for generic S4 use cases (make check-all seems fine), but I don't suppose a full patch would be dramatically different from the above. Mike C On Thu, Aug 10, 2023 at 12:39 PM Gabriel Becker wrote: > > I just want to add my 2 cents that I think it would be very useful and > beneficial to improve S4 to surface that information as well. > > More information about the way that the dispatch failed would be of great > help in situations like the one Michael pointed out. > > ~G > > On Thu, Aug 10, 2023 at 9:59 AM Michael Chirico via R-devel > wrote: >> >> I forwarded that along to the original reporter with positive feedback >> -- including the argument names is definitely a big help for cuing >> what exactly is missing. >> >> Would a patch to do something similar for S4 be useful? >> >> On Thu, Aug 10, 2023 at 6:46 AM Hadley Wickham wrote: >> > >> > Hi Michael, >> > >> > I can't help with S4, but I can help to make sure this isn't a problem >> > with S7. What do you think of the current error message? Do you see >> > anything obvious we could do to improve? >> > >> > library(S7) >> > >> > dbGetQuery <- new_generic("dbGetQuery", c("conn", "statement")) >> > dbGetQuery(connection = NULL, query = NULL) >> > #> Error: Can't find method for generic `dbGetQuery(conn, statement)`: >> > #> - conn : MISSING >> > #> - statement: MISSING >> > >> > Hadley >> > >> > On Wed, Aug 9, 2023 at 10:02 PM Michael Chirico via R-devel >> > wrote: >> > > >> > > I fielded a debugging request from a non-expert user today. At root >> > > was running the following: >> > > >> > > dbGetQuery(connection = conn, query = query) >> > > >> > > The problem is that they've named the arguments incorrectly -- it >> > > should have been [1]: >> > > >> > > dbGetQuery(conn = conn, statement = query) >> > > >> > > The problem is that the error message "looks" highly confusing to the >> > > untrained eye: >> > > >> > > Error in (function (classes, fdef, mtable) : unable to find an >> > > inherited method for function ‘dbGetQuery’ for signature ‘"missing", >> > > "missing"’ >> > > >> > > In retrospect, of course, this makes sense -- the mis-named arguments >> > > are getting picked up by '...', leaving the required arguments >> > > missing. >> > > >> > > But I was left wondering how we could help users right their own ship >> > > here. >> > > >> > > Would it help to mention the argument names? To include some code >> > > checking for weird combinations of missing arguments? Any other >> > > suggestions? >> > > >> > > Mike C >> > > >> > > [1] >> > > https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64 >> > > >> > > __ >> > > R-devel@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > >> > >> > -- >> > http://hadley.nz >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
I forwarded that along to the original reporter with positive feedback -- including the argument names is definitely a big help for cuing what exactly is missing. Would a patch to do something similar for S4 be useful? On Thu, Aug 10, 2023 at 6:46 AM Hadley Wickham wrote: > > Hi Michael, > > I can't help with S4, but I can help to make sure this isn't a problem > with S7. What do you think of the current error message? Do you see > anything obvious we could do to improve? > > library(S7) > > dbGetQuery <- new_generic("dbGetQuery", c("conn", "statement")) > dbGetQuery(connection = NULL, query = NULL) > #> Error: Can't find method for generic `dbGetQuery(conn, statement)`: > #> - conn : MISSING > #> - statement: MISSING > > Hadley > > On Wed, Aug 9, 2023 at 10:02 PM Michael Chirico via R-devel > wrote: > > > > I fielded a debugging request from a non-expert user today. At root > > was running the following: > > > > dbGetQuery(connection = conn, query = query) > > > > The problem is that they've named the arguments incorrectly -- it > > should have been [1]: > > > > dbGetQuery(conn = conn, statement = query) > > > > The problem is that the error message "looks" highly confusing to the > > untrained eye: > > > > Error in (function (classes, fdef, mtable) : unable to find an > > inherited method for function ‘dbGetQuery’ for signature ‘"missing", > > "missing"’ > > > > In retrospect, of course, this makes sense -- the mis-named arguments > > are getting picked up by '...', leaving the required arguments > > missing. > > > > But I was left wondering how we could help users right their own ship here. > > > > Would it help to mention the argument names? To include some code > > checking for weird combinations of missing arguments? Any other > > suggestions? > > > > Mike C > > > > [1] > > https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64 > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Improving user-friendliness of S4 dispatch failure when mis-naming arguments?
I fielded a debugging request from a non-expert user today. At root was running the following: dbGetQuery(connection = conn, query = query) The problem is that they've named the arguments incorrectly -- it should have been [1]: dbGetQuery(conn = conn, statement = query) The problem is that the error message "looks" highly confusing to the untrained eye: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbGetQuery’ for signature ‘"missing", "missing"’ In retrospect, of course, this makes sense -- the mis-named arguments are getting picked up by '...', leaving the required arguments missing. But I was left wondering how we could help users right their own ship here. Would it help to mention the argument names? To include some code checking for weird combinations of missing arguments? Any other suggestions? Mike C [1] https://github.com/r-dbi/DBI/blob/97934c885749dd87a6beb10e8ccb6a5ebea3675e/R/dbGetQuery.R#L62-L64 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] tab-complete for non-syntactic names could attempt backtick-wrapping
I personally wouldn't like using a string, and this comment makes me think it's against the r-core preference as well: https://bugs.r-project.org/show_bug.cgi?id=18429#c1 Thanks both for catching the sloppy mistake in vapply() :) Let's continue discussion on the bug/PR. On Thu, Mar 2, 2023 at 12:39 AM Ivan Krylov wrote: > > There turn out to be a few more things to fix. > > One problem is easy to solve: vapply() needs a third argument > specifying the type of the return value. (Can we have unit tests for > tab completion?) > > The other problem is harder: `comps` defaults to an empty string, and > you can't have a symbol consisting of an empty string, because this > value is internally reserved for missing function arguments. I think > you can return(paste0(prefix, op)) if length(comps) == 0L, but this is > still somewhat worrying. R tries to prevent empty names, so I wouldn't > expect specialOpCompletionsHelper() to return an empty string, but I > can't prove it right now. > > On the other hand, x$'a string' is the same as x$`a string`. Could we > just drop as.name() and keep the return value being a string literal? > I'm not sure about this, either. > > -- > Best regards, > Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] tab-complete for non-syntactic names could attempt backtick-wrapping
Great suggestion! I've started a patch: https://bugs.r-project.org/show_bug.cgi?id=18479 On Wed, Mar 1, 2023 at 1:56 AM Ivan Krylov wrote: > > В Wed, 1 Mar 2023 01:36:02 -0800 > Michael Chirico via R-devel пишет: > > > +comps[non_syntactic] <- paste0("`", comps[non_syntactic], "`") > > There are a few more corner cases. For example, comps could contain > backticks (which should be escaped with backslashes) and backslashes > (which should also be escaped). Thankfully, \u-style Unicode escape > sequences are not currently supported inside backticks, and "escape the > backslash" rule already takes care of them. > > The deparse() function already knows these rules: > > name <- 'hello world ` \\uFF' > cat(deparse1(as.name(name), backtick=TRUE), '\n') > # `hello world \` \\uFF` > `hello world \` \\uFF` <- 'hello' > `hello world \` \\uFF` > # [1] "hello" > > -- > Best regards, > Ivan __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] tab-complete for non-syntactic names could attempt backtick-wrapping
Consider: x <- list(`a b` = 1) x$a (i.e., press the 'tab' key after typing 'x$a') The auto-complete mechanism will fill the buffer like so: x$a b This is not particularly helpful because this is now a syntax error. It seems to me there's a simple fix -- in utils:::specialCompletions(), we can wrap the result of utils:::specialOpCompletionsHelper() with backticks for non-syntactic names ([1]): comps <- specialOpCompletionsHelper(op, suffix, prefix) if (length(comps) == 0L) comps <- "" +non_syntactic <- make.names(comps) != comps +comps[non_syntactic] <- paste0("`", comps[non_syntactic], "`") sprintf("%s%s%s", prefix, op, comps) I'm somewhat surprised this hasn't come up before (I searched for 'completeToken', 'specialCompletions', and 'specialOpCompletionsHelper' here and on Bugzilla), so I'm checking with the list first if I'm missing anything before filing a patch. Mike C [1] https://github.com/r-devel/r-svn/blob/4657f65a377cb5ef318c6548bc264e3b0f9517a0/src/library/utils/R/completion.R#L536-L538 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] FR: names= argument for load()
I have three use cases in mind for an argument to load specifying which variables to load from the input 'file': 1. Avoid overwriting existing variables. ?load recommends using envir= or attach() to avoid overwrites. An argument like names= would allow even finer control to avoid collisions. 2. Avoid loading too many (in quantity or in memory size) variables from a large file. We might save dozens or hundreds of models to the same file and then load them more lazily. 3. Helping static analysis. Currently, when load() is used in a script, it becomes impossible to distinguish truly undefined variables from those defined implicitly by load(). With a names= argument, it can be possible to know statically precisely which names were introduced by load, and which might be a bug. (of course there's nothing stopping authors from having non-literal entries to names=, but that's a more general issue of static analysis). Michael Chirico __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Line-terminal \ in character consants -- missing from ?Quotes ?
I'm still hung up on ?Quotes -- I can't see mention of 'newline' as a valid escape. It mentions the literal sequence '\' 'n', where 'n' is being escaped. Glanced at the parser blame and apparently the terminal '\' is the older behavior, and what I'm used to, i.e. literal newlines in char constants to make multi-line strings, is new (though still 20 years old): https://github.com/r-devel/r-svn/commit/bc3f20e4e686be556877bb6bd2882ae8029fd17f The NEWS entry there does say the same thing as you -- "escaping the newlines with backslashes". >From the parser, I think ?Quotes is just missing "newline" as being a valid escaped character, c.f. https://github.com/r-devel/r-svn/blob/f55b24945d56e824f124638c596b99887441354a/src/main/gram.y#L2823-L2830 ('\n' is treated like '\') https://github.com/r-devel/r-svn/blob/f55b24945d56e824f124638c596b99887441354a/src/main/gram.y#L2978-L3008 ('\n' is in the list of valid items after '\') I don't see any special handling for '\r', so there may be a gap in the R parser? Or I just don't understand what I'm reading in the parser :) Mike C On Sun, Feb 12, 2023 at 3:38 AM Duncan Murdoch wrote: > > On 12/02/2023 12:07 a.m., Michael Chirico via R-devel wrote: > > I'm coming across some code that uses the fact the parser ignores a > > line-terminal '\', e.g. > > > > identical("\ > > ", "\n") > > # [1] TRUE > > > > x = "abc \ > > def" > > y = "abc \ndef" > > identical(x, y) > > # [1] TRUE > > > > However: > > identical("\\n", "\n") > > # [1] FALSE > > > > This appears to be undocumented behavior; the closest thing I see in > > ?Quotes suggests it should be an error: > > > >> Escaping a character not in the following table is an error. > > > > ('\n' is in the table, but my understanding is the 'n' is what's being > > escaped v-a-v the "error", which seems confirmed by the third, FALSE, > > example above) > > > > Is this a bug, is the omission from ?Quotes a bug, or is this just > > undocumented behavior? > > In your first example, you have a backslash which says to escape the > next char. The next char is a newline char. The result is an escaped > newline, which apparently is a newline. > > The same thing happens in the second example. > > The third example is an escaped backslash, i.e. a backslash, followed by > n. That's not the same as an escaped n, which gives a newline. > > So I think the behaviour might be reasonable. > > The thing I'd worry about is whether things are handled properly on > Windows, where the newline is two characters (CR LF). It might be that > the backslash at the end of the line escapes the CR, and you get a \r > out of it instead of a \n. But maybe not, the parser knows about CR LF > and internally converts it to \n, so if that happens early enough, > things would be fine. > > Duncan Murdoch > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Line-terminal \ in character consants -- missing from ?Quotes ?
I'm coming across some code that uses the fact the parser ignores a line-terminal '\', e.g. identical("\ ", "\n") # [1] TRUE x = "abc \ def" y = "abc \ndef" identical(x, y) # [1] TRUE However: identical("\\n", "\n") # [1] FALSE This appears to be undocumented behavior; the closest thing I see in ?Quotes suggests it should be an error: > Escaping a character not in the following table is an error. ('\n' is in the table, but my understanding is the 'n' is what's being escaped v-a-v the "error", which seems confirmed by the third, FALSE, example above) Is this a bug, is the omission from ?Quotes a bug, or is this just undocumented behavior? Mike C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Shouldn't "Loading" be "Attaching" when referring to library() calls?
require() and library() both emit this message immediately before running library(): "Loading required package: %s" https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L967-L968 https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L684-L685 Shouldn't "Loading" be "Attaching" instead? My understanding is "a package is loaded" is equivalent to "isNamespaceLoaded()", i.e., loadNamespace() was run. And that "a package is attached" is equivalent to "pkg %in% .packages()", i.e., library(pkg) was run. Is the terminology not so precise? There's certainly ambiguity in the internal variable names referenced above -- in require, we see loaded <- paste0("package:", package) %in% search() https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L680 Whereas in library() [via .getRequiredPackages2()] we see attached <- paste0("package:", pkg) %in% search() https://github.com/r-devel/r-svn/blob/4198a2941b702d965bb2374c2b908f48c369f40a/src/library/base/R/library.R#L931 Mike C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] trace() an S3-classed function in .GlobalEnv
Thanks, that seems like a reasonable assessment. Also worth noting that tracing wipes out a function's attributes, which is also not documented: foo = function(x) { invisible(1 + 1) } attr(foo, "bar") <- 2 trace(foo, quote(message('traced'))) names(attributes(foo)) # [1] "original" "source" "class" PS one day I hope to master the dark art of choosing r-devel or bugzilla for issues :) On Wed, Jun 8, 2022 at 3:44 AM Martin Maechler wrote: > > >>>>> Michael Chirico > >>>>> on Mon, 6 Jun 2022 23:09:29 -0700 writes: > > > Consider: > > > foo <- function() { > > invisible(1 + 1) > > } > > trace(foo, quote(message("traced")), print=FALSE) > > foo() > > # traced > > > But with a minimal change: > > > class(foo) <- "bar" > > trace(foo, quote(message("traced")), print=FALSE) > > # Constructing traceable class “barWithTrace” > > # Error in .classEnv(className) : > > # unable to find an environment containing class “bar” > > > I don't see anything like this mentioned in ?trace (nor does a Google > > search turn up more than a handful of references to this error), > > and from trying to debug what trace() is doing, we arrive to the error > line[1]: > > > .makeTraceClass(traceClass, class(original)) # > > methods:::.makeTraceClass("barWithTrace", "bar") > > > I don't quite follow what's going on here, but it looks like trace() > > is trying to determine an S4 class definition for "bar", but isS4(bar) > > is FALSE. > > > I can (apparently -- not sure if there are as yet-unseen downstream > > consequences) work around the issue by unsetting the class, tracing, > > then re-setting: > > > class(foo) <- NULL > > trace(foo, quote(message("traced")), print=FALSE) > > class(foo) <- "bar" > > > But obviously this is a bit clunky. Is this a bug, or am I missing > something? > > Just a short note of semi-confirmation: > > At the time S4 generics and methods were introduced into R, > trace() was made much more flexible, notably to be able to trace > S4 methods. > > It can well be that it originally also worked for functions with > an explicit S3 class, but as such functions are very rare, it > could well be you've found a bug, namely that trace() assumes > that if a function has a non-trivial class, it must be an S4 > one. > > ... and I know you know how to report bugs ;-) > > Thank you in advance! > Martin > > > Mike C > > > [1] > https://github.com/r-devel/r-svn/blob/e2a64a4e14adbc4e9e8635eaa8cbd2835ce1d764/src/library/methods/R/trace.R#L240 > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .onLoad, packageStartupMessage, and R CMD check
Examining more closely, it's a NOTE produced by R CMD check -- originally I had thought it was a WARNING, which I think would have been too strong for this case. A NOTE actually seems fine, on second thought. For a tiny bit of context, it's common for us to issue messaging around some state initialization, which has to happen after some (ex-ante unknown) set of packages are loaded. It's important to do so whether or not the package is attached, so the proviso in .onLoad() indeed makes the most sense. Thanks! On Thu, Nov 4, 2021 at 1:02 PM Gabriel Becker wrote: > > Hi Michael, > > Indeed, just to elaborate further on what I believe Duncan's point is, can > you give any examples, "dire" or not, that are appropriate when the package > is loaded but not attached (ie none of its symbols are visible to the user > without using :::)? > > The only things I can think of are a package that changes the behavior of > other, attached package code, such as conflicted. Doing so is very much an > anti-pattern imo generally, with something like conflicted being an > (arguable) exception. And that's assuming conflicted even works/does anything > when loaded but not attached (I have not confirmed whether thats the case or > not). That or a package that is at end-of-life and is or soon will be > unsupported entirely. > > The examples don't need to be yours, per se, if you know what those pushing > back against your linter were using messages from .onLoad for... > > Best, > ~G > > > > On Thu, Nov 4, 2021 at 12:37 PM Duncan Murdoch > wrote: >> >> On 04/11/2021 2:50 p.m., Michael Chirico via R-devel wrote: >> > I wrote a linter to stop users from using packageStartupMessage() in >> > their .onLoad() hook because of the R CMD check warning it triggers: >> > >> > https://github.com/wch/r-source/blob/8b6625e39cd62424dc23399dade37f20fa8afa91/src/library/tools/R/QC.R#L5167 >> > >> > However, this received some pushback which I ultimately agree with, >> > and moreover ?.onLoad seems to agree as well: >> > >> >> Loading a namespace should where possible be silent, with startup >> > messages given by \code{.onAttach}. These messages (**and any essential >> > ones from \code{.onLoad}**) should use \code{\link{packageStartupMessage}} >> > so they can be silenced where they would be a distraction. >> > >> > **emphasis** mine. That is, if we think some message is _essential_ to >> > print during loadNamespace(), we are told to use >> > packageStartupMessage(). >> > >> > Should we remove this R CMD check warning? >> >> The help page doesn't define what an "essential" message would be, but I >> would assume it's a message about some dire condition, not just "Hi! I >> just got loaded!". So I think a note or warning would be appropriate, >> but not an error. >> >> Do you have an example of something that should routinely print, but >> that triggers a warning when checked? >> >> Duncan Murdoch >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] .onLoad, packageStartupMessage, and R CMD check
I wrote a linter to stop users from using packageStartupMessage() in their .onLoad() hook because of the R CMD check warning it triggers: https://github.com/wch/r-source/blob/8b6625e39cd62424dc23399dade37f20fa8afa91/src/library/tools/R/QC.R#L5167 However, this received some pushback which I ultimately agree with, and moreover ?.onLoad seems to agree as well: > Loading a namespace should where possible be silent, with startup messages given by \code{.onAttach}. These messages (**and any essential ones from \code{.onLoad}**) should use \code{\link{packageStartupMessage}} so they can be silenced where they would be a distraction. **emphasis** mine. That is, if we think some message is _essential_ to print during loadNamespace(), we are told to use packageStartupMessage(). Should we remove this R CMD check warning? __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Should seq.Date() return double storage?
today <- Sys.Date() typeof(today) # [1] "double" typeof(seq(today, by=1, length.out=2)) # [1] "integer" Clearly minor as it doesn't seem to have come up before (e.g. coercion to numeric will happen automatically whenever fractional dates are needed); I only noticed because of a test using identical failing: identical( seq(today, by=1, length.out=10), today + 0:9 ) # [1] FALSE It's easy in this case to fix the test using coercion, but this could (and did in practice) come up at deeper levels of nesting that become more onerous to handle. And using all.equal() comes with its own tradeoffs. The fix would be easy enough -- at a glance there are two usages of .Date(seq.int(...)) in seq.Date() that could be replaced by .Date(as.numeric(seq.int(...))). Mike C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] translation domain is not inferred correctly from a package's print methods -- intended behavior?
Here is a reprex: # initialize reprex package cd /tmp mkdir myPkg && cd myPkg echo "Package: myPkg" > DESCRIPTION echo "Version: 0.0.1" >> DESCRIPTION mkdir R echo "print.my_class = function(x, ...) { cat(gettext(\"'%s' is deprecated.\"), '\n', gettext(\"'%s' is deprecated.\", domain='R-myPkg'), '\n') }" > R/foo.R echo "S3method(print, my_class)" > NAMESPACE # extract string for translation Rscript -e "tools::update_pkg_po('.')" # add dummy translation msginit -i po/R-myPkg.pot -o po/R-ja.po -l ja --no-translator head -n -1 po/R-ja.po > tmp && mv tmp po/R-ja.po echo 'msgstr "%s successfully translated"' >> po/R-ja.po # install .mo translations Rscript -e "tools::update_pkg_po('.')" # install package & test R CMD INSTALL . LANGUAGE=ja Rscript -e "library(myPkg); print(structure(1, class = 'my_class'))" # '%s' は廃止予定です # %s successfully translated Note that the first gettext() call, which doesn't supply domain=, returns the corresponding translation from base R (i.e., the output is the same as gettext("'%s' is deprecated.", domain="R-base")). The second gettext() call, where domain= is supplied, returns our dummy translation, which is what I would have expected from the first execution. Here is what's in ?gettext: > If domain is NULL or "", and gettext or ngettext is called from a function in > the namespace of package pkg the domain is set to "R-pkg". Otherwise there is > no default domain. Does that mean the S3 print method is not "in the namespace of myPkg"? Or is there a bug here? If the former, is the edge case of concern here just S3 methods where the "top level" S3 method is defined in another package? Can we refine the manual text wording here to be more clear about when we should expect we need to supply domain= vs have it set automatically? __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] tools::update_pkg_po() doesn't work on Solaris?
Hi all, I am about 99% sure tools::update_pkg_po() is using an invocation that is not supported by the Solaris version of msgfmt. The invocation[1]: msgfmt -c --statistics -o /path/to/file.mo /path/to/file.po AFAICT neither -c nor --statistics appear to be supported by Solaris [2]: I don't have access to a Solaris machine where I can try this, but I did manage to get r-hub to pass by disabling those options in potools which similarly relies on msgfmt to generate .mo files [3], [4]. So I wanted to check with the list if anyone can confirm this before filing a bug report. Thanks, Mike C [1] per https://github.com/wch/r-source/blob/59a1965239143ca6242b9cc948d8834e1194e84a/src/library/tools/R/translations.R#L142 [2] https://docs.oracle.com/cd/E36784_01/html/E36870/msgfmt-1.html#REFMAN1msgfmt-1 [3] https://github.com/MichaelChirico/potools/pull/219 [4] https://builder.r-hub.io/status/potools_0.2.1.tar.gz-f4e936eec1b74bebb91e07c59c23b1cf __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Should last default to .Machine$integer.max-1 for substring()
Thanks all, great points well taken. Indeed it seems the default of 100 predates SVN tracking in 1997. I think a NULL default behaving as "end of string" regardless of encoding makes sense and avoids the overheads of a $ call and a much heavier nchar() calculation. Mike C On Mon, Jun 21, 2021 at 1:32 AM Martin Maechler wrote: > > >>>>> Tomas Kalibera > >>>>> on Mon, 21 Jun 2021 10:08:37 +0200 writes: > > > On 6/21/21 9:35 AM, Martin Maechler wrote: > >>>>>>> Michael Chirico > >>>>>>> on Sun, 20 Jun 2021 15:20:26 -0700 writes: > >> > Currently, substring defaults to last=100L, which > >> > strongly suggests the intent is to default to "nchar(x)" > >> > without having to compute/allocate that up front. > >> > >> > Unfortunately, this default makes no sense for "very > >> > large" strings which may exceed 100L in "width". > >> > >> Yes; and I tend to agree with you that this default is outdated > >> (Remember : R was written to work and run on 2 (or 4?) MB of RAM on > the > >> student lab Macs in Auckland in ca 1994). > >> > >> > The max width of a string is .Machine$integer.max-1: > >> > >> (which Brodie showed was only almost true) > >> > >> > So it seems to me either .Machine$integer.max or > >> > .Machine$integer.max-1L would be a more sensible default. Am I > missing > >> > something? > >> > >> The "drawback" is of course that .Machine$integer.max is still > >> a function call (as R beginners may forget) contrary to L, > >> but that may even be inlined by the byte compiler (? how would we > check ?) > >> and even if it's not, it does more clearly convey the concept > >> and idea *and* would probably even port automatically if ever > >> integer would be increased in R. > > > We still have the problem that we need to count characters, not bytes, > > if we want the default semantics of "until the end of the string". > > > I think we would have to fix this either by really using > > "nchar(type="c"))" or by using e.g. NULL and then treating this as a > > special case, that would be probably faster. > > > Tomas > > You are right, as always, Tomas. > I agree that would be better and we should do it if/when we change > the default there. > > Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Should last default to .Machine$integer.max-1 for substring()
Currently, substring defaults to last=100L, which strongly suggests the intent is to default to "nchar(x)" without having to compute/allocate that up front. Unfortunately, this default makes no sense for "very large" strings which may exceed 100L in "width". The max width of a string is .Machine$integer.max-1: # works x = strrep(" ", .Machine$integer.max-1L) # fails x = strrep(" ", .Machine$integer.max) Error in strrep(" ", .Machine$integer.max) : 'Calloc' could not allocate memory (18446744071562067968 of 1 bytes) (see also the comment in src/main/character.c: "Character strings in R are less than 2^31-1 bytes, so we use int not size_t.") So it seems to me either .Machine$integer.max or .Machine$integer.max-1L would be a more sensible default. Am I missing something? Mike C __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] usage of #import in grDevices/src/qdCocoa.h
I happened to notice that this header file uses #import This is the first time I came across the preprocessor directive #import; the first thing I found about it is this Q&A suggesting it's not portable nor standard C: https://stackoverflow.com/q/39280248/3576984 On the other hand, this exact invocation seems pretty common on GitHub https://github.com/search?l=C&q=%22%23import+%3CCocoa%2FCocoa.h%3E%22&type=Code I don't see much in the way of relevant documentation for Cocoa besides stuff like this page from 2001: http://cocoadevcentral.com/articles/31.php Is this something that should be updated to use #include? Other packages appear to have done so, e.g. from GitHub: https://github.com/search?q=%22%23include+cocoa%2Fcocoa.h%22&type=code Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Translations and snprintf on Windows
Following up on this after Matt D truffled out the issue -- snprintf was a *bit* of a red herring. The root of the issue is the use of positional identifiers (like %1$d, see https://stackoverflow.com/questions/19327441/gcc-dollar-sign-in-printf-format-string) in the format string for translations. These identifiers are quite useful for translations since grammar changes by language mean the inputs should naturally change order when changing languages -- however, they are a POSIX extension. snprintf produced the issue since all other messages are ultimately R functions, and base R handles the issue (AFAICT via trioremap.h). On Thu, Apr 30, 2020 at 4:16 PM Michael Chirico wrote: > [a bit unsure on if this is maybe better for r-package-devel] > > We recently added translations to messages at the R and C level to > data.table. > > At the C level, we did _() wrapping for char arrays supplied to the > following functions: error, warning, Rprintf, Error, and snprintf. > > This seemed OK but the use of snprintf specifically appears to have caused > a crash on Windows: > > https://github.com/Rdatatable/data.table/issues/4402 > > Is there any guidance against using gettext with snprintf, or perhaps > guidance on which "outputters" *are* OK for translation? > > Michael Chirico > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Should 0L * NA_integer_ be 0L?
OK, so maybe one way to paraphrase: For R, the boundedness of integer vectors is an implementation detail, rather than a deeper mathematical fact that can be exploited for this case. One might also expect then that overflow wouldn't result in NA, but rather automatically cast up to numeric? But that this doesn't happen for efficiency reasons? Would it make any sense to have a different carveout for the logical case? For logical, storage as integer might be seen as a similar type of implementation detail (though if we're being this strict, the question arises of what multiplication of logical values even means). FALSE * NA = 0L On Sat, May 23, 2020 at 6:49 PM Martin Maechler wrote: > > >>>>> Michael Chirico > >>>>> on Sat, 23 May 2020 18:08:22 +0800 writes: > > > I don't see this specific case documented anywhere (I also tried to > search > > the r-devel archives, as well as I could); the only close reference > > mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this > > snippet from R-lang: > > > In cases where the result of the operation would be the same for all > >> possible values the NA could take, the operation may return this value. > >> > > > This begs the question -- shouldn't 0L * NA_integer_ be 0L? > > > Because this is an integer operation, and according to this definition > of > > NA: > > > Missing values in the statistical sense, that is, variables whose value > >> is not known, have the value @code{NA} > >> > > > NA_integer_ should be an unknown integer value between -2^31+1 and > 2^31-1. > > Multiplying any of these values by 0 results in 0 -- that is, the > result of > > the operation would be 0 for all possible values the NA could take. > > > > This came up from what seems like an inconsistency to me: > > > all(NA, FALSE) > > # [1] FALSE > > NA * FALSE > > # [1] NA > > > I agree with all(NA, FALSE) being FALSE because we know for sure that > all > > cannot be true. The same can be said of the multiplication -- whether NA > > represents TRUE or FALSE, the resulting value is 0 (FALSE). > > > I also agree with the numeric case, FWIW: NA_real_ * 0 has to be > NA_real_, > > because NA_real_ could be Inf or NaN, for both of which multiplication > by 0 > > gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be > NA_real_. > > I agree about almost everything you say above. ... > but possibly the main conclusion. > > The problem with your proposed change would be that integer > arithmetic gives a different result than the corresponding > "numeric" computation. > (I don't remember other such cases in R, at least as long as the > integer arithmetic does not overflow.) > > One principle to decided such problems in S and R has been that > the user should typically *not* have to know if their data is > stored in float/double or in integer, and the results should be the same > (possibly apart from staying integer for some operations). > > > {{Note that there are also situations were it's really > undesirable that0 * NA does *not* give 0 (but NA); > notably in sparse matrix operations where you'd very often can > now that NA was not Inf (or NaN) and you really would like to > preserve sparseness ...}} > > > > [[alternative HTML version deleted]] > > (as you did not use plain text ..) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Should 0L * NA_integer_ be 0L?
I don't see this specific case documented anywhere (I also tried to search the r-devel archives, as well as I could); the only close reference mentions NA & FALSE = FALSE, NA | TRUE = TRUE. And there's also this snippet from R-lang: In cases where the result of the operation would be the same for all > possible values the NA could take, the operation may return this value. > This begs the question -- shouldn't 0L * NA_integer_ be 0L? Because this is an integer operation, and according to this definition of NA: Missing values in the statistical sense, that is, variables whose value > is not known, have the value @code{NA} > NA_integer_ should be an unknown integer value between -2^31+1 and 2^31-1. Multiplying any of these values by 0 results in 0 -- that is, the result of the operation would be 0 for all possible values the NA could take. This came up from what seems like an inconsistency to me: all(NA, FALSE) # [1] FALSE NA * FALSE # [1] NA I agree with all(NA, FALSE) being FALSE because we know for sure that all cannot be true. The same can be said of the multiplication -- whether NA represents TRUE or FALSE, the resulting value is 0 (FALSE). I also agree with the numeric case, FWIW: NA_real_ * 0 has to be NA_real_, because NA_real_ could be Inf or NaN, for both of which multiplication by 0 gives NaN, hence 0 * NA_real_ is either 0 or NaN, hence it must be NA_real_. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Translations and snprintf on Windows
[a bit unsure on if this is maybe better for r-package-devel] We recently added translations to messages at the R and C level to data.table. At the C level, we did _() wrapping for char arrays supplied to the following functions: error, warning, Rprintf, Error, and snprintf. This seemed OK but the use of snprintf specifically appears to have caused a crash on Windows: https://github.com/Rdatatable/data.table/issues/4402 Is there any guidance against using gettext with snprintf, or perhaps guidance on which "outputters" *are* OK for translation? Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?
Awesome. Gabe, since you already have a workshopped version, would you like to proceed? Feel free to ping me to review the patch once it's posted. On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler wrote: > >>>>> Michael Chirico > >>>>> on Sun, 15 Sep 2019 20:52:34 +0800 writes: > > > Finally read in detail your response Gabe. Looks great, > > and I agree it's quite intuitive, as well as agree against > > non-recycling. > > > Once the length(n) == length(dim(x)) behavior is enabled, > > I don't think there's any need/desire to have head() do > > x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for > > those familiar with head(x, 6), it would seem to me. > > > Mike C > > Thank you, Gabe, and Michael. > I did like Gabe's proposal already back in July but was > busy and/or vacationing then ... > > If you submit this with a patch (that includes changes to both > *.R and *.Rd , including some example) as "wishlist" item to R's > bugzilla, I'm willing/happy to check and commit this to R-devel. > > Martin > > > > On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker > > wrote: > > >> Hi Michael and Abby, > >> > >> So one thing that could happen that would be backwards > >> compatible (with the exception of something that was an > >> error no longer being an error) is head and tail could > >> take vectors of length (dim(x)) rather than integers of > >> length for n, with the default being n=6 being equivalent > >> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for > >> the deprecation cycle, if not permanently. It not > >> recycling would be unexpected based on the behavior of > >> many R functions but would preserve the current behavior > >> while granting more fine-grained control to users that > >> feel they need it. > >> > >> A rapidly thrown-together prototype of such a method for > >> the head of a matrix case is as follows: > >> > >> head2 = function(x, n = 6L, ...) { indvecs = > >> lapply(seq_along(dim(x)), function(i) { if(length(n) >= > >> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni = > >> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i]) > >> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE) > >> do.call("[", lstargs) } > >> > >> > >> > mat = matrix(1:100, 10, 10) > >> > >> > *head(mat)* > >> > >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > >> > >> [1,] 1 11 21 31 41 51 61 71 81 91 > >> > >> [2,] 2 12 22 32 42 52 62 72 82 92 > >> > >> [3,] 3 13 23 33 43 53 63 73 83 93 > >> > >> [4,] 4 14 24 34 44 54 64 74 84 94 > >> > >> [5,] 5 15 25 35 45 55 65 75 85 95 > >> > >> [6,] 6 16 26 36 46 56 66 76 86 96 > >> > >> > *head2(mat)* > >> > >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > >> > >> [1,] 1 11 21 31 41 51 61 71 81 91 > >> > >> [2,] 2 12 22 32 42 52 62 72 82 92 > >> > >> [3,] 3 13 23 33 43 53 63 73 83 93 > >> > >> [4,] 4 14 24 34 44 54 64 74 84 94 > >> > >> [5,] 5 15 25 35 45 55 65 75 85 95 > >> > >> [6,] 6 16 26 36 46 56 66 76 86 96 > >> > >> > *head2(mat, c(2, 3))* > >> > >> [,1] [,2] [,3] > >> > >> [1,] 1 11 21 > >> > >> [2,] 2 12 22 > >> > >> > *head2(mat, c(2, -9))* > >> > >> [,1] > >> > >> [1,] 1 > >> > >> [2,] 2 > >> > >> > >> Now one thing to keep in mind here, is that I think we'd > >> either a) have to make the non-recycling behavior > >> permanent, or b) have head treat data.frames and matrices > >> different with respect to the subsets they grab (which > >> strikes me as a *Bad Plan *(tm)). > >> > >> So I don't think the default behavior would ever be > >> mat[1:6, 1:6], not because of backwards compatibility, > >> but because at least in my intuition that is just not > >> what he
Re: [Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?
Finally read in detail your response Gabe. Looks great, and I agree it's quite intuitive, as well as agree against non-recycling. Once the length(n) == length(dim(x)) behavior is enabled, I don't think there's any need/desire to have head() do x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for those familiar with head(x, 6), it would seem to me. Mike C On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker wrote: > Hi Michael and Abby, > > So one thing that could happen that would be backwards compatible (with > the exception of something that was an error no longer being an error) is > head and tail could take vectors of length (dim(x)) rather than integers of > length for n, with the default being n=6 being equivalent to n = c(6, > dim(x)[2], <...>, dim(x)[k]), at least for the deprecation cycle, if not > permanently. It not recycling would be unexpected based on the behavior of > many R functions but would preserve the current behavior while granting > more fine-grained control to users that feel they need it. > > A rapidly thrown-together prototype of such a method for the head of a > matrix case is as follows: > > head2 = function(x, n = 6L, ...) { > indvecs = lapply(seq_along(dim(x)), function(i) { > if(length(n) >= i) { > ni = n[i] > } else { > ni = dim(x)[i] > } > if(ni < 0L) > ni = max(nrow(x) + ni, 0L) > else > ni = min(ni, dim(x)[i]) > seq_len(ni) > }) > lstargs = c(list(x),indvecs, drop = FALSE) > do.call("[", lstargs) > } > > > > mat = matrix(1:100, 10, 10) > > > *head(mat)* > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > > [1,]1 11 21 31 41 51 61 71 8191 > > [2,]2 12 22 32 42 52 62 72 8292 > > [3,]3 13 23 33 43 53 63 73 8393 > > [4,]4 14 24 34 44 54 64 74 8494 > > [5,]5 15 25 35 45 55 65 75 8595 > > [6,]6 16 26 36 46 56 66 76 8696 > > > *head2(mat)* > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > > [1,]1 11 21 31 41 51 61 71 8191 > > [2,]2 12 22 32 42 52 62 72 8292 > > [3,]3 13 23 33 43 53 63 73 8393 > > [4,]4 14 24 34 44 54 64 74 8494 > > [5,]5 15 25 35 45 55 65 75 8595 > > [6,]6 16 26 36 46 56 66 76 8696 > > > *head2(mat, c(2, 3))* > > [,1] [,2] [,3] > > [1,]1 11 21 > > [2,]2 12 22 > > > *head2(mat, c(2, -9))* > > [,1] > > [1,]1 > > [2,]2 > > > Now one thing to keep in mind here, is that I think we'd either a) have > to make the non-recycling behavior permanent, or b) have head treat > data.frames and matrices different with respect to the subsets they grab > (which strikes me as a *Bad Plan *(tm)). > > So I don't think the default behavior would ever be mat[1:6, 1:6], not > because of backwards compatibility, but because at least in my intuition > that is just not what head on a data.frame should do by default, and I > think the behaviors for the basic rectangular datatypes should "stick > together". I mean, also because of backwards compatibility, but that could > *in > theory* change across a long enough deprecation cycle, but the > conceptually right thing to do with a data.frame probably won't. > > All of that said, is head(mat, c(6, 6)) really that much easier to > type/better than just mat[1:6, 1:6, drop=FALSE] (I know this will behave > differently if any of the dims of mat are less than 6, but if so why are > you heading it in the first place ;) )? I don't really have a strong > feeling on the answer to that. > > I'm happy to put a patch for head.matrix, head.data.frame, tail.matrix and > tail.data.frame, plus documentation, if people on R-core are interested in > this. > > Note, as most here probably know, and as alluded to above, length(n) > 1 > for head or tail currently give an error, so this would be an extension > of the existing functionality in the mathematical extension sense, where > all existing behavior would remain identical, but the support/valid > parameter space would grow. > > Best, > ~G > > > On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle wrote: > >> > I assume there are lots of backwards-compatibility issues as well as >> valid >> > use cases for this behavior, so I guess defaulting to M[1:6, 1:6] is out >> of >> > the question. >> >> Agree. >> >> > Is there any scope for adding a new argument to head.matrix that would >> > allow this flexibility? >> >> I agree with what you're trying to achieve. >> However, I'm not sure this is as simple as you're suggesting. >> >> What if the user wants "head" in rows but "tail" in columns. >> Or "head" in rows, and both "head" and "tail" in columns. >> With head and tail alone, there's a combinatorial explosion. >> >> Also, when using tail on an unnamed matrix,
[Rd] --disable-long-double or --enable-long-double=no?
There's a bit of confusion about how to disable long double support in an R build. I see --disable-long-double scattered about, e.g. - R-exts: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Writing-portable-packages - R-admin: https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Solaris - CRAN noLD check description: https://www.stats.ox.ac.uk/pub/bdr/noLD/README.txt - ?capabilities: https://stat.ethz.ch/R-manual/R-devel/library/base/html/capabilities.html However, it's *missing* from ./config (cd r-source && grep "disable-long-double" configure). Instead there appears to be some code built around enable-long-double: ./configure:1808: --enable-long-doubleuse long double type [yes] ./configure:24723:# Check whether --enable-long-double was given. I see the option apparently introduced here in 2012 & the ambiguity is immediate -- the commit mentions disable-long-double but builds enable-long-double. https://github.com/wch/r-source/commit/fb8e36f8be0aaf47a9c54c9effb219dae34f0e41 Could someone please help to clear the confusion? Thanks Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Why no tz argument for format.POSIXlt?
Was a bit surprised to see oldtz = Sys.getenv('TZ') Sys.setenv(TZ = 'Asia/Jakarta') format(Sys.time()) # [1] "2019-08-13 16:05:03" format(Sys.time(), tz = 'UTC') # all is well # [1] "2019-08-13 09:05:03" format(trunc(Sys.time(), 'hours')) # correctly truncated in local time # [1] "2019-08-13 16:00:00" format(trunc(Sys.time(), 'hours'), tz = 'UTC') # no effect! [1] "2019-08-13 16:00:00" Sys.setenv(TZ = oldtz) The reason for the discrepancy is that trunc.POSIXt returns a POSIXlt object (not POSIXct), whereas Sys.time() is POSIXct. And while format.POSIXct has a tz argument, format.POSIXlt does not: names(formals(format.POSIXct)) # [1] "x" "format" "tz" "usetz" "..." names(formals(format.POSIXlt)) # [1] "x" "format" "usetz" "..." Is there any reason not to accept a tz argument for format.POSIXlt? It's quite convenient to be able to specify an output timezone format on the fly with format.POSIXct; in the case at hand, I'm trying to force UTC time on input. format(as.POSIXct(x), tz = 'UTC') seems to work just fine, is there a reason why this wouldn't be done internally? Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bug: write.dcf converts hyphen in field name to period
write.dcf(list('my-field' = 1L), tmp <- tempfile()) cat(readLines(tmp)) # my.field: 1 However there's nothing wrong with hyphenated fields per the Debian standard: https://www.debian.org/doc/debian-policy/ch-controlfields.html And in fact we see them using hyphenated fields there, and indeed read.dcf handles this just fine: writeLines(gsub('.', '-', readLines(tmp), fixed = TRUE), tmp) read.dcf(tmp) # my-field # [1,] "1" The guilty line is as.data.frame: if(!is.data.frame(x)) x <- as.data.frame(x, stringsAsFactors = FALSE) For my case, simply adding check.names=FALSE to this call would solve the issue in my case, but I think not in general. Here's what I see in the standard: > The field name is composed of US-ASCII characters excluding control characters, space, and colon (i.e., characters in the ranges U+0021 (!) through U+0039 (9), and U+003B (;) through U+007E (~), inclusive). Field names must not begin with the comment character (U+0023 #), nor with the hyphen character (U+002D -). This could be handled by an adjustment to the next line: nmx <- names(x) becomes nmx <- gsub('^[#-]', '', gsub('[^\U{0021}-\U{0039}\U{003B}-\U{007E}]', '.', names(x))) (Or maybe errors for having invalid names) Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?
I think of head() as a standard helper for "glancing" at objects, so I'm sometimes surprised that head() produces massive output: M = matrix(nrow = 10L, ncol = 10L) print(head(M)) # <- beware, could be a huge print I assume there are lots of backwards-compatibility issues as well as valid use cases for this behavior, so I guess defaulting to M[1:6, 1:6] is out of the question. Is there any scope for adding a new argument to head.matrix that would allow this flexibility? IINM it should essentially be as simple to do head.array as: do.call(`[`, c(list(x, drop = FALSE), lapply(pmin(dim(x), n), seq_len))) (with extra decoration to handle -n, etc) [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] rbind has confusing result for custom sub-class (possible bug?)
Thanks for following up! In fact that's exactly what was done here: https://github.com/Rdatatable/data.table/pull/3602/files On Sun, Jun 2, 2019 at 8:00 PM Joshua Ulrich wrote: > I thought it would be good to summarize my thoughts, since I made a > few hypotheses that turned out to be false. > > This isn't a bug in base R, in either rbind() or `[<-.Date`. > > To summarize the root cause: > base::rbind.data.frame() calls `[<-` for each column of the > data.frame, and there is no `[<-.IDate` method to ensure the > replacement value is converted to integer. And, in fact, `[<-.Date` > calls as.Date() and data.table::as.Date.IDate() calls as.numeric() on > the IDate object. So the problem exists, and can be fixed, in > data.table. > > Best, > Josh > > On Mon, May 27, 2019 at 9:34 AM Joshua Ulrich > wrote: > > > > Follow-up (inline) on my comment about a potential issue in `[<-.Date`. > > > > On Mon, May 27, 2019 at 9:31 AM Michael Chirico > > wrote: > > > > > > Yes, thanks for following up on thread here. And thanks again for > clearing things up, your email was a finger snap of clarity on the whole > issue. > > > > > > I'll add that actually it was data.table's code at fault on the > storage conversion -- note that if you use an arbitrary sub-class 'foo' > with no methods defined, it'll stay integer. > > > > > > That's because [<- calls as.Date and then as.Date.IDate, and that > method (ours) has as.numeric(); earlier I had recognized that if we > commented that line, the issue was "fixed" but I still wasn't understanding > the root cause. > > > > > > My last curiosity on this issue will be in my follow-up thread. > > > > > > Mike C > > > > > > On Mon, May 27, 2019, 10:25 PM Joshua Ulrich > wrote: > > >> > > >> On Sun, May 26, 2019 at 6:47 AM Joshua Ulrich < > josh.m.ulr...@gmail.com> wrote: > > >> > > > >> > On Sun, May 26, 2019 at 4:06 AM Michael Chirico > > >> > wrote: > > >> > > > > >> > > Have finally managed to come up with a fix after checking out > sys.calls() > > >> > > from within the as.Date.IDate debugger, which shows something > like: > > >> > > > > >> > > [[1]] rbind(DF, DF) > > >> > > [[2]] rbind(deparse.level, ...) > > >> > > [[3]] `[<-`(`*tmp*`, ri, value = 18042L) > > >> > > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) > > >> > > [[5]] as.Date(value) > > >> > > [[6]] as.Date.IDate(value) > > >> > > > > >> > > I'm not sure why [<- is called, I guess the implementation is to > assign to > > >> > > the output block by block? Anyway, we didn't have a [<- method. > And > > >> > > [<-.Date looks like: > > >> > > > > >> > > value <- unclass(as.Date(value)) # <- converts to double > > >> > > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' > class > > >> > > > > >> > > So we can fix our bug by defining a [<- class; the question that > I still > > >> > > don't see answered in documentation or source code is, why/where > is [<- > > >> > > called, exactly? > > >> > > > > >> > Your rbind(DF, DF) call dispatches to base::rbind.data.frame(). The > > >> > `[<-` call is this line: > > >> > value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij > > >> > > > >> > That's where the storage.mode changes from integer to double. > > >> > > > >> > debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else > xij > > >> > Browse[2]> > > >> > debug: xij > > >> > Browse[2]> storage.mode(xij) > > >> > [1] "integer" > > >> > Browse[2]> value[[jj]][ri] > > >> > [1] "2019-05-26" > > >> > Browse[2]> storage.mode(value[[jj]][ri]) > > >> > [1] "integer" > > >> > Browse[2]> > > >> > debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm > > >> > Browse[2]> storage.mode(value[[jj]][ri]) > > >> > [1] "double" > > >> > > > >> To be clear, I don't think this is a
[Rd] Why is R in Japanese (only in Mac terminal)?
Since a while ago, R on my Mac terminal is being started in Japanese: R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin15.6.0 (64-bit) R は、自由なソフトウェアであり、「完全に無保証」です。 一定の条件に従えば、自由にこれを再配布することができます。 配布条件の詳細に関しては、'license()' あるいは 'licence()' と入力してください。 Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. 'demo()' と入力すればデモをみることができます。 'help()' とすればオンラインヘルプが出ます。 'help.start()' で HTML ブラウザによるヘルプがみられます。 'q()' と入力すれば R を終了します。 I never gave it too much mind since I understand Japanese and am mostly working in RStudio anyway (RStudio is in English). But I found a "bug" in testthat's is_english (which tests whether the current session is reporting base messages in English) and reported here: https://github.com/r-lib/testthat/issues/879 I say "bug" because as near as I can tell is_english is built assuming the logic laid out in ?gettext, ?locales. So even though my machine appears to have none of the "symptoms" of a non-English locale, nevertheless I get Japanese. My session info: R version 3.5.2 (2018-12-20) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.6 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] C/UTF-8/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.2 My Sys.getenv() and "Languages & Region" settings are in the issue link. Where else should I be looking in my R session or terminal to figure out why it's in Japanese? Mike C [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] rbind has confusing result for custom sub-class (possible bug?)
Yes, thanks for following up on thread here. And thanks again for clearing things up, your email was a finger snap of clarity on the whole issue. I'll add that actually it was data.table's code at fault on the storage conversion -- note that if you use an arbitrary sub-class 'foo' with no methods defined, it'll stay integer. That's because [<- calls as.Date and then as.Date.IDate, and that method (ours) has as.numeric(); earlier I had recognized that if we commented that line, the issue was "fixed" but I still wasn't understanding the root cause. My last curiosity on this issue will be in my follow-up thread. Mike C On Mon, May 27, 2019, 10:25 PM Joshua Ulrich wrote: > On Sun, May 26, 2019 at 6:47 AM Joshua Ulrich > wrote: > > > > On Sun, May 26, 2019 at 4:06 AM Michael Chirico > > wrote: > > > > > > Have finally managed to come up with a fix after checking out > sys.calls() > > > from within the as.Date.IDate debugger, which shows something like: > > > > > > [[1]] rbind(DF, DF) > > > [[2]] rbind(deparse.level, ...) > > > [[3]] `[<-`(`*tmp*`, ri, value = 18042L) > > > [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) > > > [[5]] as.Date(value) > > > [[6]] as.Date.IDate(value) > > > > > > I'm not sure why [<- is called, I guess the implementation is to > assign to > > > the output block by block? Anyway, we didn't have a [<- method. And > > > [<-.Date looks like: > > > > > > value <- unclass(as.Date(value)) # <- converts to double > > > .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class > > > > > > So we can fix our bug by defining a [<- class; the question that I > still > > > don't see answered in documentation or source code is, why/where is [<- > > > called, exactly? > > > > > Your rbind(DF, DF) call dispatches to base::rbind.data.frame(). The > > `[<-` call is this line: > > value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij > > > > That's where the storage.mode changes from integer to double. > > > > debug: value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij > > Browse[2]> > > debug: xij > > Browse[2]> storage.mode(xij) > > [1] "integer" > > Browse[2]> value[[jj]][ri] > > [1] "2019-05-26" > > Browse[2]> storage.mode(value[[jj]][ri]) > > [1] "integer" > > Browse[2]> > > debug: if (!is.null(nm <- names(xij))) names(value[[jj]])[ri] <- nm > > Browse[2]> storage.mode(value[[jj]][ri]) > > [1] "double" > > > To be clear, I don't think this is a bug in rbind() or > rbind.data.frame(). The confusion is that rbind.data.frame() calls > `[<-` for each column of the data.frame, and there is no `[<-.IDate` > method. So the parent class method is dispatched, which converts the > storage mode to double. > > Someone may argue that this is an issue with `[<-.Date`, and that it > shouldn't convert the storage.mode from integer to double. > > > > > Mike C > > > > > > On Sun, May 26, 2019 at 1:16 PM Michael Chirico < > michaelchiri...@gmail.com> > > > wrote: > > > > > > > Debugging this issue: > > > > > > > > https://github.com/Rdatatable/data.table/issues/2008 > > > > > > > > We have custom class 'IDate' which inherits from 'Date' (it just > forces > > > > integer storage for efficiency, hence, I). > > > > > > > > The concatenation done by rbind, however, breaks this and returns a > double: > > > > > > > > library(data.table) > > > > DF = data.frame(date = as.IDate(Sys.Date())) > > > > storage.mode(rbind(DF, DF)$date) > > > > # [1] "double" > > > > > > > > This is specific to base::rbind (data.table's rbind returns an > integer as > > > > expected); in ?rbind we see: > > > > > > > > The method dispatching is not done via UseMethod(), but by C-internal > > > > dispatching. Therefore there is no need for, e.g., rbind.default. > > > > The dispatch algorithm is described in the source file > > > > (‘.../src/main/bind.c’) as > > > > 1. For each argument we get the list of possible class memberships > from > > > > the class attribute. > > > > 2. *We inspect each class in turn to see if there is an applic
Re: [Rd] rbind has confusing result for custom sub-class (possible bug?)
Have finally managed to come up with a fix after checking out sys.calls() from within the as.Date.IDate debugger, which shows something like: [[1]] rbind(DF, DF) [[2]] rbind(deparse.level, ...) [[3]] `[<-`(`*tmp*`, ri, value = 18042L) [[4]] `[<-.Date`(`*tmp*`, ri, value = 18042L) [[5]] as.Date(value) [[6]] as.Date.IDate(value) I'm not sure why [<- is called, I guess the implementation is to assign to the output block by block? Anyway, we didn't have a [<- method. And [<-.Date looks like: value <- unclass(as.Date(value)) # <- converts to double .Date(NextMethod(.Generic), oldClass(x)) # <- restores 'IDate' class So we can fix our bug by defining a [<- class; the question that I still don't see answered in documentation or source code is, why/where is [<- called, exactly? Mike C On Sun, May 26, 2019 at 1:16 PM Michael Chirico wrote: > Debugging this issue: > > https://github.com/Rdatatable/data.table/issues/2008 > > We have custom class 'IDate' which inherits from 'Date' (it just forces > integer storage for efficiency, hence, I). > > The concatenation done by rbind, however, breaks this and returns a double: > > library(data.table) > DF = data.frame(date = as.IDate(Sys.Date())) > storage.mode(rbind(DF, DF)$date) > # [1] "double" > > This is specific to base::rbind (data.table's rbind returns an integer as > expected); in ?rbind we see: > > The method dispatching is not done via UseMethod(), but by C-internal > dispatching. Therefore there is no need for, e.g., rbind.default. > The dispatch algorithm is described in the source file > (‘.../src/main/bind.c’) as > 1. For each argument we get the list of possible class memberships from > the class attribute. > 2. *We inspect each class in turn to see if there is an applicable > method.* > 3. If we find an applicable method we make sure that it is identical to > any method determined for prior arguments. If it is identical, we proceed, > otherwise we immediately drop through to the default code. > > It's not clear what #2 means -- an applicable method *for what*? Glancing > at the source code would suggest it's looking for rbind.IDate: > > https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 > > const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should > be rbind here > const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over > the classes, should get to IDate first > sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate > > but adding this method (or even exporting it) is no help [ simply defining > rbind.IDate = function(...) as.IDate(NextMethod()) ] > > Lastly, it appears that as.Date.IDate is called, which is causing the type > conversion: > > debug(data.table:::as.Date.IDate) > rbind(DF, DF) # launches debugger > x > # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not > c(DF$date, DF$date) > undebug(data.table:::as.Date.IDate) > > I can't really wrap my head around why as.Date is being called here, and > even allowing that, why the end result is still the original class [ > class(rbind(DF, DF)$date) == c('IDate', 'Date') ] > > So, I'm beginning to think this might be a bug. Am I missing something? > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] rbind has confusing result for custom sub-class (possible bug?)
Debugging this issue: https://github.com/Rdatatable/data.table/issues/2008 We have custom class 'IDate' which inherits from 'Date' (it just forces integer storage for efficiency, hence, I). The concatenation done by rbind, however, breaks this and returns a double: library(data.table) DF = data.frame(date = as.IDate(Sys.Date())) storage.mode(rbind(DF, DF)$date) # [1] "double" This is specific to base::rbind (data.table's rbind returns an integer as expected); in ?rbind we see: The method dispatching is not done via UseMethod(), but by C-internal dispatching. Therefore there is no need for, e.g., rbind.default. The dispatch algorithm is described in the source file (‘.../src/main/bind.c’) as 1. For each argument we get the list of possible class memberships from the class attribute. 2. *We inspect each class in turn to see if there is an applicable method.* 3. If we find an applicable method we make sure that it is identical to any method determined for prior arguments. If it is identical, we proceed, otherwise we immediately drop through to the default code. It's not clear what #2 means -- an applicable method *for what*? Glancing at the source code would suggest it's looking for rbind.IDate: https://github.com/wch/r-source/blob/trunk/src/main/bind.c#L1051-L1063 const char *generic = ((PRIMVAL(op) == 1) ? "cbind" : "rbind"); // should be rbind here const char *s = translateChar(STRING_ELT(classlist, i)); // iterating over the classes, should get to IDate first sprintf(buf, "%s.%s", generic, s); // should be rbind.IDate but adding this method (or even exporting it) is no help [ simply defining rbind.IDate = function(...) as.IDate(NextMethod()) ] Lastly, it appears that as.Date.IDate is called, which is causing the type conversion: debug(data.table:::as.Date.IDate) rbind(DF, DF) # launches debugger x # [1] "2019-05-26" <-- singleton, so apparently applied to DF$date, not c(DF$date, DF$date) undebug(data.table:::as.Date.IDate) I can't really wrap my head around why as.Date is being called here, and even allowing that, why the end result is still the original class [ class(rbind(DF, DF)$date) == c('IDate', 'Date') ] So, I'm beginning to think this might be a bug. Am I missing something? [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] \dots used improperly in ?Rprof examples
\dots is used in the Usage section of the Rprof manual, but it's not rendered as ... I'm not sure if this should be \ldots, or just written manually with ... Also, I think the Rprof() on the first line is intended to be on the second line? So that the flow looks like Rprof() # start profiling ## some code to be profiled Rprof(NULL) # shut off profiling ## some code NOT to be profiled Rprof(append = TRUE) # turn profiling back on and append output to current file ## some code to be profiled Rprof(NULL) ## ... et cetera ## Now post-process the output as described in Details As it is the first line looks like it's commented out Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] strtoi output of empty string inconsistent across platforms
Thanks Martin. For what it's worth, this extremely representative, highly scientific Twitter poll suggests the Mac/Linux split is pretty stark (NA on Mac, 0 on Linux) https://twitter.com/michael_chirico/status/1083649190117306369?s=17 On Sat, Jan 12, 2019, 2:00 AM Martin Maechler >>>>> Martin Maechler > >>>>> on Fri, 11 Jan 2019 09:44:14 +0100 writes: > > >>>>> Michael Chirico > >>>>> on Fri, 11 Jan 2019 14:36:17 +0800 writes: > > >> Identified as root cause of a bug in data.table: > >> https://github.com/Rdatatable/data.table/issues/3267 > > >> On my machine, strtoi("", base = 2L) produces NA_integer_ > >> (which seems consistent with ?strtoi: "Values which > >> cannot be interpreted as integers or would overflow are > >> returned as NA_integer_"). > > > indeed consistent with R's documentation on strtoi(). > > What machine would that be? > > >> But on all the other machines I've seen, 0L is > >> returned. This seems to be consistent with the output of > >> a simple C program using the underlying strtol function > >> (see data.table link for this program, and for full > >> sessionInfo() of some environments with differing > >> output). > > >> So, what is the correct output of strtoi("", base = 2L)? > > >> Is the cross-platform inconsistency to be > >> expected/documentable? > > > The inconsistency is certainly undesirable. The relevant > > utility function in R's source (/src/main/character.c) > > is > > > static int strtoi(SEXP s, int base) { long int res; char > > *endp; > > > /* strtol might return extreme values on error */ > > errno = 0; > > > if(s == NA_STRING) return(NA_INTEGER); res = > > strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || > > *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res < > > INT_MIN) res = NA_INTEGER; return (int) res; } > > > and so it clearly is a platform-inconsistency in the > > underlying C library's strtol(). > > (corrected typos here: ) > > > I think we should make this cross-platform consistent ... > > and indeed it makes much sense to ensure the result of > > > strtoi("", base=2L)to become NA_integer_ > > > but chances are that would break code that has relied on > > the current behavior {on "all but your computer" ;-)} ? > > I still think that such a change should be done. > > 'make check all' on the R source (+ Recommended packages) seems > not to signal any error or warning with such a change, so I plan > to commit that change to "the trunk" / "R-devel" soon, unless > concerns are raised highly (and quickly enough). > > Martin > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] strtoi output of empty string inconsistent across platforms
Identified as root cause of a bug in data.table: https://github.com/Rdatatable/data.table/issues/3267 On my machine, strtoi("", base = 2L) produces NA_integer_ (which seems consistent with ?strtoi: "Values which cannot be interpreted as integers or would overflow are returned as NA_integer_"). But on all the other machines I've seen, 0L is returned. This seems to be consistent with the output of a simple C program using the underlying strtol function (see data.table link for this program, and for full sessionInfo() of some environments with differing output). So, what is the correct output of strtoi("", base = 2L)? Is the cross-platform inconsistency to be expected/documentable? Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Dead link in documentation of ?timezones
Indeed! Sorry, I need more sleep, should have known better. Thanks! On Fri, Dec 7, 2018 at 6:22 PM Martin Maechler wrote: > >>>>> Michael Chirico > >>>>> on Fri, 7 Dec 2018 10:36:37 +0800 writes: > > > This link is referenced in ?timezones and appears to have been > > moved/removed. Is there a replacement? > > > http://www.twinsun.com/tz/tz-link.htm > > Yes, already in the sources (*) of R at > >https://svn.r-project.org/R/trunk/src/library/base/man/timezones.Rd > > We (Kurt \in {R-core}) do regularly (but not daily!) check all our URLs > --- as they are also checked for all CRAN packages -- and so > found and fixed the problems there. > > So, (in the future) you can look into the development sources to > see if a URL problem has already been addressed. > > Still, of course "thank you!" for noticing and caring about it! > > Best, > Martin > > > -- > *) the only official source, everything else is a mirror > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Dead link in documentation of ?timezones
This link is referenced in ?timezones and appears to have been moved/removed. Is there a replacement? http://www.twinsun.com/tz/tz-link.htm [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Error message truncation
Help pages for stop/warning reference the option "warning.length", e.g. from ?stop: Errors will be truncated to getOption("warning.length") characters, default > 1000. Essentially the same is in ?warning. Neither of these mention the hard-coded limits on the acceptable values of this option in options.c <https://github.com/wch/r-source/blob/a7356bf91b511287aacd3a992abfbcb75b60d93c/src/main/options.c#L546-L552> : if (streql(CHAR(namei), "warning.length")) { int k = asInteger(argi); if (k < 100 || k > 8170) error(_("invalid value for '%s'"), CHAR(namei)); R_WarnLength = k; SET_VECTOR_ELT(value, i, SetOption(tag, argi)); } Further, it appears there's a physical limit on the length of the error message itself which is only slightly larger than 8170: set.seed(1023) NN = 1L str = paste(sample(letters, NN, TRUE), collapse = '') # should of course be 1 tryCatch(stop(str), error = function(e) nchar(e$message)) # [1] 8190 My questions are: - Can we add some information to the help pages indicating valid values of options('warning.length')? - Is there any way to increase the limit on error message length? I understand having such a limit is safer than potentially crashing a system that wants to print a massive error string. This came up in relation to this SO Q&A: https://stackoverflow.com/a/50387968/3576984 The user is submitting a database query; the error message will first reproduce the entirety of the query and then give some diagnostic information. Queries can get quite long, so it stands to reason that this 8190-length limit might be binding. Thanks, Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] model.frame strips class as promised, but fails to strip OBJECT in C
Full thread here: https://github.com/tidyverse/broom/issues/287 Reproducible example: is.object(freeny$y) # [1] TRUE attr(freeny$y, 'class') # [1] "ts" class(freeny$y) # [1] "ts" # ts attribute wiped by model.frame class(model.frame(y ~ ., data = freeny)$y) # [1] "numeric" attr(model.frame(y ~ ., data = freeny)$y, 'class') # NULL # but still: is.object(model.frame(y ~ ., data = freeny)$y) # [1] TRUE That is, given a numeric vector with class "ts", model.frame strips the "ts" attribute but keeps the is.object property. This behavior is alluded to in ?model.frame: Unless na.action = NULL, time-series attributes will be removed from the > variables found (since they will be wrong if NAs are removed). > And in fact explicitly setting na.action = NULL prevents dropping the class: class(model.frame(y ~ ., data = freeny, na.action = NULL)$y) # [1] "ts" The reason this looks especially like a bug is that it differs from how na.omit behaves: DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA)) is.object(DF$y) # [1] FALSE class(DF$y) = 'foo' is.object(DF$y) # [1] TRUE class(na.omit(DF)$y) # [1] "numeric" is.object(na.omit(DF)$y) # [1] FALSE That is, similarly presented with a classed object, na.omit strips the class *and* the OBJECT attribute. Thanks, Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] scale.default gives an incorrect error message when is.numeric() fails on a dgeMatrix
thanks. I know the setup code is a mess, just duct-taped something together from the examples in lars (which are a mess in turn). in fact when I messaged Prof. Hastie he recommended using glmnet. I wonder why lars is kept on CRAN if they've no intention of maintaining it... but I digress... On Mar 2, 2018 1:52 AM, "Martin Maechler" wrote: > >>>>> Michael Chirico > >>>>> on Tue, 27 Feb 2018 20:18:34 +0800 writes: > > Slightly amended 'Subject': (unimportant mistake: a dgeMatrix is *not* > sparse) > > MM: modified to commented R code, slightly changed from your post: > > > ## I am attempting to use the lars package with a sparse input feature > matrix, > ## but the following fails: > > library(Matrix) > library(lars) > data(diabetes) # from 'lars' > ##UAagghh! not like this -- both attach() *and* as.data.frame() are > horrific! > ##UA attach(diabetes) > ##UA x = as(as.matrix(as.data.frame(x)), 'dgCMatrix') > x <- as(unclass(diabetes$x), "dgCMatrix") > lars(x, y, intercept = FALSE) > ## Error in scale.default(x, FALSE, normx) : > ## length of 'scale' must equal the number of columns of 'x' > > ## More specifically, scale.default fails as called from lars(): > normx <- new("dgeMatrix", > x = c(4, 0, 9, 1, 1, -1, 4, -2, 6, 6)*1e-14, Dim = c(1L, 10L), > Dimnames = list(NULL, > c("x.age", "x.sex", "x.bmi", "x.map", "x.tc", > "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu"))) > scale.default(x, center=FALSE, scale = normx) > ## Error in scale.default(x, center = FALSE, scale = normx) : > ## length of 'scale' must equal the number of columns of 'x' > > > The problem is that this check fails because is.numeric(normx) is FALSE: > > > if (is.numeric(scale) && length(scale) == nc) > > > So, the error message is misleading. In fact length(scale) is the same > as > > nc. > > Correct, twice. > > > At a minimum, the error message needs to be repaired; do we also want to > > attempt as.numeric(normx) (which I believe would have allowed scale to > work > > in this case)? > > It seems sensible to allow both 'center' and 'scale' to only > have to *obey* as.numeric(.) rather than fulfill is.numeric(.). > > Though that is not a bug in scale() as its help page has always > said that 'center' and 'scale' should either be a logical value > or a numeric vector. > > For that reason I can really claim a bug in 'lars' which should > really not use > >scale(x, FALSE, normx) > > but rather > >scale(x, FALSE, scale = as.numeric(normx)) > > and then all would work. > > > - > > > (I'm aware that there's some import issues in lars, as the offending > line > > to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) > %*% > > (x^2 is TRUE -- it's simply that lars doesn't import the > appropriate S4 > > methods) > > > Michael Chirico > > Yes, 'lars' has _not_ been updated since Spring 2013, notably > because its authors have been saying (for rather more than 5 > years I think) that one should really use > > require("glmnet") > > instead. > > Your point is still valid that it would be easy to enhance > base :: scale.default() so it'd work in more cases. > > Thank you for that. I do plan to consider such a change in > R-devel (planned to become R 3.5.0 in April). > > Martin Maechler, > ETH Zurich > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] scale.default gives an incorrect error message when is.numeric() fails on a sparse row matrix (dgeMatrix)
I am attempting to use the lars package with a sparse input feature matrix, but the following fails: library(Matrix) library(lars) data(diabetes) attach(diabetes) x = as(as.matrix(as.data.frame(x)), 'dgCMatrix') lars(x, y, intercept = FALSE) Error in scale.default(x, FALSE, normx) : > > length of 'scale' must equal the number of columns of 'x' > > More specifically, scale.default fails: normx = new( "dgeMatrix", x = c(1.04, 1, 1.09, 1.01, 1.01, 0.992, 1.04, 0.975, 1.06, 1.06), Dim = c(1L, 10L), Dimnames = list(NULL, c("x.age", "x.sex", "x.bmi", "x.map", "x.tc", "x.ldl", "x.hdl", "x.tch", "x.ltg", "x.glu")), factors = list() ) scale(x, FALSE, normx) The problem is that this check fails because is.numeric(normx) is FALSE: if (is.numeric(scale) && length(scale) == nc) So, the error message is misleading. In fact length(scale) is the same as nc. At a minimum, the error message needs to be repaired; do we also want to attempt as.numeric(normx) (which I believe would have allowed scale to work in this case)? (I'm aware that there's some import issues in lars, as the offending line to create normx *should* work, as is.numeric(sqrt(drop(rep(1, nrow(x)) %*% (x^2 is TRUE -- it's simply that lars doesn't import the appropriate S4 methods) Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Inconsistency in handling of numeric input with %d by sprintf
https://github.com/Rdatatable/data.table/issues/2171 The fix was easy, it's just surprising to see the behavior change almost on a whim. Just wanted to point it out in case this is unknown behavior, but Evan seems to have found this as well. On Tue, May 23, 2017 at 12:00 PM, Michael Chirico wrote: > Astute observation. And of course we should be passing integer when we use > %d. It's an edge case in how we printed ITime objects in data.table: > > > On Tue, May 23, 2017 at 11:53 AM, Joris Meys wrote: > >> I initially thought this is "documented behaviour". ?sprintf says: >> >> Numeric variables with __exactly integer__ values will be coerced to >> integer. (emphasis mine). >> >> Turns out this only works when the first value is numeric and not NA, as >> shown by the following example: >> >> > sprintf("%d", as.numeric(c(NA,1))) >> Error in sprintf("%d", as.numeric(c(NA, 1))) : >> invalid format '%d'; use format %f, %e, %g or %a for numeric objects >> > sprintf("%d", as.numeric(c(1,NA))) >> [1] "1" "NA" >> >> So the safest thing is indeed passing the right type, but the behaviour >> is indeed confusing. I checked this on both Windows and Debian, and on both >> systems I get the exact same response. >> >> Cheers >> Joris >> >> On Tue, May 23, 2017 at 4:53 PM, Evan Cortens >> wrote: >> >>> Hi Michael, >>> >>> I posted something on this topic to R-devel several weeks ago, but never >>> got a response. My ultimate conclusion is that sprintf() isn't super >>> consistent in how it handles coercion: sometimes it'll coerce real to >>> integer without complaint, other times it won't. (My particular email had >>> to do with the vectors longer than 1 and their positioning vis-a-vis the >>> format string.) The safest thing is just to pass the right type. In this >>> case, sprintf('%d', as.integer(NA_real_)) works. >>> >>> Best, >>> >>> Evan >>> >>> On Fri, May 19, 2017 at 9:23 AM, Michael Chirico < >>> michaelchiri...@gmail.com> >>> wrote: >>> >>> > Consider >>> > >>> > #as.numeric for emphasis >>> > sprintf('%d', as.numeric(1)) >>> > # [1] "1" >>> > >>> > vs. >>> > >>> > sprintf('%d', NA_real_) >>> > >>> > > Error in sprintf("%d", NA_real_) : >>> > >>> >invalid format '%d'; use format %f, %e, %g or %a for numeric object >>> > > >>> > >>> > I understand the error is correct, but if it works for other numeric >>> input, >>> > why doesn't R just coerce NA_real_ to NA_integer_? >>> > >>> > Michael Chirico >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > __ >>> > R-devel@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>> > >>> >>> >>> >>> -- >>> Evan Cortens, PhD >>> Institutional Analyst - Office of Institutional Analysis >>> Mount Royal University >>> 403-440-6529 >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Mathematical Modelling, Statistics and Bio-Informatics >> >> tel : +32 (0)9 264 61 79 <+32%209%20264%2061%2079> >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Inconsistency in handling of numeric input with %d by sprintf
Astute observation. And of course we should be passing integer when we use %d. It's an edge case in how we printed ITime objects in data.table: On Tue, May 23, 2017 at 11:53 AM, Joris Meys wrote: > I initially thought this is "documented behaviour". ?sprintf says: > > Numeric variables with __exactly integer__ values will be coerced to > integer. (emphasis mine). > > Turns out this only works when the first value is numeric and not NA, as > shown by the following example: > > > sprintf("%d", as.numeric(c(NA,1))) > Error in sprintf("%d", as.numeric(c(NA, 1))) : > invalid format '%d'; use format %f, %e, %g or %a for numeric objects > > sprintf("%d", as.numeric(c(1,NA))) > [1] "1" "NA" > > So the safest thing is indeed passing the right type, but the behaviour is > indeed confusing. I checked this on both Windows and Debian, and on both > systems I get the exact same response. > > Cheers > Joris > > On Tue, May 23, 2017 at 4:53 PM, Evan Cortens wrote: > >> Hi Michael, >> >> I posted something on this topic to R-devel several weeks ago, but never >> got a response. My ultimate conclusion is that sprintf() isn't super >> consistent in how it handles coercion: sometimes it'll coerce real to >> integer without complaint, other times it won't. (My particular email had >> to do with the vectors longer than 1 and their positioning vis-a-vis the >> format string.) The safest thing is just to pass the right type. In this >> case, sprintf('%d', as.integer(NA_real_)) works. >> >> Best, >> >> Evan >> >> On Fri, May 19, 2017 at 9:23 AM, Michael Chirico < >> michaelchiri...@gmail.com> >> wrote: >> >> > Consider >> > >> > #as.numeric for emphasis >> > sprintf('%d', as.numeric(1)) >> > # [1] "1" >> > >> > vs. >> > >> > sprintf('%d', NA_real_) >> > >> > > Error in sprintf("%d", NA_real_) : >> > >> >invalid format '%d'; use format %f, %e, %g or %a for numeric object >> > > >> > >> > I understand the error is correct, but if it works for other numeric >> input, >> > why doesn't R just coerce NA_real_ to NA_integer_? >> > >> > Michael Chirico >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> >> >> >> -- >> Evan Cortens, PhD >> Institutional Analyst - Office of Institutional Analysis >> Mount Royal University >> 403-440-6529 >> >> [[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 (0)9 264 61 79 <+32%209%20264%2061%2079> > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Inconsistency in handling of numeric input with %d by sprintf
Consider #as.numeric for emphasis sprintf('%d', as.numeric(1)) # [1] "1" vs. sprintf('%d', NA_real_) > Error in sprintf("%d", NA_real_) : invalid format '%d'; use format %f, %e, %g or %a for numeric object > I understand the error is correct, but if it works for other numeric input, why doesn't R just coerce NA_real_ to NA_integer_? Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] New strptime conversion specification for ordinal suffixes
As touched on briefly on SO <http://stackoverflow.com/questions/39237299>, base R has what appears to me to be a serious deficiency in its inability to recognize dates formatted as character strings with ordinal suffixes: ord_dates <- c("September 1st, 2016", "September 2nd, 2016", "September 3rd, 2016", "September 4th, 2016") ?strptime lists no conversion specification which could match ord_dates in one pass (as I discovered, even lubridate only manages to succeed by going through the vector in several passes). How difficult would it be to add a new conversion specification which would handle this, which would seem to me to be a pretty common instance of dates to be found in the raw data wild? My suggestion would be %o for ordinal suffixes. These would obviously be locale-specific, but in English, %o would match to: - st - nd - rd - th - st - nd - rd - th Other languages may be covered by this <https://en.wikipedia.org/wiki/Ordinal_indicator> and/or this <https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts> Wikipedia page on ordinal superscripts & Unicode superscripts, respectively. With this implemented, converting ord_dates to a Date or POSIXct would be as simple as: as.Date(ord_dates, format = "%B %d%o, %Y") Is there something on the C level preventing this from happening? Michael Chirico [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Suggestions for improvement as regards `as` methods, and a call for consistency in `as.Date` methods
Good evening all, This topic is gone into at a bit more length at my related Stack Overflow question here: http://stackoverflow.com/questions/34647674/why-do-as-methods-remove-vector-names-and-is-there-a-way-around-it There are two lingering issues despite the abundant insight received at SO, namely: 1) _Why_ do as methods remove their arguments' names attribute? This is a fact which is mentioned briefly in a select few of the related help files, namely ?as.vector ("removes *all* attributes including names for results of atomic mode"), ?as.double ("strips attributes including names.") and ?as.character ("strips attributes including names"); however, it appears (1) neither of these references gives a satisfactory explanation of the reasoning behind this (I can only think of speed) and (2) it would be much more digestible to users if this information (even copy-pasting the same blurb) was placed in all of the as reference files (e.g., ?as, ?as.numeric, ?as.Date, ?as.POSIXct, etc.) Personally, I think that unless there's a substantial efficiency cost to doing so, the default should in fact be to retain names (if not other attributes). 2) All as.Date methods should behave consistently as regards attribute retention As explicated in the referenced SO topic, the following should all give the same result (as they would for similar examples involving other as methods), but don't: datesc <- c(ind = "2015-07-04", nyd = "2016-01-01") datesn <- c(ind = 16620, nyd = 16801) datesp <- structure(c(1435982400, 1451624400), .Names = c("ind", "nyd"), class = c("POSIXct", "POSIXt"), tzone = "") datesl <- structure(list(sec = c(0, 0), min = c(0L, 0L), hour = c(0L, 0L), mday = c(4L, 1L), mon = c(6L, 0L), year = structure(115:116, .Names = c("ind", "nyd")), wday = c(6L, 5L), yday = c(184L, 0L), isdst = c(1L, 0L), zone = c("EDT", "EST"), gmtoff = c(NA_integer_, NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", "POSIXt")) Retain names as.Date.numeric(datesn) # ind nyd #"2015-07-04" "2016-01-01" as.Date.POSIXct(datesp) # ind nyd #"2015-07-04" "2016-01-01" Destroy names as.Date.POSIXlt(datesl) # [1] "2015-07-04" "2016-01-01" as.Date.character(datesc) # [1] "2015-07-04" "2016-01-01" (unconfirmed, but I assume given a glance at the code that all of as.Date.date, as.Date.dates, as.Date.ts, as.Date.yearmon, and as.Date.yearqtr will also strip the names) Regardless of the default behavior as regards keeping/destroying names/other attributes, it would seem that for the sake of consistency the above should be unified. Barring an overhaul of all as methods to retain names, this would mean the following changes (for example): as.Date.numeric <- function (x, origin, ...) { if (missing(origin)) origin <- "1970-01-01" if (identical(origin, "-00-00")) origin <- as.Date("-01-01", ...) - 1 setNames(as.Date(origin, ...) + x, NULL) } as.Date.POSIXct <- function (x, tz = "UTC", ...) { if (tz == "UTC") { z <- floor(unclass(x)/86400) attr(z, "tzone") <- NULL attr(z, "names") <- NULL structure(z, class = "Date") } else as.Date(as.POSIXlt(x, tz = tz)) } Thank you in advance for your consideration and thank you as always for your time on this project. Michael Chirico PhD Candidate in Economics University of Pennsylvania 3718 Locust Walk Room 160 McNeil Building Philadelphia, PA 19104 United States of America [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel