Re: [Rd] Unique ID for conditions to supress/rethrow selected conditions?
On Sun, 2023-04-16 at 13:52 +0200, Iñaki Ucar wrote: > I agree that something like this would be a nice addition. With the > current condition system, it would be certainly easy (but quite a lot > of work) to define a hierarchy of built-in conditions, and then use > them consistently throughout base R. Yes, a typed condition system would be great. I have two other ideas: By reading the "R messages" and "preparing translactions" sections of the "R extensions manual" https://cran.r-project.org/doc/manuals/r-release/R-exts.html#R-messages I was thinking about using the "unique" R message texts (which are the msgid in the *.po files, see e.g. https://github.com/r-devel/r-svn/blob/60a4db2171835067999e96fd2751b6b42c6a6ebc/src/library/base/po/de.po#L892) to maintain a unique ID (not dependent on the actual translation into the current language). A "simple" solution could be to pre- or postfix each message text with an ID, for example this code here else errorcall(call, _("non-numeric argument to function")); # https://github.com/r-devel/r-svn/blob/49597237842697595755415cf9147da26c8d1088/src/main/complex.c#L347 would become else errorcall(call, _("non-numeric argument to function [47]")); or else errorcall(call, _("[47] non-numeric argument to function")); Now the ID could be extracted more easily (at least for base R condition messages)... This would even be back-portable to older R versions to make condition IDs broadly available "in the wild". Another way to introduce an ID for each condition in base R would be ("the hard way") 1) by refactoring each and every code location with an embedded message string to use a centralized key/msg_text data structure to "look up" the appropriate message text and 2) use the key to enrich the condition as unique ID (e.g. as an attribute in the condition object). __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Unique ID for conditions to supress/rethrow selected conditions?
I am the author of the *tryCatchLog* package and want to - suppress selected conditions (warnings and messages) - rethrow selected conditions (e.g a specific warning as a message or to "rename" the condition text). I could not find any reliable unique identifier for each possible condition - that (base) R throws - that 3rd-party packages can throw (out of scope here). Is there any reliable way to identify each possible condition of base R? Are there plans to implement such an identifier ("errno")? PS: Things that do not work good enough IMHO: 1. Just use the condition classes (not really unique to distiguish between each and every condition)) 2. Try to match the condition text (it depends on the active language setting in R which cannot be switched "on the fly" on each platform and wordings or translations may even change in the future) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Slow try in combination with do.call
In fact an attentive user reported the same type of (slow due to deparse) problem in may tryCatchLog package recently when using a large sparse matrix https://github.com/aryoda/tryCatchLog/issues/68 and I have fixed it by explicitly using the nlines arg of deparse() instead of using as.character() which implicitly calls deparse() for a call stack. Looking for a fix I think I may have found inconsistent deparse default arguments in base R between as.character() and deparse(): A direct deparse call in R uses control = c("keepNA", "keepInteger", "niceNames", "showAttributes") as default (see ?.deparseOpts for details). The as.character() implementation in the C code of base R calls the internal deparse C function with another default for .deparseOpts: The SIMPLEDEPARSE C constant which corresponds to control = NULL. https://github.com/wch/r-source/blob/54f94f0433c487fe3b0df9bae477c9babdd1/src/main/deparse.c#L345 This is clearly no bug but maybe the as.character() implementation should use the default args of deparse() for consistency (just a proposal!)... BTW: You can find my analysis result with the call path and links to the R source code in the github issue: https://github.com/aryoda/tryCatchLog/issues/68#issuecomment-930593002 On Thu, 2021-09-16 at 18:04 +0200, Martin Maechler wrote: > > > > > > Martin Maechler > > > > > > on Thu, 16 Sep 2021 17:48:41 +0200 writes: > > > > > > Alexander Kaever > > > > > > on Thu, 16 Sep 2021 14:00:03 + writes: > > >> Hi, > >> It seems like a try(do.call(f, args)) can be very slow on error > depending on the args size. This is related to a complete deparse of the call > using deparse(call)[1L] within the try function. How about replacing > deparse(call)[1L] by deparse(call, nlines = 1)? > > >> Best, > >> Alex > > > an *excellent* idea! > > > I have checked that the resulting try() object continues to contain the > > long large call; indeed that is not the problem, but the > > deparse()ing *is* as you say above. > > > {The experts typically use tryCatch() directly, instead of try() , > > which may be the reason other experienced R developers have not > > stumbled over this ...} > > > Thanks a lot, notably also for the clear repr.ex. below. > > > Best regards, > > Martin > > OTOH, I find so many cases of deparse(*)[1] (or similar) in > R's own sources, I'm wondering > if I'm forgetting something ... and using nlines=* is not always > faster & equivalent and hence better ?? > > Martin > > > > > >> Example: > > >> fun <- function(x) { > >> stop("testing") > >> } > >> d <- rep(list(mtcars), 1) > >> object.size(d) > >> # 72MB > > >> system.time({ > >> try(do.call(fun, args = list(x = d))) > >> }) > >> # 8s > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R 4.0.2 64-bit Windows hangs
May be unrelated but on SO there is a report that a Windows update may cause this problem: https://stackoverflow.com/questions/63457321/r-will-not-run-after-latest-windows-10-updates/63524608#63524608 On Fri, 2020-08-21 at 12:34 +, m1388m+moe1ydyn0hbs--- via R-devel wrote: > I am having exactly the same issue as the following bug report: > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16515. > > RTerm.exe hangs on startup, nothing is printed to the terminal. 32-bit > RTerm.exe runs fine. > > No errors are displayed, but I see the same as the bug report in Event Viewer. > > I am running Windows 10 64-bit, v2010. > > > > > > > Sent using Guerrillamail.com > Block or report abuse: > https://www.guerrillamail.com//abuse/?a=UwxwABsFT5QHxR6m%2F3QacQCJQtiX > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Guidelines when to use LF vs CRLF ("\n" vs. "\r\n") on Windows for new lines (line endings)?
Dear R developers, I am developing an R package which returns strings with new line codes. I am not sure if I should use "\r\n" or "\n" in my returned strings on Windows platforms. What is the recommended best practice for package developers (and code in base R) for coding new lines in strings? And just out of curiosity: What is the reason (or history) for preferring "\n" in R even on Windows (see examples below)? Best regards Jürgen PS: Examples from base R: R seems to use (almost) only "\n" for new lines internally - even on Windows platforms, eg.: charToRaw(paste0("a", "\n", "b")) [1] 61 0a 62 # eol default is "\n" write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ", eol = "\n", na = "NA", dec = ".", row.names = TRUE, col.names = TRUE, qmethod = c("escape", "double"), fileEncoding = "") On the other hand some external interfaces require Windows-style new lines ("\r\n"), eg. text file outputs seen ti care internally: writeLines(text, con = stdout(), sep = "\n", useBytes = FALSE) # Excerpt from the documentation: # Normally writeLines is used with a text-mode connection, # and the default separator is converted to the normal separator # for that platform (LF on Unix/Linux, CRLF on Windows). # calls internally do_writelines(): # https://github.com/wch/r-source/blob/8db7b85953127f364f52d201ec057911db4601e5/src/main/connections.c#L4023 # But: Where is the conversion done (hidden in the call to Riconv()?) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Why does INT 3 (opcode 0xCC) SIGTRAP break to debugger (gdb) in Rgui.exe and Rterm.exe but NOT in R.exe on Windows (64 bit)?
I am developing a package to improve the debugging of Rcpp (C++) and SEXP based C code in gdb by providing convenience print, subset and other functions: https://github.com/aryoda/R_CppDebugHelper I also want to solve the Windows-only problem that you can break into the debugger from R only via Rgui.exe (menu "Misc > break to debugger") by supporting breakpoints for R.exe. I want breakpoints support in R.exe because debugging in Rgui.exe has an unwanted side effect: https://stackoverflow.com/questions/59236579/gdb-prints-output-stdout-to-rgui-console-instead-of-gdb-console-on-windows-whe My idea is to break into the debugger from R.exe by calling a little C(++) code that contains an INT 3 (opcode 0xCC) SIGTRAP code: // break_to_debugger.cpp // [[Rcpp::export]] int break_to_debugger() { int a = 3; asm("int $3"); // this code line shall break into the debugger // Idea taken from "Rgui > break into debugger": // https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/gnuwin32/rui.c#L431 a++; return a; } # breakpoint.R #' breaks the execution into the debugger #' #' @return #' @export breakpoint <- function() { break_to_debugger() } Surprisingly this works not only on Linux but also on Windows (v10, x64 architecture = 64 bit) in Rterm.exe, but NOT for R.exe (64 bit): - Rgui.exe:Works - Rscript.exe: Works - R.exe: Does not work: R.exe is exited with: [Inferior 1 (process 20704) exited with code 0203] Can you please help me to understand why it works for Rgui.exe and Rscript.exe but not for R.exe? Why is int 3 exiting R.exe? And: How could I make it also work with R.exe? Thanks a lot for sharing your ideas and experiences! Jürgen PS 1: My sessionInfo(): R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134) PS 2: My package "CppDebugHelper" was compiled with -g -o0 -std=c++11 PS 3: Here is my captured gdb output for the three test cases: 1. Rgui.exe >gdb --quiet --args Rgui.exe --silent --vanilla Reading symbols from Rgui.exe...(no debugging symbols found)...done. (gdb) run Starting program: C:\R\bin\x64\Rgui.exe --silent --vanilla [New Thread 14476.0x3710] [New Thread 14476.0x284c] [New Thread 14476.0x50ec] [New Thread 14476.0x2d24] warning: Invalid parameter passed to C runtime function. [In RGui's R console:] library(CppDebugHelper) breakpoint() [in gdb again:] Program received signal SIGTRAP, Trace/breakpoint trap. break_to_debugger () at break_to_debugger.cpp:33 33a++; (gdb) b debug_example_rcpp Breakpoint 1 at 0x66ac6846: file debug_example_rcpp.cpp, line 13. (gdb) continue Continuing. [In RGui's R console:] debug_example_rcpp() [in gdb again:] Breakpoint 1, debug_example_rcpp () at debug_example_rcpp.cpp:13 13 CharacterVector cv = CharacterVector::create("foo", "bar", NA_STRING, "hello") ; (gdb) next 14 NumericVector nv = NumericVector::create(0.0, 1.0, NA_REAL, 10) ; (gdb) n 16 DateVector dv= DateVector::create( 14974, 14975, 15123, NA_REAL); // TODO how to use real dates instead? (gdb) n 17 DateVector dv2 = DateVector::create(Date("2010-12-31"), Date("01.01.2011", "%d.%m.%Y"), Date(2011, 05, 29), NA_REAL); (gdb) n 18 DatetimeVector dtv = DatetimeVector::create(1293753600, Datetime("2011-01-01"), Datetime("2011-05-29 10:15:30") , NA_REAL); (gdb) n 19 DataFrame df = DataFrame::create(Named("name1") = cv, _["value1"] = nv, _["dv2"] = dv2); // Named and _[ ] are the same (gdb) n 20 CharacterVector col1 = df["name1"]; // get the first column (gdb) call dbg_print(df) (gdb) call dbg_str(df) (gdb) continue Continuing. [Output for the dbg_* function calls is printed to Rgui's R console (NOT the gdb terminal!):] name1 value1dv2 1 foo 0 2010-12-31 2 bar 1 2011-01-01 3 NA 2011-05-29 4 hello 10 'data.frame': 4 obs. of 3 variables: $ name1 : Factor w/ 3 levels "bar","foo","hello": 2 1 NA 3 $ value1: num 0 1 NA 10 $ dv2 : Date, format: "2010-12-31" "2011-01-01" ... 2. R.exe >gdb --quiet --args R.exe --silent --vanilla Reading symbols from R.exe...(no debugging symbols found)...done. (gdb) r Starting program: C:\R\bin\x64\R.exe --silent --vanilla [New Thread 20704.0x2b20] [New Thread 20704.0x4c08] [New Thread 20704.0x425c] [New Thread 20704.0x45f8] > library(CppDebugHelper) > breakpoint() [Thread 20704.0x45f8 exited with code 2147483651] [Thread 20704.0x425c exited with code 2147483651] [Thread 20704.0x4c08 exited with code 2147483651] [Inferior 1 (process 20704) exited with code 0203] (gdb) bt No stack. (gdb) 3. Rterm.exe gdb --quiet
Re: [Rd] typeof(getOption("warn")) is "integer" instead of "double" in R unstable (2019-09-27 r77229)? Reproducible?
Thanks a lot for pointing out the reason (and yes, I am testing quite to stringent in this case - it's my old testing disease ;-) For other readers: The R-devel NEWS is a good source to find possible change reasons: https://stat.ethz.ch/R-manual/R-devel/doc/html/NEWS.html On Sun, 2019-09-29 at 08:33 -0400, Duncan Murdoch wrote: > On 29/09/2019 7:55 a.m., nos...@altfeld-im.de wrote: > > Hi, > > > > I have a failing unit test in my package tryCatchLog on the CRAN build > > infrastructure > > (https://cran.r-project.org/web/checks/check_results_tryCatchLog.html) > > with "R Under development (unstable) (2019-09-27 r77229)" > > and the unit tests just ensures consistent behaviour of R (not of my > > package) as a precondition: > > > > The failing unit test is caused by > > > typeof(getOption("warn")) > > > [1] "integer" > > > > but it should be > > > [1] "double" > > This is related to this bug fix: > > CHANGES IN R 3.6.1 patched BUG FIXES > > ‘options(warn=1e11)’ is an error now, instead of later leading to C > stack overflow because of infinite recursion. > > which occurred in rev 77226. It explicitly coerces the warn value to > integer. > > > > I have no build infrastructure for dev and want to find out if this is > > caused by > > - my mistake > > - changes in the R dev version > > - the new C compilers used (correlates with the failing unit test) > > It is changes in the dev and patched versions, and also your mistake: > your test shouldn't be so stringent. The docs don't say that the value > has to be a double; in fact, they suggest it should be a whole number > value (talking about 0, 1, "2 or more", not about what would happen with > options(warn = pi/2), for example. > > In older versions, options(warn = pi/2) is treated the same as > options(warn = 1), and in the new version, it is displayed as 1 as well. > > Duncan Murdoch > > > > Can somebody (having the R dev version available) please help me and answer > > the result of > > > > > typeof(getOption("warn")) > > > > using "R Under development (unstable) (2019-09-27 r77229)" or newer? > > > > Thanks a lot and sorry for the "noise"! > > > > Jurgen > > > > PS: These R (dev) versions did work as expected (returning "double") but > > were also using older C compilers: > > - R Under development (unstable) (2019-09-20 r77199) > > - R Under development (unstable) (2019-09-22 r77202) > > - R Under development (unstable) (2019-09-25 r77217) > > - R version 3.6.1 Patched (2019-09-25 r77224) > > - R version 3.6.1 (2019-07-05) > > - R version 3.6.0 beta (2019-04-15 r76395) > > - R version 3.5.3 (2019-03-11) > > - R version 3.5.2 (2018-12-20) > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] typeof(getOption("warn")) is "integer" instead of "double" in R unstable (2019-09-27 r77229)? Reproducible?
Hi, I have a failing unit test in my package tryCatchLog on the CRAN build infrastructure (https://cran.r-project.org/web/checks/check_results_tryCatchLog.html) with "R Under development (unstable) (2019-09-27 r77229)" and the unit tests just ensures consistent behaviour of R (not of my package) as a precondition: The failing unit test is caused by > typeof(getOption("warn")) > [1] "integer" but it should be > [1] "double" I have no build infrastructure for dev and want to find out if this is caused by - my mistake - changes in the R dev version - the new C compilers used (correlates with the failing unit test) Can somebody (having the R dev version available) please help me and answer the result of > typeof(getOption("warn")) using "R Under development (unstable) (2019-09-27 r77229)" or newer? Thanks a lot and sorry for the "noise"! Jurgen PS: These R (dev) versions did work as expected (returning "double") but were also using older C compilers: - R Under development (unstable) (2019-09-20 r77199) - R Under development (unstable) (2019-09-22 r77202) - R Under development (unstable) (2019-09-25 r77217) - R version 3.6.1 Patched (2019-09-25 r77224) - R version 3.6.1 (2019-07-05) - R version 3.6.0 beta (2019-04-15 r76395) - R version 3.5.3 (2019-03-11) - R version 3.5.2 (2018-12-20) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem building rmarkdown vignettes with child
Which R version are you using to produce the problem? A few first indications: - The regex in ".install_extras" does not match your file endings: Change "Rmd_tmp$" into "Rmd_t$" - Try "output: rmarkdown::html_vignette" instead of "output: html_document" in the header of the file "ABVignetteWithLocalChild.Rmd" (and possibly other "*.Rmd"s) - Try to specify the child doc name directly in the chunks via "```{r child = "NoBuildVignette.Rmd_t"}" instead of "```{r includChild, child = child_docs}" Note the possible typo in the tag "includChild" (-> "includeChild"?) (and possibly other "*.Rmd"s) PS: You can find a working example of child Rmds for a CRAN package here: https://github.com/aryoda/tryCatchLog/tree/master/vignettes On Wed, 2018-11-07 at 13:33 +0100, Witold E Wolski wrote: > Hello, > > This is a problem I posted about already some time ago: > https://stat.ethz.ch/pipermail/r-devel/2018-September/076786.html > > Finally, I did had some time to create a minimal package to reproduce > the problem that vignettes with child can not be build. > https://github.com/wolski/RmarkdownVignetteProblem > > The problem basically is that while all the vignettes can be build by running > > devtools::build_vignettes > or > rmarkdown::render > > they will all fail to build when running > devtools::build() > or > R CMD build > > except of the > ABVignetteWithLocalChild.Rmd > for which I did apply the workaround suggested by Duncan in this github > issue: > https://github.com/yihui/knitr/issues/1540 > > > Best regards > Witek > > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Missing objects using dump.frames for post-mortem debugging of crashed batch jobs. Bug or gap in documentation?
Martin, thanks for the good news and sorry for wasting your (and others time) by not doing my homework and query bugzilla first (lesson learned! ). I have tested the new implementation from R-devel and observe a semantic difference when playing with the parameters: # Test script 1 g <- "global" f <- function(p) { l <- "local" dump.frames() } f("parameter") results in # > debugger() # Message: object 'server' not foundAvailable environments had calls: # 1: source("~/.active-rstudio-document", echo = TRUE) # 2: withVisible(eval(ei, envir)) # 3: eval(ei, envir) # 4: eval(expr, envir, enclos) # 5: .active-rstudio-document#9: f("parameter") # # Enter an environment number, or 0 to exit # Selection: 5 # Browsing in the environment with call: # .active-rstudio-document#9: f("parameter") # Called from: debugger.look(ind) # Browse[1]> g # [1] "global" # Browse[1]> while dumping to a file # Test script 2 g <- "global" f <- function(p) { l <- "local" dump.frames(to.file = TRUE, include.GlobalEnv = TRUE) } f("parameter") results in # > load("last.dump.rda") # > debugger() # Message: object 'server' not foundAvailable environments had calls: # 1: .GlobalEnv # 2: source("~/.active-rstudio-document", echo = TRUE) # 3: withVisible(eval(ei, envir)) # 4: eval(ei, envir) # 5: eval(expr, envir, enclos) # 6: .active-rstudio-document#11: f("parameter") # # Enter an environment number, or 0 to exit # Selection: 6 # Browsing in the environment with call: # .active-rstudio-document#11: f("parameter") # Called from: debugger.look(ind) # Browse[1]> g # Error: object 'g' not found # Browse[1]> The semantic difference is that the global variable "g" is visible within the function "f" in the first version, but not in the second version. If I dump to a file and load and debug it then the search path through the frames is not the same during run time vs. debug time. An implementation with the same semantics could be achieved by applying this workaround currently: dump.frames() save.image(file = "last.dump.rda") Does it possibly make sense to unify the semantics? THX! On Mon, 2016-11-14 at 11:34 +0100, Martin Maechler wrote: > >>>>> nospam@altfeld-im de <nos...@altfeld-im.de> > >>>>> on Sun, 13 Nov 2016 13:11:38 +0100 writes: > > > Dear R friends, to allow post-mortem debugging In my > > Rscript based batch jobs I use > > >tryCatch( , error = function(e) { > > dump.frames(to.file = TRUE) }) > > > to write the called frames into a dump file. > > > This is similar to the method recommended in the "Writing > > R extensions" manual in section 4.2 Debugging R code (page > > 96): > > > https://cran.r-project.org/doc/manuals/R-exts.pdf > > >> options(error = quote({dump.frames(to.file=TRUE); q()})) > > > > > When I load the dump later in a new R session to examine > > the error I use > > > load(file = "last.dump.rda") debugger(last.dump) > > > My problem is that the global objects in the workspace are > > NOT contained in the dump since "dump.frames" does not > > save the workspace. > > > This makes debugging difficult. > > > > > For more details see the stackoverflow question + answer > > in: > > > https://stackoverflow.com/questions/40421552/r-how-make-dump-frames-include-all-variables-for-later-post-mortem-debugging/40431711#40431711 > > > > > I think the reason of the problem is: > > > > > If you use dump.files(to.file = FALSE) in an interactive > > session debugging works as expected because it creates a > > global variable called "last.dump" and the workspace is > > still loaded. > > > In the batch job scenario however the workspace is NOT > > saved in the dump and therefore lost if you debug the dump > > in a new session. > > > > Options to solve the issue: > > -- > > > 1. Improve the documentation of the R help for > > "dump.frames" and the R_exts manual to propose another > > code snippet for batch job scenarios: > > > dump.frames() save.image(file = "last.dump.rda") > > > 2. Change the semantics of "dump.frames(to.file = TRUE)" > > to inc
[Rd] Missing objects using dump.frames for post-mortem debugging of crashed batch jobs. Bug or gap in documentation?
Dear R friends, to allow post-mortem debugging In my Rscript based batch jobs I use tryCatch( , error = function(e) { dump.frames(to.file = TRUE) }) to write the called frames into a dump file. This is similar to the method recommended in the "Writing R extensions" manual in section 4.2 Debugging R code (page 96): https://cran.r-project.org/doc/manuals/R-exts.pdf > options(error = quote({dump.frames(to.file=TRUE); q()})) When I load the dump later in a new R session to examine the error I use load(file = "last.dump.rda") debugger(last.dump) My problem is that the global objects in the workspace are NOT contained in the dump since "dump.frames" does not save the workspace. This makes debugging difficult. For more details see the stackoverflow question + answer in: https://stackoverflow.com/questions/40421552/r-how-make-dump-frames-include-all-variables-for-later-post-mortem-debugging/40431711#40431711 I think the reason of the problem is: If you use dump.files(to.file = FALSE) in an interactive session debugging works as expected because it creates a global variable called "last.dump" and the workspace is still loaded. In the batch job scenario however the workspace is NOT saved in the dump and therefore lost if you debug the dump in a new session. Options to solve the issue: -- 1. Improve the documentation of the R help for "dump.frames" and the R_exts manual to propose another code snippet for batch job scenarios: dump.frames() save.image(file = "last.dump.rda") 2. Change the semantics of "dump.frames(to.file = TRUE)" to include the workspace in the dump. This would change the semantics implied by the function name but makes the semantics consistent for both "to.file" param values. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
Excellent analysis, thank you both for the quick reply! Is there anything I can do to get the bug fixed in the next version of R (e. g. filing a bug report at https://bugs.r-project.org/bugzilla3/)? On Tue, 2016-02-23 at 14:06 +0200, Mikko Korpela wrote: > On 23.02.2016 11:37, Martin Maechler wrote: > >>>>>> nospam@altfeld-im de <nos...@altfeld-im.de> > >>>>>> on Mon, 22 Feb 2016 18:45:59 +0100 writes: > > > > > Dear R developers > > > I think I have found a bug that can be reproduced with two lines of > > code > > > and I am very thankful to get your first assessment or feed-back on my > > > report. > > > > > If this is the wrong mailing list or I did something wrong > > > (e. g. semi "anonymous" email address to protect my privacy and defend > > > unwanted spam) please let me know since I am new here. > > > > > Thank you very much :-) > > > > > J. Altfeld > > > > Dear J., > > (yes, a bit less anonymity would be very welcomed here!), > > > > You are right, this is a bug, at least in the documentation, but > > probably "all real", indeed, > > > > but read on. > > > > > On Tue, 2016-02-16 at 18:25 +0100, nos...@altfeld-im.de wrote: > > >> > > >> > > >> If I execute the code from the "?write.table" examples section > > >> > > >> x <- data.frame(a = I("a \" quote"), b = pi) > > >> # (ommited code) > > >> write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE") > > >> > > >> the resulting CSV file has a size of 6 bytes which is too short > > >> (truncated): > > >> > > >> """,3 > > > > reproducibly, yes. > > If you look at what write.csv does > > and then simplify, you can get a similar wrong result by > > > > write.table(x, file = "foo.tab", fileEncoding = "UTF-16LE") > > > > which results in a file with one line > > > > """ 3 > > > > and if you debug write.table() you see that its building blocks > > here are > > file <- file(, encoding = fileEncoding) > > > > awriteLines(*, file=file) for the column headers, > > > > and then "deeper down" C code which I did not investigate. > > I took a look at connections.c. There is a call to strlen() that gets > confused by null characters. I think the obvious fix is to avoid the > call to strlen() as the size is already known: > > Index: src/main/connections.c > === > --- src/main/connections.c(revision 70213) > +++ src/main/connections.c(working copy) > @@ -369,7 +369,7 @@ > /* is this safe? */ > warning(_("invalid char string in output conversion")); > *ob = '\0'; > - con->write(outbuf, 1, strlen(outbuf), con); > + con->write(outbuf, 1, ob - outbuf, con); > } while(again && inb > 0); /* it seems some iconv signal -1 on > zero-length input */ > } else > > > > > > But just looking a bit at such a file() object with writeLines() > > seems slightly revealing, as e.g., 'eol' does not seem to > > "work" for this encoding: > > > > > fn <- tempfile("ffoo"); ff <- file(fn, open="w", encoding = > > "UTF-16LE") > > > writeLines(LETTERS[3:1], ff); writeLines("|", ff); writeLines(">a", > > ff) > > > close(ff) > > > file.show(fn) > > CBA|> > > > file.size(fn) > > [1] 5 > > > > > With the patch applied: > > > readLines(fn, encoding="UTF-16LE", skipNul=TRUE) > [1] "C" "B" "A" "|" ">a" > > file.size(fn) > [1] 22 > > - Mikko Korpela > > > >> The problem seems to be the iconv function: > > >> > > >> iconv("foo", to="UTF-16") > > >> > > >> produces > > >> > > >> Error in iconv("foo", to = "UTF-16"): > > >> embedded nul in string: '\xff\xfef\0o\0o\0' > > > > but this works > > > > > iconv("foo", to="UTF-16", toRaw=TRUE) > > [[1]] > > [1] ff fe 66 00 6f 00 6f 00 > > > > (indeed showing the embedded '\0's) > > > > >> In 2010 a (partial) patch for this problem was submitted: > > >> http://tolstoy.newcastle.edu.au/R/e10/devel/10/06/0648.html > > > > the patch only related to the iconv() problem not allowing 'raw' > > (instead of character) argument x. > > > > ... and it is > 5.5 years old, for an iconv() version that was less > > featureful than today. > > Rather, current iconv(x) allows x to be a list of raw entries. > > > > > > >> Are there chances to fix this problem since it prevents writing > > Windows > > >> UTF-16LE text files? > > > > >> > > >> PS: This problem can be reproduced on Windows and Linux. > > > > indeed also on "R devel of today". > > > > I agree it should be fixed... but as I said not by the patch you > > mentioned. > > > > Tested patches to fix this are welcome, indeed. > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
Dear R developers I think I have found a bug that can be reproduced with two lines of code and I am very thankful to get your first assessment or feed-back on my report. If this is the wrong mailing list or I did something wrong (e. g. semi "anonymous" email address to protect my privacy and defend unwanted spam) please let me know since I am new here. Thank you very much :-) J. Altfeld On Tue, 2016-02-16 at 18:25 +0100, nos...@altfeld-im.de wrote: > > > If I execute the code from the "?write.table" examples section > > x <- data.frame(a = I("a \" quote"), b = pi) > # (ommited code) > write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE") > > the resulting CSV file has a size of 6 bytes which is too short > (truncated): > > """,3 > > The problem seems to be the iconv function: > > iconv("foo", to="UTF-16") > > produces > > Error in iconv("foo", to = "UTF-16"): > embedded nul in string: '\xff\xfef\0o\0o\0' > > In 2010 a (partial) patch for this problem was submitted: > > http://tolstoy.newcastle.edu.au/R/e10/devel/10/06/0648.html > > Are there chances to fix this problem since it prevents writing Windows > UTF-16LE text files? > > > > PS: This problem can be reproduced on Windows and Linux. > > --- > > > sessionInfo() > R version 3.2.3 (2015-12-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 14.04.3 LTS > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > loaded via a namespace (and not attached): > [1] tools_3.2.3 > > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] iconv to UTF-16 encoding produces error due to embedded nulls (write.table with fileEncoding param)
If I execute the code from the "?write.table" examples section x <- data.frame(a = I("a \" quote"), b = pi) # (ommited code) write.csv(x, file = "foo.csv", fileEncoding = "UTF-16LE") the resulting CSV file has a size of 6 bytes which is too short (truncated): """,3 The problem seems to be the iconv function: iconv("foo", to="UTF-16") produces Error in iconv("foo", to = "UTF-16"): embedded nul in string: '\xff\xfef\0o\0o\0' In 2010 a (partial) patch for this problem was submitted: http://tolstoy.newcastle.edu.au/R/e10/devel/10/06/0648.html Are there chances to fix this problem since it prevents writing Windows UTF-16LE text files? PS: This problem can be reproduced on Windows and Linux. --- > sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.2.3 > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel