Re: [Rd] configure.ac
Many thanks Peter for your answer. That page was exactly what I was looking for. I suppose I was in too much of a hurry. I didn't developer links on the page R page and quick google didn't give me anything. That's great, thanks! On Tue, Aug 1, 2017 at 4:29 PM, Ramón Fallon wrote: > Hi, > > Just a quick mail to mention that I cannot generate a new configure script > using autoconf or autoreconf. I had edited the configure.ac and thought > ... "oh, that's my fault", but then I tried it on R-patched and R-3.4.1 > without touching configure.ac and had the same problems. > > The "building R packages" documentation seems to suggest that "autoconf" > should take care of it, but I must be missing something as I expect it to > be a common task. > > I also tried explicit autohell (yes, I know) commands > 1) autoreconf --force -v > completes but invoking configure (with no options) gives > > checking build system type... x86_64-pc-linux-gnu > checking host system type... x86_64-pc-linux-gnu > loading site script './config.site' > loading build-specific script './config.site' > ./configure: line 2982: syntax error near unexpected token `blas' > ./configure: line 2982: ` withval=$with_blas; R_ARG_USE(blas)' > > OK.. there's a recipe that oen can use, starting with: > > libtoolize --force > > but you get: > > A sequence typical found out there, starting withlibtoolize: putting > auxiliary files in AC_CONFIG_AUX_DIR, `tools'. > libtoolize: linking file `tools/ltmain.sh' > libtoolize: You should add the contents of the following files to > `aclocal.m4': > libtoolize: `/usr/share/aclocal/libtool.m4' > libtoolize: `/usr/share/aclocal/ltoptions.m4' > libtoolize: `/usr/share/aclocal/ltversion.m4' > libtoolize: `/usr/share/aclocal/ltsugar.m4' > libtoolize: `/usr/share/aclocal/lt~obsolete.m4' > libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac > and > libtoolize: rerunning libtoolize, to keep the correct libtool macros > in-tree. > libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am. > > > then: > > aclocal > autoheader > automake --force-missing --add-missing > > first two go OK, but the third gives > > configure.ac: no proper invocation of > was found. > configure.ac: You should verify that configure.ac invokes > AM_INIT_AUTOMAKE, > configure.ac: that aclocal.m4 is present in the top-level directory, > configure.ac: and that aclocal.m4 was recently regenerated (using > aclocal). > automake: no `Makefile.am' found for any configure output > > then the following runs OK > autoconf > > but running configure gives the same BLAS error. > > But I'm farily sure one shouldn't run to see what's wrong with BLAS,ratehr > it's just the configure options not being read properly. The > AM_INIT_AUTOMAKE > issue definitely seems important. > > Is there anything I'm missing? > > Cheers and thanks in advance! > > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] reproducible segmentation fault installing packages on FreeBSD 11.1
For anyone interested, here is the FreeBSD bug report. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221127 signature.asc Description: PGP signature __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] reproducible segmentation fault installing packages on FreeBSD 11.1
Dirk Eddelbuettel writes: > On 31 July 2017 at 19:38, Joseph Mingrone wrote: > | This happens when attempting to install any package. There were no such > | problems on 11.0. > | > | Some other ways to trigger the problem: > [...] > | trying URL 'https://cloud.r-project.org/src/contrib/Rcpp_0.12.12.tar.gz' > | [New LWP 100854 of process 56011] > | > | Thread 7 received signal SIGSEGV, Segmentation fault. > | [Switching to LWP 100854 of process 56011] > | uw_frame_state_for (context=context@entry=0x7fffdfffde20, > fs=fs@entry=0x7fffdfffdb70) > | at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c:1249 > | 1249 return MD_FALLBACK_FRAME_STATE_FOR (context, fs); > Looks like a gcc error to me. Rcpp itself is pretty widely used and tested > and I am not aware of it having issue per se on any of the *BSDs. The problem isn't specific to any package or packages in general, but seems related to libcurl and/or gfortran. The third call to CURLcode ret = curl_easy_perform(hnd); in in_do_curlGetHeaders() from src/modules/internet/libcurl.c triggers the problem _unless_ R is built with flang instead of gfortran. For FreeBSD 11.1 R users, there are two workarounds [1]. 1) Build the FreeBSD R package with the FLANG option instead of GFORTRAN. FLANG may be the default soon. 2) Add options(download.file.method="wget") to ~/.Rprofile with a newline after it. Ensure ftp/wget is installed, because it is not pulled in by math/R. Joseph [1] Both these workarounds prevent a predictable crash, but have only been lightly tested. signature.asc Description: PGP signature __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] translateChar in NewName in bind.c
For the 2nd example, I say that R 3.4.1 result is acceptable, as names(c(x)) and names(x) are equal. The change exposed by the 2nd example is in line with statement of the NEWS item corresponding to PR#17284: "c() and unlist() are now more efficient in constructing the names(.) of their return value, " However, currently, the NEWS item is for R-devel, not R 3.4.1 patched. On Mon, 31/7/17, Martin Maechler wrote: Subject: Re: [Rd] translateChar in NewName in bind.c Cc: r-devel@r-project.org Date: Monday, 31 July, 2017, 8:38 PM > Suharto Anggono Suharto Anggono via R-devel > on Sun, 30 Jul 2017 14:57:53 + writes: > R devel's bind.c has been ported to R patched. Is it OK while names of 'unlist' or 'c' result may be not strictly the same as in R 3.4.1 because of changed function 'NewName' in bind.c? > Using 'translateCharUTF8' instead of 'translateChar' is as it should be. It has an effect in non-UTF-8 locale for this example. > x <- list(1:2) > names(x) <- "\ue7" > res <- unlist(x) > charToRaw(names(res)[1]) > Directly assigning 'tag' to 'ans' is more efficient, but > may be different from in R 3.4.1 that involves > 'translateCharUTF8', that is also correct. It has an > effect for this example. > x <- 0 > names(x) <- "\xe7" > Encoding(names(x)) <- "latin1" > res <- c(x) > Encoding(names(res)) > charToRaw(names(res)) Yes, you are right, thank you: That part of the changes in bind.c was *not* directly related to the two R-bugs (PR#17284 & PR#17292)... and therefore, maybe I should not have ported it to R-patched (= R 3.4.1 patched). Your examples above are instructive.. notably the 2nd one seems to demonstrate to me, that the change also *did* fix a bug: Encoding(names(res)) is "latin1" in R-devel but interestingly is "UTF-8" in R 3.4.1, indeed independently of the locale. I would argue R-devel (and current R-patched) is more faithful by keeping the Encoding "latin1" that was set for names(x) also in the names(c(x)) . I could revert R-patched's bind.c (so it only contains the two official bug fixes PR#172(84|92) but I wonder if it is desirable in this case. I'm glad for further reasoning. Given current "knowledge"/"evidence", I would not revert R-patched to R 3.4.1's behavior. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] configure.ac
On 01/08/2017 17:26, peter dalgaard wrote: If you check developer.r-project.org, you'll find links to the scripts that we use for building releases and pre-releases of R. These are usually run on a Mac, but shouldn't require much change for Linux. In particular, notice this lead-in in the prerelease script: rm -rf BUILD-dist mkdir BUILD-dist cd R aclocal -I m4 autoconf cd ../BUILD-dist etc Or there is configure --enable-maintainer-mode, after which 'make' remakes 'configure' if necessary. (That is fairly often tested on Linux, and occasionally on macOS with the autoconf tools added.) -pd On 1 Aug 2017, at 17:29 , Ramón Fallon wrote: Hi, Just a quick mail to mention that I cannot generate a new configure script using autoconf or autoreconf. I had edited the configure.ac and thought ... "oh, that's my fault", but then I tried it on R-patched and R-3.4.1 without touching configure.ac and had the same problems. The "building R packages" documentation seems to suggest that "autoconf" should take care of it, but I must be missing something as I expect it to be a common task. I also tried explicit autohell (yes, I know) commands 1) autoreconf --force -v completes but invoking configure (with no options) gives checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu loading site script './config.site' loading build-specific script './config.site' ./configure: line 2982: syntax error near unexpected token `blas' ./configure: line 2982: ` withval=$with_blas; R_ARG_USE(blas)' OK.. there's a recipe that oen can use, starting with: libtoolize --force but you get: A sequence typical found out there, starting withlibtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `tools'. libtoolize: linking file `tools/ltmain.sh' libtoolize: You should add the contents of the following files to `aclocal.m4': libtoolize: `/usr/share/aclocal/libtool.m4' libtoolize: `/usr/share/aclocal/ltoptions.m4' libtoolize: `/usr/share/aclocal/ltversion.m4' libtoolize: `/usr/share/aclocal/ltsugar.m4' libtoolize: `/usr/share/aclocal/lt~obsolete.m4' libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree. libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am. then: aclocal autoheader automake --force-missing --add-missing first two go OK, but the third gives configure.ac: no proper invocation of was found. configure.ac: You should verify that configure.ac invokes AM_INIT_AUTOMAKE, configure.ac: that aclocal.m4 is present in the top-level directory, configure.ac: and that aclocal.m4 was recently regenerated (using aclocal). automake: no `Makefile.am' found for any configure output then the following runs OK autoconf but running configure gives the same BLAS error. But I'm farily sure one shouldn't run to see what's wrong with BLAS,ratehr it's just the configure options not being read properly. The AM_INIT_AUTOMAKE issue definitely seems important. Is there anything I'm missing? Cheers and thanks in advance! [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] configure.ac
If you check developer.r-project.org, you'll find links to the scripts that we use for building releases and pre-releases of R. These are usually run on a Mac, but shouldn't require much change for Linux. In particular, notice this lead-in in the prerelease script: rm -rf BUILD-dist mkdir BUILD-dist cd R aclocal -I m4 autoconf cd ../BUILD-dist etc -pd > On 1 Aug 2017, at 17:29 , Ramón Fallon wrote: > > Hi, > > Just a quick mail to mention that I cannot generate a new configure script > using autoconf or autoreconf. I had edited the configure.ac and thought ... > "oh, that's my fault", but then I tried it on R-patched and R-3.4.1 without > touching configure.ac and had the same problems. > > The "building R packages" documentation seems to suggest that "autoconf" > should take care of it, but I must be missing something as I expect it to > be a common task. > > I also tried explicit autohell (yes, I know) commands > 1) autoreconf --force -v > completes but invoking configure (with no options) gives > > checking build system type... x86_64-pc-linux-gnu > checking host system type... x86_64-pc-linux-gnu > loading site script './config.site' > loading build-specific script './config.site' > ./configure: line 2982: syntax error near unexpected token `blas' > ./configure: line 2982: ` withval=$with_blas; R_ARG_USE(blas)' > > OK.. there's a recipe that oen can use, starting with: > > libtoolize --force > > but you get: > > A sequence typical found out there, starting withlibtoolize: putting > auxiliary files in AC_CONFIG_AUX_DIR, `tools'. > libtoolize: linking file `tools/ltmain.sh' > libtoolize: You should add the contents of the following files to > `aclocal.m4': > libtoolize: `/usr/share/aclocal/libtool.m4' > libtoolize: `/usr/share/aclocal/ltoptions.m4' > libtoolize: `/usr/share/aclocal/ltversion.m4' > libtoolize: `/usr/share/aclocal/ltsugar.m4' > libtoolize: `/usr/share/aclocal/lt~obsolete.m4' > libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and > libtoolize: rerunning libtoolize, to keep the correct libtool macros > in-tree. > libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am. > > > then: > > aclocal > autoheader > automake --force-missing --add-missing > > first two go OK, but the third gives > > configure.ac: no proper invocation of > was found. > configure.ac: You should verify that configure.ac invokes AM_INIT_AUTOMAKE, > configure.ac: that aclocal.m4 is present in the top-level directory, > configure.ac: and that aclocal.m4 was recently regenerated (using aclocal). > automake: no `Makefile.am' found for any configure output > > then the following runs OK > autoconf > > but running configure gives the same BLAS error. > > But I'm farily sure one shouldn't run to see what's wrong with BLAS,ratehr > it's just the configure options not being read properly. The > AM_INIT_AUTOMAKE > issue definitely seems important. > > Is there anything I'm missing? > > Cheers and thanks in advance! > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] configure.ac
Hi, Just a quick mail to mention that I cannot generate a new configure script using autoconf or autoreconf. I had edited the configure.ac and thought ... "oh, that's my fault", but then I tried it on R-patched and R-3.4.1 without touching configure.ac and had the same problems. The "building R packages" documentation seems to suggest that "autoconf" should take care of it, but I must be missing something as I expect it to be a common task. I also tried explicit autohell (yes, I know) commands 1) autoreconf --force -v completes but invoking configure (with no options) gives checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu loading site script './config.site' loading build-specific script './config.site' ./configure: line 2982: syntax error near unexpected token `blas' ./configure: line 2982: ` withval=$with_blas; R_ARG_USE(blas)' OK.. there's a recipe that oen can use, starting with: libtoolize --force but you get: A sequence typical found out there, starting withlibtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `tools'. libtoolize: linking file `tools/ltmain.sh' libtoolize: You should add the contents of the following files to `aclocal.m4': libtoolize: `/usr/share/aclocal/libtool.m4' libtoolize: `/usr/share/aclocal/ltoptions.m4' libtoolize: `/usr/share/aclocal/ltversion.m4' libtoolize: `/usr/share/aclocal/ltsugar.m4' libtoolize: `/usr/share/aclocal/lt~obsolete.m4' libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree. libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am. then: aclocal autoheader automake --force-missing --add-missing first two go OK, but the third gives configure.ac: no proper invocation of was found. configure.ac: You should verify that configure.ac invokes AM_INIT_AUTOMAKE, configure.ac: that aclocal.m4 is present in the top-level directory, configure.ac: and that aclocal.m4 was recently regenerated (using aclocal). automake: no `Makefile.am' found for any configure output then the following runs OK autoconf but running configure gives the same BLAS error. But I'm farily sure one shouldn't run to see what's wrong with BLAS,ratehr it's just the configure options not being read properly. The AM_INIT_AUTOMAKE issue definitely seems important. Is there anything I'm missing? Cheers and thanks in advance! [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] reproducible segmentation fault installing packages on FreeBSD 11.1
On 31 July 2017 at 19:38, Joseph Mingrone wrote: | This happens when attempting to install any package. There were no such | problems on 11.0. | | Some other ways to trigger the problem: [...] | trying URL 'https://cloud.r-project.org/src/contrib/Rcpp_0.12.12.tar.gz' | [New LWP 100854 of process 56011] | | Thread 7 received signal SIGSEGV, Segmentation fault. | [Switching to LWP 100854 of process 56011] | uw_frame_state_for (context=context@entry=0x7fffdfffde20, fs=fs@entry=0x7fffdfffdb70) | at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c:1249 | 1249 return MD_FALLBACK_FRAME_STATE_FOR (context, fs); Looks like a gcc error to me. Rcpp itself is pretty widely used and tested and I am not aware of it having issue per se on any of the *BSDs. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problems with S4 methods dispatching on `...` (aka dotsMethods)
Thank you Michael for updating the 3.4 branch, the `callNextMethod()` now works for `...` methods as expected. However, I'm still missing your other patch fixing the handling of arguments in `...` methods. It would be really great if this bugfix could be integrated into the 3.4 branch as well, such that the following code doesn't result in an error. Cheers, Andrzej f = function(x, ..., a = b) { b = "missing 'a'" print(a) } f() ## [1] missing 'a' f(a = 1) ## [1] 1 setGeneric("f", signature = "x") # works as the non-generic version f() ## [1] missing 'a' setGeneric("f", signature = "...") # unexpectedly fails to find 'b' f() ## Error in print(a) : object 'b' not found On Fri, Jul 28, 2017 at 9:15 PM, Michael Lawrence wrote: > I pushed the patch to the 3.4 branch. Feel free to test. > > Michael > > On Wed, Jul 26, 2017 at 4:02 AM, Andrzej Oleś > wrote: > > Hi Michael, > > > > it seems that your patch to S4 generics dispatching on `...` is still > > available only in R-devel, and was not included in the minor R-3.4.1 > > release. I was wondering what is the policy of incorporating bug fixes > from > > the devel branch into release, and whether there is any chance that the > > broken `...` dispatch is fixed before R-3.5.0? > > > > Cheers, > > Andrzej > > > > > > On Tue, Apr 25, 2017 at 4:15 PM, Andrzej Oleś > > wrote: > >> > >> You're right, I must have mixed up my R versions when running the > example, > >> as the problem seems to be resolved in R-devel. > >> > >> Sorry for the noise and thanks again for fixing this. > >> > >> Andrzej > >> > >> On Tue, Apr 25, 2017 at 3:55 PM, Michael Lawrence > >> wrote: > >>> > >>> I attempted to fix it, and that example seems to work for me. It's > >>> also a (passing) regression test in R. Are you sure you're using a new > >>> enough R-devel? > >>> > >>> > >>> On Tue, Apr 25, 2017 at 2:34 AM, Andrzej Oleś > >>> wrote: > >>> > Hi Michael, > >>> > > >>> > thanks again for your patch! I've tested it and I'm happy to confirm > >>> > that > >>> > `callNextMethod()` works with methods dispatching on `...`. > >>> > > >>> > However, the second issue I reported still seems to be unresolved. > >>> > Consider > >>> > the following toy example, where the `f()` calls differ in result > >>> > depending > >>> > on whether the dispatch happens on a formal argument or the `...` > >>> > argument. > >>> > > >>> > > >>> > f = function(x, ..., a = b) { > >>> > b = "missing 'a'" > >>> > print(a) > >>> > } > >>> > > >>> > f() > >>> > ## [1] missing 'a' > >>> > > >>> > f(a = 1) > >>> > ## [1] 1 > >>> > > >>> > setGeneric("f", signature = "x") > >>> > > >>> > # works as the non-generic version > >>> > f() > >>> > ## [1] missing 'a' > >>> > > >>> > setGeneric("f", signature = "...") > >>> > > >>> > # unexpectedly fails to find 'b' > >>> > f() > >>> > ## Error in print(a) : object 'b' not found > >>> > > >>> > > >>> > Any chances of fixing this? > >>> > > >>> > Cheers, > >>> > Andrzej > >>> > > >>> > > >>> > > >>> > On Fri, Apr 21, 2017 at 11:40 AM, Andrzej Oleś < > andrzej.o...@gmail.com> > >>> > wrote: > >>> >> > >>> >> Great, thanks Michael for you quick response! > >>> >> > >>> >> I started off with a question on SO because I was not sure whether > >>> >> this > >>> >> was an actual bug or I was just missing something obvious. I'm > looking > >>> >> forward to the patch. > >>> >> > >>> >> Cheers, > >>> >> Andrzej > >>> >> > >>> >> > >>> >> On Thu, Apr 20, 2017 at 10:28 PM, Michael Lawrence > >>> >> wrote: > >>> >>> > >>> >>> Thanks for pointing out these issues. I have a fix that I will > commit > >>> >>> soon. > >>> >>> > >>> >>> Btw, I would never have seen the post on Stack Overflow. It's best > to > >>> >>> report bugs on the bugzilla. > >>> >>> > >>> >>> Michael > >>> >>> > >>> >>> On Thu, Apr 20, 2017 at 8:30 AM, Andrzej Oleś > >>> >>> > >>> >>> wrote: > >>> >>> > Hi all, > >>> >>> > > >>> >>> > I recently encountered some unexpected behavior with S4 generics > >>> >>> > dispatching on `...`, which I described in > >>> >>> > > >>> >>> > > >>> >>> > http://stackoverflow.com/questions/43499203/use- > callnextmethod-with-dotsmethods > >>> >>> > > >>> >>> > TL;DR: `callNextMethod()` doesn't work in methods dispatching on > >>> >>> > `...`, > >>> >>> > and > >>> >>> > arguments of such methods are resolved differently than the > >>> >>> > arguments > >>> >>> > of > >>> >>> > methods dispatching on formal arguments. > >>> >>> > > >>> >>> > Could this indicate a potential problem with the implementation > of > >>> >>> > the > >>> >>> > `...` dispatch? > >>> >>> > > >>> >>> > Cheers, > >>> >>> > Andrzej > >>> >>> > > >>> >>> > [[alternative HTML version deleted]] > >>> >>> > > >>> >>> > __ > >>> >>> > R-devel@r-project.org mailing list > >>> >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >
Re: [Rd] special latin1 do not print as glyphs in current devel on windows
Thank you!. My apologies again for not including the console output in my message before. I sent another e-mail with the output in the meantime, so it should be a bit clearer now, what I am seeing. In case I missed something, please let me know. Yes, I am using latin1 and cp1252 interchangebly here, mostly because Encoding() is reporting the encoding as "latin1". You presumed correctly that my current/default locale's encoding is CP1252. (I also mentioned that my locale is LC_COLLATE=German_Germany.1252 before). As you are changing encodings, you do not want to preserve encoding! > I am not interested in preserving encodings. What I am worried about is that the encoding is not marked anymore, i.e. that Encoding() returns "unknown". In cp1252 encoding on Windows (note that I am using the cp1252 escape "\x80" and not the Unicode "\u20AC") > x_utf8 <- enc2utf8(c("€", "\x80")) > Encoding(x_utf8) [1] "UTF-8" "UTF-8" > x_nat <- enc2native(x_utf8) > Encoding(x_nat) [1] "unknown" "unknown" See also Kirill's message to this list: "ASCII strings are marked as ASCII internally, but this information doesn't seem to be available, e.g., Encoding() returns "unknown" for such strings " http://r.789695.n4.nabble.com/source-parse-and-foreign-UTF-8-characters-tp4733523.html > > Again, this is not the case with iconv() >> >> x_iutf8 <- iconv(x, to = "UTF-8") >> Encoding(x_iutf8) >> x_inat <- iconv(x_iutf8, from = "UTF-8") >> Encoding(x_inat) >> > > iconv is converting from/to the current locale's encoding, presumably > CP1252, not from the marked encoding (as the help page states explicitly.) > I am aware that iconv is not using the marked encoding, but that you either have to set it explicitly or it uses the current locale's default encoding. As I said I am worried about the fact that the encoding markers get lost with the enc2* functions or rather they are not set correctly. I am just using the iconv example to show that iconv is able to set the encoding markers correctly. So it seems generally possible. > x_iutf8 <- iconv(c("€", "\x80"), to = "UTF-8") > Encoding(x_iutf8) [1] "UTF-8" "UTF-8" > x_iutf8 [1] "€" "€" > x_inat <- iconv(x_iutf8, from = "UTF-8") > Encoding(x_inat) [1] "latin1" "latin1" > x_inat [1] "\u0080" "\u0080" [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] special latin1 do not print as glyphs in current devel on windows
You seem confused about Latin-1: those characters are not in Latin-1. (MicroSoft code pages are a proprietary encoding, some code pages such as CP1252 being extensions to Latin-1.) You have not given the 'at a minimum information' asked for in the posting guide so we have no way to reproduce this, and without showing us the output on your system, we have no idea what you saw. [As a convenience to Windows users, R does in some cases assume that they are using Latin-1 encodings. If they use extensions to Latin-1 then there are no guarantees that code written for strict Latin-1 will work.] On 01/08/2017 10:19, Daniel Possenriede wrote: Upon further inspection, I think these are at least two problems. First the issue with printing latin1/cp1252 characters in the "80" to "9F" code range. x <- c("€", "–", "‰") Encoding(x) print(x) I assume that these are Unicode escapes!? (Given that Encoding(x) shows "latin1" I'd rather expect latin1/cp1252 escapes here, but these would be e.g. "\x80", right? My locale is LC_COLLATE=German_Germany.1252 btw.) Now I don't know why print tries to convert to Unicode, but if these indeed are Unicode escapes, then there is something wrong with the conversion from cp1252 to Unicode. In general, most cp1252 char codes translate to Unicode like CP1252: "00" -> Unicode "", "01" -> "0001", "02" -> "0002", etc. see http://www.cp1252.com/. The exception is the cp1252 "80" to "9F" code range. E.g. the Euro sign is "80" in cp1252 but "20AC" in Unicode, endash "96" in cp1252, "2013" in Unicode. The same error seems to happen with enc2utf8(x) Now with iconv() the result is as expected. iconv(x, to = "UTF-8") The second problem IMO is that encoding markers get lost with the enc2* functions As you are changing encodings, you do not want to preserve encoding! x_utf8 <- enc2utf8(x) Encoding(x_utf8) x_nat <- enc2native(x_utf8) Encoding(x_nat) In an actual Latin-1 locale on Linux > x_utf8 <- c("éè", "\u20ac", "\u2013") > Encoding(x_utf8) [1] "latin1" "UTF-8" "UTF-8" > enc2native(x_utf8) [1] "éè" "" "" > Encoding(.Last.value) [1] "latin1" "unknown" "unknown" as expected. Again, this is not the case with iconv() x_iutf8 <- iconv(x, to = "UTF-8") Encoding(x_iutf8) x_inat <- iconv(x_iutf8, from = "UTF-8") Encoding(x_inat) iconv is converting from/to the current locale's encoding, presumably CP1252, not from the marked encoding (as the help page states explicitly.) -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] special latin1 do not print as glyphs in current devel on windows
Sorry, I should have included my console output, obviously. So here we go: Wrong UTF-8 escapes with using print in v3.5.0 devel: # R Under development (unstable) (2017-07-30 r73000) -- "Unsuffered Consequences" # Platform: x86_64-w64-mingw32/x64 (64-bit) > x <- c("€", "–", "‰") > Encoding(x) [1] "latin1" "latin1" "latin1" > print(x) [1] "\u0080" "\u0096" "\u0089" Same output with enc2utf8() > enc2utf8(x) [1] "\u0080" "\u0096" "\u0089" With iconv() the result is as expected. > iconv(x, to = "UTF-8") [1] "€" "–" "‰" The second problem IMO is that encoding markers get lost with the enc2* functions > x_utf8 <- enc2utf8(x) > Encoding(x_utf8) [1] "UTF-8" "UTF-8" "UTF-8" > x_nat <- enc2native(x_utf8) > Encoding(x_nat) [1] "unknown" "unknown" "unknown" This is not the case with iconv() > x_iutf8 <- iconv(x, to = "UTF-8") > Encoding(x_iutf8) [1] "UTF-8" "UTF-8" "UTF-8" > x_inat <- iconv(x_iutf8, from = "UTF-8") > Encoding(x_inat) [1] "latin1" "latin1" "latin1" [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] special latin1 do not print as glyphs in current devel on windows
Upon further inspection, I think these are at least two problems. First the issue with printing latin1/cp1252 characters in the "80" to "9F" code range. x <- c("€", "–", "‰") Encoding(x) print(x) I assume that these are Unicode escapes!? (Given that Encoding(x) shows "latin1" I'd rather expect latin1/cp1252 escapes here, but these would be e.g. "\x80", right? My locale is LC_COLLATE=German_Germany.1252 btw.) Now I don't know why print tries to convert to Unicode, but if these indeed are Unicode escapes, then there is something wrong with the conversion from cp1252 to Unicode. In general, most cp1252 char codes translate to Unicode like CP1252: "00" -> Unicode "", "01" -> "0001", "02" -> "0002", etc. see http://www.cp1252.com/. The exception is the cp1252 "80" to "9F" code range. E.g. the Euro sign is "80" in cp1252 but "20AC" in Unicode, endash "96" in cp1252, "2013" in Unicode. The same error seems to happen with enc2utf8(x) Now with iconv() the result is as expected. iconv(x, to = "UTF-8") The second problem IMO is that encoding markers get lost with the enc2* functions x_utf8 <- enc2utf8(x) Encoding(x_utf8) x_nat <- enc2native(x_utf8) Encoding(x_nat) Again, this is not the case with iconv() x_iutf8 <- iconv(x, to = "UTF-8") Encoding(x_iutf8) x_inat <- iconv(x_iutf8, from = "UTF-8") Encoding(x_inat) [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel