>>>>> Suharto Anggono Suharto Anggono via R-devel >>>>> on Sat, 6 Nov 2021 08:07:58 +0000 (UTC) writes:
> This issue has come up before: https://stat.ethz.ch/pipermail/r-help/2013-February/346721.html ("gettext wierdness"), https://stat.ethz.ch/pipermail/r-devel/2007-December/047893.html ("gettext() and messages in 'pkg' domain"). > Using 'ngettext' is a workaround, like in https://rdrr.io/cran/svMisc/src/R/svMisc-internal.R . Thank you for the pointers! > It is documented: "For 'gettext', leading and trailing whitespace is ignored when looking for the translation." Indeed; and it *is* a feature but really only valuable when the msgid's (the original message strings) do *not* contain such whitespace. And, in fact, when xgettext() or xgettext2pot() from pkg 'tools' are used to create the original *.pot files, they *also* trim leading and trailing \n, \t and spaces. So ideally there should not be any end(or beginning)-of-line "\n" in the R-base.pot (and hence corresponding <LANG>-base.po ) and as I mentioned there *are* only a few, and we could (should?) consider to remove them from there. A "problem" is still in the many C-code msgid's where end-of-line-"\n" are common. Yes, indeed, one can use the workaround Suharto mentions, ngettext() even though users will typically only look at ngettext() if they want / need to learn about plural/singular messages ... I.e. in our case, this works, and Henrik could get what he wants > Sys.setenv(LANGUAGE = "de") > ngettext(1,"Execution halted\n", "", domain="R") [1] "Ausführung angehalten\n" but it's still not so satisfactory, that you cannot use gettext() itself to look at a considerable proportion of the C/C++/.. level error messages just because they end with "\n". One possibility would be to introduce an optional `trim = TRUE` argument, so the above could be achieved (more efficiently and naturally) by gettext("Execution halted\n", domain="R", trim=FALSE) but in any case, to *not* do the trimming anymore in general, as I proposed yesterday (see below) is not a good idea. > ------------ >>> Martin Maechler >>>>> on Fri, 5 Nov 2021 17:55:24 +0100 writes: >>>>> Tomas Kalibera >>>>> on Fri, 5 Nov 2021 16:15:19 +0100 writes: >>> On 11/5/21 4:12 PM, Duncan Murdoch wrote: >>>> On 05/11/2021 10:51 a.m., Henrik Bengtsson wrote: >>>>> I'm trying to reuse some of the translations available in base R by >>>>> using: >>>>> >>>>> gettext(msgid, domain="R") >>>>> >>>>> This works great for most 'msgid's, e.g. >>>>> >>>>> $ LANGUAGE=de Rscript -e 'gettext("cannot get working directory", >>>>> domain="R")' >>>>> [1] "kann das Arbeitsverzeichnis nicht ermitteln" >>>>> >>>>> However, it does not work for all. For instance, >>>>> >>>>> $ LANGUAGE=de Rscript -e 'gettext("Execution halted\n", domain="R")' >>>>> [1] "Execution halted\n" >>>>> >>>>> This despite that 'msgid' existing in: >>>>> >>>>> $ grep -C 2 -F 'Execution halted\n' src/library/base/po/de.po >>>>> >>>>> #: src/main/main.c:342 >>>>> msgid "Execution halted\n" >>>>> msgstr "Ausführung angehalten\n" >>>>> >>>>> It could be that the trailing newline causes problems, because the >>>>> same happens also for: >>>>> >>>>> $ LANGUAGE=de Rscript --vanilla -e 'gettext("error during cleanup\n", >>>>> domain="R")' >>>>> [1] "error during cleanup\n" >>>>> >>>>> Is this meant to work, and if so, how do I get it to work, or is it a >>>>> bug? >>>> >>>> I don't know the solution, but I think the cause is different than you >>>> think, because I also have the problem with other strings not >>>> including "\n": >>>> >>>> $ LANGUAGE=de Rscript -e 'gettext("malformed version string", >>>> domain="R")' >>>> [1] "malformed version string" >> You need domain="R-base" for the "malformed version "string" >>> I can reproduce Henrik's report and the problem there is that the >>> trailing \n is stripped by R before doing the lookup, in do_gettext >>> /* strip leading and trailing white spaces and >>> add back after translation */ >>> for(p = tmp; >>> *p && (*p == ' ' || *p == '\t' || *p == '\n'); >>> p++, ihead++) ; >>> But, calling dgettext with the trailing \n does translate correctly for me. >>> I'd leave to translation experts how this should work (e.g. whether the >>> .po files should have trailing newlines). >> Thanks a lot, Tomas. >> This is "interesting" .. and I think an R bug one way or the >> other (and I also note that Henrik's guess was also right on !). >> We have the following: >> - New translation *.po source files are to be made from the original *.pot files. >> In our case it's our code that produce R.pot and R-base.pot >> (and more for the non-base packages, and more e.g. for >> Recommended packages 'Matrix' and 'cluster' I maintain). >> And notably the R.pot (from all the "base" C error/warn/.. messages) >> contains tons of msgid strings of the form ".......\n" >> i.e., ending in \n. >>> From that automatically the translator's *.po files should also >> end in \n. >> Additionally, the GNU gettext FAQ has >> (here : https://www.gnu.org/software/gettext/FAQ.html#newline ) >> ------------------------------------------------ >> Q: What does this mean: “'msgid' and 'msgstr' entries do not both end with '\n'” >> A: It means that when the original string ends in a newline, your translation must also end in a newline. And if the original string does not end in a newline, then your translation should likewise not have a newline at the end. >> ------------------------------------------------ >>> From all that I'd conclude that we (R base code) are the source >> of the problem. >> Given the above FAQ, it seems common in other projects also to >> have such trailing \n and so we should really change the C code >> you cite above. >> On the other hand, this is from almost the very beginning of >> when Brian added translation to R, >> ------------------------------------------------------------------------ >> r32938 | ripley | 2005-01-30 20:24:04 +0100 (Sun, 30 Jan 2005) | 2 lines >> include \n in whitespace ignored for R-level gettext >> ------------------------------------------------------------------------ >> I think this has been because simultaneously we had started to >> emphasize to useRs they should *not* end message/format strings >> in stop() / warning() by a new line, but rather stop() and >> warning() would *add* the newlines(s) themselves. >> Still, currently we have a few such cases in R-base.pot, >> but just these few and maybe they really are "in error", in the >> sense we could drop the ending '\n' (and do the same in all the *.po files!), >> and newlines would be appended later {{not just by Rstudio which >> graceously adds final newlines in its R console, even for say >> cat("abc") }} >> However, this is quite different for all the message strings from C, as >> used there in error() or warn() e.g., and so in R.pot >> we see many many msg strings ending in "\n" (which must then >> also be in the *.po files. >> My current conclusion is we should try simplifying the >> do_gettext() code and *not* remove and re-add the '\n' (nor the >> '\t' I think ...) > After such a change, I indeed do see > $ LANGUAGE=de bin/Rscript --vanilla -e 'gettext("Execution halted\n", domain="R")' > [1] "Ausführung angehalten\n" > $ LANGUAGE=de bin/Rscript --vanilla -e 'message("Execution halted\n", domain="R")' > Ausführung angehalten > $ LANGUAGE=de bin/Rscript --vanilla -e 'warning("Execution halted\n", domain="R")' > Warnmeldung: > Ausführung angehalten > $ > (note the extra newline after the German translation!) > whereas before, not only using gettext() directly did not work, > but also using warning() or message() {with or without trailing \n} > were never translated. > ... and my simple #ifdef .. #endif change around the head/tail > save and restor seems to pass make check-devel ... > so I will be looking into dropping all those "head" and "tail" add > and remove parts in do_gettext() as they really seem to harm given the current > translation data bases which indeed *are* full of final '\n' in > `msgid` and corresponding translated `msgstr` .... > So, no need for a bugzilla PR nor a patch, please. > Maybe further examples which add something interesting in > addition to the ones we have here. > Thank you again, Henrik, Duncan, and Tomas! > Martin > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel