Bug#1034332: Occasional garbled Chinese pdf lines
OK, I now installed fonts-droid-fallback. Alas, no change.
Bug#1034332: Occasional garbled Chinese pdf lines
Do you have fonts-droid-fallback installed? If not, please try install it and see if it improves ḿatters. - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ * Sponsorship: https://ko-fi.com/drjones [x] quote me freely [ ] ask before reusing [ ] keep private signature.asc Description: signature
Bug#1034332: Occasional garbled Chinese pdf lines
sh /tmp/text.sh 2 n.txt C.UTF-8:2 name type encoding emb sub uni object ID - --- --- --- - XEVJPT+WenQuanYiZenHei CID TrueType Identity-H yes yes yes 6 0 MYDSFU+LiberationSerif TrueType WinAnsi yes yes yes 7 0 zh_CN.UTF-8:5 name type encoding emb sub uni object ID - --- --- --- - HFTSNP+WenQuanYiZenHeiMono TrueType WinAnsi yes yes yes 6 0 MYDSFU+LiberationSerif TrueType WinAnsi yes yes yes 7 0 zh_TW.UTF-8:5 name type encoding emb sub uni object ID - --- --- --- - HFTSNP+WenQuanYiZenHeiMono TrueType WinAnsi yes yes yes 6 0 MYDSFU+LiberationSerif TrueType WinAnsi yes yes yes 7 0> "JS" == Jonas Smedegaard writes: from set -e cd /tmp t=n.txt LC_ALL=C.UTF-8 echo 郵編123 > $t echo >> $t wc -l $t for l in C.UTF-8 zh_CN.UTF-8 zh_TW.UTF-8 do printf $l:\\t LC_ALL=$l abiword --to=pdf $t LC_ALL=C.UTF-8 pdftotext -nopgbrk n.pdf -|wc -l pdffonts n.pdf done n.pdf Description: Adobe PDF document
Bug#1034332: Occasional garbled Chinese pdf lines
Quoting Dan Jacobson (2023-04-13 22:13:23) > I get > 2 n.txt > C.UTF-8:2 > zh_CN.UTF-8:5 > zh_TW.UTF-8:5 Try replace this line in your script: echo 郵編123 > $t with this: LC_ALL=C.UTF-8 echo 郵編123 > $t Otherwise the script will be conditional to locale of testing environment for *creation* of the text, which then affects conversion. - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ * Sponsorship: https://ko-fi.com/drjones [x] quote me freely [ ] ask before reusing [ ] keep private
Bug#1034332: Occasional garbled Chinese pdf lines
set -e cd /tmp t=n.txt echo 郵編123 > $t echo >> $t wc -l $t for l in C.UTF-8 zh_CN.UTF-8 zh_TW.UTF-8 do printf $l:\\t LC_ALL=$l abiword --to=pdf $t LC_ALL=C.UTF-8 pdftotext -nopgbrk n.pdf -|wc -l #pdffonts n.pdf done I get 2 n.txt C.UTF-8:2 zh_CN.UTF-8:5 zh_TW.UTF-8:5
Bug#1034332: Occasional garbled Chinese pdf lines
Quoting Jonas Smedegaard (2023-04-13 13:13:34) > Control: tag -1 + unreproducible moreinfo > > Hi Jidanni, > > Quoting Dan Jacobson (2023-04-13 02:56:11) > > Here we see that there is a bug in the pdf creator: > > { echo 哈哈 郵編123 哈哈; echo 郵編123;} > /tmp/n.txt > > abiword --to=pdf /tmp/n.txt > > When viewing the resultant pdf, one line is garbled. > > > > But if I use > > LC_ALL=C abiword --to=pdf /tmp/n.txt > > then all worked fine. > > Sorry, I cannot reproduce, neither consistently using my personal locale > da_DK.UTF-8 nor consistently using zh_TW.UTF-8. > > Is the contents of the shell-generated plaintext file identical to a > plaintext file generated while having LC_ALL=C.UTF-8 ? > > Do abiword produce garbled or correct PDF if instead of (your complex > personal settings or) C the locale is generally set to C.UTF-8? > > If the bug is only reproducible on your system, then it is highly > unlikely to be addressed upstream (nor by custom-patching in Debian). > So I recommend that you try create a minimally reproducible test - e.g. > a shell script that when executed in a pristine Debian system account > with locale C.UTF-8 (i.e. where any unusual locales are exported within > the shell script, assuming only that locales are generated on the host) > reproduce the problem. > > > I didn't see anything on the abiword man page about controling what fonts > > get used. > > Smells independent from the above reported bug, in that changes to > locale is unlikely to involve changes to fonts (and you did not mention > above any changes to fonts either). > > This might help: http://abiword.com/help/en-US/howto/howtonormaltemplate.html > > Please file separate bugreports for each issue. Ohh, might be related after all: Looking below /usr/share/abiword-3.0/templates there are locale-specific normal.awt-* files, which I guess mean that a unusual locale would cause the file normal.awt to be used, and that that file might be rarely tested because commonly it is shadowed by a locale-specific file. Hope that helps. - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ * Sponsorship: https://ko-fi.com/drjones [x] quote me freely [ ] ask before reusing [ ] keep private
Bug#1034332: Occasional garbled Chinese pdf lines
Control: tag -1 + unreproducible moreinfo Hi Jidanni, Quoting Dan Jacobson (2023-04-13 02:56:11) > Here we see that there is a bug in the pdf creator: > { echo 哈哈 郵編123 哈哈; echo 郵編123;} > /tmp/n.txt > abiword --to=pdf /tmp/n.txt > When viewing the resultant pdf, one line is garbled. > > But if I use > LC_ALL=C abiword --to=pdf /tmp/n.txt > then all worked fine. Sorry, I cannot reproduce, neither consistently using my personal locale da_DK.UTF-8 nor consistently using zh_TW.UTF-8. Is the contents of the shell-generated plaintext file identical to a plaintext file generated while having LC_ALL=C.UTF-8 ? Do abiword produce garbled or correct PDF if instead of (your complex personal settings or) C the locale is generally set to C.UTF-8? If the bug is only reproducible on your system, then it is highly unlikely to be addressed upstream (nor by custom-patching in Debian). So I recommend that you try create a minimally reproducible test - e.g. a shell script that when executed in a pristine Debian system account with locale C.UTF-8 (i.e. where any unusual locales are exported within the shell script, assuming only that locales are generated on the host) reproduce the problem. > I didn't see anything on the abiword man page about controling what fonts get > used. Smells independent from the above reported bug, in that changes to locale is unlikely to involve changes to fonts (and you did not mention above any changes to fonts either). This might help: http://abiword.com/help/en-US/howto/howtonormaltemplate.html Please file separate bugreports for each issue. Kind regards, - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ * Sponsorship: https://ko-fi.com/drjones [x] quote me freely [ ] ask before reusing [ ] keep private
Bug#1034332: Occasional garbled Chinese pdf lines
Package: abiword Version: 3.0.5~dfsg-3.2 File: /usr/bin/abiword Here we see that there is a bug in the pdf creator: { echo 哈哈 郵編123 哈哈; echo 郵編123;} > /tmp/n.txt abiword --to=pdf /tmp/n.txt When viewing the resultant pdf, one line is garbled. But if I use LC_ALL=C abiword --to=pdf /tmp/n.txt then all worked fine. OK, what was my $ locale? LANG=zh_TW.UTF-8 LANGUAGE=en_US:en LC_CTYPE=zh_TW.UTF-8 LC_NUMERIC="zh_TW.UTF-8" LC_TIME="zh_TW.UTF-8" LC_COLLATE=C LC_MONETARY="zh_TW.UTF-8" LC_MESSAGES=C LC_PAPER="zh_TW.UTF-8" LC_NAME="zh_TW.UTF-8" LC_ADDRESS="zh_TW.UTF-8" LC_TELEPHONE="zh_TW.UTF-8" LC_MEASUREMENT="zh_TW.UTF-8" LC_IDENTIFICATION="zh_TW.UTF-8" LC_ALL= So sure, always use LC_ALL=C ... but some other characters still get messed up. But I would have to send you my whole secret letter to reproduce it. So just take my word for it. And using LC_ALL=zh_CN.UTF-8 didn't help. Versions of packages abiword recommends: pn abiword-plugin-grammar ii aspell-en [aspell-dictionary] 2020.12.07-0-1 ii fonts-liberation 1:1.07.4-11 ii poppler-utils 22.12.0-2+b1 Yes, pdffonts will show the fonts used. I didn't see anything on the abiword man page about controling what fonts get used.