Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-14 Thread Dan Jacobson
OK, I now installed fonts-droid-fallback. Alas, no change.



Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-14 Thread Jonas Smedegaard
Do you have fonts-droid-fallback installed?  If not, please try install
it and see if it improves ḿatters.


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/
 * Sponsorship: https://ko-fi.com/drjones

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-13 Thread Dan Jacobson
sh /tmp/text.sh
2 n.txt
C.UTF-8:2
name type  encoding emb sub 
uni object ID
 -  --- --- 
--- -
XEVJPT+WenQuanYiZenHei   CID TrueType  Identity-H   yes yes 
yes  6  0
MYDSFU+LiberationSerif   TrueType  WinAnsi  yes yes 
yes  7  0
zh_CN.UTF-8:5
name type  encoding emb sub 
uni object ID
 -  --- --- 
--- -
HFTSNP+WenQuanYiZenHeiMono   TrueType  WinAnsi  yes yes 
yes  6  0
MYDSFU+LiberationSerif   TrueType  WinAnsi  yes yes 
yes  7  0
zh_TW.UTF-8:5
name type  encoding emb sub 
uni object ID
 -  --- --- 
--- -
HFTSNP+WenQuanYiZenHeiMono   TrueType  WinAnsi  yes yes 
yes  6  0
MYDSFU+LiberationSerif   TrueType  WinAnsi  yes yes 
yes  7  0> "JS" == Jonas Smedegaard  writes:

from

set -e
cd /tmp
t=n.txt
LC_ALL=C.UTF-8 echo 郵編123 > $t
echo >> $t
wc -l $t
for l in C.UTF-8 zh_CN.UTF-8 zh_TW.UTF-8
do
printf $l:\\t
LC_ALL=$l abiword --to=pdf $t
LC_ALL=C.UTF-8 pdftotext -nopgbrk n.pdf -|wc -l
pdffonts n.pdf
done



n.pdf
Description: Adobe PDF document


Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-13 Thread Jonas Smedegaard
Quoting Dan Jacobson (2023-04-13 22:13:23)
> I get
> 2 n.txt
> C.UTF-8:2
> zh_CN.UTF-8:5
> zh_TW.UTF-8:5

Try replace this line in your script:

echo 郵編123 > $t

with this:

LC_ALL=C.UTF-8 echo 郵編123 > $t

Otherwise the script will be conditional to locale of testing
environment for *creation* of the text, which then affects conversion.


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/
 * Sponsorship: https://ko-fi.com/drjones

 [x] quote me freely  [ ] ask before reusing  [ ] keep private



Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-13 Thread Dan Jacobson
set -e
cd /tmp
t=n.txt
echo 郵編123 > $t
echo >> $t
wc -l $t
for l in C.UTF-8 zh_CN.UTF-8 zh_TW.UTF-8
do
printf $l:\\t
LC_ALL=$l abiword --to=pdf $t
LC_ALL=C.UTF-8 pdftotext -nopgbrk n.pdf -|wc -l
#pdffonts n.pdf
done
I get
2 n.txt
C.UTF-8:2
zh_CN.UTF-8:5
zh_TW.UTF-8:5


Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-13 Thread Jonas Smedegaard
Quoting Jonas Smedegaard (2023-04-13 13:13:34)
> Control: tag -1 + unreproducible moreinfo
> 
> Hi Jidanni,
> 
> Quoting Dan Jacobson (2023-04-13 02:56:11)
> > Here we see that there is a bug in the pdf creator:
> > { echo 哈哈 郵編123 哈哈; echo 郵編123;} > /tmp/n.txt
> > abiword --to=pdf /tmp/n.txt
> > When viewing the resultant pdf, one line is garbled.
> > 
> > But if I use
> > LC_ALL=C abiword --to=pdf /tmp/n.txt
> > then all worked fine.
> 
> Sorry, I cannot reproduce, neither consistently using my personal locale
> da_DK.UTF-8 nor consistently using zh_TW.UTF-8.
> 
> Is the contents of the shell-generated plaintext file identical to a
> plaintext file generated while having LC_ALL=C.UTF-8 ?
> 
> Do abiword produce garbled or correct PDF if instead of (your complex
> personal settings or) C the locale is generally set to C.UTF-8?
> 
> If the bug is only reproducible on your system, then it is highly
> unlikely to be addressed upstream (nor by custom-patching in Debian).
> So I recommend that you try create a minimally reproducible test - e.g.
> a shell script that when executed in a pristine Debian system account
> with locale C.UTF-8 (i.e. where any unusual locales are exported within
> the shell script, assuming only that locales are generated on the host)
> reproduce the problem.
> 
> > I didn't see anything on the abiword man page about controling what fonts 
> > get used.
> 
> Smells independent from the above reported bug, in that changes to
> locale is unlikely to involve changes to fonts (and you did not mention
> above any changes to fonts either).
> 
> This might help: http://abiword.com/help/en-US/howto/howtonormaltemplate.html
> 
> Please file separate bugreports for each issue.

Ohh, might be related after all: Looking below
/usr/share/abiword-3.0/templates there are locale-specific normal.awt-*
files, which I guess mean that a unusual locale would cause the file
normal.awt to be used, and that that file might be rarely tested because
commonly it is shadowed by a locale-specific file.

Hope that helps.

 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/
 * Sponsorship: https://ko-fi.com/drjones

 [x] quote me freely  [ ] ask before reusing  [ ] keep private



Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-13 Thread Jonas Smedegaard
Control: tag -1 + unreproducible moreinfo

Hi Jidanni,

Quoting Dan Jacobson (2023-04-13 02:56:11)
> Here we see that there is a bug in the pdf creator:
> { echo 哈哈 郵編123 哈哈; echo 郵編123;} > /tmp/n.txt
> abiword --to=pdf /tmp/n.txt
> When viewing the resultant pdf, one line is garbled.
> 
> But if I use
> LC_ALL=C abiword --to=pdf /tmp/n.txt
> then all worked fine.

Sorry, I cannot reproduce, neither consistently using my personal locale
da_DK.UTF-8 nor consistently using zh_TW.UTF-8.

Is the contents of the shell-generated plaintext file identical to a
plaintext file generated while having LC_ALL=C.UTF-8 ?

Do abiword produce garbled or correct PDF if instead of (your complex
personal settings or) C the locale is generally set to C.UTF-8?

If the bug is only reproducible on your system, then it is highly
unlikely to be addressed upstream (nor by custom-patching in Debian).
So I recommend that you try create a minimally reproducible test - e.g.
a shell script that when executed in a pristine Debian system account
with locale C.UTF-8 (i.e. where any unusual locales are exported within
the shell script, assuming only that locales are generated on the host)
reproduce the problem.

> I didn't see anything on the abiword man page about controling what fonts get 
> used.

Smells independent from the above reported bug, in that changes to
locale is unlikely to involve changes to fonts (and you did not mention
above any changes to fonts either).

This might help: http://abiword.com/help/en-US/howto/howtonormaltemplate.html

Please file separate bugreports for each issue.

Kind regards,

 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/
 * Sponsorship: https://ko-fi.com/drjones

 [x] quote me freely  [ ] ask before reusing  [ ] keep private



Bug#1034332: Occasional garbled Chinese pdf lines

2023-04-13 Thread Dan Jacobson
Package: abiword
Version: 3.0.5~dfsg-3.2
File: /usr/bin/abiword

Here we see that there is a bug in the pdf creator:
{ echo 哈哈 郵編123 哈哈; echo 郵編123;} > /tmp/n.txt
abiword --to=pdf /tmp/n.txt
When viewing the resultant pdf, one line is garbled.

But if I use
LC_ALL=C abiword --to=pdf /tmp/n.txt
then all worked fine.

OK, what was my
$ locale?
LANG=zh_TW.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=zh_TW.UTF-8
LC_NUMERIC="zh_TW.UTF-8"
LC_TIME="zh_TW.UTF-8"
LC_COLLATE=C
LC_MONETARY="zh_TW.UTF-8"
LC_MESSAGES=C
LC_PAPER="zh_TW.UTF-8"
LC_NAME="zh_TW.UTF-8"
LC_ADDRESS="zh_TW.UTF-8"
LC_TELEPHONE="zh_TW.UTF-8"
LC_MEASUREMENT="zh_TW.UTF-8"
LC_IDENTIFICATION="zh_TW.UTF-8"
LC_ALL=

So sure, always use LC_ALL=C ... but some other characters still get
messed up. But I would have to send you my whole secret letter to
reproduce it. So just take my word for it.

And using LC_ALL=zh_CN.UTF-8 didn't help.

Versions of packages abiword recommends:
pn  abiword-plugin-grammar 
ii  aspell-en [aspell-dictionary]  2020.12.07-0-1
ii  fonts-liberation   1:1.07.4-11
ii  poppler-utils  22.12.0-2+b1

Yes, pdffonts will show the fonts used.
I didn't see anything on the abiword man page about controling what fonts get 
used.