On Mon, Jun 21, 2021 at 02:47:16PM +0900, Tatsuo Ishii wrote:
> > I got the parse error after applying the patch:
> >
> > release-14.sgml:3562: parser error : Input is not proper UTF-8,
> > indicate encoding !
> > Bytes: 0xE9 0x20 0x53 0x61
> > (Juan Jos Santamara Flecha)
> > ^
> >
> > Is that a problem with my environment?
>
> Me too. I think the problem is, Bruce's patch is encoded in
> ISO-8859-1, not UTF-8. As far as I know PostgreSQL never encodes
> *.sgml files in ISO-8859-1. Anyway, attached is the Bruce's patch
> encoded in UTF-8. This works for me.
>
> My guess is, when Bruce attached the file, his MUA automatically
> changed the file encoding from UTF-8 to ISO-8859-1 (it could happen in
> many MUA). Also that's the reason why he does not see the problem
> while compiling the sgml files. In his environment release-14.sgml is
> encoded in UTF-8, I guess. To prevent the problem next time, it's
> better to change the mime type of the attached file to
> Application/Octet-Stream.
Oh, people were testing by building from the attached patch, not from
the git tree. Yes, I see now the email was switched to a single-byte
encoding, and the attachment header confirms it:
Content-Type: text/x-diff; charset=iso-8859-1
----------
Content-Disposition: attachment; filename="master.diff"
Content-Transfer-Encoding: 8bit
I guess my email program, mutt, is trying to be helpful by using a
single-byte encoding when UTF is not necessary, which I guess makes
sense. I will try to remember this can cause problems with SGML
attachments.
--
Bruce Momjian <[email protected]> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.