On Tue, 01 Oct 2024 10:33:50 +0900 (JST) Tatsuo Ishii <is...@postgresql.org> wrote:
> >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code > >> point in Unicode. i.e. U+00A0). > >> So grep -P "[\xC2\xA0]" should work to detect nbsp. > > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. > > ([ and ] were not necessary.) > > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in > > charset.sgml, > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making > > sure detecting > > nbsp. > > > > One problem is that -P option can be used in only GNU grep, and grep in mac > > doesn't support it. > > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can > > assume the shell is bash. > > > > Maybe, better way is use perl itself rather than grep as following. > > > > `perl -ne '/\xC2\xA0/ and print' ` > > > > I attached a patch fixed in this way. > > GNU sed can also be used without setting LC_ALL: > > sed -n /"\xC2\xA0"/p > > However I am not sure if non-GNU sed can do this too... Although I've not check it myself, BSD sed doesn't support \x escape according to [1]. [1] https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference By the way, I've attached a patch a bit modified to use the plural form statement as same as check-tabs. Non-breaking **spaces** appear in SGML/XML files Regards, Yugo Nagata > > Best reagards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp -- Yugo NAGATA <nag...@sraoss.co.jp>
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile index 9c9bbfe375..17feae9ed0 100644 --- a/doc/src/sgml/Makefile +++ b/doc/src/sgml/Makefile @@ -194,7 +194,7 @@ MAKEINFO = makeinfo ## # Quick syntax check without style processing -check: postgres.sgml $(ALLSGML) check-tabs +check: postgres.sgml $(ALLSGML) check-tabs check-nbsp $(XMLLINT) $(XMLINCLUDE) --noout --valid $< @@ -259,6 +259,9 @@ endif # sqlmansectnum != 7 check-tabs: @( ! grep ' ' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Tabs appear in SGML/XML files" 1>&2; exit 1) +check-nbsp: + @( ! $(PERL) -ne '/\xC2\xA0/ and print "$$ARGV $$_"' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Non-breaking spaces appear in SGML/XML files" 1>&2; exit 1) + ## ## Clean ##