On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA <nag...@sraoss.co.jp> wrote:

> On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
> Tatsuo Ishii <is...@postgresql.org> wrote:
> 
> > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
> > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
> > >> point in Unicode. i.e. U+00A0).
> > >> So grep -P "[\xC2\xA0]" should work to detect nbsp.
> > > 
> > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. 
> > > ([ and ] were not necessary.)
> > > 
> > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in 
> > > charset.sgml,
> > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for 
> > > making sure detecting
> > > nbsp.
> > > 
> > > One problem is that -P option can be used in only GNU grep, and grep in 
> > > mac doesn't support it.
> > > 
> > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can 
> > > assume the shell is bash.
> > > 
> > > Maybe, better way is use perl itself rather than grep as following.
> > > 
> > >  `perl -ne '/\xC2\xA0/ and print' `
> > > 
> > > I attached a patch fixed in this way.
> > 
> > GNU sed can also be used without setting LC_ALL:
> > 
> > sed -n /"\xC2\xA0"/p
> > 
> > However I am not sure if non-GNU sed can do this too...
> 
> Although I've not check it myself, BSD sed doesn't support \x escape 
> according to [1].
> 
> [1] 
> https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference
> 
> By the way, I've attached a patch a bit modified to use the plural form 
> statement
> as same as check-tabs.
> 
>  Non-breaking **spaces** appear in SGML/XML files

The previous patch was broken because the perl command failed to return the 
correct result.
I've attached an updated patch to fix the return value. In passing, I added 
line breaks
for long lines.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nag...@sraoss.co.jp>
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 9c9bbfe375..e5607585af 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -194,7 +194,7 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALLSGML) check-tabs
+check: postgres.sgml $(ALLSGML) check-tabs check-nbsp
 	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
 
 
@@ -255,9 +255,15 @@ clean-man:
 
 endif # sqlmansectnum != 7
 
-# tabs are harmless, but it is best to avoid them in SGML files
+# tabs and non-breaking spaces are harmless, but it is best to avoid them in SGML files
 check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
+	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
+	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
+
+check-nbsp:
+	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
+	  $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
+	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
 
 ##
 ## Clean

Reply via email to