Andrew Dunstan <and...@dunslane.net> writes:

> At
> <https://www.postgresql.org/message-id/543620.1629899413%40sss.pgh.pa.us>
> Tom noted:
>
>> You have to be very careful these days when applying stale patches to
>> func.sgml --- there's enough duplicate boilerplate that "patch' can easily
>> be fooled into dumping an addition into the wrong place. 
>
> This is yet another indication to me that there's probably a good case
> for breaking func.sgml up into sections. It is by a very large margin
> the biggest file in our document sources (the next largest is less than
> half the number of lines).
>
> thoughts?

It would make sense to follow a similar pattern to datatype.sgml and
break out the largest sections.  I whipped up a quick awk script to get
an idea of the sizes of the sections in the file:

$ awk '$1 == "<sect1" { start = NR; name = $2 }
       $1 == "</sect1>" { print NR-start, name }' \
   func.sgml | sort -rn
3076 id="functions-info">
2506 id="functions-admin">
2463 id="functions-json">
2352 id="functions-matching">
2028 id="functions-datetime">
1672 id="functions-string">
1466 id="functions-math">
1263 id="functions-geometry">
1252 id="functions-xml">
1220 id="functions-aggregate">
1165 id="functions-formatting">
1053 id="functions-textsearch">
1049 id="functions-range">
785 id="functions-binarystring">
625 id="functions-comparison">
591 id="functions-net">
552 id="functions-array">
357 id="functions-bitstring">
350 id="functions-comparisons">
348 id="functions-subquery">
327 id="functions-event-triggers">
284 id="functions-conditional">
283 id="functions-window">
282 id="functions-srf">
181 id="functions-sequence">
145 id="functions-logical">
134 id="functions-trigger">
120 id="functions-enum">
84 id="functions-statistics">
31 id="functions-uuid">

Tangentially, running the same on datatype.sgml indicates that the
datetime section might do with splitting out:

$ awk '$1 == "<sect1" { start = NR; name = $2 }
       $1 == "</sect1>" { print NR-start, name }' \
   datatype.sgml | sort -rn
1334 id="datatype-datetime">
701 id="datatype-numeric">
374 id="datatype-net-types">
367 id="datatype-oid">
320 id="datatype-geometric">
310 id="datatype-pseudo">
295 id="datatype-binary">
256 id="datatype-character">
245 id="datatype-textsearch">
197 id="datatype-xml">
160 id="datatype-enum">
119 id="datatype-boolean">
81 id="datatype-money">
74 id="datatype-bit">
51 id="domains">
49 id="datatype-uuid">
30 id="datatype-pg-lsn">

The existing split-out sections of datatype.sgml are:

$ wc -l json.sgml array.sgml rowtypes.sgml rangetypes.sgml | grep -v total | 
sort -rn
  1006 json.sgml
   797 array.sgml
   592 rangetypes.sgml
   540 rowtypes.sgml

The names are also rather inconsistent and vague, especially "json" and
"array". If we split the json section out of func.sgml, we might want to
rename these datatype-foo.sgml instead of foo(types).sgml, or go the
whole hog and create subdirectories and move all the sections into
separate files in them, like with reference.sgml.

- ilmari


Reply via email to