On 23/04/2024 02:01, Ihor Radchenko wrote:
For example, consider an HTML exporter that aligns tags nicely and
keeps blank lines between markup blocks for readability.  If we
remove such blank lines unconditionally, it will be problematic.

I consider that just newlines are enough to make HTML markup human readable. I believe blank lines appear in HTML due to conditional constructs interpreted by various template engines and almost nobody cares concerning actual formatting in such cases.

However I proposed to make this feature an option that is turned on by default.

I guess that I can change the condition to not include trailing space
from (rx whitespace eol) to (rx (any " \t|) eol).

One more time I forgot that neither \n nor non-breakable space are included into post-blank.

I think, more permissive regexp may be used. At least it should accept newlines and any space after it

    (rx (any " \t" eol) (zero-or-more whitespace) eos)

Moreover, post-blank of the pruned object may be ignored when the following element starts with spaces other than purely zero width ones.

In my opinion, keeping extra spaces (e.g. post-blank ones from pruned objects) makes less harm than aggressively stripping them. Anyway some backends must normalize spaces (while for others they do not matter).

While newline characters are not affected, this part of change does not affect accidental split of paragraphs.

My feeling is that extensive test suite is required. It would be easier to review what cases are not handled yet.



Reply via email to