[
https://issues.apache.org/jira/browse/DOXIA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905543#comment-17905543
]
Konrad Windszus commented on DOXIA-767:
---------------------------------------
Compare with
https://www.w3.org/TR/2000/WD-xhtml-modularization-20000105/conformance.html
{quote}
In elements where the 'xml:space' attribute is set to 'preserve', the user
agent must leave all whitespace characters intact (with the exception of
leading and trailing whitespace characters, which should be removed).
Otherwise, whitespace is handled according to the following rules:
All whitespace surrounding block elements should be removed.
Comments are removed entirely and do not affect whitespace handling. One
whitespace character on either side of a comment is treated as two white space
characters.
Leading and trailing whitespace inside a block element must be removed.
Line feed characters within a block element must be converted into a space
(except when the 'xml:space' attribute is set to 'preserve').
A sequence of white space characters must be reduced to a single space
character (except when the 'xml:space' attribute is set to 'preserve').
With regard to rendition, the User Agent should render the content in a manner
appropriate to the language in which the content is written. In languages whose
primary script is Latinate, the ASCII space character is typically used to
encode both grammatical word boundaries and typographic whitespace; in
languages whose script is related to Nagari (e.g., Sanskrit, Thai, etc.),
grammatical boundaries may be encoded using the ZW 'space' character, but will
not typically be represented by typographic whitespace in rendered output;
languages using Arabiform scripts may encode typographic whitespace using a
space character, but may also use the ZW space character to delimit 'internal'
grammatical boundaries (what look like words in Arabic to an English eye
frequently encode several words, e.g. 'kitAbuhum' = 'kitAbu-hum' = 'book them'
== their book); and languages in the Chinese script tradition typically neither
encode such delimiters nor use typographic whitespace in this way.
{quote}
> Clarify block vs inline semantics in Sink API
> ---------------------------------------------
>
> Key: DOXIA-767
> URL: https://issues.apache.org/jira/browse/DOXIA-767
> Project: Maven Doxia
> Issue Type: Task
> Components: Documentation
> Reporter: Konrad Windszus
> Priority: Major
>
> The javadoc of
> https://maven.apache.org/doxia/doxia/doxia-sink-api/apidocs/org/apache/maven/doxia/sink/Sink.html
> should be clarified with regards to trimming whitespace behaviour for
> block/inline elements. Not necessarily each block Sink element is a block
> element in the desired output format (e.g. XHTML). Most prominently this is
> the case with
> {{Sink.verbatim(Decoration:source)}} which is a block element in Sink API but
> opens an inline element ({{<pre><code>}}) in XHTML.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)