po 26. 8. 2024 v 13:28 odesÃlatel Jim Jones <jim.jo...@uni-muenster.de> napsal:
> > > On 26.08.24 12:30, Pavel Stehule wrote: > > I think so there should be specified the target of CANONICAL - it is a > > partial replacement of NO INDENT or it produces format just for > > comparing? The CANONICAL format is not probably extra standardized, > > because libxml2 removes indenting, but examples in > > https://www.w3.org/TR/xml-c14n11/ doesn't do it. So this format makes > > sense just for local operations. > My idea with CANONICAL was not to replace NO INDENT. The intent was to > format xml strings in an standardized way, so that they can be compared. > For instance, removing comments, sorting attributes, converting CDATA > strings, converting empty elements to start-end tag pairs, removing > white spaces between elements, etc ... > > The W3C recommendation for Canonical XML[1] dictates the following > regarding the removal of whitespaces between elements : > > * Whitespace outside of the document element and within start and end > tags is normalized > * All whitespace in character content is retained (excluding characters > removed during line feed normalization) > > > > > I like this functionality, and it is great so the functionality from > > libxml2 can be used, but I think, so the fact that there are four not > > compatible implementations of xmlserialize is messy. Can be nice, if > > we find some intersection between SQL/XML, Oracle instead of new > > proprietary syntax. > > > > In Oracle syntax the CANONICAL is +/- NO INDENT SHOW DEFAULT ? > > No. > XMLSERIALIZE ... NO INDENT is supposed, as the name suggests, to > serialize an xml string without indenting it. One could argue that not > indenting can be translated as removing indentation, but I couldn't find > anything concrete about this in the SQL/XML spec. If it's indeed the > case, we should correct XMLSERIALIZE .. NO INDENT, but it is unrelated > to this patch. > > CANONICAL serializes a physical representation of an xml document. In a > nutshell, XMLSERIALIZE ... CANONICAL sort of "rewrites" the xml string > with the following rules (list from the W3C recommendation): > > * The document is encoded in UTF-8 > * Line breaks normalized to #xA on input, before parsing > * Attribute values are normalized, as if by a validating processor > * Character and parsed entity references are replaced > * CDATA sections are replaced with their character content > * The XML declaration and document type declaration are removed > * Empty elements are converted to start-end tag pairs > * Whitespace outside of the document element and within start and end > tags is normalized > * All whitespace in character content is retained (excluding characters > removed during line feed normalization) > * Attribute value delimiters are set to quotation marks (double quotes) > * Special characters in attribute values and character content are > replaced by character references > * Superfluous namespace declarations are removed from each element > * Default attributes are added to each element > * Fixup of xml:base attributes [C14N-Issues] is performed > * Lexicographic order is imposed on the namespace declarations and > attributes of each element > > btw: Oracle's SIZE =, HIDE DEFAULTS, and SHOW DEFAULTS are not part of > the SQL/XML standard either :) > I know - looks so this function is not well designed generally > > > My objection against CANONICAL so SQL/XML and Oracle allows to > > parametrize XMLSERIALIZE more precious and before implementing new > > feature, we should to clean table and say, what we want to have in > > XMLSERIALIZE. > > > > An alternative of enhancing of XMLSERIALIZE I can imagine just > > function "to_canonical(xml, without_comments bool default false)". In > > this case we don't need to solve relations against SQL/XML or Oracle. > > To create a separated serialization function would be IMHO way less > elegant than to parametrize XMLSERIALIZE, but it would be something I > could live with in case we decide to go down this path. > I am not strongly against enhancing XMLSERIALIZE, but it can be nice to see some wider concept first. Currently the state looks just random - and I didn't see any serious discussion about implementation fo SQL/XML. We don't need to be necessarily compatible with Oracle, but it can help if we have a functionality that can be used for conversions. > Thanks! > > -- > Jim > > 1 - https://www.w3.org/TR/xml-c14n11/ > >