Sorry if you got that impression, but TOP DOWN, I gave you an very short answer going to more details as you read on.
Josef Von: [email protected] [mailto:[email protected]] Im Auftrag von Sam Carleton Gesendet: Mittwoch, 23. Februar 2011 17:24 An: Apache AXIS C User List Betreff: Re: axutil_xml_quote_string and apostrophes Josef, Your reply gives me the impression you expect me to become an expert in XML. What drives me nuts about open source is that the community seems to expect everyone to be an expert in everything, I simply don't have time to learn every last detail of every little tool I use in the world when my goal is developing application, not building tools. I guess I will run with what I got and hope new issues don't come up. So far all seems to work well. Sam On Tue, Feb 22, 2011 at 4:27 AM, Stadelmann Josef <[email protected]> wrote: Yes, there is in fact a reason for that. The restriction is given by the XML standards http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name which explaisn the usage of certain characters. greater is used in a stream to open a tag, less is used as part of the closing tag, double quotes are used to name a tag. In very short and maybe too simple: Given the case the parser should read <statement>40 is < then 70</statement> The parser has a problem. after the opening tag <statement> is read the parser looks for a closing tag starting with a "<". ON-FOUND it expects the next character of the closing tag to come in. It finds the "<" and it expects next a "/" but, as it does not find one it struggles and has to report an error. As a consequence: "<" can't be used as data in between the opening and closing tag. It can however be transmitted by using an escaping technology. I suggest you to read "just a bit" about http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name to get a better understanding why certain characters can be used in text and other not. However to send in a xml stream '<' or '>' or '"' there are ways to do so. In this case an escaping technique is used. AND READ http://en.wikipedia.org/wiki/Character_encoding because you should never forget encodings used when parsing or writing xml documents by your own code. Excerpt from the documents link given above (see red bold text below first) 2.2 Characters [Definition: A parsed entity contains text, a sequence of characters <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-character> , which may represent markup or character data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646] <http://www.w3.org/TR/2008/REC-xml-20081126/#ISO10646> . Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References <http://www.w3.org/TR/2008/REC-xml-20081126/#sec-existing-stds> were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors MUST accept any character in the range specified for Char <http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char> . ] Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode [Unicode] <http://www.w3.org/TR/2008/REC-xml-20081126/#Unicode> ; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in 4.3.3 Character Encoding in Entities <http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding> . Note: Document authors are encouraged to avoid "compatibility characters", as defined in section 2.3 of [Unicode] <http://www.w3.org/TR/2008/REC-xml-20081126/#Unicode> . The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters: [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF], [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], [#x10FFFE-#x10FFFF]. Etc. 2.4 Character Data and Markup Text <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-text> consists of intermingled character data <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-chardata> and markup. [Definition: Markup takes the form of start-tags <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-stag> , end-tags <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-etag> , empty-element tags <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-empty> , entity references <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-entref> , character references <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-charref> , comments <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-comment> , CDATA section <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-cdsection> delimiters, document type declarations <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-doctype> , processing instructions <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-pi> , XML declarations <http://www.w3.org/TR/2008/REC-xml-20081126/#NT-XMLDecl> , text declarations <http://www.w3.org/TR/2008/REC-xml-20081126/#NT-TextDecl> , and any white space that is at the top level of the document entity (that is, outside the document element and not inside any other markup).] [Definition: All text that is not markup constitutes the character data of the document.] The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-comment> , a processing instruction <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-pi> , or a CDATA section <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-cdsection> . If they are needed elsewhere, they MUST be escaped <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-escape> using either numeric character references <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-charref> or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " > ", and MUST, for compatibility <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-compat> , be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-cdsection> . In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup and does not include the CDATA-section-close delimiter, " ]]> ". In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, " ]]> ". To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " ' ", and the double-quote character (") as " " ". Character Data [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) Hope that explains a bit, and always consider encoding used when the first line in xml stream is specified like: http://www.w3schools.com/xml/singlebyte2.xml Josef Von: [email protected] [mailto:[email protected]] Im Auftrag von Sam Carleton Gesendet: Sonntag, 20. Februar 2011 19:22 An: Apache AXIS C User List Betreff: axutil_xml_quote_string and apostrophes I just discovered that the axutil_xml_quote_string only escapes the less than, greater than, and quote, but not the apostrophe. Is there a reason for this or is it a bug? If it is a bug, I would be happy to fix it and submit it back if someone would enlighten me as to how to do that.
