AW: axutil_xml_quote_string and apostrophes

Stadelmann Josef Thu, 24 Feb 2011 02:30:30 -0800

Sorry if you got that impression, but TOP DOWN, I gave you an very short answer 
going to more details as you read on.


Josef

 

Von: [email protected] [mailto:[email protected]] Im Auftrag von Sam 
Carleton
Gesendet: Mittwoch, 23. Februar 2011 17:24
An: Apache AXIS C User List
Betreff: Re: axutil_xml_quote_string and apostrophes

 

Josef,

Your reply gives me the impression you expect me to become an expert in XML.  
What drives me nuts about open source is that the community seems to expect 
everyone to be an expert in everything, I simply don't have time to learn every 
last detail of every little tool I use in the world when my goal is developing 
application, not building tools.  

I guess I will run with what I got and hope new issues don't come up.  So far 
all seems to work well.

Sam

On Tue, Feb 22, 2011 at 4:27 AM, Stadelmann Josef 
<[email protected]> wrote:

Yes, there is in fact a reason for that. The restriction is given by the XML 
standards http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name which explaisn 
the usage of certain characters. 

greater is used in a stream to open a tag, less is used as part of the closing 
tag, double quotes are used to name a  tag. 

 

In very short and maybe too simple:

Given the case the parser should read

<statement>40 is < then 70</statement>

 

The parser has a problem. after the opening tag <statement> is read the parser 
looks for a closing tag starting with a "<". ON-FOUND it expects  the next 
character of the closing tag to come in. 

It finds the "<" and it expects next a "/" but, as it does not find one it 
struggles and has to report an error. 

 

As a consequence: "<" can't be used as data in between the opening and closing 
tag. It can however be transmitted by using an escaping technology.

 

I suggest you to read "just a bit" about 
http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name to get a better 
understanding why certain characters can be used in text and other not. However 
to send in a xml stream '<' or '>' or '"' there are ways to do so. In this case 
an  escaping technique is used.

 

AND READ http://en.wikipedia.org/wiki/Character_encoding because you should 
never forget encodings used when parsing or writing xml documents by your own 
code.

 

 

Excerpt from the documents link given above (see red bold text below first)

 

2.2 Characters

[Definition: A parsed entity contains text, a sequence of characters 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-character> , which may 
represent markup or character data.] [Definition: A character is an atomic unit 
of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646] 
<http://www.w3.org/TR/2008/REC-xml-20081126/#ISO10646> . Legal characters are 
tab, carriage return, line feed, and the legal characters of Unicode and 
ISO/IEC 10646. The versions of these standards cited in A.1 Normative 
References <http://www.w3.org/TR/2008/REC-xml-20081126/#sec-existing-stds>  
were current at the time this document was prepared. New characters may be 
added to these standards by amendments or new editions. Consequently, XML 
processors MUST accept any character in the range specified for Char 
<http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char> . ] 

Character Range

[2]   

Char

   ::=   

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

The mechanism for encoding character code points into bit patterns may vary 
from entity to entity. All XML processors MUST accept the UTF-8 and UTF-16 
encodings of Unicode [Unicode] 
<http://www.w3.org/TR/2008/REC-xml-20081126/#Unicode> ; the mechanisms for 
signaling which of the two is in use, or for bringing other encodings into 
play, are discussed later, in 4.3.3 Character Encoding in Entities 
<http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding> .

Note:

Document authors are encouraged to avoid "compatibility characters", as defined 
in section 2.3 of [Unicode] 
<http://www.w3.org/TR/2008/REC-xml-20081126/#Unicode> . The characters defined 
in the following ranges are also discouraged. They are either control 
characters or permanently undefined Unicode characters:

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],

[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],

[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],

[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],

[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],

[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],

[#x10FFFE-#x10FFFF].

 

Etc.

 

2.4 Character Data and Markup

Text <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-text>  consists of 
intermingled character data 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-chardata>  and markup. 
[Definition: Markup takes the form of start-tags 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-stag> , end-tags 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-etag> , empty-element tags 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-empty> , entity references 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-entref> , character references 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-charref> , comments 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-comment> , CDATA section 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-cdsection>  delimiters, 
document type declarations 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-doctype> , processing 
instructions <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-pi> , XML 
declarations <http://www.w3.org/TR/2008/REC-xml-20081126/#NT-XMLDecl> , text 
declarations <http://www.w3.org/TR/2008/REC-xml-20081126/#NT-TextDecl> , and 
any white space that is at the top level of the document entity (that is, 
outside the document element and not inside any other markup).] 

[Definition: All text that is not markup constitutes the character data of the 
document.] 

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in 
their literal form, except when used as markup delimiters, or within a comment 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-comment> , a processing 
instruction <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-pi> , or a CDATA 
section <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-cdsection> . If they 
are needed elsewhere, they MUST be escaped 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-escape>  using either numeric 
character references <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-charref>  
or the strings " &amp; " and " &lt; " respectively. The right angle bracket (>) 
may be represented using the string " &gt; ", and MUST, for compatibility 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-compat> , be escaped using 
either " &gt; " or a character reference when it appears in the string " ]]> " 
in content, when that string is not marking the end of a CDATA section 
<http://www.w3.org/TR/2008/REC-xml-20081126/#dt-cdsection> .

In the content of elements, character data is any string of characters which 
does not contain the start-delimiter of any markup and does not include the 
CDATA-section-close delimiter, " ]]> ". In a CDATA section, character data is 
any string of characters not including the CDATA-section-close delimiter, " ]]> 
".

To allow attribute values to contain both single and double quotes, the 
apostrophe or single-quote character (') may be represented as " &apos; ", and 
the double-quote character (") as " &quot; ".

Character Data

[14]   

CharData

   ::=   

[^<&]* - ([^<&]* ']]>' [^<&]*)

 

 

Hope that explains a bit, 

and always consider encoding used when the first line in xml stream is 
specified like:      http://www.w3schools.com/xml/singlebyte2.xml

 

Josef

 

 

 

 

 

 

Von: [email protected] [mailto:[email protected]] Im Auftrag von Sam 
Carleton
Gesendet: Sonntag, 20. Februar 2011 19:22
An: Apache AXIS C User List
Betreff: axutil_xml_quote_string and apostrophes

 

I just discovered that the axutil_xml_quote_string only escapes the less than, 
greater than, and quote, but not the apostrophe.  Is there a reason for this or 
is it a bug? 

If it is a bug, I would be happy to fix it and submit it back if someone would 
enlighten me as to how to do that.

AW: axutil_xml_quote_string and apostrophes

Reply via email to