Hi Alberto,
Thanks for your reply. 
Actually my program is a filter so I'm just taking a XML to output another
XML.
So it would better If I can output as it was. That is &.amp; instead of &.

Right now I'm using this function to convert it again before make the
output.

string escape(string str) {
  string a[] = {"&", "<", ">", "\"", "'"};
  string b[] = {"&amp;",  "&lt;",  "&gt;",  "&quot;",  "&apos;"};
  for (int i=0; i<5; i++) {
    size_t pos=0;
    while ( ( pos = str.find(a[i], pos ) ) != std::string::npos ) {
      str.replace( pos, a[i].length(), b[i] );
      pos += b[i].length();
    }
  }
  return str;
}

But problem is as I told before I have to parse a huge file ( like 100 GB to
1 TB ) so doing the conversion twice is costly (This function increase 10%
of running time :( )

So it would better if I can tell SAX2XMLReader not to convert &.amp; to & ,
which saves double conversion time.

Thanks you again
-Nahid


Alberto Massari wrote:
> 
> Hi Nahid,
> an XML document cannot contain a & character, as it has a special 
> meaning (beginning of an entity reference); why do you need to see the 
> raw text instead of its meaning?
> 
> Alberto
> 
> Nahid wrote:
>> Hi,
>> Before posting, I've searched for the solution but can't find any. May be
>> it
>> has a trivial solution.
>> I'm using SAX2XMLReader for parsing a huge XML file which contains entity
>> characters(&gt, &lt etc...)
>> "<title>abc &amp; cde</title>" which is converted to "<title>abc &
>> cde</title>" 
>> I don't want this auto conversion. I just want the actual text.
>> Do you have any idea, how can I do it?
>> Thanks
>> Regards
>> -Nahid
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/xerces-translates-entity-characters%28-gt...%29-automatically-but-I-don%27t-want-to-tp20002547p20010043.html
Sent from the Xerces - C - Users mailing list archive at Nabble.com.

Reply via email to