Hi Nahid,
if you are directly writing the text to a final XML file, you can use an
XMLFormatter object that takes care of the conversion in a more
efficient manner (instead of scanning the input string 5 times and then
reallocating it as you are doing now). Depending on the type of
filtering you are doing, you could achieve better performances by using
a grep-like tool, if what you are filtering is not XML-aware.
Alberto
Nahid wrote:
Hi Alberto,
Thanks for your reply.
Actually my program is a filter so I'm just taking a XML to output another
XML.
So it would better If I can output as it was. That is &.amp; instead of &.
Right now I'm using this function to convert it again before make the
output.
string escape(string str) {
string a[] = {"&", "<", ">", "\"", "'"};
string b[] = {"&", "<", ">", """, "'"};
for (int i=0; i<5; i++) {
size_t pos=0;
while ( ( pos = str.find(a[i], pos ) ) != std::string::npos ) {
str.replace( pos, a[i].length(), b[i] );
pos += b[i].length();
}
}
return str;
}
But problem is as I told before I have to parse a huge file ( like 100 GB to
1 TB ) so doing the conversion twice is costly (This function increase 10%
of running time :( )
So it would better if I can tell SAX2XMLReader not to convert &.amp; to & ,
which saves double conversion time.
Thanks you again
-Nahid
Alberto Massari wrote:
Hi Nahid,
an XML document cannot contain a & character, as it has a special
meaning (beginning of an entity reference); why do you need to see the
raw text instead of its meaning?
Alberto
Nahid wrote:
Hi,
Before posting, I've searched for the solution but can't find any. May be
it
has a trivial solution.
I'm using SAX2XMLReader for parsing a huge XML file which contains entity
characters(>, < etc...)
"<title>abc & cde</title>" which is converted to "<title>abc &
cde</title>"
I don't want this auto conversion. I just want the actual text.
Do you have any idea, how can I do it?
Thanks
Regards
-Nahid