On Sep 6, 2011, at 6:49 PM, Marco Leise wrote:

> Am 07.09.2011, 00:23 Uhr, schrieb Sean Kelly <s...@invisibleduck.org>:
> 
>> On Sep 6, 2011, at 2:51 PM, Marco Leise wrote:
>> 
>>> Am 06.09.2011, 22:28 Uhr, schrieb Timon Gehr <timon.g...@gmx.ch>:
>>> 
>>>> On 09/06/2011 09:36 PM, notna wrote:
>>>>> Sorry upfront, I didn't read this hole thread, so maybe I'm missing or
>>>>> mixing something...
>>>>> 
>>>>> How about a D binding for http://www.xmlsoft.org/ ?
>>>>> 
>>>>> In other words, taking the "curl or sqlite3 path", something like
>>>>> /etc/c/xml2
>>>> 
>>>> That is about 4 times slower than the Tango XML parser:
>>>> 
>>>> http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/
>>> 
>>> You are so right, Timon. How deep is the trench between Phobos and Tango 
>>> devs? Tango's XML parser should really make it into Phobos.
>> 
>> That will never happen.  Though on a positive note, a major reason the Tango 
>> parser is so fast because there's no copying or translation of the 
>> underlying data.  Attributes are passed to the user as-is via a slice of the 
>> input range.  Most parsers in other languages simply don't work this way.
> 
> So in the benchmark neither white-space is collapsed, nor are entities like 
> &amp; converted?

I don't believe so.  That's expected to be done by the user if he cares about 
decoding the field.  Compare this to the Xerces (Apache) XML parser that passes 
in all attributes as wide chars regardless of the input format and you can see 
why parsing XML in D can be so fast: passing values via array slicing and 
having Unicode as the native character format.  If the input text is UTF-8 you 
use XmlParser!char, if it's UTF-16 you use XmlParser!wchar, etc.  I'm actually 
surprised that more C/C++ parsers don't work this way.

Reply via email to