Re: std.xml2 (collecting features)

2016-03-12 Thread Alex Vincent via Digitalmars-d
For everyone's information, I've posted a pull request to Mr. 
Schadek's github repository, with a proposed Simple API for XML 
(SAX) stub.  I'd really appreciate reviews of the stub's 
interfaces.


https://github.com/burner/std.xml2/pull/5


Re: std.xml2 (collecting features)

2016-03-07 Thread Craig Dillabaugh via Digitalmars-d
On Sunday, 6 March 2016 at 11:46:00 UTC, Robert burner Schadek 
wrote:
On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh 
wrote:
Robert, we have had some student interest in GSOC for XML.  
Would you be interested in mentoring a student to work with 
you on this.


Craig


Of course


Great.  Can you please get in touch by email so I can add you to 
the mentors list:


craig dot dillabaugh at gmail dot com

Cheers


Re: std.xml2 (collecting features)

2016-03-07 Thread Lodovico Giaretta via Digitalmars-d
On Sunday, 6 March 2016 at 11:46:00 UTC, Robert burner Schadek 
wrote:
On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh 
wrote:
Robert, we have had some student interest in GSOC for XML.  
Would you be interested in mentoring a student to work with 
you on this.


Craig


Of course


Hi,
I don't know if this is the right spot to join the conversation;
I'm student and I'd really love to work on std.xml for GSoC!

I'm just waiting March 14 to apply.


Re: std.xml2 (collecting features)

2016-03-06 Thread Robert burner Schadek via Digitalmars-d

On Saturday, 5 March 2016 at 15:20:12 UTC, Craig Dillabaugh wrote:
Robert, we have had some student interest in GSOC for XML.  
Would you be interested in mentoring a student to work with you 
on this.


Craig


Of course


Re: std.xml2 (collecting features)

2016-03-05 Thread Craig Dillabaugh via Digitalmars-d
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY 
and on topic.


Robert, we have had some student interest in GSOC for XML.  Would 
you be interested in mentoring a student to work with you on this.


Craig


Re: std.xml2 (collecting features)

2016-03-03 Thread Tobias Müller via Digitalmars-d
Adam D. Ruppe  wrote:
> On Wednesday, 2 March 2016 at 06:59:49 UTC, Tobias Müller wrote:
>> What's the usecase of DOM outside of browser 
>> interoperability/scripting? The API isn't particularly nice, 
>> especially in languages with a rich type system.
> 
> I find my extended dom to be very nice, especially thanks to D's 
> type system. I use it for a lot of things: using web apis, html 
> scraping, config file stuff, working on my own documents, and 
> even as my web template system.
> 
> Basically, dom.d made xml cool to me.

Sure, some kind of DOM is certainly useful. But the standard XML-DOM isn't
particularly nice.
What's the point of a linked list style interface when you have ranges in
the language?



Re: std.xml2 (collecting features)

2016-03-02 Thread Adam D. Ruppe via Digitalmars-d

On Wednesday, 2 March 2016 at 06:59:49 UTC, Tobias Müller wrote:
What's the usecase of DOM outside of browser 
interoperability/scripting? The API isn't particularly nice, 
especially in languages with a rich type system.


I find my extended dom to be very nice, especially thanks to D's 
type system. I use it for a lot of things: using web apis, html 
scraping, config file stuff, working on my own documents, and 
even as my web template system.


Basically, dom.d made xml cool to me.


Re: std.xml2 (collecting features)

2016-03-02 Thread Adam D. Ruppe via Digitalmars-d

On Wednesday, 2 March 2016 at 02:50:22 UTC, Alex Vincent wrote:
I agree, but the Document Object Model (DOM) is a hge 
project.  It's a project I'd love to take an active hand in 
driving.



My dom.d implements a fair chunk of it already.

https://github.com/adamdruppe/arsd/blob/master/dom.d

Yes, indeed, it is quite a lot of code, but easy to use if you 
are familiar with javascript and css selectors.


http://dpldocs.info/experimental-docs/arsd.dom.html


Re: std.xml2 (collecting features)

2016-03-02 Thread Tobias Müller via Digitalmars-d
Dejan Lekic  wrote:
> If you really want to be serious about the XML package, then I 
> humbly believe implementing the commonly-known DOM interfaces is 
> a must. Luckily there is IDL available for it: 
> https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, 
> speaking about DOM, all levels need to be supported!
> 
> Also, I would recommend borrowing the Tango's XML pull parser as 
> it is blazingly fast.
> 
> Finally, perhaps integration with signal/slot module should 
> perhaps be considered as well.
> 

What's the usecase of DOM outside of browser interoperability/scripting?
The API isn't particularly nice, especially in languages with a rich type
system.



Re: std.xml2 (collecting features)

2016-03-02 Thread Alex Vincent via Digitalmars-d

On Wednesday, 24 February 2016 at 10:55:01 UTC, Dejan Lekic wrote:
If you really want to be serious about the XML package, then I 
humbly believe implementing the commonly-known DOM interfaces 
is a must. Luckily there is IDL available for it: 
https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, 
speaking about DOM, all levels need to be supported!


I agree, but the Document Object Model (DOM) is a hge 
project.  It's a project I'd love to take an active hand in 
driving.  Also, DOM "level 4" is a living standard at whatwg.org, 
along with rules for parsing HTML.  (Which naturally means the 
rules are always changing.)


I have a partial implementation of DOM in JavaScript, so I am 
serious when I say it's going to take time.


Ideally (imho), we'd have a set of related packages, prefixed 
with std.web:

* html
* xml
* dom
* css
* javascript

(Yes, I'm suggesting a rename of std.xml2 to std.web.xml.)

But from what I can see, realistically the community is a long 
way from that.  I'm trying to write the SAX interfaces now.  I 
only have a limited amount of time to devote to this (a common 
complaint, I gather)...


Re: std.xml2 (collecting features)

2016-02-25 Thread Adam D. Ruppe via Digitalmars-d

On Thursday, 25 February 2016 at 23:59:04 UTC, crimaniak wrote:
Where is only a couple of ad-hoc checks for attributes values. 
This language is not XPath-compatible, so most easy way to 
cover a lot of cases is regex check for attributes. Something 
like "script[src/https:.+\\.googleapis\\.com/i]"


The css3 selector standard offers three substring search: 
[attr^=foo] if it begins with foo, [attr$=foo] if it ends with 
foo, and [attr*=foo] if it includes foo somewhere. dom.d supports 
all three now.


So for your regex, you could probably match: 
[attr*=googleapis.com] well enough.


Re: std.xml2 (collecting features)

2016-02-25 Thread crimaniak via Digitalmars-d

On Sunday, 21 February 2016 at 23:57:40 UTC, Adam D. Ruppe wrote:

On Sunday, 21 February 2016 at 23:01:22 UTC, crimaniak wrote:
I will use it in my experiments, but getElementsBySelector() 
selector language need to be improved I think.


What, specifically, do you have in mind?


Where is only a couple of ad-hoc checks for attributes values. 
This language is not XPath-compatible, so most easy way to cover 
a lot of cases is regex check for attributes. Something like 
"script[src/https:.+\\.googleapis\\.com/i]"


Re: std.xml2 (collecting features)

2016-02-24 Thread Craig Dillabaugh via Digitalmars-d

On Tuesday, 23 February 2016 at 12:46:38 UTC, Dmitry wrote:

On Tuesday, 23 February 2016 at 11:22:23 UTC, Joakim wrote:
Then write a good XML extraction-only library and dub it. I 
see no reason to include this in Phobos

You won't be able to sleep if it will be in Phobos?

I use XML and I don't like check tons of side libraries for see 
which will be good for me, which have support (bugfixes), which 
will have support in some years, etc.
Lot of systems already using XML and any serious language 
_must_ have official support for it.


So are you trying to say C/C++ are not serious languages :o)

Having said that, as much as I hate XML, basic support would be a 
nice feature for the language.





Re: std.xml2 (collecting features)

2016-02-24 Thread Dejan Lekic via Digitalmars-d
If you really want to be serious about the XML package, then I 
humbly believe implementing the commonly-known DOM interfaces is 
a must. Luckily there is IDL available for it: 
https://www.w3.org/TR/DOM-Level-2-Core/idl/dom.idl . Also, 
speaking about DOM, all levels need to be supported!


Also, I would recommend borrowing the Tango's XML pull parser as 
it is blazingly fast.


Finally, perhaps integration with signal/slot module should 
perhaps be considered as well.


Re: std.xml2 (collecting features)

2016-02-23 Thread Dmitry via Digitalmars-d

On Tuesday, 23 February 2016 at 11:22:23 UTC, Joakim wrote:
Then write a good XML extraction-only library and dub it. I see 
no reason to include this in Phobos

You won't be able to sleep if it will be in Phobos?

I use XML and I don't like check tons of side libraries for see 
which will be good for me, which have support (bugfixes), which 
will have support in some years, etc.
Lot of systems already using XML and any serious language _must_ 
have official support for it.


If data formats are your thing, you could help get Ludwig's 
JSON stuff in, or better yet, enable some nice binary data 
format.
If it better for you, it not mean that it will better for 
everyone.




Re: std.xml2 (collecting features)

2016-02-23 Thread Joakim via Digitalmars-d

On Friday, 19 February 2016 at 12:13:53 UTC, Chris wrote:

On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years 
now. Time to build a successor. I currently plan the 
following featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance 
test suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts 
DRY and on topic.


My request: just skip it.  XML is a horrible waste of space 
for a standard, better D doesn't support it well, anything to 
discourage it's use.  I'd rather see you spend your time on 
something worthwhile.  If data formats are your thing, you 
could help get Ludwig's JSON stuff in, or better yet, enable 
some nice binary data format.


Glad to hear that someone is working on XML support. We cannot 
just "skip it". XML/HTML like mark up comes up all the time, 
here and there. I recently had to write a mini-parser (nowhere 
near the stuff Robert is doing, just a quick fix!) to extract 
data from XML input. This has nothing to do with personal 
preferences, it's just there [1] and has to be dealt with.


[1] 
https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language


Then write a good XML extraction-only library and dub it.  I see 
no reason to include this in Phobos, which will encourage those 
who don't know any better to use it, since it comes with the 
compiler.  I'll close with a quote from Saint Linus of Torvalds, 
which I was unaware of till a couple days ago:


"XML is crap. Really. There are no excuses. XML is nasty to parse 
for humans, and it's a disaster to parse even for computers. 
There's just no reason for that horrible crap to exist."

https://en.wikiquote.org/wiki/Linus_Torvalds#2014


Re: std.xml2 (collecting features)

2016-02-23 Thread Robert burner Schadek via Digitalmars-d
On Thursday, 18 February 2016 at 15:39:01 UTC, Robert burner 
Schadek wrote:
On Thursday, 18 February 2016 at 12:30:29 UTC, Andrei 
Alexandrescu wrote:


Would the measuring be possible with 2995 as a dub package? -- 
Andrei


yes, after have synced the dub package to the PR


brought the dub package up to date with the PR (v0.0.6)


Re: std.xml2 (collecting features)

2016-02-21 Thread Adam D. Ruppe via Digitalmars-d

On Sunday, 21 February 2016 at 23:01:22 UTC, crimaniak wrote:
I will use it in my experiments, but getElementsBySelector() 
selector language need to be improved I think.


What, specifically, do you have in mind?


Re: std.xml2 (collecting features)

2016-02-21 Thread crimaniak via Digitalmars-d
On Saturday, 20 February 2016 at 19:16:47 UTC, Adam D. Ruppe 
wrote:

On Saturday, 20 February 2016 at 19:08:25 UTC, crimaniak wrote:
- the ability to read documents with missing or incorrectly 
specified encoding
- additional feature: relaxed mode for reading html and broken 
XML documents


fyi, my dom.d can do those, I use it for web scraping where 
there's all kinds of hideous stuff out there.


https://github.com/adamdruppe/arsd/blob/master/dom.d


It works, thanks! I will use it in my experiments, but 
getElementsBySelector() selector language need to be improved I 
think.


Re: std.xml2 (collecting features)

2016-02-20 Thread Adam D. Ruppe via Digitalmars-d

On Saturday, 20 February 2016 at 19:08:25 UTC, crimaniak wrote:
- the ability to read documents with missing or incorrectly 
specified encoding
- additional feature: relaxed mode for reading html and broken 
XML documents


fyi, my dom.d can do those, I use it for web scraping where 
there's all kinds of hideous stuff out there.


https://github.com/adamdruppe/arsd/blob/master/dom.d


Re: std.xml2 (collecting features)

2016-02-20 Thread crimaniak via Digitalmars-d
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:



Please post you feature requests...


- the ability to read documents with missing or incorrectly 
specified encoding
- additional feature: relaxed mode for reading html and broken 
XML documents


Some time ago I worked for Accusoft for the document 
viewing/converting software. The main experience that I get: any 
theoretically possible types of errors in the documents are real, 
when the application is popular.





Re: std.xml2 (collecting features) control character

2016-02-19 Thread Alex Vincent via Digitalmars-d
On Thursday, 18 February 2016 at 21:53:24 UTC, Robert burner 
Schadek wrote:
On Thursday, 18 February 2016 at 18:28:10 UTC, Alex Vincent 
wrote:
Regarding control characters:  If you give me a complete 
sample file, I can run it through Mozilla's UTF stream 
conversion and/or XML parsing code (via either SAX or 
DOMParser) to tell you how that reacts as a reference.  
Mozilla supports XML 1.0, but not 1.1.


thanks you making the effort

https://github.com/burner/std.xml2/blob/master/tests/eduni/xml-1.1/out/010.xml


In this case, Firefox just passes the control characters through 
to the contentHandler.characters method:


Starting runTest
Retrieved source
contentHandler.startDocument()
contentHandler.startElement("", "foo", "foo", {})
contentHandler.characters("\u0080")
contentHandler.endElement("", "foo", "foo")
contentHandler.endDocument()
Done reading



Re: std.xml2 (collecting features) control character

2016-02-19 Thread Robert burner Schadek via Digitalmars-d

On Friday, 19 February 2016 at 12:55:52 UTC, Kagamin wrote:

http://dpaste.dzfl.pl/2f8a8ff10bde like this?


yes


Re: std.xml2 (collecting features) control character

2016-02-19 Thread Kagamin via Digitalmars-d
On Friday, 19 February 2016 at 12:30:06 UTC, Robert burner 
Schadek wrote:
ubyte[] arr = cast(ubyte[])[3C, 66, 6F, 6F, 3E, C2, 80, 3C, 2F, 
66, 6F,

6F, 3E]);
string s = cast(string)arr;
dstring ds = to!dstring(s);

and see what happens


http://dpaste.dzfl.pl/2f8a8ff10bde like this?


Re: std.xml2 (collecting features) control character

2016-02-19 Thread Robert burner Schadek via Digitalmars-d
On 2016-02-19 11:58, Kagamin via Digitalmars-d wrote:
> On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner Schadek
> wrote:
>> the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
>
> http://dpaste.dzfl.pl/80888ed31958 like this?
No, The program just takes the hex dump as string.

you would need to do something like:

ubyte[] arr = cast(ubyte[])[3C, 66, 6F, 6F, 3E, C2, 80, 3C, 2F, 66, 6F,
6F, 3E]);
string s = cast(string)arr;
dstring ds = to!dstring(s);

and see what happens


Re: std.xml2 (collecting features)

2016-02-19 Thread Chris via Digitalmars-d

On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years 
now. Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts 
DRY and on topic.


My request: just skip it.  XML is a horrible waste of space for 
a standard, better D doesn't support it well, anything to 
discourage it's use.  I'd rather see you spend your time on 
something worthwhile.  If data formats are your thing, you 
could help get Ludwig's JSON stuff in, or better yet, enable 
some nice binary data format.


Glad to hear that someone is working on XML support. We cannot 
just "skip it". XML/HTML like mark up comes up all the time, here 
and there. I recently had to write a mini-parser (nowhere near 
the stuff Robert is doing, just a quick fix!) to extract data 
from XML input. This has nothing to do with personal preferences, 
it's just there [1] and has to be dealt with.


[1] https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language




Re: std.xml2 (collecting features) control character

2016-02-19 Thread Kagamin via Digitalmars-d
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner 
Schadek wrote:

the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"


http://dpaste.dzfl.pl/80888ed31958 like this?


Re: std.xml2 (collecting features)

2016-02-19 Thread Robert burner Schadek via Digitalmars-d
On Friday, 19 February 2016 at 04:02:02 UTC, Craig Dillabaugh 
wrote:
Would you be interested in mentoring a student for the Google 
Summer of Code to do work on std.xml?


Yes, why not!


Re: std.xml2 (collecting features)

2016-02-18 Thread Craig Dillabaugh via Digitalmars-d
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner 
Schadek wrote:
On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent 
wrote:
I'm looking for a status update.  DUB doesn't seem to have 
many options posted.  I was thinking about starting a 
SAXParser implementation.


I'm working on it, but recently I had to do some major 
restructuring of the code.
Currently I'm trying to get this merged 
https://github.com/D-Programming-Language/phobos/pull/3880 
because I had some problems with the encoding of test files. 
XML has a lot of corner cases, it just takes time.


If you want to on some XML stuff, please join me. It is 
properly more productive working together than creating two 
competing implementations.


Would you be interested in mentoring a student for the Google 
Summer of Code to do work on std.xml?


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Robert burner Schadek via Digitalmars-d

On Thursday, 18 February 2016 at 18:28:10 UTC, Alex Vincent wrote:
Regarding control characters:  If you give me a complete sample 
file, I can run it through Mozilla's UTF stream conversion 
and/or XML parsing code (via either SAX or DOMParser) to tell 
you how that reacts as a reference.  Mozilla supports XML 1.0, 
but not 1.1.


thanks you making the effort

https://github.com/burner/std.xml2/blob/master/tests/eduni/xml-1.1/out/010.xml


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Alex Vincent via Digitalmars-d
On Thursday, 18 February 2016 at 17:26:30 UTC, Adam D. Ruppe 
wrote:
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner 
Schadek wrote:
unix file says it is a utf8 encoded file, but not BOM is 
present.


the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"


Gah, I should have read this before replying... well, that does 
appear to be valid utf-8 why is it throwing an exception 
then?


I'm pretty sure that byte stream *is* actually well-formed xml 
1.0 and should pass utf validation as well as the XML 
well-formedness check.


Regarding control characters:  If you give me a complete sample 
file, I can run it through Mozilla's UTF stream conversion and/or 
XML parsing code (via either SAX or DOMParser) to tell you how 
that reacts as a reference.  Mozilla supports XML 1.0, but not 
1.1.


Re: std.xml2 (collecting features)

2016-02-18 Thread Alex Vincent via Digitalmars-d
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner 
Schadek wrote:
If you want to on some XML stuff, please join me. It is 
properly more productive working together than creating two 
competing implementations.


Oh, I absolutely agree, independent implementation is a bad 
thing. (Someone should rename DRY as "don't repeat yourself or 
others"... but DRYOO sounds weird.)


Where's your repo?


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Adam D. Ruppe via Digitalmars-d
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner 
Schadek wrote:
unix file says it is a utf8 encoded file, but not BOM is 
present.


the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"


Gah, I should have read this before replying... well, that does 
appear to be valid utf-8 why is it throwing an exception then?


I'm pretty sure that byte stream *is* actually well-formed xml 
1.0 and should pass utf validation as well as the XML 
well-formedness check.


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Adam D. Ruppe via Digitalmars-d
On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner 
Schadek wrote:

It does not, it has no prolog and therefore no EncodingInfo.


In that case, it needs to be valid UTF-8 or valid UTF-16 and it 
is a fatal error if there's any invalid bytes:


https://www.w3.org/TR/REC-xml/#charencoding

==
 It is a fatal error if an XML entity is determined (via default, 
encoding declaration, or higher-level protocol) to be in a 
certain encoding but contains byte sequences that are not legal 
in that encoding. Specifically, it is a fatal error if an entity 
encoded in UTF-8 contains any ill-formed code unit sequences, as 
defined in section 3.9 of Unicode [Unicode]. Unless an encoding 
is determined by a higher-level protocol, it is also a fatal 
error if an XML entity contains no encoding declaration and its 
content is not legal UTF-8 or UTF-16.

==



Re: std.xml2 (collecting features) control character

2016-02-18 Thread Robert burner Schadek via Digitalmars-d
On Thursday, 18 February 2016 at 16:54:10 UTC, Robert burner 
Schadek wrote:
unix file says it is a utf8 encoded file, but not BOM is 
present.


the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Robert burner Schadek via Digitalmars-d
On Thursday, 18 February 2016 at 16:47:35 UTC, Adam D. Ruppe 
wrote:
On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner 
Schadek wrote:
for instance, quick often I find <80> in tests that are 
supposed to be valid xml 1.0. they are invalid xml 1.1 though


What char encoding does the document declare itself as?


It does not, it has no prolog and therefore no EncodingInfo.

unix file says it is a utf8 encoded file, but not BOM is present.


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Adam D. Ruppe via Digitalmars-d
On Thursday, 18 February 2016 at 16:41:52 UTC, Robert burner 
Schadek wrote:
for instance, quick often I find <80> in tests that are 
supposed to be valid xml 1.0. they are invalid xml 1.1 though


What char encoding does the document declare itself as?


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Robert burner Schadek via Digitalmars-d
for instance, quick often I find <80> in tests that are supposed 
to be valid xml 1.0. they are invalid xml 1.1 though


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Adam D. Ruppe via Digitalmars-d
On Thursday, 18 February 2016 at 15:56:58 UTC, Robert burner 
Schadek wrote:
When trying to validate/convert an utf string these lead to 
exceptions, because they are not valid utf character.


That means the user didn't encode them properly...

Which one specifically are you thinking of? I'm pretty sure all 
those control characters have a spot in the Unicode space and can 
be properly encoded as UTF-8 (though I think even if they are 
properly encoded, some of them are illegal in XML anyway).


If they appear in another form, it is invalid and/or needs a 
charset conversion, which should be specified in the XML document 
itself.


Re: std.xml2 (collecting features) control character

2016-02-18 Thread Robert burner Schadek via Digitalmars-d
While working on a new xml implementation I came cross "control 
characters (CC)". [1]
When trying to validate/convert an utf string these lead to 
exceptions, because they are not valid utf character.
Unfortunately, some of these characters are allowed to appear in 
valid xml 1.* documents.


I currently see two option how to go about it:

1. Do not allow non CCs that do not work with existing 
functionality.

1.Pros
  * easy
1.Cons
  * the resulting xml implementation will not be xml 1.* complete

2. Add special cases to the existing functionality to handle CCs 
that are allowed in 1.0.

2.Pros
  * the resulting xml implementation will be xml 1.* complete
2.Cons
  * will make utf de/encoding slower as I would need to add 
additional logic


Any other ideas, feedback?




[1] https://en.wikipedia.org/wiki/C0_and_C1_control_codes



Re: std.xml2 (collecting features)

2016-02-18 Thread Robert burner Schadek via Digitalmars-d
On Thursday, 18 February 2016 at 12:30:29 UTC, Andrei 
Alexandrescu wrote:

also I would like to see this
https://github.com/D-Programming-Language/phobos/pull/2995 go 
in first

to be able to accurately measure and compare performance


Would the measuring be possible with 2995 as a dub package? -- 
Andrei


yes, after have synced the dub package to the PR


Re: std.xml2 (collecting features)

2016-02-18 Thread Andrei Alexandrescu via Digitalmars-d

On 02/18/2016 05:49 AM, Robert burner Schadek wrote:

On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner Schadek wrote:

If you want to on some XML stuff, please join me. It is properly more
productive working together than creating two competing implementations.


also I would like to see this
https://github.com/D-Programming-Language/phobos/pull/2995 go in first
to be able to accurately measure and compare performance


Would the measuring be possible with 2995 as a dub package? -- Andrei


Re: std.xml2 (collecting features)

2016-02-18 Thread Robert burner Schadek via Digitalmars-d
On Thursday, 18 February 2016 at 10:18:18 UTC, Robert burner 
Schadek wrote:
If you want to on some XML stuff, please join me. It is 
properly more productive working together than creating two 
competing implementations.


also I would like to see this 
https://github.com/D-Programming-Language/phobos/pull/2995 go in 
first to be able to accurately measure and compare performance


Re: std.xml2 (collecting features)

2016-02-18 Thread Robert burner Schadek via Digitalmars-d

On Thursday, 18 February 2016 at 04:34:13 UTC, Alex Vincent wrote:
I'm looking for a status update.  DUB doesn't seem to have many 
options posted.  I was thinking about starting a SAXParser 
implementation.


I'm working on it, but recently I had to do some major 
restructuring of the code.
Currently I'm trying to get this merged 
https://github.com/D-Programming-Language/phobos/pull/3880 
because I had some problems with the encoding of test files. XML 
has a lot of corner cases, it just takes time.


If you want to on some XML stuff, please join me. It is properly 
more productive working together than creating two competing 
implementations.


Re: std.xml2 (collecting features)

2016-02-17 Thread Alex Vincent via Digitalmars-d
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY 
and on topic.


I'm looking for a status update.  DUB doesn't seem to have many 
options posted.  I was thinking about starting a SAXParser 
implementation.


Re: std.xml2 (collecting features)

2015-05-12 Thread Kagamin via Digitalmars-d

On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:
One can do all these things with better formats than either XML 
or JSON.


Hypothetically, yes, though formats better than XML don't exist. 
I personally find XML perfectly readable.


Re: std.xml2 (collecting features)

2015-05-11 Thread Alex Parrill via Digitalmars-d

Can we please not turn this thread into an XML vs JSON flamewar?

XML is one of the most popular data formats (for better or for 
worse), so a parser would be a good addition to the standard 
library.


Re: std.xml2 (collecting features)

2015-05-11 Thread via Digitalmars-d

On Monday, 11 May 2015 at 15:20:12 UTC, Alex Parrill wrote:

Can we please not turn this thread into an XML vs JSON flamewar?


This is not a flamewar, JSON is ad hoc and I use it a lot, but it 
isn't actually suitable as a file and archival exchange format. 
It is important that people understand what the point of XML is 
in order to build something useful.


Full XML support and tooling is very valuable for typed GC-backed 
batch processing. That means namespaces, entities, XQuery 
equivalents, DOMs etc


A library backed tooling pipeline would be a valuable asset for 
D. The value is not in _reading_ or _writing_ XML. The value is 
all about providing a framework for structured grammar/namespace 
based _processing_ and _transforms_.


Re: std.xml2 (collecting features)

2015-05-10 Thread Laeeth Isharc via Digitalmars-d

On Sunday, 10 May 2015 at 08:54:09 UTC, Joakim wrote:
It's worse than shabby, it's a horrible, horrible choice.  Not 
just for data formats, but for _anything_.  XML should not be 
used.


I feel the same way about XML, and I also think that having 
strong  aesthetic internal emotional responses is often necessary 
to achieve excellence in engineering.


But why do we often end up dealing with these two?  
Familiarity, that is the only reason.  XML seems familiar to 
anybody who's written some HTML, and JSON became familiar to 
web developers initially.  Starting from those two large 
niches, they've expanded out to become the two most popular 
data interchange formats, despite XML being a horrible mess and 
JSON being too simple for many uses.


Sometimes you get to pick, but often not.  I can hardly tell the 
UK Debt Management Office to give up XML and switch to msgpack 
structs (well, I can, but I am not sure they would listen).  So 
at the moment for some data series I use a python library via PyD 
to convert xml files to JSON.  But it would be nice to do it all 
in D.


I am not sure XML is going away very soon since new protocols 
keep being created using it.  (Most recent one I heard of is one 
for allowing hedge funds to achieve full transparency of their 
portfolio to end investors - not necessarily something that will 
achieve what people think it will, but one in tune with the 
times).



Laeeth.


Re: std.xml2 (collecting features)

2015-05-10 Thread Joakim via Digitalmars-d

On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:

Am Sat, 09 May 2015 10:28:52 +
schrieb Joakim dl...@joakim.fea.st:


On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:
 Remember that while JSON is simpler, XML is not just a
 structured container for bool, Number and String data. It
 comes with many official side kicks covering a broad range of
 use cases:

 XPath:
  …

 XSL and XSLT
  …

 XSL-FO (XSL formatting objects):
  …

 XML Schema Definition (XSD):
  …

These are all incredibly dumb ideas.  I don't deny that many 
people may use these things, but then people use hammers for 
all kinds of things they shouldn't use them for too. :)


:) One can't really answer this one. But with many hundreds of
published data exchange formats built on XML, it can't have been
too shabby all along.


It's worse than shabby, it's a horrible, horrible choice.  Not 
just for data formats, but for _anything_.  XML should not be 
used.


And sometimes small things matter, like being able to add 
comments

along with the payload. JSON doesn't have that.
Or knowing that both sender and receiver will validate the XML 
the
same way through XSD. So if it doesn't blow up on your end, it 
will

pass validation on the other end, too.


One can do all these things with better formats than either XML 
or JSON.


But why do we often end up dealing with these two?  Familiarity, 
that is the only reason.  XML seems familiar to anybody who's 
written some HTML, and JSON became familiar to web developers 
initially.  Starting from those two large niches, they've 
expanded out to become the two most popular data interchange 
formats, despite XML being a horrible mess and JSON being too 
simple for many uses.


I'd like to see a move back to binary formats, which is why I 
mentioned that to Robert.  D would be an ideal language in which 
to show the superiority of binary to text formats, given its 
emphasis on efficiency.  Many devs have learned the wrong lessons 
from past closed binary formats, when open binary formats 
wouldn't have many of those deficiencies.


There have been some interesting moves back to open binary 
formats/protocols in recent years, like Hessian 
(http://hessian.caucho.com/), Thrift 
(https://thrift.apache.org/), MessagePack (http://msgpack.org/), 
and Cap'n Proto (from the protobufs guy after he left google - 
https://capnproto.org/).  I'd rather see phobos support these, 
which are the future, rather than flash-in-the-pan text formats 
like XML or JSON.


Re: std.xml2 (collecting features)

2015-05-10 Thread Marco Leise via Digitalmars-d
Am Sat, 09 May 2015 10:28:52 +
schrieb Joakim dl...@joakim.fea.st:

 On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:

  You two are terrible at motivating people. Better D doesn't
  support it well and JSON is superior through-and-through is
  overly dismissive.
  …
 
 You seem to have missed the point of my post, which was to 
 discourage him from working on an XML module for phobos.  As for 
 motivating him, I suggested better alternatives.  And I never 
 said JSON was great, but it's certainly _much_ more readable than 
 XML, which is one of the basic goals of a text format.

Well, I was mostly answering to w0rp here. JSON is both
readable and easy to parse, no question.
 
  Remember that while JSON is simpler, XML is not just a
  structured container for bool, Number and String data. It
  comes with many official side kicks covering a broad range of
  use cases:
 
  XPath:
   …
 
  XSL and XSLT
   …
 
  XSL-FO (XSL formatting objects):
   …
 
  XML Schema Definition (XSD):
   …
 
 These are all incredibly dumb ideas.  I don't deny that many 
 people may use these things, but then people use hammers for all 
 kinds of things they shouldn't use them for too. :)

:) One can't really answer this one. But with many hundreds of
published data exchange formats built on XML, it can't have been
too shabby all along.
And sometimes small things matter, like being able to add comments
along with the payload. JSON doesn't have that.
Or knowing that both sender and receiver will validate the XML the
same way through XSD. So if it doesn't blow up on your end, it will
pass validation on the other end, too.


Am Sat, 09 May 2015 13:04:57 +
schrieb Craig Dillabaugh craig.dillaba...@gmail.com:

 I have to agree with Joakim on this.  Having spent much of this 
 past
 week trying to get XML generated by gSOAP (project has some legacy
 code) to work with JAXB (Java) has reinforced my dislike for XML.
 
 I've used things like XPath and XLST in the past, so I can 
 appreciate
 their power, but think the 'jobs' they perform would be better 
 supported
 elsewhere (ie. language specific XML frameworks).
 
 In trying to pass data between applications I just want a simple 
 way
 of packaging up the data and ideally making 
 serialization/deserialization
 easy for me.  At some point the programmer working on these needs
 to understand and validate the data anyway.  Sure you can use 
 DTD/XML Schema to
 handle the validation part, but it is just easier to deal with 
 that
 within you own code - without having to learn a 'whole new 
 language', that
 is likely harder to grok than the tools you would have at your 
 disposal
 in your language of choice.

You see, the thing is that XSD is _not_ a whole new language,
it is written in XML as well, probably specifically to make it
so. Try to switch the perspective: With XSD (if it is
sufficient for your validation needs) _one_ person needs to
learn and write it and other programmers (inside or outside
the company) just use the XML library of choice to handle
validation via that schema. Once the schema is loaded it is
usually no more than doc.validate();
(There is also good GUI tools to assist in writing XSD.)
What you propose on the other hand is that everyone involved
in the data exchange writes their own validation code in their
language of choice, with either no access to existing sources
or functionality that doesn't translate to their language!
 
-- 
Marco



Re: std.xml2 (collecting features)

2015-05-10 Thread via Digitalmars-d

On Sunday, 10 May 2015 at 07:01:58 UTC, Marco Leise wrote:

Well, I was mostly answering to w0rp here. JSON is both
readable and easy to parse, no question.


JSON is just javascript literals with some silly constraints. As 
crappy a format as it gets. Even pure Lisp would have been 
better. And much more powerful!



:) One can't really answer this one. But with many hundreds of
published data exchange formats built on XML, it can't have been
too shabby all along.
And sometimes small things matter, like being able to add 
comments

along with the payload.


XML is actually great for what it is: eXtensible. It means you 
can build forward compatible formats and annotate existing 
formats with metadata without breaking existing (compliant) 
applications etc... It also means you can datamine files whithout 
knowing the full format.


Or knowing that both sender and receiver will validate the XML 
the

same way through XSD.


Right, or build a database/archival service that is generic.

XML is not going away until there is something better, and that 
won't happen anytime soon. It is also one of the few formats that 
I actually need library and _good_ DOM support for. (JSON can be 
done in an afternoon, so I don't care if it is supported or 
not...)


Re: std.xml2 (collecting features)

2015-05-09 Thread Joakim via Digitalmars-d

On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:

On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:

My request: just skip it.  XML is a horrible waste of space 
for a standard, better D doesn't support it well, anything to 
discourage it's use.  I'd rather see you spend your time on 
something worthwhile.  If data formats are your thing, you 
could help get Ludwig's JSON stuff in, or better yet, enable 
some nice binary data format.


You two are terrible at motivating people. Better D doesn't
support it well and JSON is superior through-and-through is
overly dismissive. To me it sounds like someone saying replace
C++ with JavaScript, because C++ is a horrible standard and
JavaScript is so much superior.  Honestly.


You seem to have missed the point of my post, which was to 
discourage him from working on an XML module for phobos.  As for 
motivating him, I suggested better alternatives.  And I never 
said JSON was great, but it's certainly _much_ more readable than 
XML, which is one of the basic goals of a text format.



Remember that while JSON is simpler, XML is not just a
structured container for bool, Number and String data. It
comes with many official side kicks covering a broad range of
use cases:

XPath:
 * allows you to use XML files like a textual database
 * complex enough to allow for almost any imaginable query
 * many tools emerged to test XPath expressions against XML 
documents

 * also powers XSLT
   (http://www.liquid-technologies.com/xpath-tutorial.aspx)

XSL (Extensible Stylesheet Language) and
XSLT (XSL Transformations):
 * written as XML documents
 * standard way to transform XML from one structure into another
 * convert or compile data to XHTML or SVG for display in a 
browser

 * output to XSL-FO

XSL-FO (XSL formatting objects):
 * written as XSL
 * type-setting for XML; a XSL-FO processor is similar to a 
LaTex processor
 * reads an XML document (a Format document) and outputs to a 
PDF, RTF or similar format


XML Schema Definition (XSD):
 * written as XML
 * linked in by an XML file
 * defines structure and validates content to some extent
 * can set constraints on how often an element can occur in a 
list
 * can validate data type of values (length, regex, positive, 
etc.)

 * database like unique IDs and references


These are all incredibly dumb ideas.  I don't deny that many 
people may use these things, but then people use hammers for all 
kinds of things they shouldn't use them for too. :)



I think XML is the most eat-your-own-dog-food language ever
and nicely covers a wide range of use cases.


The problem is you're still eating dog food. ;)


In any case there
are many XML based file formats that we might want to parse.
Amongst them SVG, OpenDocument (Open/LibreOffics), RSS feeds,
several US Offices, XMP and other meta data formats.


Sure, and if he has any real need for any of those, who are we to 
stop him?  But if he's just looking for some way to contribute, 
there are better ways.


On Monday, 4 May 2015 at 20:44:42 UTC, Jonathan M Davis wrote:
Also true. Many of us just don't find enough time to work on D, 
and we don't seem to do a good job of encouraging larger 
contributions to Phobos, so newcomers don't tend to contribute 
like that. And there's so much to do all around that the big 
stuff just falls by the wayside, and it really shouldn't.


This is why I keep asking Walter and Andrei for a list of big 
stuff on the wiki- they don't have to be big, just important- so 
that newcomers know where help is most needed.  Of course, it 
doesn't have to be them, it could be any member of the D core 
team, though whatever the BDFLs push for would have a bit more 
weight.


Re: std.xml2 (collecting features)

2015-05-09 Thread Craig Dillabaugh via Digitalmars-d

On Saturday, 9 May 2015 at 10:28:53 UTC, Joakim wrote:

On Monday, 4 May 2015 at 18:50:43 UTC, Marco Leise wrote:

On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:


clip



Remember that while JSON is simpler, XML is not just a
structured container for bool, Number and String data. It
comes with many official side kicks covering a broad range of
use cases:

XPath:
* allows you to use XML files like a textual database
* complex enough to allow for almost any imaginable query
* many tools emerged to test XPath expressions against XML 
documents

* also powers XSLT
  (http://www.liquid-technologies.com/xpath-tutorial.aspx)

XSL (Extensible Stylesheet Language) and
XSLT (XSL Transformations):
* written as XML documents
* standard way to transform XML from one structure into another
* convert or compile data to XHTML or SVG for display in a 
browser

* output to XSL-FO

XSL-FO (XSL formatting objects):
* written as XSL
* type-setting for XML; a XSL-FO processor is similar to a 
LaTex processor
* reads an XML document (a Format document) and outputs to a 
PDF, RTF or similar format


XML Schema Definition (XSD):
* written as XML
* linked in by an XML file
* defines structure and validates content to some extent
* can set constraints on how often an element can occur in a 
list
* can validate data type of values (length, regex, positive, 
etc.)

* database like unique IDs and references


These are all incredibly dumb ideas.  I don't deny that many 
people may use these things, but then people use hammers for 
all kinds of things they shouldn't use them for too. :)



I think XML is the most eat-your-own-dog-food language ever
and nicely covers a wide range of use cases.


The problem is you're still eating dog food. ;)


I have to agree with Joakim on this.  Having spent much of this 
past

week trying to get XML generated by gSOAP (project has some legacy
code) to work with JAXB (Java) has reinforced my dislike for XML.

I've used things like XPath and XLST in the past, so I can 
appreciate
their power, but think the 'jobs' they perform would be better 
supported

elsewhere (ie. language specific XML frameworks).

In trying to pass data between applications I just want a simple 
way
of packaging up the data and ideally making 
serialization/deserialization

easy for me.  At some point the programmer working on these needs
to understand and validate the data anyway.  Sure you can use 
DTD/XML Schema to
handle the validation part, but it is just easier to deal with 
that
within you own code - without having to learn a 'whole new 
language', that
is likely harder to grok than the tools you would have at your 
disposal

in your language of choice.

Having said all that.  As much as I share Joakim's sentiment that 
I wish
XML would just go away,  there is a lot of it out there, and I 
think having good support in Phobos is very valuable so I thank 
Robert for his efforts.


Craig






Re: std.xml2 (collecting features)

2015-05-06 Thread Richard Webb via Digitalmars-d

On 06/05/2015 07:31, Jacob Carlborg wrote:

On 2015-05-06 01:38, Walter Bright wrote:


I haven't read the Tango source code, but the performance of it's xml
was supposedly because it did not use the GC, it used slices.


That's only true for the pull parser (not sure about the SAX parser).
The DOM parser needs to allocate the nodes, but if I recall correctly
those are allocated in a free list. Not sure which parser was used in
the test.



The direct comparisons were with the DOM parsers (I was playing with a D 
port of some C++ code at work at the time, and that is DOM based).


xmlp has alternate parsers (event driven etc) which were faster in some 
simple tests i did, but I don't recall if I did a direct comparison with 
Tango there.


Re: std.xml2 (collecting features)

2015-05-06 Thread Brad Roberts via Digitalmars-d
An old friend of mine who was intimate with the microsoft xml parsers 
was fond of saying, particularly with respect to xml parsers, that if 
you hadn't finished implementing and testing error handling and negative 
tests (ie, malformed documents) that your positive benchmarks were 
fairly meaningless.  A whole lot of work goes into that 'second half' of 
things that can quickly cost performance.


I didn't dive or don't recall specific details as this was years ago.

The (over-)generalization from there is an old adage: it's easy to write 
an incorrect program.


On 5/5/2015 11:33 PM, Jacob Carlborg via Digitalmars-d wrote:

On 2015-05-05 16:04, Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?=
ola.fosheim.grostad+dl...@gmail.com wrote:


In my opinion it is rather difficult to build a good API without also
using the API in an application in parallel. So it would be a good
strategy to build a specific DOM along with writing the XML
infrastructure, like SVG/HTML.


Agree.


Also, some parsers, like RapidXML only support a subset of XML. So they
cannot be used for comparisons.


The Tango parser has some limitation as well. In some places it
sacrificed correctness for speed. There's a comment claiming the parser
might read past the input if it's not well formed.



Re: std.xml2 (collecting features)

2015-05-06 Thread Jacob Carlborg via Digitalmars-d
On 2015-05-05 16:04, Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
ola.fosheim.grostad+dl...@gmail.com wrote:



In my opinion it is rather difficult to build a good API without also
using the API in an application in parallel. So it would be a good
strategy to build a specific DOM along with writing the XML
infrastructure, like SVG/HTML.


Agree.


Also, some parsers, like RapidXML only support a subset of XML. So they
cannot be used for comparisons.


The Tango parser has some limitation as well. In some places it 
sacrificed correctness for speed. There's a comment claiming the parser 
might read past the input if it's not well formed.


--
/Jacob Carlborg


Re: std.xml2 (collecting features)

2015-05-06 Thread Jacob Carlborg via Digitalmars-d

On 2015-05-06 01:38, Walter Bright wrote:


I haven't read the Tango source code, but the performance of it's xml
was supposedly because it did not use the GC, it used slices.


That's only true for the pull parser (not sure about the SAX parser). 
The DOM parser needs to allocate the nodes, but if I recall correctly 
those are allocated in a free list. Not sure which parser was used in 
the test.


--
/Jacob Carlborg


Re: std.xml2 (collecting features)

2015-05-05 Thread Jacob Carlborg via Digitalmars-d
On 2015-05-05 12:41, Mario =?UTF-8?B?S3LDtnBsaW4i?= 
linkr...@github.com wrote:



Recently, I compared DOM parsers for an XML files of 100 MByte:

15.8 s tango.text.xml (SiegeLord/Tango-D2)
13.4 s ae.utils.xml (CyberShadow/ae)
  8.5 s xml.etree (Python)

Either the Tango DOM parser is slow compared to the Tango pull parser,


Yes, of course it's slower. The DOM parser creates a DOM as well, which 
the pull parser doesn't.


These other libraries, what kind of parsers are those using? I mean, 
it's not fair to compare a pull parser against a DOM parser.


Could you try D1 Tango as well? Or do you have the benchmark available 
somewhere?



or the D2 port ruined the performance.


Might be the case as well, see this comment [1].

[1] 
http://forum.dlang.org/thread/vsbsxfeciryrdsjhh...@forum.dlang.org?page=3#post-mi8hs8:24b0j:241:40digitalmars.com


--
/Jacob Carlborg


Re: std.xml2 (collecting features)

2015-05-05 Thread Walter Bright via Digitalmars-d

On 5/5/2015 4:16 AM, Richard Webb wrote:

Also, profiling showed a lot of time spent in the GC, and the recent
improvements in that area might have changed things by now.


I haven't read the Tango source code, but the performance of it's xml was 
supposedly because it did not use the GC, it used slices.


Re: std.xml2 (collecting features)

2015-05-05 Thread via Digitalmars-d

On Tuesday, 5 May 2015 at 12:10:59 UTC, Jacob Carlborg wrote:
Yes, of course it's slower. The DOM parser creates a DOM as 
well, which the pull parser doesn't.


These other libraries, what kind of parsers are those using? I 
mean, it's not fair to compare a pull parser against a DOM 
parser.


I agree. Most applications will use a DOM parser for convenience, 
so sacrificing some speed initially in favour of easy-of-use 
makes a lot of sense. As long as it is possible to improve it 
later (e.g. use SIMD scanning to find the end of CDATA etc).


In my opinion it is rather difficult to build a good API without 
also using the API in an application in parallel. So it would be 
a good strategy to build a specific DOM along with writing the 
XML infrastructure, like SVG/HTML.


Also, some parsers, like RapidXML only support a subset of XML. 
So they cannot be used for comparisons.


Re: std.xml2 (collecting features)

2015-05-05 Thread Marco Leise via Digitalmars-d
Am Tue, 05 May 2015 02:01:50 +
schrieb weaselcat weasel...@gmail.com:

 maybe off-topic, but it would be nice if the standard json,xml, 
 etc etc all had identical interfaces(except for 
 implementation-specific quirks.) This might be something worth 
 discussing if it wasn't already agreed upon.

I don't think this needs discussion. It is plain impossible to
have a sophisticated JSON parser and a sophisticated XML
parser share the same API. Established function names,
structural differences in the formats and feature sets differ
to much.
For example in XML attributes and child elements are used
somewhat interchangeably whereas in JSON attributes don't
exist. So while in JSON obj.field makes sense in XML you
would want to select either an attribute or an element with
the name field.

-- 
Marco



Re: std.xml2 (collecting features)

2015-05-05 Thread via Digitalmars-d

On Monday, 4 May 2015 at 19:31:59 UTC, Jonathan M Davis wrote:
Given how D's arrays work, we have the opportunity to have an 
_extremely_ fast XML parser thanks to slices.


Yes, that would be great. XML is a flexible go-to archive, 
exchange and application format.


Things like entities, namespaces and so makes it non-trivial, but 
being able to conveniently process Inkscape and Open Office files 
etc would be very useful.


One should probably look at what applications generate XML and 
create some large test files with existing applications.


Re: std.xml2 (collecting features)

2015-05-05 Thread via Digitalmars-d

On Monday, 4 May 2015 at 19:28:25 UTC, Jacob Carlborg wrote:

On 2015-05-03 19:39, Robert burner Schadek wrote:

Not much code yet, I'm currently building the performance test 
suite

https://github.com/burner/std.xml2


I recommend benchmarking against the Tango pull parser.


Recently, I compared DOM parsers for an XML files of 100 MByte:

15.8 s tango.text.xml (SiegeLord/Tango-D2)
13.4 s ae.utils.xml (CyberShadow/ae)
 8.5 s xml.etree (Python)

Either the Tango DOM parser is slow compared to the Tango pull 
parser,

or the D2 port ruined the performance.


Re: std.xml2 (collecting features)

2015-05-05 Thread John Colvin via Digitalmars-d

On Tuesday, 5 May 2015 at 10:41:37 UTC, Mario Kröplin wrote:

On Monday, 4 May 2015 at 19:28:25 UTC, Jacob Carlborg wrote:

On 2015-05-03 19:39, Robert burner Schadek wrote:

Not much code yet, I'm currently building the performance 
test suite

https://github.com/burner/std.xml2


I recommend benchmarking against the Tango pull parser.


Recently, I compared DOM parsers for an XML files of 100 MByte:

15.8 s tango.text.xml (SiegeLord/Tango-D2)
13.4 s ae.utils.xml (CyberShadow/ae)
 8.5 s xml.etree (Python)

Either the Tango DOM parser is slow compared to the Tango pull 
parser,

or the D2 port ruined the performance.


As usual: system, compiler, compiler version, compilation flags?


Re: std.xml2 (collecting features)

2015-05-05 Thread Richard Webb via Digitalmars-d
On 05/05/2015 11:41, Mario =?UTF-8?B?S3LDtnBsaW4i?= 
linkr...@github.com wrote:


Recently, I compared DOM parsers for an XML files of 100 MByte:

15.8 s tango.text.xml (SiegeLord/Tango-D2)
13.4 s ae.utils.xml (CyberShadow/ae)
  8.5 s xml.etree (Python)

Either the Tango DOM parser is slow compared to the Tango pull parser,
or the D2 port ruined the performance.



fwiw I did some tests a couple of years back with 
https://launchpad.net/d2-xml on 20 odd megabyte files and found it 
faster than Tango.
Unfortunately that would need some work to test now, as xmlp is 
abandoned and wouldn't build last time I tried it :-(


I also had some success with https://github.com/opticron/kxml, though it 
had some issues with chuffy entity decoding performance.



Also, profiling showed a lot of time spent in the GC, and the recent 
improvements in that area might have changed things by now.


Re: std.xml2 (collecting features)

2015-05-04 Thread Robert burner Schadek via Digitalmars-d

On Sunday, 3 May 2015 at 23:32:28 UTC, Michel Fortin wrote:


This isn't a feature request (sorry?), but I just want to point 
out that you should feel free to borrow code from 
https://github.com/michelf/mfr-xml-d  There's probably a lot 
you can reuse in there.


nice, thank you


Re: std.xml2 (collecting features)

2015-05-04 Thread via Digitalmars-d

On Sunday, 3 May 2015 at 22:02:13 UTC, Walter Bright wrote:

On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:

Can it lazily reads huge files (files greater than memory)?


If a range interface is used, it doesn't need to be aware of 
where the data is coming from. In fact, the xml package should 
NOT be doing I/O.


Wouldn't D-ranges make it impossible to use SIMD optimizations  
when scanning?


However, it would make a lot of sense to just convert an existing 
XML solution with Boost license. I don't know which ones are any 
good, but RapidXML is at least Boost.


Re: std.xml2 (collecting features)

2015-05-04 Thread Marco Leise via Digitalmars-d
On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:

 My request: just skip it.  XML is a horrible waste of space for 
 a standard, better D doesn't support it well, anything to 
 discourage it's use.  I'd rather see you spend your time on 
 something worthwhile.  If data formats are your thing, you 
 could help get Ludwig's JSON stuff in, or better yet, enable 
 some nice binary data format.

Am Sun, 03 May 2015 18:44:11 +
schrieb w0rp devw...@gmail.com:

 I agree that JSON is superior through-and-through, but legacy 
 support matters, and XML is in many places. It's good to have a 
 quality XML parsing library.

You two are terrible at motivating people. Better D doesn't
support it well and JSON is superior through-and-through is
overly dismissive. To me it sounds like someone saying replace
C++ with JavaScript, because C++ is a horrible standard and
JavaScript is so much superior.  Honestly.

Remember that while JSON is simpler, XML is not just a
structured container for bool, Number and String data. It
comes with many official side kicks covering a broad range of
use cases:

XPath:
 * allows you to use XML files like a textual database
 * complex enough to allow for almost any imaginable query
 * many tools emerged to test XPath expressions against XML documents
 * also powers XSLT
   (http://www.liquid-technologies.com/xpath-tutorial.aspx)

XSL (Extensible Stylesheet Language) and
XSLT (XSL Transformations):
 * written as XML documents
 * standard way to transform XML from one structure into another
 * convert or compile data to XHTML or SVG for display in a browser
 * output to XSL-FO

XSL-FO (XSL formatting objects):
 * written as XSL
 * type-setting for XML; a XSL-FO processor is similar to a LaTex processor
 * reads an XML document (a Format document) and outputs to a PDF, RTF or 
similar format

XML Schema Definition (XSD):
 * written as XML
 * linked in by an XML file
 * defines structure and validates content to some extent
 * can set constraints on how often an element can occur in a list
 * can validate data type of values (length, regex, positive, etc.)
 * database like unique IDs and references

I think XML is the most eat-your-own-dog-food language ever
and nicely covers a wide range of use cases. In any case there
are many XML based file formats that we might want to parse.
Amongst them SVG, OpenDocument (Open/LibreOffics), RSS feeds,
several US Offices, XMP and other meta data formats.

When it comes to which features to support, I personally used
XSD more than XPath and the tech using it. But quite frankly
both would be expected by users. Based on XPath, XSL
transformations can be added any time then. Anything beyond
that doesn't feel quite core enough to be in a XML module.

-- 
Marco



Re: std.xml2 (collecting features)

2015-05-04 Thread Marco Leise via Digitalmars-d
Am Sun, 03 May 2015 14:00:11 -0700
schrieb Walter Bright newshou...@digitalmars.com:

 On 5/3/2015 10:39 AM, Robert burner Schadek wrote:
  - CTS for encoding (ubyte(ASCII), char(utf8), ... )
 
 Encoding schemes should be handled by adapter algorithms, not in the XML 
 parser 
 itself, which should only handle UTF8.

Unlike JSON, XML actually declares the encoding in the prolog,
e.g.: ?xml version=1.0 encoding=Windows-1252?

-- 
Marco



Re: std.xml2 (collecting features)

2015-05-04 Thread Jonathan M Davis via Digitalmars-d
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY 
and on topic.


If I were doing it, I'd do three types of parsers:

1. A parser that was pretty much as low level as you can get, 
where you basically a range of XML atributes or tags. Exactly how 
to build that could be a bit entertaining, since it would have to 
be hierarchical, and ranges aren't, but something like a range of 
tags where you can get a range of its attributes and sub-tags 
from it so that the whole document can be processed without 
actually getting to the level of even a SAX parser. That parser 
could then be used to build the other parsers, and anyone who 
needed insanely fast speeds could use it rather than the SAX or 
DOM parser so long as they were willing to pay the inevitable 
loss in user-friendliness.


2. SAX parser built on the low level parser.

3. DOM parser built either on the low level parser or the SAX 
parser (whichever made more sense).


I doubt that I'm really explaining the low level parser well 
enough or have even though through it enough, but I really think 
that even a SAX parser is too high level for the base parser and 
that something that slightly higher than a lexer (high enough to 
actually be processing XML rather than individual tokens but 
pretty much only as high as is required to do that) would be a 
far better choice.


IIRC, Michel Fortin's work went in that direction, and he linked 
to his code in another post, so I'd suggest at least looking at 
that for ideas.


Regardless, by building layers of XML parsers rather than just 
the standard ones, it should be possible to get higher 
performance while still having the more standard, user-friendly 
ones for those that don't need the full performance and do need 
the user-friendliness (though of course, we do want the SAX and 
DOM parsers to be efficient as well).


- Jonathan M Davis


Re: std.xml2 (collecting features)

2015-05-04 Thread Jacob Carlborg via Digitalmars-d

On 2015-05-03 19:39, Robert burner Schadek wrote:


Not much code yet, I'm currently building the performance test suite
https://github.com/burner/std.xml2


I recommend benchmarking against the Tango pull parser.

--
/Jacob Carlborg


Re: std.xml2 (collecting features)

2015-05-04 Thread Jacob Carlborg via Digitalmars-d

On 2015-05-04 21:14, Jonathan M Davis wrote:


If I were doing it, I'd do three types of parsers:

1. A parser that was pretty much as low level as you can get, where you
basically a range of XML atributes or tags. Exactly how to build that
could be a bit entertaining, since it would have to be hierarchical, and
ranges aren't, but something like a range of tags where you can get a
range of its attributes and sub-tags from it so that the whole document
can be processed without actually getting to the level of even a SAX
parser. That parser could then be used to build the other parsers, and
anyone who needed insanely fast speeds could use it rather than the SAX
or DOM parser so long as they were willing to pay the inevitable loss in
user-friendliness.

2. SAX parser built on the low level parser.

3. DOM parser built either on the low level parser or the SAX parser
(whichever made more sense).

I doubt that I'm really explaining the low level parser well enough or
have even though through it enough, but I really think that even a SAX
parser is too high level for the base parser and that something that
slightly higher than a lexer (high enough to actually be processing XML
rather than individual tokens but pretty much only as high as is
required to do that) would be a far better choice.

IIRC, Michel Fortin's work went in that direction, and he linked to his
code in another post, so I'd suggest at least looking at that for ideas.


This way the XML parser is structured in Tango. A pull parser at the 
lowest level, a SAX parser on top of that and I think the DOM parser 
builds on top of the pull parser.


The Tango pull parser can give you the following tokens:

* start element
* attribute
* end element
* end empty element
* data
* comment
* cdata
* doctype
* pi

--
/Jacob Carlborg


Re: std.xml2 (collecting features)

2015-05-04 Thread Jonathan M Davis via Digitalmars-d

On Sunday, 3 May 2015 at 22:02:13 UTC, Walter Bright wrote:

On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:

Can it lazily reads huge files (files greater than memory)?


If a range interface is used, it doesn't need to be aware of 
where the data is coming from. In fact, the xml package should 
NOT be doing I/O.


Indeed. It should operate on ranges without caring where they 
came from (though it may end up supporting both input ranges and 
random-access ranges with the idea that it can support reading of 
a socket with a range in a less efficient manner or operating on 
a whole file at once as via a random-access range for more 
efficient parsing).


But if I/O is a big concern, I'd suggest just using std.mmfile to 
do the trick, since then you can still operate on the whole file 
as a single array without having to actually have the whole thing 
in memory.


- Jonathan M Davis


Re: std.xml2 (collecting features)

2015-05-04 Thread Jonathan M Davis via Digitalmars-d

On Monday, 4 May 2015 at 09:35:55 UTC, Ola Fosheim Grøstad wrote:
However, it would make a lot of sense to just convert an 
existing XML solution with Boost license. I don't know which 
ones are any good, but RapidXML is at least Boost.


Given how D's arrays work, we have the opportunity to have an 
_extremely_ fast XML parser thanks to slices. It's highly 
unlikely that any C or C++ solution is going to be able to 
compete, and if it can, it's likely to be far more complex than 
necessary. Parsing is an area where we definitely should write 
our own stuff rather than porting existing code from other 
languages or use existing libraries in other languages via C 
bindings. Fast parsing is definitely a killer feature of D and 
the fact that std.xml botches that so badly is just embarrassing.


- Jonathan M Davis


Re: std.xml2 (collecting features)

2015-05-04 Thread Rikki Cattermole via Digitalmars-d

On 5/05/2015 10:45 a.m., Liam McSherry wrote:

On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek
wrote:

std.xml has been considered not up to specs nearly 3 years now. Time
to build a successor. I currently plan the following featues for it:

- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test suite
https://github.com/burner/std.xml2

Please post you feature requests, and please keep the posts DRY and on
topic.


Not a feature, but if `std.data.json` [1] gets accepted in to
Phobos, it may be something to consider naming this
`std.data.xml` (although that might not as effectively
differentiate it from `std.xml`).

[1]: http://wiki.dlang.org/Review_Queue


It really should be std.data.xml. To keep with the new structuring. Plus 
it'll make transitioning a little easier.


Re: std.xml2 (collecting features)

2015-05-04 Thread weaselcat via Digitalmars-d
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY 
and on topic.


maybe off-topic, but it would be nice if the standard json,xml, 
etc etc all had identical interfaces(except for 
implementation-specific quirks.) This might be something worth 
discussing if it wasn't already agreed upon.


Re: std.xml2 (collecting features)

2015-05-04 Thread Walter Bright via Digitalmars-d
On 5/4/2015 2:35 AM, Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= 
ola.fosheim.grostad+dl...@gmail.com wrote:

Wouldn't D-ranges make it impossible to use SIMD optimizations when scanning?


Not at all. Algorithms can be specialized for various forms of input ranges, 
including ones where SIMD optimizations can be used.


Specialization is one of the very cool things about D algorithms.



Re: std.xml2 (collecting features)

2015-05-04 Thread Walter Bright via Digitalmars-d

On 5/4/2015 12:28 PM, Jacob Carlborg wrote:

On 2015-05-03 19:39, Robert burner Schadek wrote:


Not much code yet, I'm currently building the performance test suite
https://github.com/burner/std.xml2


I recommend benchmarking against the Tango pull parser.


I agree. The Tango XML parser has set the performance bar. If any new solution 
can't match that, throw it out and try again.




Re: std.xml2 (collecting features)

2015-05-04 Thread Liam McSherry via Digitalmars-d

On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek
wrote:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY 
and on topic.


Not a feature, but if `std.data.json` [1] gets accepted in to
Phobos, it may be something to consider naming this
`std.data.xml` (although that might not as effectively
differentiate it from `std.xml`).

[1]: http://wiki.dlang.org/Review_Queue


Re: std.xml2 (collecting features)

2015-05-04 Thread Jonathan M Davis via Digitalmars-d

On Monday, 4 May 2015 at 19:45:18 UTC, Andrei Alexandrescu wrote:

On 5/4/15 12:31 PM, Jonathan M Davis wrote:

Fast parsing is definitely a killer feature of
D and the fact that std.xml botches that so badly is just 
embarrassing.


To be frank what's more embarrassing is that we managed to do 
nothing about it for years (aside from endlessly wailing about 
it in an a capella ensemble). It's a failure of leadership 
(that Walter and I need to work on) that very many unimportant 
and arguably less interesting areas of Phobos get attention at 
the expense of this one. -- Andrei


Also true. Many of us just don't find enough time to work on D, 
and we don't seem to do a good job of encouraging larger 
contributions to Phobos, so newcomers don't tend to contribute 
like that. And there's so much to do all around that the big 
stuff just falls by the wayside, and it really shouldn't.


- Jonathan M Davis


Re: std.xml2 (collecting features)

2015-05-04 Thread Jacob Carlborg via Digitalmars-d

On 2015-05-03 19:39, Robert burner Schadek wrote:


Not much code yet, I'm currently building the performance test suite
https://github.com/burner/std.xml2


There are a couple of interesting comments about the Tango pull parser 
that can be worth mentioning:


* Use -version=whitespace to retain whitespace as data nodes. We see a 
%25 increase in token count and 10% throughput drop when parsing 
hamlet.xml with this option enabled (pullparser alone)


* The parser is constructed with some tradeoffs relating to document 
integrity. It is generally optimized for well-formed documents, and 
currently may read past a document-end for those that are not well formed


* Making some tiny unrelated change to the code can cause notable 
throughput changes. We're not yet clear why these swings are so 
pronounced (for changes outside the code path) but they seem to be 
related to the alignment of codegen. It could be a cache-line issue, or 
something else


The last comment might not relevant anymore since these are all quite 
old comments.


--
/Jacob Carlborg


Re: std.xml2 (collecting features)

2015-05-04 Thread Walter Bright via Digitalmars-d

On 5/4/2015 12:31 PM, Jonathan M Davis wrote:

Given how D's arrays work, we have the opportunity to have an _extremely_ fast
XML parser thanks to slices. It's highly unlikely that any C or C++ solution is
going to be able to compete, and if it can, it's likely to be far more complex
than necessary. Parsing is an area where we definitely should write our own
stuff rather than porting existing code from other languages or use existing
libraries in other languages via C bindings. Fast parsing is definitely a killer
feature of D and the fact that std.xml botches that so badly is just 
embarrassing.


Tango's XML package was well regarded and the fastest in the business. It used 
slicing, and almost no memory allocation.




Re: std.xml2 (collecting features)

2015-05-04 Thread Andrei Alexandrescu via Digitalmars-d

On 5/4/15 12:31 PM, Jonathan M Davis wrote:

On Monday, 4 May 2015 at 09:35:55 UTC, Ola Fosheim Grøstad wrote:

However, it would make a lot of sense to just convert an existing XML
solution with Boost license. I don't know which ones are any good, but
RapidXML is at least Boost.


Given how D's arrays work, we have the opportunity to have an
_extremely_ fast XML parser thanks to slices. It's highly unlikely that
any C or C++ solution is going to be able to compete, and if it can,
it's likely to be far more complex than necessary. Parsing is an area
where we definitely should write our own stuff rather than porting
existing code from other languages or use existing libraries in other
languages via C bindings. Fast parsing is definitely a killer feature of
D and the fact that std.xml botches that so badly is just embarrassing.


To be frank what's more embarrassing is that we managed to do nothing 
about it for years (aside from endlessly wailing about it in an a 
capella ensemble). It's a failure of leadership (that Walter and I need 
to work on) that very many unimportant and arguably less interesting 
areas of Phobos get attention at the expense of this one. -- Andrei




Re: std.xml2 (collecting features)

2015-05-03 Thread Robert burner Schadek via Digitalmars-d

- CTS to disable parsing location (line,column)


Re: std.xml2 (collecting features)

2015-05-03 Thread Joakim via Digitalmars-d
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY 
and on topic.


My request: just skip it.  XML is a horrible waste of space for a 
standard, better D doesn't support it well, anything to 
discourage it's use.  I'd rather see you spend your time on 
something worthwhile.  If data formats are your thing, you could 
help get Ludwig's JSON stuff in, or better yet, enable some nice 
binary data format.


Re: std.xml2 (collecting features)

2015-05-03 Thread Walter Bright via Digitalmars-d

On 5/3/2015 10:39 AM, Robert burner Schadek wrote:

Please post you feature requests, and please keep the posts DRY and on topic.


Pipeline range interface, for example:

source.xmlparse(configuration).whatever();



Re: std.xml2 (collecting features)

2015-05-03 Thread Walter Bright via Digitalmars-d

On 5/3/2015 10:39 AM, Robert burner Schadek wrote:

- CTS for encoding (ubyte(ASCII), char(utf8), ... )


Encoding schemes should be handled by adapter algorithms, not in the XML parser 
itself, which should only handle UTF8.


Re: std.xml2 (collecting features)

2015-05-03 Thread Ilya Yaroshenko via Digitalmars-d

Can it lazily reads huge files (files greater than memory)?


Re: std.xml2 (collecting features)

2015-05-03 Thread Michel Fortin via Digitalmars-d
On 2015-05-03 17:39:46 +, Robert burner Schadek 
rburn...@gmail.com said:


std.xml has been considered not up to specs nearly 3 years now. Time to 
build a successor. I currently plan the following featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test suite 
https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY and on topic.


This isn't a feature request (sorry?), but I just want to point out 
that you should feel free to borrow code from 
https://github.com/michelf/mfr-xml-d  There's probably a lot you can 
reuse in there.


--
Michel Fortin
michel.for...@michelf.ca
http://michelf.ca



Re: std.xml2 (collecting features)

2015-05-03 Thread wobbles via Digitalmars-d

On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek
wrote:
std.xml has been considered not up to specs nearly 3 years now. 
Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts DRY 
and on topic.


Could possibly use pegged to do it?
It may simplify the parsing portion of it for you at least.


Re: std.xml2 (collecting features)

2015-05-03 Thread Walter Bright via Digitalmars-d

On 5/3/2015 10:39 AM, Robert burner Schadek wrote:

Please post you feature requests, and please keep the posts DRY and on topic.


Try to design the interface to it so it does not inherently require the 
implementation to allocate GC memory.


Re: std.xml2 (collecting features)

2015-05-03 Thread Walter Bright via Digitalmars-d

On 5/3/2015 2:31 PM, Ilya Yaroshenko wrote:

Can it lazily reads huge files (files greater than memory)?


If a range interface is used, it doesn't need to be aware of where the data is 
coming from. In fact, the xml package should NOT be doing I/O.


Re: std.xml2 (collecting features)

2015-05-03 Thread Rikki Cattermole via Digitalmars-d

On 4/05/2015 5:39 a.m., Robert burner Schadek wrote:

std.xml has been considered not up to specs nearly 3 years now. Time to
build a successor. I currently plan the following featues for it:

- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test suite
https://github.com/burner/std.xml2

Please post you feature requests, and please keep the posts DRY and on
topic.


Preferably the interfaces are made first 1:1 as the spec requires.
Then its just a matter of building the actual reader/writer code.

That way we could theoretically rewrite the reader/writer to support 
other formats such as html5/svg. Independently of phobos.


Also would be nice to be CTFE'able!


Re: std.xml2 (collecting features)

2015-05-03 Thread Meta via Digitalmars-d

On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years 
now. Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts 
DRY and on topic.


My request: just skip it.  XML is a horrible waste of space for 
a standard, better D doesn't support it well, anything to 
discourage it's use.  I'd rather see you spend your time on 
something worthwhile.  If data formats are your thing, you 
could help get Ludwig's JSON stuff in, or better yet, enable 
some nice binary data format.


That's not really an option considering the huge amount of XML 
data there is out there.


Re: std.xml2 (collecting features)

2015-05-03 Thread w0rp via Digitalmars-d

On Sunday, 3 May 2015 at 17:47:15 UTC, Joakim wrote:
On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek 
wrote:
std.xml has been considered not up to specs nearly 3 years 
now. Time to build a successor. I currently plan the following 
featues for it:


- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test 
suite https://github.com/burner/std.xml2


Please post you feature requests, and please keep the posts 
DRY and on topic.


My request: just skip it.  XML is a horrible waste of space for 
a standard, better D doesn't support it well, anything to 
discourage it's use.  I'd rather see you spend your time on 
something worthwhile.  If data formats are your thing, you 
could help get Ludwig's JSON stuff in, or better yet, enable 
some nice binary data format.


I agree that JSON is superior through-and-through, but legacy 
support matters, and XML is in many places. It's good to have a 
quality XML parsing library.