Re: [Pharo-dev] How to get rid of empty XML nodes?

Stephane Ducasse Mon, 29 Jan 2018 12:11:45 -0800

Tx monty.
I will update it because I do not want to lose all the pds if bintray collapse.
I plan to revise all the booklets since I will put them on lulu so
that people can get them printed.


Stef


On Mon, Jan 29, 2018 at 2:00 PM, monty <mon...@programmer.net> wrote:
> I attached a commit patch (apply with `git am ...`) to the 'books.pharo.org' 
> repo to update the Scraping .pdf link. (The .pdf it links to now is obsolete.)
>
>> Sent: Friday, January 26, 2018 at 2:30 PM
>> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
>> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
>> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>>
>> Tx Monty!
>> This is a really important addition :)
>> Because a super frequent scenario.
>>
>> Stef
>>
>> On Fri, Jan 26, 2018 at 8:37 AM, monty <mon...@programmer.net> wrote:
>> > See #removeAllFormattingNodes and its comment in the latest version.
>> >
>> > And instances of SAXHandler and subclasses are meant to be created with 
>> > #on: (or another "instance creation" message), _not #new_, otherwise they 
>> > won't be properly initialized. The class comment is clear about this, but 
>> > I should have overridden #new to raise an error like Stream does. Your 
>> > misuse was helpful in bringing this to my attention, and I added a 
>> > Stream-like #new implementation to SAXHandler.
>> >
>> >> Sent: Friday, December 08, 2017 at 9:21 AM
>> >> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
>> >> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
>> >> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>> >>
>> >> Hi monty
>> >>
>> >>
>> >> On Fri, Dec 8, 2017 at 9:03 AM, monty <mon...@programmer.net> wrote:
>> >> > By "empty XML nodes," do you mean whitespace-only string nodes?
>> >>
>> >> Yes
>> >>
>> >> > Those are included because all in-element whitespace is assumed 
>> >> > significant by the spec: https://www.w3.org/TR/xml/#sec-white-space
>> >>
>> >> I know. There was a discussion a while ago. I just lost a couple of
>> >> hours understanding that :(
>> >>
>> >> But this is a super super super annoying practices.
>> >> We had to test each nodes to see if it is a empty nodes so it makes
>> >> everything a lot more complex without real justification
>> >> beside the fact that these standardizers probably never implemented
>> >> some real cases.
>> >> This standard is a really out of reality from that perspective.
>> >>
>> >> > The exception is if the element is declared in the DTD as only having 
>> >> > element children ("element content"): 
>> >> > https://www.w3.org/TR/xml/#dt-elemcontent
>> >>
>> >> Well the XML files that I had (I did not choose XML because I would
>> >> have prefer JSON :) ), had no DTD :(
>> >>
>> >> So at the end of the day, this wonderful standard puts all the stress
>> >> and burden to people.
>> >>
>> >> >
>> >> > For example, if you declare an element like this:
>> >> >
>> >> > <!ELEMENT one (two,three*,four?)>
>> >> >
>> >> > Any whitespace around a "two," "three," or "four" element child of a 
>> >> > "one" element is insignificant and ignored (unless 
>> >> > #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 
>> >> > and Xerces, behave the same way.
>> >> >
>> >> > I'll see if I can come up with some easier way to deal with this, like 
>> >> > an optional parser setting, new enumeration methods, or maybe a tree 
>> >> > transformation.
>> >>
>> >> It would be A HUGE PLUS!!!!!!!!!!!!!!!!!!
>> >>
>> >>
>> >> Because reality is that people have XML files with just nodes and no
>> >> empty nodes and they are forced to
>> >> Let me know because I could try.
>> >>
>> >> I was showing how to use Pharo to import code to pharo learners and
>> >> this was a big drag.
>> >>
>> >> Stef
>> >>
>> >>
>> >> I tried to set some values in the parser but it did not work.
>> >> BTW I saw that the configuration logic forces to write the following
>> >>
>> >> | parser doc visitor |
>> >> parser := XMLDOMParser new
>> >>    on: self xmlContents;
>> >>    preservesIgnorableWhitespace: true.
>> >>
>> >> and not
>> >>
>> >> | parser doc visitor |
>> >> parser := XMLDOMParser new
>> >>     preservesIgnorableWhitespace: true.
>> >>     on: self xmlContents;
>> >>
>> >>
>> >> >
>> >> >> Sent: Tuesday, December 05, 2017 at 8:29 AM
>> >> >> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
>> >> >> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
>> >> >> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>> >> >>
>> >> >> )Hi
>> >> >>
>> >> >> we are manipulating an XML document and I would like to get rid of the
>> >> >> spurious empty string.
>> >> >> We saw that the gt panes are doing it.
>> >> >>
>> >> >> (aNodeWithElements isStringNode
>> >> >> and: [aNodeWithElements isEmpty
>> >> >> or: [aNodeWithElements isWhitespace]]
>> >> >>
>> >> >> Is there a way not to produce empty nodes?
>> >> >> Is there a simple way not to have to handle them
>> >> >>
>> >> >> Now each time we are dealing with a node with have to check.
>> >> >>
>> >> >> Stef
>> >> >>
>> >> >>
>> >> >
>> >>
>> >>
>> >
>>
>>

Re: [Pharo-dev] How to get rid of empty XML nodes?

Reply via email to