Re: [Pharo-dev] How to get rid of empty XML nodes?

2018-01-29 Thread Stephane Ducasse
Tx monty.
I will update it because I do not want to lose all the pds if bintray collapse.
I plan to revise all the booklets since I will put them on lulu so
that people can get them printed.

Stef


On Mon, Jan 29, 2018 at 2:00 PM, monty <mon...@programmer.net> wrote:
> I attached a commit patch (apply with `git am ...`) to the 'books.pharo.org' 
> repo to update the Scraping .pdf link. (The .pdf it links to now is obsolete.)
>
>> Sent: Friday, January 26, 2018 at 2:30 PM
>> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
>> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
>> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>>
>> Tx Monty!
>> This is a really important addition :)
>> Because a super frequent scenario.
>>
>> Stef
>>
>> On Fri, Jan 26, 2018 at 8:37 AM, monty <mon...@programmer.net> wrote:
>> > See #removeAllFormattingNodes and its comment in the latest version.
>> >
>> > And instances of SAXHandler and subclasses are meant to be created with 
>> > #on: (or another "instance creation" message), _not #new_, otherwise they 
>> > won't be properly initialized. The class comment is clear about this, but 
>> > I should have overridden #new to raise an error like Stream does. Your 
>> > misuse was helpful in bringing this to my attention, and I added a 
>> > Stream-like #new implementation to SAXHandler.
>> >
>> >> Sent: Friday, December 08, 2017 at 9:21 AM
>> >> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
>> >> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
>> >> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>> >>
>> >> Hi monty
>> >>
>> >>
>> >> On Fri, Dec 8, 2017 at 9:03 AM, monty <mon...@programmer.net> wrote:
>> >> > By "empty XML nodes," do you mean whitespace-only string nodes?
>> >>
>> >> Yes
>> >>
>> >> > Those are included because all in-element whitespace is assumed 
>> >> > significant by the spec: https://www.w3.org/TR/xml/#sec-white-space
>> >>
>> >> I know. There was a discussion a while ago. I just lost a couple of
>> >> hours understanding that :(
>> >>
>> >> But this is a super super super annoying practices.
>> >> We had to test each nodes to see if it is a empty nodes so it makes
>> >> everything a lot more complex without real justification
>> >> beside the fact that these standardizers probably never implemented
>> >> some real cases.
>> >> This standard is a really out of reality from that perspective.
>> >>
>> >> > The exception is if the element is declared in the DTD as only having 
>> >> > element children ("element content"): 
>> >> > https://www.w3.org/TR/xml/#dt-elemcontent
>> >>
>> >> Well the XML files that I had (I did not choose XML because I would
>> >> have prefer JSON :) ), had no DTD :(
>> >>
>> >> So at the end of the day, this wonderful standard puts all the stress
>> >> and burden to people.
>> >>
>> >> >
>> >> > For example, if you declare an element like this:
>> >> >
>> >> > 
>> >> >
>> >> > Any whitespace around a "two," "three," or "four" element child of a 
>> >> > "one" element is insignificant and ignored (unless 
>> >> > #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 
>> >> > and Xerces, behave the same way.
>> >> >
>> >> > I'll see if I can come up with some easier way to deal with this, like 
>> >> > an optional parser setting, new enumeration methods, or maybe a tree 
>> >> > transformation.
>> >>
>> >> It would be A HUGE PLUS!!
>> >>
>> >>
>> >> Because reality is that people have XML files with just nodes and no
>> >> empty nodes and they are forced to
>> >> Let me know because I could try.
>> >>
>> >> I was showing how to use Pharo to import code to pharo learners and
>> >> this was a big drag.
>> >>
>> >> Stef
>> >>
>> >>
>> >> I tried to set some values in the parser but it did not work.
>> >> BTW I saw that th

Re: [Pharo-dev] How to get rid of empty XML nodes?

2018-01-29 Thread monty
I attached a commit patch (apply with `git am ...`) to the 'books.pharo.org' 
repo to update the Scraping .pdf link. (The .pdf it links to now is obsolete.)

> Sent: Friday, January 26, 2018 at 2:30 PM
> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>
> Tx Monty!
> This is a really important addition :)
> Because a super frequent scenario.
> 
> Stef
> 
> On Fri, Jan 26, 2018 at 8:37 AM, monty <mon...@programmer.net> wrote:
> > See #removeAllFormattingNodes and its comment in the latest version.
> >
> > And instances of SAXHandler and subclasses are meant to be created with 
> > #on: (or another "instance creation" message), _not #new_, otherwise they 
> > won't be properly initialized. The class comment is clear about this, but I 
> > should have overridden #new to raise an error like Stream does. Your misuse 
> > was helpful in bringing this to my attention, and I added a Stream-like 
> > #new implementation to SAXHandler.
> >
> >> Sent: Friday, December 08, 2017 at 9:21 AM
> >> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
> >> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
> >> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
> >>
> >> Hi monty
> >>
> >>
> >> On Fri, Dec 8, 2017 at 9:03 AM, monty <mon...@programmer.net> wrote:
> >> > By "empty XML nodes," do you mean whitespace-only string nodes?
> >>
> >> Yes
> >>
> >> > Those are included because all in-element whitespace is assumed 
> >> > significant by the spec: https://www.w3.org/TR/xml/#sec-white-space
> >>
> >> I know. There was a discussion a while ago. I just lost a couple of
> >> hours understanding that :(
> >>
> >> But this is a super super super annoying practices.
> >> We had to test each nodes to see if it is a empty nodes so it makes
> >> everything a lot more complex without real justification
> >> beside the fact that these standardizers probably never implemented
> >> some real cases.
> >> This standard is a really out of reality from that perspective.
> >>
> >> > The exception is if the element is declared in the DTD as only having 
> >> > element children ("element content"): 
> >> > https://www.w3.org/TR/xml/#dt-elemcontent
> >>
> >> Well the XML files that I had (I did not choose XML because I would
> >> have prefer JSON :) ), had no DTD :(
> >>
> >> So at the end of the day, this wonderful standard puts all the stress
> >> and burden to people.
> >>
> >> >
> >> > For example, if you declare an element like this:
> >> >
> >> > 
> >> >
> >> > Any whitespace around a "two," "three," or "four" element child of a 
> >> > "one" element is insignificant and ignored (unless 
> >> > #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and 
> >> > Xerces, behave the same way.
> >> >
> >> > I'll see if I can come up with some easier way to deal with this, like 
> >> > an optional parser setting, new enumeration methods, or maybe a tree 
> >> > transformation.
> >>
> >> It would be A HUGE PLUS!!
> >>
> >>
> >> Because reality is that people have XML files with just nodes and no
> >> empty nodes and they are forced to
> >> Let me know because I could try.
> >>
> >> I was showing how to use Pharo to import code to pharo learners and
> >> this was a big drag.
> >>
> >> Stef
> >>
> >>
> >> I tried to set some values in the parser but it did not work.
> >> BTW I saw that the configuration logic forces to write the following
> >>
> >> | parser doc visitor |
> >> parser := XMLDOMParser new
> >>on: self xmlContents;
> >>preservesIgnorableWhitespace: true.
> >>
> >> and not
> >>
> >> | parser doc visitor |
> >> parser := XMLDOMParser new
> >> preservesIgnorableWhitespace: true.
> >> on: self xmlContents;
> >>
> >>
> >> >
> >> >> Sent: Tuesday, December 05, 2017 at 8:29 AM
> >> >> From: "Stephane Ducasse" <stepha

Re: [Pharo-dev] How to get rid of empty XML nodes?

2018-01-26 Thread Stephane Ducasse
Tx Monty!
This is a really important addition :)
Because a super frequent scenario.

Stef

On Fri, Jan 26, 2018 at 8:37 AM, monty <mon...@programmer.net> wrote:
> See #removeAllFormattingNodes and its comment in the latest version.
>
> And instances of SAXHandler and subclasses are meant to be created with #on: 
> (or another "instance creation" message), _not #new_, otherwise they won't be 
> properly initialized. The class comment is clear about this, but I should 
> have overridden #new to raise an error like Stream does. Your misuse was 
> helpful in bringing this to my attention, and I added a Stream-like #new 
> implementation to SAXHandler.
>
>> Sent: Friday, December 08, 2017 at 9:21 AM
>> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
>> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
>> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>>
>> Hi monty
>>
>>
>> On Fri, Dec 8, 2017 at 9:03 AM, monty <mon...@programmer.net> wrote:
>> > By "empty XML nodes," do you mean whitespace-only string nodes?
>>
>> Yes
>>
>> > Those are included because all in-element whitespace is assumed 
>> > significant by the spec: https://www.w3.org/TR/xml/#sec-white-space
>>
>> I know. There was a discussion a while ago. I just lost a couple of
>> hours understanding that :(
>>
>> But this is a super super super annoying practices.
>> We had to test each nodes to see if it is a empty nodes so it makes
>> everything a lot more complex without real justification
>> beside the fact that these standardizers probably never implemented
>> some real cases.
>> This standard is a really out of reality from that perspective.
>>
>> > The exception is if the element is declared in the DTD as only having 
>> > element children ("element content"): 
>> > https://www.w3.org/TR/xml/#dt-elemcontent
>>
>> Well the XML files that I had (I did not choose XML because I would
>> have prefer JSON :) ), had no DTD :(
>>
>> So at the end of the day, this wonderful standard puts all the stress
>> and burden to people.
>>
>> >
>> > For example, if you declare an element like this:
>> >
>> > 
>> >
>> > Any whitespace around a "two," "three," or "four" element child of a "one" 
>> > element is insignificant and ignored (unless 
>> > #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and 
>> > Xerces, behave the same way.
>> >
>> > I'll see if I can come up with some easier way to deal with this, like an 
>> > optional parser setting, new enumeration methods, or maybe a tree 
>> > transformation.
>>
>> It would be A HUGE PLUS!!
>>
>>
>> Because reality is that people have XML files with just nodes and no
>> empty nodes and they are forced to
>> Let me know because I could try.
>>
>> I was showing how to use Pharo to import code to pharo learners and
>> this was a big drag.
>>
>> Stef
>>
>>
>> I tried to set some values in the parser but it did not work.
>> BTW I saw that the configuration logic forces to write the following
>>
>> | parser doc visitor |
>> parser := XMLDOMParser new
>>on: self xmlContents;
>>preservesIgnorableWhitespace: true.
>>
>> and not
>>
>> | parser doc visitor |
>> parser := XMLDOMParser new
>> preservesIgnorableWhitespace: true.
>> on: self xmlContents;
>>
>>
>> >
>> >> Sent: Tuesday, December 05, 2017 at 8:29 AM
>> >> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
>> >> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
>> >> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>> >>
>> >> )Hi
>> >>
>> >> we are manipulating an XML document and I would like to get rid of the
>> >> spurious empty string.
>> >> We saw that the gt panes are doing it.
>> >>
>> >> (aNodeWithElements isStringNode
>> >> and: [aNodeWithElements isEmpty
>> >> or: [aNodeWithElements isWhitespace]]
>> >>
>> >> Is there a way not to produce empty nodes?
>> >> Is there a simple way not to have to handle them
>> >>
>> >> Now each time we are dealing with a node with have to check.
>> >>
>> >> Stef
>> >>
>> >>
>> >
>>
>>
>



Re: [Pharo-dev] How to get rid of empty XML nodes?

2018-01-25 Thread monty
See #removeAllFormattingNodes and its comment in the latest version.

And instances of SAXHandler and subclasses are meant to be created with #on: 
(or another "instance creation" message), _not #new_, otherwise they won't be 
properly initialized. The class comment is clear about this, but I should have 
overridden #new to raise an error like Stream does. Your misuse was helpful in 
bringing this to my attention, and I added a Stream-like #new implementation to 
SAXHandler.

> Sent: Friday, December 08, 2017 at 9:21 AM
> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
> Subject: Re: [Pharo-dev] How to get rid of empty XML nodes?
>
> Hi monty
> 
> 
> On Fri, Dec 8, 2017 at 9:03 AM, monty <mon...@programmer.net> wrote:
> > By "empty XML nodes," do you mean whitespace-only string nodes?
> 
> Yes
> 
> > Those are included because all in-element whitespace is assumed significant 
> > by the spec: https://www.w3.org/TR/xml/#sec-white-space
> 
> I know. There was a discussion a while ago. I just lost a couple of
> hours understanding that :(
> 
> But this is a super super super annoying practices.
> We had to test each nodes to see if it is a empty nodes so it makes
> everything a lot more complex without real justification
> beside the fact that these standardizers probably never implemented
> some real cases.
> This standard is a really out of reality from that perspective.
> 
> > The exception is if the element is declared in the DTD as only having 
> > element children ("element content"): 
> > https://www.w3.org/TR/xml/#dt-elemcontent
> 
> Well the XML files that I had (I did not choose XML because I would
> have prefer JSON :) ), had no DTD :(
> 
> So at the end of the day, this wonderful standard puts all the stress
> and burden to people.
> 
> >
> > For example, if you declare an element like this:
> >
> > 
> >
> > Any whitespace around a "two," "three," or "four" element child of a "one" 
> > element is insignificant and ignored (unless #preservesIgnorableWhitespace: 
> > is true). Other parsers, like LibXML2 and Xerces, behave the same way.
> >
> > I'll see if I can come up with some easier way to deal with this, like an 
> > optional parser setting, new enumeration methods, or maybe a tree 
> > transformation.
> 
> It would be A HUGE PLUS!!
> 
> 
> Because reality is that people have XML files with just nodes and no
> empty nodes and they are forced to
> Let me know because I could try.
> 
> I was showing how to use Pharo to import code to pharo learners and
> this was a big drag.
> 
> Stef
> 
> 
> I tried to set some values in the parser but it did not work.
> BTW I saw that the configuration logic forces to write the following
> 
> | parser doc visitor |
> parser := XMLDOMParser new
>on: self xmlContents;
>preservesIgnorableWhitespace: true.
> 
> and not
> 
> | parser doc visitor |
> parser := XMLDOMParser new
> preservesIgnorableWhitespace: true.
> on: self xmlContents;
> 
> 
> >
> >> Sent: Tuesday, December 05, 2017 at 8:29 AM
> >> From: "Stephane Ducasse" <stepharo.s...@gmail.com>
> >> To: "Pharo Development List" <pharo-dev@lists.pharo.org>
> >> Subject: [Pharo-dev] How to get rid of empty XML nodes?
> >>
> >> )Hi
> >>
> >> we are manipulating an XML document and I would like to get rid of the
> >> spurious empty string.
> >> We saw that the gt panes are doing it.
> >>
> >> (aNodeWithElements isStringNode
> >> and: [aNodeWithElements isEmpty
> >> or: [aNodeWithElements isWhitespace]]
> >>
> >> Is there a way not to produce empty nodes?
> >> Is there a simple way not to have to handle them
> >>
> >> Now each time we are dealing with a node with have to check.
> >>
> >> Stef
> >>
> >>
> >
> 
> 



Re: [Pharo-dev] How to get rid of empty XML nodes?

2017-12-10 Thread Norbert Hartl
Sure it can get quite annoying. It would be good to have a switch to prevent 
the creation of whitespace-only nodes at parse time. 

Norbert
> Am 10.12.2017 um 08:42 schrieb Stephane Ducasse :
> 
> Norbert
> 
> Should I say to the tool generating the XML that it is an idiot? Even
> that I cannot. It is a tool I do not control.
> I have no control about what I get.
> Now why we cannot control that if people add a line return or not does
> not matter?
> Why I cannot be in charge of deciding? I take the risk of the
> interpretation but now
> the "standard" does not help me at all. It just tells me that is good for me.
> 
> I implemented in the past "standards" like XMI to found that there
> were bugs in the spec.
> 
> At then end, each time I visit a node I have to check
> 
> visitNodeWithElements: aNodeWithElements
>   | currentNode |
>   currentNode := OkStubNode new.
>   self cleanNode: aNodeWithElements.
>   aNodeWithElements hasChildren
>ifTrue: [ | tokenNode |
>self cleanNode: aNodeWithElements nodes first.
>tokenNode := self visitElement: aNodeWithElements
> nodes first.
>self assert: tokenNode isToken.
>currentNode addChild: tokenNode.
>aNodeWithElements nodes allButFirst
>do: [ :each | currentNode addChild: (self
> visitNodeWithElements: each) ] ].
>^ currentNode
> 
> And I do not like to modify a structure while I'm visiting it.
> 
> 
> cleanNode: aNodeWithElements
>  aNodeWithElements removeNodes: (aNodeWithElements nodes select:
> [ :e | e isStringNode and: [ e isEmpty or: [ e isWhitespace ] ] ])
> 
> So I understand why people are going away from XML.
> 
> Stef
> 
>> On Fri, Dec 8, 2017 at 4:02 PM, Norbert Hartl  wrote:
>> 
>> 
>>> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse :
>>> 
>>> Hi monty
>>> 
>>> 
 On Fri, Dec 8, 2017 at 9:03 AM, monty  wrote:
 By "empty XML nodes," do you mean whitespace-only string nodes?
>>> 
>>> Yes
>>> 
 Those are included because all in-element whitespace is assumed 
 significant by the spec: https://www.w3.org/TR/xml/#sec-white-space
>>> 
>>> I know. There was a discussion a while ago. I just lost a couple of
>>> hours understanding that :(
>>> 
>>> But this is a super super super annoying practices.
>>> We had to test each nodes to see if it is a empty nodes so it makes
>>> everything a lot more complex without real justification
>>> beside the fact that these standardizers probably never implemented
>>> some real cases.
>>> This standard is a really out of reality from that perspective.
>> 
>> Are you sure you do not oversimplify things? XML would be even more complex 
>> if these cases would be in the standard. It is not easy to decide if a 
>> whitespace is important or not.
>> Where do this whitespaces in your case come from? Most probably because the 
>> XML is pretty printed. That is inserting whitespaces into the serialized 
>> text. So why not just stopping to pretty print and your problem is gone.
>> 
>> Norbert
>>> 
 The exception is if the element is declared in the DTD as only having 
 element children ("element content"): 
 https://www.w3.org/TR/xml/#dt-elemcontent
>>> 
>>> Well the XML files that I had (I did not choose XML because I would
>>> have prefer JSON :) ), had no DTD :(
>>> 
>>> So at the end of the day, this wonderful standard puts all the stress
>>> and burden to people.
>>> 
 
 For example, if you declare an element like this:
 
 
 
 Any whitespace around a "two," "three," or "four" element child of a "one" 
 element is insignificant and ignored (unless 
 #preservesIgnorableWhitespace: is true). Other parsers, like LibXML2 and 
 Xerces, behave the same way.
 
 I'll see if I can come up with some easier way to deal with this, like an 
 optional parser setting, new enumeration methods, or maybe a tree 
 transformation.
>>> 
>>> It would be A HUGE PLUS!!
>>> 
>>> 
>>> Because reality is that people have XML files with just nodes and no
>>> empty nodes and they are forced to
>>> Let me know because I could try.
>>> 
>>> I was showing how to use Pharo to import code to pharo learners and
>>> this was a big drag.
>>> 
>>> Stef
>>> 
>>> 
>>> I tried to set some values in the parser but it did not work.
>>> BTW I saw that the configuration logic forces to write the following
>>> 
>>> | parser doc visitor |
>>> parser := XMLDOMParser new
>>>  on: self xmlContents;
>>>  preservesIgnorableWhitespace: true.
>>> 
>>> and not
>>> 
>>> | parser doc visitor |
>>> parser := XMLDOMParser new
>>>   preservesIgnorableWhitespace: true.
>>>   on: self xmlContents;
>>> 
>>> 
 
> Sent: Tuesday, December 05, 2017 at 8:29 AM
> From: "Stephane Ducasse" 
> To: "Pharo Development List" 

Re: [Pharo-dev] How to get rid of empty XML nodes?

2017-12-09 Thread Stephane Ducasse
Norbert

Should I say to the tool generating the XML that it is an idiot? Even
that I cannot. It is a tool I do not control.
I have no control about what I get.
Now why we cannot control that if people add a line return or not does
not matter?
Why I cannot be in charge of deciding? I take the risk of the
interpretation but now
the "standard" does not help me at all. It just tells me that is good for me.

I implemented in the past "standards" like XMI to found that there
were bugs in the spec.

At then end, each time I visit a node I have to check

visitNodeWithElements: aNodeWithElements
   | currentNode |
   currentNode := OkStubNode new.
   self cleanNode: aNodeWithElements.
   aNodeWithElements hasChildren
ifTrue: [ | tokenNode |
self cleanNode: aNodeWithElements nodes first.
tokenNode := self visitElement: aNodeWithElements
nodes first.
self assert: tokenNode isToken.
currentNode addChild: tokenNode.
aNodeWithElements nodes allButFirst
do: [ :each | currentNode addChild: (self
visitNodeWithElements: each) ] ].
^ currentNode

And I do not like to modify a structure while I'm visiting it.


cleanNode: aNodeWithElements
  aNodeWithElements removeNodes: (aNodeWithElements nodes select:
[ :e | e isStringNode and: [ e isEmpty or: [ e isWhitespace ] ] ])

So I understand why people are going away from XML.

Stef

On Fri, Dec 8, 2017 at 4:02 PM, Norbert Hartl  wrote:
>
>
>> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse :
>>
>> Hi monty
>>
>>
>>> On Fri, Dec 8, 2017 at 9:03 AM, monty  wrote:
>>> By "empty XML nodes," do you mean whitespace-only string nodes?
>>
>> Yes
>>
>>> Those are included because all in-element whitespace is assumed significant 
>>> by the spec: https://www.w3.org/TR/xml/#sec-white-space
>>
>> I know. There was a discussion a while ago. I just lost a couple of
>> hours understanding that :(
>>
>> But this is a super super super annoying practices.
>> We had to test each nodes to see if it is a empty nodes so it makes
>> everything a lot more complex without real justification
>> beside the fact that these standardizers probably never implemented
>> some real cases.
>> This standard is a really out of reality from that perspective.
>
> Are you sure you do not oversimplify things? XML would be even more complex 
> if these cases would be in the standard. It is not easy to decide if a 
> whitespace is important or not.
> Where do this whitespaces in your case come from? Most probably because the 
> XML is pretty printed. That is inserting whitespaces into the serialized 
> text. So why not just stopping to pretty print and your problem is gone.
>
> Norbert
>>
>>> The exception is if the element is declared in the DTD as only having 
>>> element children ("element content"): 
>>> https://www.w3.org/TR/xml/#dt-elemcontent
>>
>> Well the XML files that I had (I did not choose XML because I would
>> have prefer JSON :) ), had no DTD :(
>>
>> So at the end of the day, this wonderful standard puts all the stress
>> and burden to people.
>>
>>>
>>> For example, if you declare an element like this:
>>>
>>> 
>>>
>>> Any whitespace around a "two," "three," or "four" element child of a "one" 
>>> element is insignificant and ignored (unless #preservesIgnorableWhitespace: 
>>> is true). Other parsers, like LibXML2 and Xerces, behave the same way.
>>>
>>> I'll see if I can come up with some easier way to deal with this, like an 
>>> optional parser setting, new enumeration methods, or maybe a tree 
>>> transformation.
>>
>> It would be A HUGE PLUS!!
>>
>>
>> Because reality is that people have XML files with just nodes and no
>> empty nodes and they are forced to
>> Let me know because I could try.
>>
>> I was showing how to use Pharo to import code to pharo learners and
>> this was a big drag.
>>
>> Stef
>>
>>
>> I tried to set some values in the parser but it did not work.
>> BTW I saw that the configuration logic forces to write the following
>>
>> | parser doc visitor |
>> parser := XMLDOMParser new
>>   on: self xmlContents;
>>   preservesIgnorableWhitespace: true.
>>
>> and not
>>
>> | parser doc visitor |
>> parser := XMLDOMParser new
>>preservesIgnorableWhitespace: true.
>>on: self xmlContents;
>>
>>
>>>
 Sent: Tuesday, December 05, 2017 at 8:29 AM
 From: "Stephane Ducasse" 
 To: "Pharo Development List" 
 Subject: [Pharo-dev] How to get rid of empty XML nodes?

 )Hi

 we are manipulating an XML document and I would like to get rid of the
 spurious empty string.
 We saw that the gt panes are doing it.

 (aNodeWithElements isStringNode
 and: [aNodeWithElements isEmpty
 or: [aNodeWithElements isWhitespace]]

 Is there a way not to produce empty nodes?

Re: [Pharo-dev] How to get rid of empty XML nodes?

2017-12-08 Thread Norbert Hartl


> Am 08.12.2017 um 14:21 schrieb Stephane Ducasse :
> 
> Hi monty
> 
> 
>> On Fri, Dec 8, 2017 at 9:03 AM, monty  wrote:
>> By "empty XML nodes," do you mean whitespace-only string nodes?
> 
> Yes
> 
>> Those are included because all in-element whitespace is assumed significant 
>> by the spec: https://www.w3.org/TR/xml/#sec-white-space
> 
> I know. There was a discussion a while ago. I just lost a couple of
> hours understanding that :(
> 
> But this is a super super super annoying practices.
> We had to test each nodes to see if it is a empty nodes so it makes
> everything a lot more complex without real justification
> beside the fact that these standardizers probably never implemented
> some real cases.
> This standard is a really out of reality from that perspective.

Are you sure you do not oversimplify things? XML would be even more complex if 
these cases would be in the standard. It is not easy to decide if a whitespace 
is important or not.
Where do this whitespaces in your case come from? Most probably because the XML 
is pretty printed. That is inserting whitespaces into the serialized text. So 
why not just stopping to pretty print and your problem is gone. 

Norbert
> 
>> The exception is if the element is declared in the DTD as only having 
>> element children ("element content"): 
>> https://www.w3.org/TR/xml/#dt-elemcontent
> 
> Well the XML files that I had (I did not choose XML because I would
> have prefer JSON :) ), had no DTD :(
> 
> So at the end of the day, this wonderful standard puts all the stress
> and burden to people.
> 
>> 
>> For example, if you declare an element like this:
>> 
>> 
>> 
>> Any whitespace around a "two," "three," or "four" element child of a "one" 
>> element is insignificant and ignored (unless #preservesIgnorableWhitespace: 
>> is true). Other parsers, like LibXML2 and Xerces, behave the same way.
>> 
>> I'll see if I can come up with some easier way to deal with this, like an 
>> optional parser setting, new enumeration methods, or maybe a tree 
>> transformation.
> 
> It would be A HUGE PLUS!!
> 
> 
> Because reality is that people have XML files with just nodes and no
> empty nodes and they are forced to
> Let me know because I could try.
> 
> I was showing how to use Pharo to import code to pharo learners and
> this was a big drag.
> 
> Stef
> 
> 
> I tried to set some values in the parser but it did not work.
> BTW I saw that the configuration logic forces to write the following
> 
> | parser doc visitor |
> parser := XMLDOMParser new
>   on: self xmlContents;
>   preservesIgnorableWhitespace: true.
> 
> and not
> 
> | parser doc visitor |
> parser := XMLDOMParser new
>preservesIgnorableWhitespace: true.
>on: self xmlContents;
> 
> 
>> 
>>> Sent: Tuesday, December 05, 2017 at 8:29 AM
>>> From: "Stephane Ducasse" 
>>> To: "Pharo Development List" 
>>> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>>> 
>>> )Hi
>>> 
>>> we are manipulating an XML document and I would like to get rid of the
>>> spurious empty string.
>>> We saw that the gt panes are doing it.
>>> 
>>> (aNodeWithElements isStringNode
>>> and: [aNodeWithElements isEmpty
>>> or: [aNodeWithElements isWhitespace]]
>>> 
>>> Is there a way not to produce empty nodes?
>>> Is there a simple way not to have to handle them
>>> 
>>> Now each time we are dealing with a node with have to check.
>>> 
>>> Stef
>>> 
>>> 
>> 



Re: [Pharo-dev] How to get rid of empty XML nodes?

2017-12-08 Thread Stephane Ducasse
Hi monty


On Fri, Dec 8, 2017 at 9:03 AM, monty  wrote:
> By "empty XML nodes," do you mean whitespace-only string nodes?

Yes

> Those are included because all in-element whitespace is assumed significant 
> by the spec: https://www.w3.org/TR/xml/#sec-white-space

I know. There was a discussion a while ago. I just lost a couple of
hours understanding that :(

But this is a super super super annoying practices.
We had to test each nodes to see if it is a empty nodes so it makes
everything a lot more complex without real justification
beside the fact that these standardizers probably never implemented
some real cases.
This standard is a really out of reality from that perspective.

> The exception is if the element is declared in the DTD as only having element 
> children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent

Well the XML files that I had (I did not choose XML because I would
have prefer JSON :) ), had no DTD :(

So at the end of the day, this wonderful standard puts all the stress
and burden to people.

>
> For example, if you declare an element like this:
>
> 
>
> Any whitespace around a "two," "three," or "four" element child of a "one" 
> element is insignificant and ignored (unless #preservesIgnorableWhitespace: 
> is true). Other parsers, like LibXML2 and Xerces, behave the same way.
>
> I'll see if I can come up with some easier way to deal with this, like an 
> optional parser setting, new enumeration methods, or maybe a tree 
> transformation.

It would be A HUGE PLUS!!


Because reality is that people have XML files with just nodes and no
empty nodes and they are forced to
Let me know because I could try.

I was showing how to use Pharo to import code to pharo learners and
this was a big drag.

Stef


I tried to set some values in the parser but it did not work.
BTW I saw that the configuration logic forces to write the following

| parser doc visitor |
parser := XMLDOMParser new
   on: self xmlContents;
   preservesIgnorableWhitespace: true.

and not

| parser doc visitor |
parser := XMLDOMParser new
preservesIgnorableWhitespace: true.
on: self xmlContents;


>
>> Sent: Tuesday, December 05, 2017 at 8:29 AM
>> From: "Stephane Ducasse" 
>> To: "Pharo Development List" 
>> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>>
>> )Hi
>>
>> we are manipulating an XML document and I would like to get rid of the
>> spurious empty string.
>> We saw that the gt panes are doing it.
>>
>> (aNodeWithElements isStringNode
>> and: [aNodeWithElements isEmpty
>> or: [aNodeWithElements isWhitespace]]
>>
>> Is there a way not to produce empty nodes?
>> Is there a simple way not to have to handle them
>>
>> Now each time we are dealing with a node with have to check.
>>
>> Stef
>>
>>
>



Re: [Pharo-dev] How to get rid of empty XML nodes?

2017-12-08 Thread monty
By "empty XML nodes," do you mean whitespace-only string nodes? Those are 
included because all in-element whitespace is assumed significant by the spec: 
https://www.w3.org/TR/xml/#sec-white-space

The exception is if the element is declared in the DTD as only having element 
children ("element content"): https://www.w3.org/TR/xml/#dt-elemcontent

For example, if you declare an element like this:



Any whitespace around a "two," "three," or "four" element child of a "one" 
element is insignificant and ignored (unless #preservesIgnorableWhitespace: is 
true). Other parsers, like LibXML2 and Xerces, behave the same way.

I'll see if I can come up with some easier way to deal with this, like an 
optional parser setting, new enumeration methods, or maybe a tree 
transformation.

> Sent: Tuesday, December 05, 2017 at 8:29 AM
> From: "Stephane Ducasse" 
> To: "Pharo Development List" 
> Subject: [Pharo-dev] How to get rid of empty XML nodes?
>
> )Hi
> 
> we are manipulating an XML document and I would like to get rid of the
> spurious empty string.
> We saw that the gt panes are doing it.
> 
> (aNodeWithElements isStringNode
> and: [aNodeWithElements isEmpty
> or: [aNodeWithElements isWhitespace]]
> 
> Is there a way not to produce empty nodes?
> Is there a simple way not to have to handle them
> 
> Now each time we are dealing with a node with have to check.
> 
> Stef
> 
> 



Re: [Pharo-dev] How to get rid of empty XML nodes?

2017-12-05 Thread Stephane Ducasse
We tried

| parser doc visitor |
parser := XMLDOMParser new
on: self xmlContents;
preservesIgnorableWhitespace: false.
doc := parser parseDocument.

but we still have the empty nodes around.

Stef


On Tue, Dec 5, 2017 at 2:29 PM, Stephane Ducasse
 wrote:
> )Hi
>
> we are manipulating an XML document and I would like to get rid of the
> spurious empty string.
> We saw that the gt panes are doing it.
>
> (aNodeWithElements isStringNode
> and: [aNodeWithElements isEmpty
> or: [aNodeWithElements isWhitespace]]
>
> Is there a way not to produce empty nodes?
> Is there a simple way not to have to handle them
>
> Now each time we are dealing with a node with have to check.
>
> Stef