[Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-09 Thread jaayer


 Forwarded message 
>From : jaayer
To : "Jan van de Sandt" 
Date : Mon, 09 Aug 2010 14:55:29 -0700
Subject : Re: Squeaksource XML Parser - Enhanced support for CDATA Sections
 Forwarded message 



 On Mon, 09 Aug 2010 04:44:07 -0700 Jan van de Sandt wrote  

>Hello,
>
>Ok, I understand the problem :-) thanks for explaining it.
>
>
>Here is a new version. In this version the XMLDOMParser has an extra property 
>preserveCDATASections so the behaviour is now configurable. For now the 
>default value of this property is true.

Thank you for the patch. I have merged it in, but with the following 
modifications:
1) Preservation of CDATA sections is disabled by default, as most of the time 
you don't really care whether parsed character data originally contained &, < 
or other pseudoentities to escape special characters or if those special 
characters were guarded within a CDATA section.
2) #addCDATASection: has not been added. I really don't want a proliferation of 
#add* methods in XMLElement or XMLNodeWithElements for every node type. 
#addContent: is special, as it accepts a node or a string, and #addElement; was 
once needed for the special handling element nodes required but is no longer 
needed. #addNode: should be preferred.
3) The messages added to XMLDOMParser were renamed to #isInCDataSection, 
#preservesCDataSections, and #preservesCDataSections: to make it clearer that 
they take or return boolean values.

Here is an example that demonstrates parsing with CDATA section preservation:
doc := (XMLDOMParser on: '')
 preservesCDataSections: true;
 parseDocument.
doc root firstNode
When evaluated with cmd-p, it produces:


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread Stéphane Ducasse
Just an api point

why preservesC...
and not
preserverC

I'm always confuse with the infinitive and third person singular situation.

Stef

> Here is an example that demonstrates parsing with CDATA section preservation:
> doc := (XMLDOMParser on: '')
> preservesCDataSections: true;
> parseDocument.
> doc root firstNode
> When evaluated with cmd-p, it produces:
> 
> 
> ___
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread Norbert Hartl

On 10.08.2010, at 10:45, Stéphane Ducasse wrote:

> Just an api point
> 
> why preservesC...
> and not
>   preserverC
> 
> I'm always confuse with the infinitive and third person singular situation.
> 
You mean preserveC...? The preserve(r)C.. was a typo, right? I think 
preservesC... qualifies for a testing selector. 

But as I wrote in my description of the problem I would call it 

coalesceCDATASections: aBoolean

or

enableCoalescing
disableCoalescing

The functionality that is described here is better known as coalescing. And it 
describes better what is going. If a parser is coalescing two things will 
happen. CDATA sections will be read in as text nodes and then subsequent text 
nodes are coalesing into a single text node.

my 2 cents,

Norbert


> Stef
> 
>> Here is an example that demonstrates parsing with CDATA section preservation:
>> doc := (XMLDOMParser on: '')
>> preservesCDataSections: true;
>> parseDocument.
>> doc root firstNode
>> When evaluated with cmd-p, it produces:
>> 
>> 
>> ___
>> Pharo-project mailing list
>> Pharo-project@lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
> 
> 
> ___
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread Stéphane Ducasse
> 
> coalesceCDATASections: aBoolean
> 
> or
> 
> enableCoalescing
> disableCoalescing

most of the time you need the 3 because the first one let you easily build 
scripts

for me 

> coalesceCDATASections: aBoolean

is a setter not a testing selector
___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread jaayer


 On Tue, 10 Aug 2010 01:45:42 -0700 Stéphane Ducasse  wrote  

>Just an api point 
> 
>why preservesC... 
>and not 
>preserverC 
> 
>I'm always confuse with the infinitive and third person singular situation. 
> 
>Stef 

The infinitive in English is two words with possibly other words separating 
them, the word "to" and then the verb lacking any "s" or "es" or other tense, 
number or person modifiers: "to program" or "to code."

I choose preservesCDataSections because it is more obvious that it returns a 
boolean and that the corresponding preservesCDataSections: accepts a boolean. 
If you call the testing message preserveCDataSections, it sounds more like you 
are commanding the receiver to do so rather than asking if it already does..

Also, my mail client ate the example code, so here it is again:
doc :=
(XMLDOMParser on: '')
preservesCDataSections: true;
parseDocument.
doc root firstNode
produces:
 


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread Stéphane Ducasse
>> 
> 
> The infinitive in English is two words with possibly other words separating 
> them, the word "to" and then the verb lacking any "s" or "es" or other tense, 
> number or person modifiers: "to program" or "to code."

Thanks. I know the difference :) I meant in method selectors include: vs 
includes: 

> I choose preservesCDataSections because it is more obvious that it returns a 
> boolean and that the corresponding preservesCDataSections: accepts a boolean.

> If you call the testing message preserveCDataSections,

No I would write is 

isPreservingCDataSections
doesPreserveCDataSections

for me 
preservesCDataSection: 
should better be written as 
preserveCDataSections:

Because I do not have to think if I should put an S or not.


> it sounds more like you are commanding the receiver to do so rather than 
> asking if it already does..
> 
> Also, my mail client ate the example code, so here it is again:
> doc :=
>   (XMLDOMParser on: '')
>   preservesCDataSections: true;


Yes but it looks like an order too and I do not understand the difference 
between 
using 
preserveSCD
and 
parseDocument (with no S after parseDocument)

>   parseDocument.
> doc root firstNode
> produces:
> 

I follow Beck and Smalltalk with Style (see my web page) convention. 

> 
> 
> ___
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread jaayer


 On Tue, 10 Aug 2010 12:17:36 -0700 Stéphane Ducasse  wrote  

>>> 
>> 
>> The infinitive in English is two words with possibly other words separating 
>> them, the word "to" and then the verb lacking any "s" or "es" or other 
>> tense, number or person modifiers: "to program" or "to code." 
> 
>Thanks. I know the difference :) I meant in method selectors include: vs 
>includes: 

I think #includes*, #has* and similar messages use the third person singular so 
that when you use them in an expression like this:
aCollection includes: anObject
What you are really doing is affirming something (inclusion of an object) about 
some subject, (a collection). A sentence with a subject and a predicate that 
affirms or denies something about that subject is a proposition, and 
propositions in two-valued logic are either true or false (just as the 
smalltalk expression above would be when evaluated).

> 
>isPreservingCDataSections 
>doesPreserveCDataSections 

The first form is already in use elsewhere in the API and has some advantages 
over the third person singular form (the "is" prefix). However, it can also 
imply an unnecessary temporal restriction to the present. For example, compare 
#resolvesExternalEntities with #isResolvingExternalEntities. The second 
selector could just mean--if true--that the parser supports resolution of 
external entities, but it could also mean that the parser is right now, at this 
very moment, resolving external entities. While the "does" form does not suffer 
from these ambiguities, it is also the longest and ugliest of bunch.

> 
>Yes but it looks like an order too and I do not understand the difference 
>between 
>using 
>preserveSCD 
>and 
>parseDocument (with no S after parseDocument) 

Imperative forms of a verb in English never have an "s" or "es" at the end of 
them. That means #preservesCDataSections (or #includes: or any other similar 
message) could never be taken to be an imperative command given to the receiver 
and instead must form, with the receiver and any arguments, some type of 
propositional sentence that is either true or false.

>I follow Beck and Smalltalk with Style (see my web page) convention. 

I have read Kent's Smalltalk Best Practice Patterns, though not the other one. 
I will check it out, and I appreciate your feedback, Stéphane.


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread jaayer


 On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote  

> 
>But as I wrote in my description of the problem I would call it 
> 
>coalesceCDATASections: aBoolean 
> 
>or 
> 
>enableCoalescing 
>disableCoalescing 

The downside of enable/disable pairs is the need for three message (two to 
modify, one to test and lazily initialize) rather than two.
 
>The functionality that is described here is better known as coalescing. And it 
>describes better what is going. If a parser is coalescing two things will 
>happen. CDATA sections will be read in as text nodes and then subsequent text 
>nodes are coalesing into a single text node. 
> 
>my 2 cents, 
> 
>Norbert 

I think "preserve" is better, if only because "coalesceCData" just implies  a 
joining together of CDATA sections and says nothing about their status in the 
DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-10 Thread jaayer


 On Tue, 10 Aug 2010 13:41:09 -0700 jaayer  wrote  

>The first form is already in use elsewhere in the API and has some advantages 
>over the third person singular form (the "is" prefix).

I meant to say here that the "is" prefix is an advantage over the third person 
singular form (because people recognize it right away as indicating a boolean 
return value) and not that it is the third person singular (it isn't).


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-11 Thread Norbert Hartl

On 11.08.2010, at 00:56, jaayer wrote:

> 
> 
>  On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote  
> 
>> 
>> But as I wrote in my description of the problem I would call it 
>> 
>> coalesceCDATASections: aBoolean 
>> 
>> or 
>> 
>> enableCoalescing 
>> disableCoalescing 
> 
> The downside of enable/disable pairs is the need for three message (two to 
> modify, one to test and lazily initialize) rather than two.
> 
>> The functionality that is described here is better known as coalescing. And 
>> it describes better what is going. If a parser is coalescing two things will 
>> happen. CDATA sections will be read in as text nodes and then subsequent 
>> text nodes are coalesing into a single text node. 
>> 
>> my 2 cents, 
>> 
>> Norbert 
> 
> I think "preserve" is better, if only because "coalesceCData" just implies  a 
> joining together of CDATA sections and says nothing about their status in the 
> DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.
> 
I think it is hard to find a word that describes completely what ist going on. 
And I think that common sense/common usage is also kind of an argument. I 
didn't start to think of myself what would be the best describing word (quite 
hard as non-native speaker). If you search the net then you might see (as I 
did) that it is quite common that this effect is described as coalescing. That 
was my only reason to speak up because I think its recognition is better this 
way.
If you know about coalescing than the state in DOM tree is pretty obvious. The 
nodes can coalesce only if they are of the same kind. While a cdata _is_ a text 
node all cdata nodes are converted to simple text nodes and then all of the 
text nodes coalesce into one. The state in the DOM is always that there is a 
single text node after coalescing.

Norbert

___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-11 Thread Stéphane Ducasse
:funnily enough for me coalesce is far more obscure than preserve.

Now my point was not for this specific message but I would like to get some 
guidelines to specify consistent API.
And I'm always thorn apart when writing code if I should use s or not.

Stef

On Aug 11, 2010, at 10:07 AM, Norbert Hartl wrote:

> 
> On 11.08.2010, at 00:56, jaayer wrote:
> 
>> 
>> 
>>  On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote  
>> 
>>> 
>>> But as I wrote in my description of the problem I would call it 
>>> 
>>> coalesceCDATASections: aBoolean 
>>> 
>>> or 
>>> 
>>> enableCoalescing 
>>> disableCoalescing 
>> 
>> The downside of enable/disable pairs is the need for three message (two to 
>> modify, one to test and lazily initialize) rather than two.
>> 
>>> The functionality that is described here is better known as coalescing. And 
>>> it describes better what is going. If a parser is coalescing two things 
>>> will happen. CDATA sections will be read in as text nodes and then 
>>> subsequent text nodes are coalesing into a single text node. 
>>> 
>>> my 2 cents, 
>>> 
>>> Norbert 
>> 
>> I think "preserve" is better, if only because "coalesceCData" just implies  
>> a joining together of CDATA sections and says nothing about their status in 
>> the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.
>> 
> I think it is hard to find a word that describes completely what ist going 
> on. And I think that common sense/common usage is also kind of an argument. I 
> didn't start to think of myself what would be the best describing word (quite 
> hard as non-native speaker). If you search the net then you might see (as I 
> did) that it is quite common that this effect is described as coalescing. 
> That was my only reason to speak up because I think its recognition is better 
> this way.
> If you know about coalescing than the state in DOM tree is pretty obvious. 
> The nodes can coalesce only if they are of the same kind. While a cdata _is_ 
> a text node all cdata nodes are converted to simple text nodes and then all 
> of the text nodes coalesce into one. The state in the DOM is always that 
> there is a single text node after coalescing.
> 
> Norbert
> 
> ___
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-11 Thread Norbert Hartl

On 11.08.2010, at 10:31, Stéphane Ducasse wrote:

> :funnily enough for me coalesce is far more obscure than preserve.
> 
Might be the french in you :)

> Now my point was not for this specific message but I would like to get some 
> guidelines to specify consistent API.
> And I'm always thorn apart when writing code if I should use s or not.
> 
I think we all want to figure out how it is done best in general. Discussing 
about a single selector in one piece of code would hardly justify a longer 
thread :)

To me this is really important. I read the Becks but can't even remember which 
of those. Because it's none of these things you read and you can remember 
afterwards. Well, at least in my cast this doesn't work and far too lazy to 
read it over and over. 

Norbert

> 
> On Aug 11, 2010, at 10:07 AM, Norbert Hartl wrote:
> 
>> 
>> On 11.08.2010, at 00:56, jaayer wrote:
>> 
>>> 
>>> 
>>>  On Tue, 10 Aug 2010 02:18:19 -0700 Norbert Hartl  wrote  
>>> 
 
 But as I wrote in my description of the problem I would call it 
 
 coalesceCDATASections: aBoolean 
 
 or 
 
 enableCoalescing 
 disableCoalescing 
>>> 
>>> The downside of enable/disable pairs is the need for three message (two to 
>>> modify, one to test and lazily initialize) rather than two.
>>> 
 The functionality that is described here is better known as coalescing. 
 And it describes better what is going. If a parser is coalescing two 
 things will happen. CDATA sections will be read in as text nodes and then 
 subsequent text nodes are coalesing into a single text node. 
 
 my 2 cents, 
 
 Norbert 
>>> 
>>> I think "preserve" is better, if only because "coalesceCData" just implies  
>>> a joining together of CDATA sections and says nothing about their status in 
>>> the DOM tree as XMLString or XMLCData nodes. Although I am not wed to it.
>>> 
>> I think it is hard to find a word that describes completely what ist going 
>> on. And I think that common sense/common usage is also kind of an argument. 
>> I didn't start to think of myself what would be the best describing word 
>> (quite hard as non-native speaker). If you search the net then you might see 
>> (as I did) that it is quite common that this effect is described as 
>> coalescing. That was my only reason to speak up because I think its 
>> recognition is better this way.
>> If you know about coalescing than the state in DOM tree is pretty obvious. 
>> The nodes can coalesce only if they are of the same kind. While a cdata _is_ 
>> a text node all cdata nodes are converted to simple text nodes and then all 
>> of the text nodes coalesce into one. The state in the DOM is always that 
>> there is a single text node after coalescing.
>> 
>> Norbert
>> 
>> ___
>> Pharo-project mailing list
>> Pharo-project@lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
> 
> 
> ___
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Fwd: Re: Squeaksource XML Parser - Enhanced support for CDATA Sections

2010-08-11 Thread Stéphane Ducasse
> :funnily enough for me coalesce is far more obscure than preserve.
>> 
> Might be the french in you :)

Probably :)
We can't mess up with our roots :)

> 
>> Now my point was not for this specific message but I would like to get some 
>> guidelines to specify consistent API.
>> And I'm always thorn apart when writing code if I should use s or not.
>> 
> I think we all want to figure out how it is done best in general. Discussing 
> about a single selector in one piece of code would hardly justify a longer 
> thread :)
> 
> To me this is really important. I read the Becks but can't even remember 
> which of those. Because it's none of these things you read and you can 
> remember afterwards. Well, at least in my cast this doesn't work and far too 
> lazy to read it over and over. 

It is 4 pages so this is worth the effort. I will reread them and smalltalk 
with style. and probably Smalltalk by example.

Stef


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project