On Oct 29, 2012, at 8:34 AM, Mike Abdullah <cocoa...@mikeabdullah.net> wrote:

> 
> On 29 Oct 2012, at 11:44, Vincent Habchi <vi...@macports.org> wrote:
> 
>> Le 29 oct. 2012 à 12:34, Mike Abdullah <cocoa...@mikeabdullah.net> a écrit :
>> 
>>> The code is a fairly inefficient to start with, but no, it's not going to 
>>> leak.
>> 
>> Thanks. I am aware of this, but since this code is going to be part of a 
>> didactic article on writing a WMS client, I emphasize clarity over 
>> performance (this is a secondary aspect).
>> 
>> However, I am interested in knowing how you would write such a translator 
>> yourself to make it more efficient. I had initially the idea of copying 
>> every char until a ‘&’, in which case the following content would be 
>> analyzed and replaced if necessary, and so on until the end of the HTML 
>> string. That would mean one single pass instead of as many as the number of 
>> pairs in the dictionary. 
> 
> Well, you can ask CFXMLCreateStringByUnescapingEntities() to do this on OS X, 
> although if I recall all the CFXML functions have now sadly been deprecated. 
> The source code for it should still be available if you search around.
> 
> But in general, I would just work my way through the string looking for 
> occurrences of '&' and see if that makes up a valid escape sequence. Much of 
> the problem if dealing with HTML rather than XML is that there are a vast 
> range of special sequences. e.g. &micro;
> 


Given that there are also decimal (&#DD;) and hexadecimal escape sequences 
(&#xHHHH;) in HTML, trying to support those through the use of a dictionary of 
sequence -> replacement is going to be impractical.

Scanning through the string to find & and test for valid escape sequences 
(including both the 250 or so named entities plus those numeric escape 
sequences) is the right way to go, since the time spent on the string is 
dependent on the number of escape sequences in the string, not the number of 
possible escape sequences.





Glenn Andreas                      gandr...@gandreas.com 
 <http://www.gandreas.com/> wicked fun!
Mad, Bad, and Dangerous to Know


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to