RE : RE : Re[2]: [fpc-pascal] fpWeb and html and uriescaping/unescapingelements

2011-05-25 Thread Ludo Brands
 So you suggest to place the amp; translation last.

That would be a solution for this problem. My previous comments on the
minimalistic approach of the implementation suggest a different approach:
scan the source string once and replace html entities as you find them (with
a look up table for example). This would scale much better when implementing
iso-8859-1 or '#entity-number;' unescaping .
 
Ludo

-Message d'origine-
De : fpc-pascal-boun...@lists.freepascal.org
[mailto:fpc-pascal-boun...@lists.freepascal.org] De la part de ik
Envoyé : mercredi 25 mai 2011 09:22
À : FPC-Pascal users discussions
Objet : Re: RE : Re[2]: [fpc-pascal] fpWeb and html and
uriescaping/unescapingelements


On Wed, May 25, 2011 at 10:09, Ludo Brands ludo.bra...@free.fr wrote:


You should not unescape recursively.
Input to EscapeHTML: 'lt;' Output: 'amp;lt;' : Correct
UnescapeHTML: input  'amp;lt;' Output ''   Wrong.
This is because you replace 'amp;' with '' resulting in 'lt;' which is
translated to ' ' in the next line. 


So you suggest to place the amp; translation last.
 

 
Ludo
 
 

-Message d'origine-
De : fpc-pascal-boun...@lists.freepascal.org
[mailto:fpc-pascal-boun...@lists.freepascal.org] De la part de ik
Envoyé : mercredi 25 mai 2011 08:34
À : FPC-Pascal users discussions
Objet : Re: Re[2]: [fpc-pascal] fpWeb and html and uri
escaping/unescapingelements





On Tue, May 24, 2011 at 12:21, José Mejuto joshy...@gmail.com wrote:


Hello FPC-Pascal,

Tuesday, May 24, 2011, 10:09:03 AM, you wrote:

i I've created a patch with the Escape and unEscape functions, and place it
i here: http://bugs.freepascal.org/view.php?id=19407

Un/escapeHTML parsing must be in one go, specially the amp one. Test
against:

amp;lt;



I'm not sure what you mean here.

If you have already html entities you should not escape them. If you do not
have html entities you should escape them.
The Escaping and unescaping works well, I already tested them before I sent
them.

 


Right unescape lt;, your code .

--
Best regards,
 José



Ido
 



___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal




___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal



___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: RE : RE : Re[2]: [fpc-pascal] fpWeb and html and uriescaping/unescapingelements

2011-05-25 Thread ik
On Wed, May 25, 2011 at 14:04, Ludo Brands ludo.bra...@free.fr wrote:

   So you suggest to place the amp; translation last.
 That would be a solution for this problem. My previous comments on the
 minimalistic approach of the implementation suggest a different approach:
 scan the source string once and replace html entities as you find them (with
 a look up table for example). This would scale much better when implementing
 iso-8859-1 or '#entity-number;' unescaping .


For every html entity you need a dictionary, and that's makes your program
with more fat.
converting number entity can be done, but you need to know what is the code
page you wish to convert from/to.



 Ludo


Ido


   -Message d'origine-
 *De :* fpc-pascal-boun...@lists.freepascal.org [mailto:
 fpc-pascal-boun...@lists.freepascal.org] *De la part de* ik
 *Envoyé :* mercredi 25 mai 2011 09:22

 *À :* FPC-Pascal users discussions
 *Objet :* Re: RE : Re[2]: [fpc-pascal] fpWeb and html and
 uriescaping/unescapingelements

 On Wed, May 25, 2011 at 10:09, Ludo Brands ludo.bra...@free.fr wrote:

  You should not unescape recursively.
 Input to EscapeHTML: 'lt;' Output: 'amp;lt;' : Correct
 UnescapeHTML: input  'amp;lt;' Output ''   Wrong.
 This is because you replace 'amp;' with '' resulting in 'lt;' which is
 translated to ' ' in the next line.


 So you suggest to place the amp; translation last.



 Ludo



  -Message d'origine-
 *De :* fpc-pascal-boun...@lists.freepascal.org [mailto:
 fpc-pascal-boun...@lists.freepascal.org] *De la part de* ik
 *Envoyé :* mercredi 25 mai 2011 08:34
 *À :* FPC-Pascal users discussions
 *Objet :* Re: Re[2]: [fpc-pascal] fpWeb and html and uri
 escaping/unescapingelements




 On Tue, May 24, 2011 at 12:21, José Mejuto joshy...@gmail.com wrote:

 Hello FPC-Pascal,

 Tuesday, May 24, 2011, 10:09:03 AM, you wrote:

 i I've created a patch with the Escape and unEscape functions, and place
 it
 i here: http://bugs.freepascal.org/view.php?id=19407

 Un/escapeHTML parsing must be in one go, specially the amp one. Test
 against:

 amp;lt;


 I'm not sure what you mean here.

 If you have already html entities you should not escape them. If you do
 not have html entities you should escape them.
 The Escaping and unescaping works well, I already tested them before I
 sent them.




 Right unescape lt;, your code .

 --
 Best regards,
  José


 Ido



 ___
 fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-pascal



 ___
 fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-pascal



 ___
 fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal