Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2014-05-08 Thread Andy Bennett

Hi,


Thanks for your email.

I'm somewhat confused by what you say. Through investigation, 
it seems html-sxml will decode entities, so long as they aren't 
within a HTML element attribute. Could you clarify on whether 
that default applies globally or just to attributes?


Yes, sorry, I misread my own code :)

The default is to _decode_ entities:

#;1 (html-sxml quot;)
(*TOP* \)

And as you say, it currently doesn't just process attributes:

#;2 (html-sxml div data-foo=\quot;\)
(*TOP* (div (@ (data-foo quot;

I'll fix this.



Thanks for this Alex and sorry for taking so long to come back to you.

When Philip first reported this we were running html-parser 0.5.0 on 
CHICKEN 4.7.0. We're currently upgrading to CHICKEN 4.9.0 and we were 
trying the latest html-parser, version 0.5.2. Unfortunately we've had a 
couple of problems: one with empty attributes and another that seems a bit 
more sinister.


html-parser 0.5.0 works on both 4.7.0 and 4.9.0.
html-parsers 0.5.1 and 0.5.2 don't work on either 4.7.0 or 4.9.0 so I've 
isolated the problem to changes introduced in 0.5.1.


Empty attributes now seem to decode to the string ().

During quot; deserialisation when inside an attribute, we seem to get data 
from earlier in the stream introduced:



(define empty div data=\\empty/div)

(define content br\r\nbr\r\ndiv data=\(sxml (@ (attr 
quot;12345quot;)) body)\div body/div)



0.5.0
-

#; (html-sxml empty)
(*TOP* (div (@ (data )) empty))

#; (html-sxml content)
(*TOP* (br) \r\n (br) \r\n (div (@ (data (sxml (@ (attr 
quot;12345quot;)) body))) div body))



0.5.1
-

#; (html-sxml empty)
(*TOP* (div (@ (data ())) empty))

#; (html-sxml content)
(*TOP* (br) \r\n (br) \r\n (div (@ (data (sxml (@ (attr 
\\r\nbr\r\nbr12345\\r\nbr\r\nbr)) body))) div body))





The data in attr seems to be taken from data elsewhere:


#; (html-sxml first\r\nbr\r\nsecond /div data=\(sxml (@ (attr 
quot;12345quot;)) body)\div body/div)
(*TOP* (first \r\n (br) \r\n (second) (div (@ (data (sxml (@ (attr 
\second\r\nbr\r\n12345\second\r\nbr\r\n)) body))) div body)))



Thanks for all your help maintaining this and, once again, sorry it took so 
long for us to put your newer versions into our code.





Regards,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2014-05-08 Thread Alex Shinn
On Fri, May 9, 2014 at 6:44 AM, Andy Bennett andy...@ashurst.eu.org wrote:


 Empty attributes now seem to decode to the string ().


Fixed.

During quot; deserialisation when inside an attribute, we seem to get data
 from earlier in the stream introduced:


I couldn't reproduce this.  Could you check with the latest fix?
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2014-05-08 Thread Andy Bennett

Hi,


Empty attributes now seem to decode to the string ().

Fixed.


Thanks! :-) That works for me now:

-
#;4 (html-sxml empty)
(*TOP* (div (@ (data )) empty))
-


During quot; deserialisation when inside an attribute, we seem 
to get data from earlier in the stream introduced:


I couldn't reproduce this.  Could you check with the latest fix?


Which CHICKEN are you using? I can reproduce it with 0.5.2 on 4.9.0rc1:

-
#;5 (html-sxml content)
(*TOP* (br) \r\n (br) \r\n (div (@ (data (sxml (@ (attr 
\\r\nbr\r\nbr12345\\r\nbr\r\nbr)) body))) div body))

-

...but not with 0.5.2 on 4.8.0.4.

-
#;4 (html-sxml content)
(*TOP* (br) \r\n (br) \r\n (div (@ (data (sxml (@ (attr 
quot;12345quot;)) body))) div body))

-

With 0.5.3 on 4.9.0rc1 it seems to work:

-
#;5 (html-sxml content)
(*TOP* (br) \r\n (br) \r\n (div (@ (data (sxml (@ (attr \12345\)) 
body))) div body))

-


...but perhaps it's worth chasing this down a bit further?



Thanks for all your help with this. :-)



Regards,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2014-05-08 Thread Alex Shinn
On Fri, May 9, 2014 at 8:26 AM, Andy Bennett andy...@ashurst.eu.org wrote:


 Which CHICKEN are you using? I can reproduce it with 0.5.2 on 4.9.0rc1:


Nevermind, I had only checked 0.5.3.  I can see it in 0.5.2.
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-11-23 Thread Alex Shinn
On Sat, Nov 23, 2013 at 11:19 AM, Jim Ursetto zbignie...@gmail.com wrote:

 Alex,

 Looks like there's a regression of sorts in html-parser 0.5.1.

 0.5.0

 #; (html-sxml foo bar/foo)
 (*TOP* (foo (@ (bar

 0.5.1

 #; (html-sxml foo bar/foo)
 Error: (cadr) bad argument type: ()


Oops, fixed.

Arguably, empty attributes should result in a value of  as per
 http://dev.w3.org/html5/markup/syntax.html#syntax-attr-empty ; for
 example,

 #; (html-sxml foo bar/foo)
 (*TOP* (foo (@ (bar 

 although I'd also be satisfied with a return to the status quo ante, in
 which a null cdr signifies empty.


Given that I can see pros and cons to both approaches,
I'm inclined to leave as-is for now.

-- 
Alex


 Jim

 On Sep 8, 2013, at 7:30 AM, Alex Shinn alexsh...@gmail.com wrote:

 On Thu, Sep 5, 2013 at 12:39 AM, Philip Kent phi...@knodium.com wrote:

  Hi Alex,

 Excellent! Thanks for looking into it and for the tip re custom parsers -
 I was trying to understand that code!


 It should work now, let me know if you have any problems.

 --
 Alex

 ___
 Chicken-users mailing list
 Chicken-users@nongnu.org
 https://lists.nongnu.org/mailman/listinfo/chicken-users



___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-11-22 Thread Jim Ursetto
Alex,

Looks like there's a regression of sorts in html-parser 0.5.1.

0.5.0

#; (html-sxml foo bar/foo)
(*TOP* (foo (@ (bar

0.5.1

#; (html-sxml foo bar/foo)
Error: (cadr) bad argument type: ()

Arguably, empty attributes should result in a value of  as per 
http://dev.w3.org/html5/markup/syntax.html#syntax-attr-empty ; for example,

#; (html-sxml foo bar/foo)
(*TOP* (foo (@ (bar 

although I'd also be satisfied with a return to the status quo ante, in which a 
null cdr signifies empty.

Jim

On Sep 8, 2013, at 7:30 AM, Alex Shinn alexsh...@gmail.com wrote:

 On Thu, Sep 5, 2013 at 12:39 AM, Philip Kent phi...@knodium.com wrote:
 Hi Alex,
 
 Excellent! Thanks for looking into it and for the tip re custom parsers - I 
 was trying to understand that code!
 
 It should work now, let me know if you have any problems.
 
 -- 
 Alex
 
 ___
 Chicken-users mailing list
 Chicken-users@nongnu.org
 https://lists.nongnu.org/mailman/listinfo/chicken-users

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-09-14 Thread Philip Kent
Hi Alex,

Thank you for fixing this. Unfortunately I am not able to test this right now, 
but Andy (andyjpb) should be able to, I'll see what he says.

Thanks,
Philip

From: Alex Shinn alexsh...@gmail.commailto:alexsh...@gmail.com
Date: Sunday, 8 September 2013 13:30
To: Philip Kent phi...@knodium.commailto:phi...@knodium.com
Cc: chicken-users@nongnu.orgmailto:chicken-users@nongnu.org 
chicken-users@nongnu.orgmailto:chicken-users@nongnu.org
Subject: Re: [Chicken-users] html-sxml (html-parser egg) does not decode 
entities in html attributes, ideas why?

On Thu, Sep 5, 2013 at 12:39 AM, Philip Kent 
phi...@knodium.commailto:phi...@knodium.com wrote:
Hi Alex,

Excellent! Thanks for looking into it and for the tip re custom parsers - I was 
trying to understand that code!

It should work now, let me know if you have any problems.

--
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-09-08 Thread Alex Shinn
On Thu, Sep 5, 2013 at 12:39 AM, Philip Kent phi...@knodium.com wrote:

  Hi Alex,

 Excellent! Thanks for looking into it and for the tip re custom parsers -
 I was trying to understand that code!


It should work now, let me know if you have any problems.

-- 
Alex
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-09-04 Thread Philip Kent
Hi Alex,

Thanks for your email.

I'm somewhat confused by what you say. Through investigation, it seems 
html-sxml will decode entities, so long as they aren't within a HTML element 
attribute. Could you clarify on whether that default applies globally or just 
to attributes?

Thanks,
Philip


From: Alex Shinn alexsh...@gmail.com
Sent: 04 September 2013 03:51
To: Philip Kent
Cc: chicken-users@nongnu.org
Subject: Re: [Chicken-users] html-sxml (html-parser egg) does not decode 
entities in html attributes, ideas why?

On Tue, Sep 3, 2013 at 11:19 PM, Philip Kent 
phi...@knodium.commailto:phi...@knodium.com wrote:
Hi all,

I noticed an issue today with the html-parser egg, where it does not seem to 
decode entities within an attribute of an element, I have included an example 
below.

#;14 (html-sxml div data-foo=\quot;\)
(*TOP* (div (@ (data-foo quot;

Expected: (*TOP* (div (@ (data-foo \

I was wondering if anyone could provide some thoughts as to why this might be 
happening? I have taken a look at the html-parser egg but have not seen much 
(but then this goes far beyond my knowledge of scheme!)

html-parser processes entities, but the default for html-sxml
is just to leave the encoded as-is.  I'm not sure if that's the best
default, but will at least provide a convenient option to get
the decoded strings.

--
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-09-04 Thread Philip Kent
Hi Alex,

Excellent! Thanks for looking into it and for the tip re custom parsers - I was 
trying to understand that code!

Philip


From: Alex Shinn alexsh...@gmail.com
Sent: 04 September 2013 14:00
To: Philip Kent
Cc: chicken-users@nongnu.org
Subject: Re: [Chicken-users] html-sxml (html-parser egg) does not decode 
entities in html attributes, ideas why?

On Wed, Sep 4, 2013 at 8:23 PM, Philip Kent 
phi...@knodium.commailto:phi...@knodium.com wrote:
Hi Alex,

Thanks for your email.

I'm somewhat confused by what you say. Through investigation, it seems 
html-sxml will decode entities, so long as they aren't within a HTML element 
attribute. Could you clarify on whether that default applies globally or just 
to attributes?

Yes, sorry, I misread my own code :)

The default is to _decode_ entities:

#;1 (html-sxml quot;)
(*TOP* \)

And as you say, it currently doesn't just process attributes:

#;2 (html-sxml div data-foo=\quot;\)
(*TOP* (div (@ (data-foo quot;

I'll fix this.

What I was referring to before is that you can customize
what is done with entities with

 (make-html-parser 'entity: (lambda (name) ...))

and can customize non-default entity names:

 (make-html-parser 'entities: '((quot . \) ...))

but again, these are currently ignored in attributes.

--
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-09-03 Thread Philip Kent
Hi all,

I noticed an issue today with the html-parser egg, where it does not seem to 
decode entities within an attribute of an element, I have included an example 
below.

#;14 (html-sxml div data-foo=\quot;\)
(*TOP* (div (@ (data-foo quot;

Expected: (*TOP* (div (@ (data-foo \

I was wondering if anyone could provide some thoughts as to why this might be 
happening? I have taken a look at the html-parser egg but have not seen much 
(but then this goes far beyond my knowledge of scheme!)

DerGuteMoritz mentioned on IRC that htmlprag behaves the same way.

Any help you can give would be appreciated greatly!

Thanks,
Philip Kent
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] html-sxml (html-parser egg) does not decode entities in html attributes, ideas why?

2013-09-03 Thread Matt Gushee
On Tue, Sep 3, 2013 at 8:51 PM, Alex Shinn alexsh...@gmail.com wrote:

 html-parser processes entities, but the default for html-sxml
 is just to leave the encoded as-is.  I'm not sure if that's the best
 default,

I'm not going to suggest that this is a major problem, especially
since you are not claiming html-parser conforms to any particular
standard, and the docs clearly indicate its pragmatic focus. But just
for the record, if you wanted to be an XML-1.1-conformant processor,
you would have to normalize attribute values, which includes
dereferencing character entities:

http://www.w3.org/TR/xml11/#AVNormalize

As for the non-XML varieties of HTML, well ... life is too short to go
digging into all that hoary SGML stuff. Did that once upon a time ...
but I was younger then, and thought markup languages were the greatest
thing since sliced bread ;-)

--
Matt Gushee

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users