Probably the documentation should clarify that it only works for the
older specification, then.

Sam

On Thu, Feb 25, 2016 at 1:48 PM, Jay McCarthy <jay.mccar...@gmail.com> wrote:
> The `html` library, however, is specifically for parsing HTML4. HTML5
> is a totally new beast basically unrelated to old HTML. We could
> imaginably have a new html library
>
> Jay
>
> On Thu, Feb 25, 2016 at 1:45 PM, Sam Tobin-Hochstadt
> <sa...@cs.indiana.edu> wrote:
>> Note that HTML4 is quite out of date (from 1999), the most recent HTML
>> standard from the W3C is here: https://www.w3.org/TR/html/ from 2014.
>> However, if you plan to reference the standard to build software, the
>> most useful spec is https://html.spec.whatwg.org/ which is what
>> browsers and other applications follow.
>>
>> Sam
>>
>> On Thu, Feb 25, 2016 at 1:21 PM, Jay McCarthy <jay.mccar...@gmail.com> wrote:
>>> You should double check against the HTML 4.01 spec
>>>
>>> https://www.w3.org/TR/html4/
>>>
>>> Since you mention "in the wild", I think you probably don't want to
>>> use the html library but instead want to use
>>>
>>> http://docs.racket-lang.org/html-parsing/index.html
>>>
>>> Jay
>>>
>>> On Thu, Feb 25, 2016 at 1:13 PM, jon stenerson <jonstener...@comcast.net> 
>>> wrote:
>>>> I find that when I use the html library I have to make a few simple changes
>>>> to html-spec.rkt. It seems that <ins> and <del> are not treated like <b> 
>>>> and
>>>> <i> . You can see in this example that while <b> remains in the enclosing
>>>> <p>, <ins> does not. I also find that I have to allow pcdata as a child of
>>>> <ol> and <ul>. I don't know whether pcdata is "supposed to" appear there 
>>>> but
>>>> it often does in the wild.
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>> #lang racket
>>>>
>>>> (require (prefix-in h: html)  (prefix-in x: xml))
>>>>
>>>> (define (xml->list x)
>>>>   (cond
>>>>         [(x:pcdata? x) (x:pcdata-string x)]
>>>>         [(x:entity? x) (list)]
>>>>         [(x:element? x)
>>>>          (list (x:element-name x)
>>>>                (map xml->list (x:element-content x)))]
>>>>         [(list? x) (map xml->list x)]))
>>>>
>>>> (printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string "<p>Hello
>>>> world <b>Testing</b>!</p>"))))
>>>> (printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string "<p>Hello
>>>> world <ins>Testing</ins>!</p>"))))
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Racket Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to racket-users+unsubscr...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>> --
>>> Jay McCarthy
>>> Associate Professor
>>> PLT @ CS @ UMass Lowell
>>> http://jeapostrophe.github.io
>>>
>>>            "Wherefore, be not weary in well-doing,
>>>       for ye are laying the foundation of a great work.
>>> And out of small things proceedeth that which is great."
>>>                           - D&C 64:33
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "Racket Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to racket-users+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Jay McCarthy
> Associate Professor
> PLT @ CS @ UMass Lowell
> http://jeapostrophe.github.io
>
>            "Wherefore, be not weary in well-doing,
>       for ye are laying the foundation of a great work.
> And out of small things proceedeth that which is great."
>                           - D&C 64:33
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to