Re: The HTML module design

John-Mark Bell Wed, 12 Aug 2009 07:12:09 -0700

On Mon, 2009-08-10 at 11:33 +0800, Bo Yang wrote:
> On Mon, Aug 10, 2009 at 4:06 AM, John-Mark Bell<[email protected]> 
> wrote:
> > On Sun, 2009-08-09 at 22:02 +0800, Bo Yang wrote:
> >> 1. Change the parser wrapper structure now. In HTMLDocument, there are
> >> methods like HTMLDocument.open, HTMLDocument.write, which are used for
> >> writing some string to DOM. This require the HTMLDocument must know
> >> its parser. But our in the structure of libDOM, the parser is created
> >> before the Document and it is the parser creating the document, I
> >> think this should be changed.
> >>
> >>     The HTMLDocument will get created firstly and it has a parser
> >> within it. It is the client of HTMLDocument who is responsible for
> >> passing corresponding parser to HTMLDocument. This mean, we will have
> >> a function like:
> >>
> >> dom_html_document_create(dom_alloc, void *, lwc_context *, parser *, ....);
> >>
> >> In the future, the Netsurf will create a Parser according the HTTP
> >> response header's content type (text/html to create a hubbub parser,
> >> and text/xml to create a libxml parser), and pass it to the
> >> HTMLDocument to create a one instance of it. And then, the loading and
> >> parsing starts.
> >
> > Please flesh this proposal out more, with specific APIs etc. Then we'll
> > have more idea if it's sane.
> 
> The main API:
> 
> 1. Change the parser wrappers' API. Now, the API are :
> parser_create
> parser_destroy
> parser_parse_chunk
> parser_complete
> parser_get_document
> 
> It should be changed to:
> parser_create(const char *aliases,  const char *enc, bool fix_enc,
> dom_alloc alloc, void *pw, dom_msg msg, void *mctx, struct
> lwc_context_s *ctx, struct dom_document *doc);
> 
> parser_destroy
> parser_parse_chunk
> parser_complete
> 
> 
> That is that removing the parser_get_document, and add a new parameter
> to parser_create to pass the document in.


That sounds OK, _providing_ that the correct kind of Document is
created. Now, we can either have it so that the Document object supports
all type-specific methods or require that the correct kind of Document
is created.

Consider the case of mixed DOMs (e.g. XHTML+SVG), where the Document
object does not just have methods for one particular vocabulary.

> 2. The dom_html_document is like:
> 
> struct dom_html_document {
>     struct dom_document base;
>     dom_hubbub_parser *hp;
>     dom_xml_parser  *xp;
>     ....
> };

Why does the document need a handle for the parser? You can stick the
parser pointers into a union as there'll only ever be one of them.

> typedef enum {
>     DOM_HTML,
>     DOM_XML
> } parser_type;
> 
> And we provide corresponding API to dom_html_document.
> 
> /* Create the HTMLDocument */
> dom_exception dom_html_document_create(dom_alloc, void *, dom_msg,
> void *, lwc_context *, parser_type, ui_handler);
> 
> /* Parse data chunk */
> dom_exception dom_html_document_write_data(uint8_t *, size_t);
> 
> /* Tell the document is complete */
> dom_exception dom_html_document_data_complete(void);

This should be ok.

> 3. Bootstrap considertaion.
> The Core spec 
> http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Level-2-Core-DOM-createDocument
> said, specific Document such HTMLDocument can be created using API
> createHTMLDocument, but I found no such API definition in HTML level 2
> at all. I think we can just ignore it.

The reason for this is so that the correct type of Document object is
created. What the specification calls createHTMLDocument, you've called
dom_html_document_create.


J.

Re: The HTML module design

Reply via email to