On Fri, Sep 15, 2023, at 6:17 PM, Niels Dossche wrote:
> On 9/2/23 21:41, Niels Dossche wrote:
>> Hello internals
>> 
>> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization 
>> support".
>> https://wiki.php.net/rfc/domdocument_html5_parser
>> 
>> Kind regards
>> Niels
>
>
> Hi internals
>
> I'd like to announce a change to the RFC. The new RFC version is 0.5.1, 
> the old one was 0.4.0.
> The diff can be viewed via the revision history button on the right.
>
> I had a productive discussion with Tim and Arne about the class hierarchy.
> Here's a summary of the changes and the rationale.
>
> Until now, the RFC specified that DOM\HTML5Document extends DOMDocument.
> However, as we're introducing a new class anyway, we believe we should 
> take the opportunity to improve the API.
> We have the following concerns:
> a) It's a bit of an awkward class hierarchy. *If* we hypothetically 
> would want to get rid of DOMDocument in the far far future, we can't 
> easily do that.
> b) API is messy. Some methods are useless for HTML5Document. E.g.: 
> validate(), loadXML(), loadXMLFile(). They can be a source of confusion.
> c) The fact that you can pass HTML5Document to methods accepting 
> DOMDocument may result in unexpected behaviour when the method expects 
> a particular behaviour. It would be better if developers could "opt-in" 
> to accepting both DOMDocument and HTML5Document in a method using a 
> common base class.
> d) The properties set by DOMDocument's constructor are overridden by 
> load methods, which is surprising. That's even mentioned as the second 
> top comment on https://www.php.net/manual/en/domdocument.loadxml.php. 
> Furthermore, the XML version argument of the constructor is even 
> useless for HTML5 documents.
>
> So we propose the following changes to the RFC.
>
> We'll add a common abstract base class DOM\Document (name taken from 
> the DOM spec & Javascript world).
> DOM\Document contains the properties and abstract methods common to 
> both HTML and XML documents.
> Examples of what it includes/excludes:
> * includes: firstElementChild, lastElementChild, ...
> * excludes: xmlStandalone, xmlVersion, validate(), ...
> Then we'll have two subclasses: DOM\HTMLDocument (previously we called 
> this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the 
> name to be more resilient to version changes and match the DOM spec 
> name.
> DOMDocument will also use DOM\Document as a base class to make it 
> interchangeable with the new classes.
>
> The above would solve points a, b, and c.
> To solve point d, we can use "factory methods":
> This means HTMLDocument's constructor will be made private, and instead 
> we'll have three static methods that create a new instance:
> - HTMLDocument::fromHTMLString(string $xml): HTMLDocument;

That should be string $html, yes?

> - HTMLDocument::fromHTMLFile(string $filename): HTMLDocument;
> - HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"): 
> HTMLDocument;
>
>
> Or to put it in PHP code:
>
> ```
> namespace DOM {
>       // The base abstract document class
>       abstract class Document extends DOM\Node implements DOM\ParentNode {
>               /* all properties and methods that are common and sensible for 
> both 
> XML & HTML documents */
>       }
>       
>       class XMLDocument extends Document {
>               /* insert specific XML methods and properties (e.g. xmlVersion, 
> validate(), ...) here */
>
>               private function __construct() {}
>               
>               public static function fromEmptyDocument(string $version = 
> "1.0", 
> string $encoding = "UTF-8");
>               public static function fromFile(string $path);
>               public static function fromString(string $source);
>       }
>       
>       class HTMLDocument extends Document {
>               /* insert specific Html methods and properties here */
>
>               private function __construct() {}
>               
>               public static function fromEmptyDocument(string $encoding = 
> "UTF-8");
>               public static function fromFile(string $path);
>               public static function fromString(string $source);
>       }
> }
>
> class DOMDocument extends DOM\Document {
>       /* Keep methods, properties, and constructor the same as they are now */
> }
> ```
>
> We're only adding XMLDocument for completeness and API parity. It's a 
> drop-in replacement for DOMDocument, and behaves the exact same.
> The difference is that the API is on par with HTMLDocument, and the 
> construction is designed to be more misuse-resistant.
> DOMDocument will NOT change, and remains for the foreseeable future.
>
> We also have to change the $ownerDocument field in DOMNode to have type 
> ?DOM\Document instead of ?DOMDocument.
> Problem is that this breaks BC (but only a minor break): 
> https://3v4l.org/El7Ve.
> Overriding properties is kind of useless, but if someone does it, then 
> the compiler will complain loudly during compilation and it should be 
> easy to fix.
>
>
> Of course, these changes means that the discussion period will run a 
> bit longer than originally foreseen.
>
> Kind regards
> Niels


This all makes sense to me, and I like this as a way forward.  Nice work!

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to