On Fri, Sep 15, 2023, at 6:17 PM, Niels Dossche wrote: > On 9/2/23 21:41, Niels Dossche wrote: >> Hello internals >> >> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization >> support". >> https://wiki.php.net/rfc/domdocument_html5_parser >> >> Kind regards >> Niels > > > Hi internals > > I'd like to announce a change to the RFC. The new RFC version is 0.5.1, > the old one was 0.4.0. > The diff can be viewed via the revision history button on the right. > > I had a productive discussion with Tim and Arne about the class hierarchy. > Here's a summary of the changes and the rationale. > > Until now, the RFC specified that DOM\HTML5Document extends DOMDocument. > However, as we're introducing a new class anyway, we believe we should > take the opportunity to improve the API. > We have the following concerns: > a) It's a bit of an awkward class hierarchy. *If* we hypothetically > would want to get rid of DOMDocument in the far far future, we can't > easily do that. > b) API is messy. Some methods are useless for HTML5Document. E.g.: > validate(), loadXML(), loadXMLFile(). They can be a source of confusion. > c) The fact that you can pass HTML5Document to methods accepting > DOMDocument may result in unexpected behaviour when the method expects > a particular behaviour. It would be better if developers could "opt-in" > to accepting both DOMDocument and HTML5Document in a method using a > common base class. > d) The properties set by DOMDocument's constructor are overridden by > load methods, which is surprising. That's even mentioned as the second > top comment on https://www.php.net/manual/en/domdocument.loadxml.php. > Furthermore, the XML version argument of the constructor is even > useless for HTML5 documents. > > So we propose the following changes to the RFC. > > We'll add a common abstract base class DOM\Document (name taken from > the DOM spec & Javascript world). > DOM\Document contains the properties and abstract methods common to > both HTML and XML documents. > Examples of what it includes/excludes: > * includes: firstElementChild, lastElementChild, ... > * excludes: xmlStandalone, xmlVersion, validate(), ... > Then we'll have two subclasses: DOM\HTMLDocument (previously we called > this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the > name to be more resilient to version changes and match the DOM spec > name. > DOMDocument will also use DOM\Document as a base class to make it > interchangeable with the new classes. > > The above would solve points a, b, and c. > To solve point d, we can use "factory methods": > This means HTMLDocument's constructor will be made private, and instead > we'll have three static methods that create a new instance: > - HTMLDocument::fromHTMLString(string $xml): HTMLDocument;
That should be string $html, yes? > - HTMLDocument::fromHTMLFile(string $filename): HTMLDocument; > - HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"): > HTMLDocument; > > > Or to put it in PHP code: > > ``` > namespace DOM { > // The base abstract document class > abstract class Document extends DOM\Node implements DOM\ParentNode { > /* all properties and methods that are common and sensible for > both > XML & HTML documents */ > } > > class XMLDocument extends Document { > /* insert specific XML methods and properties (e.g. xmlVersion, > validate(), ...) here */ > > private function __construct() {} > > public static function fromEmptyDocument(string $version = > "1.0", > string $encoding = "UTF-8"); > public static function fromFile(string $path); > public static function fromString(string $source); > } > > class HTMLDocument extends Document { > /* insert specific Html methods and properties here */ > > private function __construct() {} > > public static function fromEmptyDocument(string $encoding = > "UTF-8"); > public static function fromFile(string $path); > public static function fromString(string $source); > } > } > > class DOMDocument extends DOM\Document { > /* Keep methods, properties, and constructor the same as they are now */ > } > ``` > > We're only adding XMLDocument for completeness and API parity. It's a > drop-in replacement for DOMDocument, and behaves the exact same. > The difference is that the API is on par with HTMLDocument, and the > construction is designed to be more misuse-resistant. > DOMDocument will NOT change, and remains for the foreseeable future. > > We also have to change the $ownerDocument field in DOMNode to have type > ?DOM\Document instead of ?DOMDocument. > Problem is that this breaks BC (but only a minor break): > https://3v4l.org/El7Ve. > Overriding properties is kind of useless, but if someone does it, then > the compiler will complain loudly during compilation and it should be > easy to fix. > > > Of course, these changes means that the discussion period will run a > bit longer than originally foreseen. > > Kind regards > Niels This all makes sense to me, and I like this as a way forward. Nice work! --Larry Garfield -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php