Hi Larry

On 29/09/2023 18:58, Larry Garfield wrote:
> On Fri, Sep 29, 2023, at 7:07 AM, Niels Dossche wrote:
>> On 02/09/2023 21:41, Niels Dossche wrote:
>>> Hello internals
>>>
>>> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization 
>>> support".
>>> https://wiki.php.net/rfc/domdocument_html5_parser
>>>
>>> Kind regards
>>> Niels
>>
>> Hi internals
>>
>> Discussion seems to have died down.
>> Today, it's been 14 days since the last major change was done to the 
>> RFC (i.e. the class hierarchy update).
>> And it's also been close to 4 weeks since I first announced the RFC it 
>> on the mailing list.
>> I'd like to start the vote on Monday (20:00 PM GMT+2) and I intend to 
>> let it run for 2 weeks.
>> Any final complaints should be raised now.
>>
>> Kind regards
>> Niels
> 
> From the RFC:
> 
>> \DOMDocument will also use DOM\Document as a base class to make it 
>> interchangeable with the new classes. We're only adding XMLDocument for 
>> completeness and API parity. It's a drop-in replacement for \DOMDocument, 
>> and behaves the exact same. The difference is that the API is on par with 
>> HTMLDocument, and the construction is designed to be more misuse-resistant. 
>> \DOMDocument will NOT change, and remains for the foreseeable future. 
> 
> Would it make sense then for one of \DOMDocument and DOM\XMLDocument to 
> extend the other, then?  So that, eg, we can type against DOM\XMLDocument and 
> then support both old and new classes?  Or are the construction et al 
> differences enough that is not viable?
> 

I agree with Tim's answer here :) (Thanks Tim!)

>> Similarly, the constants would lose their DOM_ prefix in the namespace 
>> version, e.g. DOM\INDEX_SIZE_ERR will be an alias for DOM_INDEX_SIZE_ERR. 
>> For constants that begin with XML_ I propose to keep the prefix. 
> 
> Unclear to me: Would the XML constants also be aliased into the namespace 
> verbatim, or left globally?  
> 

I'll clarify this.
The intention is to alias them verbatim.

> Did you consider making the new classes throw exceptions rather than forcing 
> people to remember to call another "was there an error" global function like 
> it's still 1996? :-)

I did think about it.
Using exceptions for the parser is not viable. This is because parse errors in 
HTML aren't actually hard errors.
The errors are recoverable, i.e. the parser spec tells us how to proceed when 
an error occurred. So in a sense, they're closer to warnings. Using an 
exception would abort parsing.
As a side note, a good amount of the web pages out there violates at least one 
parsing rule, but browsers know by-spec how to proceed in that case (which is 
probably also why they're often not fixed).

I thought about other options as well. E.g. providing a getParseErrors() method 
or letting the factory methods return parse errors optionally as well.
However, I think they're not significantly better than what we have now. 
Furthermore, I think overhauling how the parse errors are handled in ext/dom 
(and maybe by extension ext/simplexml to keep consistency) is a bit of a 
feature creep. See also the motivation in the RFC text.
Therefore, I would keep the error handling as it is described now in the RFC.

If accepted, this RFC would land early in the 8.4 development cycle.
Therefore, we can gather feedback very early on. If we do notice a major 
problem in how these things are handled, they can be changed by a hypothetical 
future RFC in the same development cycle. That would also require thinking 
about the other XML extensions though.

> 
> Otherwise looks good to me.  Thanks!
> 
> --Larry Garfield
> 

Cheers
Niels

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to