Re: [PHP-DEV] Re: [RFC] [Discussion] DOM HTML5 parsing and serialization support

2023-09-16 Thread Niels Dossche
Hi Larry

On 9/17/23 01:04, Larry Garfield wrote:
> On Fri, Sep 15, 2023, at 6:17 PM, Niels Dossche wrote:
>> On 9/2/23 21:41, Niels Dossche wrote:
>>> Hello internals
>>>
>>> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization 
>>> support".
>>> https://wiki.php.net/rfc/domdocument_html5_parser
>>>
>>> Kind regards
>>> Niels
>>
>>
>> Hi internals
>>
>> I'd like to announce a change to the RFC. The new RFC version is 0.5.1, 
>> the old one was 0.4.0.
>> The diff can be viewed via the revision history button on the right.
>>
>> I had a productive discussion with Tim and Arne about the class hierarchy.
>> Here's a summary of the changes and the rationale.
>>
>> Until now, the RFC specified that DOM\HTML5Document extends DOMDocument.
>> However, as we're introducing a new class anyway, we believe we should 
>> take the opportunity to improve the API.
>> We have the following concerns:
>> a) It's a bit of an awkward class hierarchy. *If* we hypothetically 
>> would want to get rid of DOMDocument in the far far future, we can't 
>> easily do that.
>> b) API is messy. Some methods are useless for HTML5Document. E.g.: 
>> validate(), loadXML(), loadXMLFile(). They can be a source of confusion.
>> c) The fact that you can pass HTML5Document to methods accepting 
>> DOMDocument may result in unexpected behaviour when the method expects 
>> a particular behaviour. It would be better if developers could "opt-in" 
>> to accepting both DOMDocument and HTML5Document in a method using a 
>> common base class.
>> d) The properties set by DOMDocument's constructor are overridden by 
>> load methods, which is surprising. That's even mentioned as the second 
>> top comment on https://www.php.net/manual/en/domdocument.loadxml.php. 
>> Furthermore, the XML version argument of the constructor is even 
>> useless for HTML5 documents.
>>
>> So we propose the following changes to the RFC.
>>
>> We'll add a common abstract base class DOM\Document (name taken from 
>> the DOM spec & Javascript world).
>> DOM\Document contains the properties and abstract methods common to 
>> both HTML and XML documents.
>> Examples of what it includes/excludes:
>> * includes: firstElementChild, lastElementChild, ...
>> * excludes: xmlStandalone, xmlVersion, validate(), ...
>> Then we'll have two subclasses: DOM\HTMLDocument (previously we called 
>> this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the 
>> name to be more resilient to version changes and match the DOM spec 
>> name.
>> DOMDocument will also use DOM\Document as a base class to make it 
>> interchangeable with the new classes.
>>
>> The above would solve points a, b, and c.
>> To solve point d, we can use "factory methods":
>> This means HTMLDocument's constructor will be made private, and instead 
>> we'll have three static methods that create a new instance:
>> - HTMLDocument::fromHTMLString(string $xml): HTMLDocument;
> 
> That should be string $html, yes?

Indeed, silly copy paste mistake on my part.
In fact, the bullet point list is a bit outdated: it uses an older proposed 
method name (i.e. it says fromHTMLString instead of fromString; the HTML part 
is unnecessary duplication as the method is in the HTMLDocument class). I 
forgot to update that in my email draft when the final name was decided.
Anyway, the code snippet from my email and the RFC text both have the right 
method name and parameter name.

> 
>> - HTMLDocument::fromHTMLFile(string $filename): HTMLDocument;
>> - HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"): 
>> HTMLDocument;
>>
>>
>> Or to put it in PHP code:
>>
>> ```
>> namespace DOM {
>>  // The base abstract document class
>>  abstract class Document extends DOM\Node implements DOM\ParentNode {
>>  /* all properties and methods that are common and sensible for 
>> both 
>> XML & HTML documents */
>>  }
>>  
>>  class XMLDocument extends Document {
>>  /* insert specific XML methods and properties (e.g. xmlVersion, 
>> validate(), ...) here */
>>
>>  private function __construct() {}
>>  
>>  public static function fromEmptyDocument(string $version = 
>> "1.0", 
>> string $encoding = "UTF-8");
>>  public static function fromFile(string $path);
>>  public static function fromString(string $source);
>>  }
>>  
>>  class HTMLDocument extends Document {
>>  /* insert specific Html methods and properties here */
>>
>>  private function __construct() {}
>>  
>>  public static function fromEmptyDocument(string $encoding = 
>> "UTF-8");
>>  public static function fromFile(string $path);
>>  public static function fromString(string $source);
>>  }
>> }
>>
>> class DOMDocument extends DOM\Document {
>>  /* Keep methods, properties, and constructor the same as they are now */
>> }
>> ```
>>
>> We're only adding XMLDocument for 

Re: [PHP-DEV] Re: [RFC] [Discussion] DOM HTML5 parsing and serialization support

2023-09-16 Thread Larry Garfield
On Fri, Sep 15, 2023, at 6:17 PM, Niels Dossche wrote:
> On 9/2/23 21:41, Niels Dossche wrote:
>> Hello internals
>> 
>> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization 
>> support".
>> https://wiki.php.net/rfc/domdocument_html5_parser
>> 
>> Kind regards
>> Niels
>
>
> Hi internals
>
> I'd like to announce a change to the RFC. The new RFC version is 0.5.1, 
> the old one was 0.4.0.
> The diff can be viewed via the revision history button on the right.
>
> I had a productive discussion with Tim and Arne about the class hierarchy.
> Here's a summary of the changes and the rationale.
>
> Until now, the RFC specified that DOM\HTML5Document extends DOMDocument.
> However, as we're introducing a new class anyway, we believe we should 
> take the opportunity to improve the API.
> We have the following concerns:
> a) It's a bit of an awkward class hierarchy. *If* we hypothetically 
> would want to get rid of DOMDocument in the far far future, we can't 
> easily do that.
> b) API is messy. Some methods are useless for HTML5Document. E.g.: 
> validate(), loadXML(), loadXMLFile(). They can be a source of confusion.
> c) The fact that you can pass HTML5Document to methods accepting 
> DOMDocument may result in unexpected behaviour when the method expects 
> a particular behaviour. It would be better if developers could "opt-in" 
> to accepting both DOMDocument and HTML5Document in a method using a 
> common base class.
> d) The properties set by DOMDocument's constructor are overridden by 
> load methods, which is surprising. That's even mentioned as the second 
> top comment on https://www.php.net/manual/en/domdocument.loadxml.php. 
> Furthermore, the XML version argument of the constructor is even 
> useless for HTML5 documents.
>
> So we propose the following changes to the RFC.
>
> We'll add a common abstract base class DOM\Document (name taken from 
> the DOM spec & Javascript world).
> DOM\Document contains the properties and abstract methods common to 
> both HTML and XML documents.
> Examples of what it includes/excludes:
> * includes: firstElementChild, lastElementChild, ...
> * excludes: xmlStandalone, xmlVersion, validate(), ...
> Then we'll have two subclasses: DOM\HTMLDocument (previously we called 
> this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the 
> name to be more resilient to version changes and match the DOM spec 
> name.
> DOMDocument will also use DOM\Document as a base class to make it 
> interchangeable with the new classes.
>
> The above would solve points a, b, and c.
> To solve point d, we can use "factory methods":
> This means HTMLDocument's constructor will be made private, and instead 
> we'll have three static methods that create a new instance:
> - HTMLDocument::fromHTMLString(string $xml): HTMLDocument;

That should be string $html, yes?

> - HTMLDocument::fromHTMLFile(string $filename): HTMLDocument;
> - HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"): 
> HTMLDocument;
>
>
> Or to put it in PHP code:
>
> ```
> namespace DOM {
>   // The base abstract document class
>   abstract class Document extends DOM\Node implements DOM\ParentNode {
>   /* all properties and methods that are common and sensible for 
> both 
> XML & HTML documents */
>   }
>   
>   class XMLDocument extends Document {
>   /* insert specific XML methods and properties (e.g. xmlVersion, 
> validate(), ...) here */
>
>   private function __construct() {}
>   
>   public static function fromEmptyDocument(string $version = 
> "1.0", 
> string $encoding = "UTF-8");
>   public static function fromFile(string $path);
>   public static function fromString(string $source);
>   }
>   
>   class HTMLDocument extends Document {
>   /* insert specific Html methods and properties here */
>
>   private function __construct() {}
>   
>   public static function fromEmptyDocument(string $encoding = 
> "UTF-8");
>   public static function fromFile(string $path);
>   public static function fromString(string $source);
>   }
> }
>
> class DOMDocument extends DOM\Document {
>   /* Keep methods, properties, and constructor the same as they are now */
> }
> ```
>
> We're only adding XMLDocument for completeness and API parity. It's a 
> drop-in replacement for DOMDocument, and behaves the exact same.
> The difference is that the API is on par with HTMLDocument, and the 
> construction is designed to be more misuse-resistant.
> DOMDocument will NOT change, and remains for the foreseeable future.
>
> We also have to change the $ownerDocument field in DOMNode to have type 
> ?DOM\Document instead of ?DOMDocument.
> Problem is that this breaks BC (but only a minor break): 
> https://3v4l.org/El7Ve.
> Overriding properties is kind of useless, but if someone does it, then 
> the compiler will 

Re: [PHP-DEV] A new JIT engine for PHP-8.4/9

2023-09-16 Thread Hans Henrik Bergan
I think the submodule approach is fine, but maybe it should be moved from
Dmitry's gh to php-src gh, or maybe it's own dedicated group, to reduce the
bus factor (how much work needs to be done if Dmitry is hit by a bus~)

On Sat, Sep 16, 2023, 00:22 Tim Düsterhus  wrote:

> Hi
>
> On 9/15/23 17:50, Ben Ramsey wrote:
> > Additionally, despite the use of a Git submodule complicating things for
> > "everyone else," it provides a clear dependency and development
> > boundary, avoiding situations where the php-src version of IR drifts
> > from the upstream version. I think we can adjust tooling and messaging
> > to help folks know how to use the submodule. :-)
>
> Do not want: If the submodule repository goes away for whatever reason,
> the dependency will no longer be available. IR is currently sitting in
> Dmitry's personal GitHub account and it would not be the first time that
> a GitHub account is suspended for good or less good reasons [1].
>
> Bundle IR in some dedicated directory with php-src. Then updating it is
> as easy as "throw away the directory and copy over the new files". We
> could even automate that using GitHub actions to sync in the changes
> every night or so.
>
> Dmitry also said that he gave 'git subtree' a try. I don't know that
> command myself, but it looks exactly like what is required here. I would
> be interested in hearing why it didn't work well.
>
> Best regards
> Tim Düsterhus
>
> [1] Such as sanctions, see:
> https://techcrunch.com/2019/07/29/github-ban-sanctioned-countries/
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>
>