from:"tyson andre"

Re: [PHP-DEV] ARRAY_UNIQUE_IDENTICAL option

2022-11-07 Thread tyson andre

Hi Levi Morrison,

> A bit off topic, but not entirely:
>
> In my opinion, adding another flag isn't the _real_ fix. Any function
> which does comparisons should take a callable for users to provide any
> comparison they wish. An iteratively better API would be:
>
>     function array_unique(list $array, callable(T $a, T $b): int 
> $comparator);
>
> Of course, there are other things like instead of using int for `0`,
> `-1`, `1`, we could have used an enum but we don't have one today. I
> just mean the core idea of taking callable is better than mucking
> around with flags while also allowing for custom comparison. Note that
> it doesn't necessarily prevent optimizations either. For instance, if
> they had passed `php_compare` or some function which represents `$a
> <=> $b`, we could identify that just as we identify a specific flag
> and take an optimized pass.

1. Calling php functions from C is fairly slow. Sorting compared to hash maps 
is also slow

If we did want a specialized implementation for user-provided
equality criteria, `?callable(T $a): U $iteratee = null` would seems more 
practical to me
(Calling a function `n` times rather than `n log n` times would be faster,
and this would work even for cases such as
`[$obj->nonObjectField1, $obj->field2->toNormalizedRepresentation()]` instead 
of using a comparator in most cases

(by putting both arrays into the internal hash map)

(same approach as https://lodash.com/docs/#uniqBy)

2. `<=>` isn't a stable order for int/string/float (etc) in various ways,
   so some comparators would have issues and return duplicates.

   It's possible to implement stable comparisons, but I don't really expect 
enthusiasm for that
   https://github.com/TysonAndre/pecl-teds/#stable-comparison
3. I expect a majority of cases could use the `ARRAY_UNIQUE_IDENTICAL` directly.
   (or use that on array_map()ped values to find the keys of the original array 
to use)
   So requiring the use of $comparator would result in longer code and more 
possible sources of bugs
   (e.g. a comparator returning `$a - $b` might overflow/underflow int if users 
don't realize `<=>` should be used)

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Proposal: Expanded iterable helper functions and aliasing iterator_to_array in `iterable\` namespace

2022-10-28 Thread tyson andre

Hi internals,

https://wiki.php.net/rfc/iterator_xyz_accept_array recently passed in php 8.2,
fixing a common inconvenience of those functions throwing a TypeError for 
arrays.

However, from the `iterator_` name 
(https://www.php.net/manual/en/class.iterator.php),
it's likely to become a source of confusion when writing or reviewing code 
decades from now,
when the name suggests it only accepts objects (Traversable 
Iterator/IteratorAggregate).

I'm planning on creating an RFC adding the following functions to the 
`iterable\` namespace as aliases of iterator_count/iterator_to_array.
Those accept iterables 
(https://www.php.net/manual/en/language.types.iterable.php), i.e. both 
Traversable objects and arrays.

Namespaces were chosen after feedback on my previous RFC,
and I believe `iterable\` follows the guidance from 
https://wiki.php.net/rfc/namespaces_in_bundled_extensions and
https://wiki.php.net/rfc/namespaces_in_bundled_extensions#core_standard_spl

I plan to create an RFC with the following functionality in the iterable\ 
namespace, and wanted to see what the preference on naming was, or if there was 
other feedback.
(Not having enough functionality and wanting a better idea of the overall 

- `iterable\count(...)` (alias of iterator_count)
- `iterable\to_array(Traversable $iterator, bool $preserve_keys = true): array` 
(alias of iterator_to_array, so that users can stop using a misleading name)

- `iterable\any(iterable $input, ?callable $callback = null): bool` - 
Determines whether any value of the iterable satisfies the predicate.
   and all() - Determines whether all values of the iterable satisfies the 
predicate.

  This is a different namespace from 
https://wiki.php.net/rfc/any_all_on_iterable
- `iterable\none(iterable $input, ?callable $callback = null): bool`

   returns the opposite of any()
- `iterable\find(iterable $iterable, callable $callback, mixed $default = 
null): mixed`

   Returns the first value for which $callback($value) is truthy. On failure, 
returns default
- `iterable\fold(iterable $iterable, callable $callback, mixed $initial): mixed`

  `fold` and requiring an initial value seems like better practice. See 
https://externals.io/message/112558#112834
  and 
https://stackoverflow.com/questions/25149359/difference-between-reduce-and-fold
- `iterable\unique_values(iterable $iterable): array {}`

  Returns true if this iterable includes a value identical to $value (`===`).
- `iterable\includes_value(iterable $iterable, mixed $value): bool {}`
   Returns a list of unique values of $iterable

There's other functionality that I was less certain about proposing, such as 
`iterable\keys(iterable $iterable): array`,
which would work similarly to array_keys but also work on Traversables (e.g. to 
be used with userland/internal collections, generators, etc.)
Or functions to get the iterable\first()/last() value in an iterable. Any 
thoughts on those?

I also wanted to know if more verbose names such as find_value(), 
fold_values(), any_values(), all_values() were generally preferred before 
proposing this,
since I only had feedback from a small number of names. My assumption was short 
names were generally preferred when possible.

See https://github.com/TysonAndre/pecl-teds/blob/main/teds.stub.php for 
documentation of the other functions mentioned here. The functionality can be 
tried out by installing https://pecl.php.net/package/teds

Background
---

In February 2021, I proposed expanded iterable functionality and brought it to 
a vote,
https://wiki.php.net/rfc/any_all_on_iterable , where feedback was mainly about 
being too small in scope and the choice of naming.

Later, after https://externals.io/message/112558#112780 , 
https://wiki.php.net/rfc/namespaces_in_bundled_extensions#proposal was created 
and brought to a vote in April 2021 that passed,
offering useful recommendations on how to standardize namespaces in future 
proposals of new categories of functionality
(e.g. `iterable\any()` and `iterable\all()`)

Any comments?

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Microseconds to error log

2022-10-28 Thread tyson andre

Hi Mikhail,

> >Basically, we have quite a high-loaded environment and we really want
> >to see timestamps containing milli-/microseconds in our logs.
> 
> I'm not knowledgeable enough to comment on the implementation details, but 
> from a user point of view I agree the feature would be useful. It would 
> definitely need to be behind an ini setting, though, to avoid existing log 
> parsers failing unexpectedly on the new format.

I agree that I'd want this functionality available.

I found https://www.ietf.org/rfc/rfc3164.txt for syslogs which allows 1-6 
digits after `.`. (haven't checked for any followup rfcs or how widely they are 
supported)

https://www.w3.org/TR/NOTE-datetime appears to document a useful subset of the 
iso 8601 functionality for anything documented as accepting iso 8601 dates, 
which allows microseconds

> s= one or more digits representing a decimal fraction of a second


Does anyone know of commonly used tools for syslogs where the recent releases 
work with milliseconds but not microseconds (I'd hope not, since syslog has 
been around for decades, but it'd be useful to know)? PHP's fast so I'd agree 
microseconds would be useful to have, e.g. I've wanted this when debugging the 
order of db/cache calls or other operations in a web app 

E.g. 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/toISOString
 (JavaScript dates and Date.now are in milliseconds)

> The toISOString() method returns a string in simplified extended ISO format 
> (ISO 8601), which is always 24 or 27 characters long 
> (-MM-DDTHH:mm:ss.sssZ or ±YY-MM-DDTHH:mm:ss.sssZ, respectively).


I'd agree I'd want an ini setting, e.g. for non-technical users that wanted to 
upgrade to the next php minor version
without researching how to change their ad-hoc or external syslog parsers
(e.g. when logs are parsed using regex to extract the date and following fields)

An example: syslogs can be sent over the network to various services, e.g., 
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-syslog.html#plugins-inputs-syslog-grok_pattern

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Adding the OpenSSF Scorecards GitHub Action

2022-10-27 Thread tyson andre

Hi Pedro Nacht,

> Hello, I'm working on behalf of Google and the Open Source Security 
> Foundation to help essential open-source projects improve their supply-chain 
> security.

Could you expand on that? It isn't obvious from your comment, and I'm curious 
about this initiative at Google.


1. How many hours a week do you spend working for Google/Alphabet, roughly? 
(e.g., averaged over the last month)
2. How many hours a week do you spend working for the Open Source Security 
Foundation, roughly? Is that work part of your job role at Google?
3. What is your job title, team, and department in those organizations?
4. What is the team size?

I also had a few other questions:

5. How many of the top N security-critical open-source projects does the OSSF 
plan to propose this badge to this year?
6. What studies have been published or are being conducted by Google/OSSF on 
the impact of the badge on open-source organizations (or being conducted 
externally, e.g., by universities) (e.g. comparing organizations where it is 
proposed to vs not proposed to)? If so, where can I find them?

   E.g., I saw https://news.ycombinator.com/item?id=33309969 recently and 
wanted to learn more about what is known about the impact on metrics of 
projects short-term and long-term. (e.g. on developers that strongly focus on 
scorecards, or perfectionists, or averaged)

   I'm interested in learning more about what is being done to ensure the 
overall security, stability, and ongoing improvements of open source software 
in general as an end user, contributor, maintainer, and user of the companies 
that use open source software.

   This would be useful to know when an organization considers adopting a badge 
or change to process.
6. Is creating PRs to add this badge part of your job role (If so, the job role 
of which organization)? Is this done in your free time?

   Sorry, it isn't clear - From 
https://opensource.google/documentation/reference/patching, I see that the use 
of @google.com emails is required for all open-source contributions, so I was 
initially confused.
7. Are there recent posts by Google clarifying their involvement in the Open 
Source Security Foundation (funding provided, team size, shared 
employees/contractors, etc)?
   I wanted to know more.

   
https://security.googleblog.com/2022/10/announcing-guac-great-pairing-with-slsa.html
 mentions that the foundation exists,
   but doesn't mention any details about how Google is involved in it.

   > An open source organization like the Open Source Security Foundation wants 
to identify critical libraries to maintain and secure
8. What is the roadmap/timeline for this tool? 
https://github.com/ossf/scorecard/issues has a lot of open issues.
   E.g., avoiding false positives in some contexts seems to be a TODO,
   the preview is a one-line JSON dump (https://stedolan.github.io/jq/ is a 
fantastic tool), and there are a lot of open tickets for the website.

   What other practices are planned for inclusion in this badge?

Best regards,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Adding `class UnsupportedOperationException extends RuntimeException` to php?

2022-02-13 Thread tyson andre

Hi internals,

Currently, there doesn't seem to be an exact fit for indicating that a method 
isn't supported on an object by design (and will throw unconditionally).
(For example, when implementing an interface or overriding a base class, e.g. 
an Immutable datastructure not implementing offsetSet/offsetUnset,
an object that does not support serialization and overrides 
__serialize/__unserialize to forbid it, etc.)

- BadMethodCallException - Exception thrown if a callback refers to an 
undefined method or if some arguments are missing.
- DomainException/RangeException - Seemingly refer to argument values
- LogicException - "Exception that represents error in the program logic. This 
kind of exception should lead directly to a fix in your code."
  I don't believe it's quite that, because code designed for the interface 
implementing the method (e.g. ArrayAccess) might have correct logic,
  it's the choice of implementation/subclass at runtime that caused the 
exception.
- RuntimeException and its subclasses are also used for unpredictable errors 
such as missing files, network/db errors, etc,
  making the meaning of `try { ... } catch (RuntimeException $e) {}` less clear.
- BadMethodCallException - Exception thrown if a callback refers to an 
undefined method or if some arguments are missing.'
  seems misleading as well since the method is defined and has its arguments, 
and existing uses in CachingIterator/Phar are checking the object's properties 
or arguments.
  (Uncaught BadMethodCallException: CachingIterator does not fetch string value 
(see CachingIterator::__construct) in %s:%d)

RuntimeException (or a userland subclass) is the most common exception type I 
see used for this purpose,
but RuntimeException is already used for many other things and has many 
subclasses.
https://www.php.net/manual/en/spl.exceptions.php#spl.exceptions.tree 
https://www.php.net/manual/en/class.exception.php
(RuntimeException and subclasses are also used for unpredictable errors such as 
missing files, network/db errors, some types of errors in destructors, etc,
making writing `try { ... } catch (RuntimeException $e) {}` to catch this also 
catch other issues that should instead be rethrown.)

It would be useful to provide this instead of having extensions or userland 
code throw RuntimeException or
implement their own exceptions 
https://www.php.net/manual/en/class.solrillegaloperationexception

IDEs/type checkers would also have more information to indicate that a method 
call is probably invalid in a use
case and it would allow userland code to be have more specific api 
documentation/type info than `@throws RuntimeException`
(rather than deliberately invoking a shared helper method to throw exceptions).
- Type checkers already check this - e.g. 
https://psalm.dev/docs/running_psalm/configuration/#ignoreexceptions 
https://github.com/phan/phan/wiki/Issue-Types-Caught-by-Phan#phanthrowcommentintostring

Thoughts on adding UnsupportedOperationException to the spl?

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: Adding `final class Deque` to PHP

2022-02-05 Thread tyson andre

Hi internals,

> > > I've created a new RFC https://wiki.php.net/rfc/deque to add a `final 
> > > class Deque`
> > >
> > > This is based on the `Teds\Deque` implementation I've worked on
> > > for the https://github.com/TysonAndre/pecl-teds PECL.
> > >
> > > While `SplDoublyLinkedList` and its subclass `SplQueue`/`SplStack` exist 
> > > in the SPL, they have several drawbacks
> > > that are addressed by this RFC to add a `Deque` class (to use instead of 
> > > those):
> > >
> > > 1. `SplDoublyLinkedList` is internally represented by a doubly linked 
> > > list,
> > >    making it use roughly twice as much memory as the proposed `Deque`
> > > 2. `push`/`pop`/`unshift`/`shift` from `SplDoublyLinkedList` are slower 
> > > due to
> > >    needing to allocate or free the linked list nodes.
> > > 3. Reading values in the middle of the `SplDoublyLinkedList` is 
> > > proportional to the length of the list,
> > >    due to needing to traverse the linked list nodes.
> > > 4. `foreach` Iteration behavior cannot be understood without knowing what 
> > > constructed the
> > >    `SplDoublyLinkedList` instance or set the flags.
> > >
> > > It would be useful to have an efficient `Deque` container in the standard 
> > > library
> > > to provide an alternative without those drawbacks,
> > > as well as for the following reasons:
> > >
> > > 1. To save memory in applications or libraries that may need to store 
> > > many lists of values or run for long periods of time.
> > >    Notably, PHP's `array` type will never release allocated capacity.
> > >    See 
> > > https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
> > > 2. To provide a better alternative to `SplDoublyLinkedList`, `SplStack`, 
> > > and `SplQueue`
> > >    for use cases that require stacks or queues.
> > > 3. As a more efficient option than `array` and `SplDoublyLinkedList`
> > >    as a queue or `Deque`, especially for `unshift`.
> > >
> > > A `Deque` is more efficient than an `array` when used as a queue, more 
> > > readable, and easier to use correctly.
> > > While it is possible to efficiently remove elements from the start of an 
> > > `array` (in terms of insertion order) (though this makes 
> > > reset()/array_key_first() inefficient),
> > > it is very inefficient to prepend elements to the start of a large 
> > > `array` due to needing to either copy the array
> > > or move all elements in the internal array representation,
> > > and an `array` would use much more memory than a `Deque` when used that 
> > > way (and be slower).
> > >
> > > There are also several pitfalls to using an array as a queue for larger 
> > > queue sizes,
> > > some of which are not obvious and discovered while writing the benchmarks.
> > > (Having a better (double-ended) queue datastructure (`Deque`) than the 
> > > `SplDoublyLinkedList`
> > > would save users from needing to write code with these pitfalls):
> > >
> > > 1. `array_key_first()` and reset()`takes time proportional to the number 
> > > of elements `unset` from the start of an array,
> > >    causing it to unexpectedly be extremely slow (quadratic time) after 
> > > unsetting many elements at the start of the queue.
> > >    (when the array infrequently runs out of capacity, buckets are moved 
> > > to the front)
> > > 2. `reset()` or `end()` will convert a variable to a reference,
> > >    and php is less efficient at reading or writing to reference.
> > >    Opcache is also less efficient at optimizing uses of variables using 
> > > references.
> > > 3. More obviously, `array_unshift` and `array_shift` will take time 
> > > proportional to the number of elements in the array
> > >    (to reindex and move existing/remaining elements).
> >
> > I plan to start voting on https://wiki.php.net/rfc/deque on Friday, 
> > February 4th.
> >
> > Several changes have been made to https://wiki.php.net/rfc/deque#changelog
> > after the feedback in https://externals.io/message/116100
> >
> > - The class is now named `Collections\Deque`
> > - The api documentation in https://wiki.php.net/rfc/deque#proposal was 
> > expanded for methods.
> > - Benchmarks were updated.
> > - Like other standard datastructures, iteration over the deque is now over 
> > the original object (instead of creating a copy),
> >   and mutating the deque will be reflected in `$iterator->current()` (and 
> > moving the end with push()/pop() will affect where iteration ends).
> > - Iteration will account for calls to shift/unshift moving the start of the 
> > deque.
> >   the offsets will be corrected and values won't be skipped or iterated 
> > over multiple times.
> >   (no matter how many iterators were created by `Deque->getIterator()`)
> >   See https://wiki.php.net/rfc/deque#iteration_behavior
> > - The get()/set() methods were removed, after feedback in 
> > https://externals.io/message/116100#116214
> >
> > A WebAssembly demo is available at 
> > https://tysonandre.github.io/php-rfc-demo/deque/
> 
> I've updated the RFC

[PHP-DEV] Re: Adding `final class Deque` to PHP

2022-02-04 Thread tyson andre

Hi internals,

> > I've created a new RFC https://wiki.php.net/rfc/deque to add a `final class 
> > Deque`
> >
> > This is based on the `Teds\Deque` implementation I've worked on
> > for the https://github.com/TysonAndre/pecl-teds PECL.
> >
> > While `SplDoublyLinkedList` and its subclass `SplQueue`/`SplStack` exist in 
> > the SPL, they have several drawbacks
> > that are addressed by this RFC to add a `Deque` class (to use instead of 
> > those):
> >
> > 1. `SplDoublyLinkedList` is internally represented by a doubly linked list,
> >    making it use roughly twice as much memory as the proposed `Deque`
> > 2. `push`/`pop`/`unshift`/`shift` from `SplDoublyLinkedList` are slower due 
> > to
> >    needing to allocate or free the linked list nodes.
> > 3. Reading values in the middle of the `SplDoublyLinkedList` is 
> > proportional to the length of the list,
> >    due to needing to traverse the linked list nodes.
> > 4. `foreach` Iteration behavior cannot be understood without knowing what 
> > constructed the
> >    `SplDoublyLinkedList` instance or set the flags.
> >
> > It would be useful to have an efficient `Deque` container in the standard 
> > library
> > to provide an alternative without those drawbacks,
> > as well as for the following reasons:
> >
> > 1. To save memory in applications or libraries that may need to store many 
> > lists of values or run for long periods of time.
> >    Notably, PHP's `array` type will never release allocated capacity.
> >    See 
> > https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
> > 2. To provide a better alternative to `SplDoublyLinkedList`, `SplStack`, 
> > and `SplQueue`
> >    for use cases that require stacks or queues.
> > 3. As a more efficient option than `array` and `SplDoublyLinkedList`
> >    as a queue or `Deque`, especially for `unshift`.
> >
> > A `Deque` is more efficient than an `array` when used as a queue, more 
> > readable, and easier to use correctly.
> > While it is possible to efficiently remove elements from the start of an 
> > `array` (in terms of insertion order) (though this makes 
> > reset()/array_key_first() inefficient),
> > it is very inefficient to prepend elements to the start of a large `array` 
> > due to needing to either copy the array
> > or move all elements in the internal array representation,
> > and an `array` would use much more memory than a `Deque` when used that way 
> > (and be slower).
> >
> > There are also several pitfalls to using an array as a queue for larger 
> > queue sizes,
> > some of which are not obvious and discovered while writing the benchmarks.
> > (Having a better (double-ended) queue datastructure (`Deque`) than the 
> > `SplDoublyLinkedList`
> > would save users from needing to write code with these pitfalls):
> >
> > 1. `array_key_first()` and reset()`takes time proportional to the number of 
> > elements `unset` from the start of an array,
> >    causing it to unexpectedly be extremely slow (quadratic time) after 
> > unsetting many elements at the start of the queue.
> >    (when the array infrequently runs out of capacity, buckets are moved to 
> > the front)
> > 2. `reset()` or `end()` will convert a variable to a reference,
> >    and php is less efficient at reading or writing to reference.
> >    Opcache is also less efficient at optimizing uses of variables using 
> > references.
> > 3. More obviously, `array_unshift` and `array_shift` will take time 
> > proportional to the number of elements in the array
> >    (to reindex and move existing/remaining elements).
>
> I plan to start voting on https://wiki.php.net/rfc/deque on Friday, February 
> 4th.
>
> Several changes have been made to https://wiki.php.net/rfc/deque#changelog
> after the feedback in https://externals.io/message/116100
>
> - The class is now named `Collections\Deque`
> - The api documentation in https://wiki.php.net/rfc/deque#proposal was 
> expanded for methods.
> - Benchmarks were updated.
> - Like other standard datastructures, iteration over the deque is now over 
> the original object (instead of creating a copy),
>   and mutating the deque will be reflected in `$iterator->current()` (and 
> moving the end with push()/pop() will affect where iteration ends).
> - Iteration will account for calls to shift/unshift moving the start of the 
> deque.
>   the offsets will be corrected and values won't be skipped or iterated over 
> multiple times.
>   (no matter how many iterators were created by `Deque->getIterator()`)
>   See https://wiki.php.net/rfc/deque#iteration_behavior
> - The get()/set() methods were removed, after feedback in 
> https://externals.io/message/116100#116214
>
> A WebAssembly demo is available at 
> https://tysonandre.github.io/php-rfc-demo/deque/

I've updated the RFC https://wiki.php.net/rfc/deque yet again (no 
implementation changes). I now plan to start voting on Saturday, February 5th.

I've also updated the WebAssembly demo at

Re: [PHP-DEV] Re: Adding `final class Deque` to PHP

2022-02-02 Thread tyson andre

Hi Larry Garfield,

> >> Returning $this would resolve that, I think.  (Making it return a new, 
> >> immutable copy of the Deque would be even nicer, but I realize that's 
> >> probably not an argument I'm going to win at this point on this RFC.)
> >
> > Technically, you still can have single-expression usages in
> > readable/unreadable ways
> >
> > - `[$deque->shift('value'), $deque][1]`, or
> > - `($deque->shift('value') ?: $deque)`, or
> > - `my_userland_helper_shift_and_return($deque, 'value')`
>
> ... Those are all grotesque.

True, I wasn't familiar with that event and whether your goal was to write as 
few lines as possible or as quickly as possible vs in a readable way to people 
familiar with php.
That clears that up.

> > My personal preference is against making this fluent.
> > I'd rather expose an efficient datastructure that's consistent with the
> > rest of PHP's functionality to the extent possible,
> > which userland can use to write their own fluent/non-fluent classes.
> > There's drawbacks to returning `$this`, including:
> >
> > 1. Inconsistency with existing APIs making remembering what does what
> > harder. Barely anything in php that I remember returns $this.
> >
> >    https://www.php.net/manual/en/arrayobject.append.php returns void.
> >
> >    https://www.php.net/manual/en/function.array-push returns an int.
>
> So it's already inconsistent, and frankly neither of those returns is useful. 
>  If the existing convention is "inconsistently unhelpful", then let's go 
> ahead and make it helpful since "it's lame but at least it's consistent" is 
> not an argument here.  (Sometimes that is a valid argument, but not in this 
> case.)

I won't know if anyone is strongly against fluent interfaces belonging in core, 
or strongly against adopting fluent interfaces in an ad-hoc way that would make 
it unclear if the vote was for/against the overall functionality or the new 
design choice.

I'd rather not do this without a policy RFC such as the one that passed for 
namespaces https://wiki.php.net/rfc/namespaces_in_bundled_extensions

If you expect this to be widely considered a better design php should adopt, 
please create an RFC after the vote finishes (if it passes) before the feature 
freeze outlining which methods of `Collection\Deque` would change and how.

> > 2. Inconsistency with possible new datastructures/methods
> >
> >    If a `Map`/`Set` function were to be added, then methods for
> > add/remove would return booleans (or the old value), not $this
>
> Removal methods need to return the value being removed, sure.  That makes 
> sense across the board.  But I don't see why Map::add($k, $v) or Set::add($v) 
> shouldn't also return $this.  I would make the exact same ask there, for the 
> exact same reason: To aid functional code, which (when the syntax is 
> conducive to it) can be more readable in many cases than procedural 
> approaches.
>
> > 3. Slight additional performance overhead for functionality I assume
> > will be used relatively infrequently
>
> I would assume the opposite, at least in my own code.  I'm writing a lot of 
> recursive or reduction-based routines these days, so if I had reason to use a 
> queue/stack in them in the first place, they'd almost certainly get used in 
> this fashion.
>
> >    (php has to increment reference counts and opcache can't eliminate
> > the opcode to decrease reference counts and possibly free the return
> > value of `$deque->shift()` with the return type info being an object)
>
> I cannot speak to this one; how much of a difference is it in practice? 

Adding `RETURN_OBJ_COPY(Z_OBJ_P(ZEND_THIS));` and a different method with the 
`: Deque` return type:
Around 6-8% for large arrays where push is the most frequent method call and 
the array reads are taken out since they'd be the same?
(and similar for shift, I'd expect)
(Intel CPU security mitigations/power management and so on have made 
benchmarking less convenient, I ran the functions twice to see if it was 
consistent)

I'd still object to this for other stated reasons even if you think 6-8% is 
worth it for your use cases for this or future additions to php in general.

```
Results for php 8.2.0-dev debug=false with opcache enabled=true

Appending to Collections\Deque []=   : n=   4 iterations=1000, 
create+destroy time=1.204 result=0
Appending to Collections\Deque push  : n=   4 iterations=1000, 
create+destroy time=1.551 result=0
Appending to Collections\Deque fluent: n=   4 iterations=1000, 
create+destroy time=1.640 result=0
Appending to Collections\Deque []=   : n=   4 iterations=1000, 
create+destroy time=1.213 result=0
Appending to Collections\Deque push  : n=   4 iterations=1000, 
create+destroy time=1.549 result=0
Appending to Collections\Deque fluent: n=   4 iterations=1000, 
create+destroy time=1.636 result=0

Appending to Collections\Deque []=   : n=   8 iterations= 500, 
create+destroy time=0.923 result=0

Re: [PHP-DEV] Re: Adding `final class Deque` to PHP

2022-02-02 Thread tyson andre

Hi Jordan,

> > 4.  Returning $this makes code easier to write at some cost to readability 
> > - Developers new to php or using `Collections\Deque` for the first time 
> > would not immediately know what the code they're reading is doing.
> >    (less of a problem with a good IDE, typechecker, and a typed codebase, 
> > but this isn't universal)
> >    Having it return void, `return $deque->push()` would be less common and 
> > this would force the meaning to be clear.
> > 
> >    Developers might have several guesses/assumptions based on their 
> > experience with other methods in php/elsewhere
> > 
> >    - It returns the new count (JavaScript Array.push, array_push)
> >    - It returns $this (Ruby)
> >    - It returns a lazy copy, like you'd wanted, not modifying the original
> >    - It's returning void and the code in question is shorthand for `return 
> > null`.
> >      (Python, C++ 
> > https://www.cplusplus.com/reference/vector/vector/push_back/ , offsetSet 
> > and spl push()/shift() methods)
> 
> 
> I'm not sure that I buy this as a point even. Returning an immutable Deque 
> instance would be much more in line with modern PHP in general.
> 
> A major complaint about my operator overloads RFC was how it impacted static 
> analysis tools. I don't see how in one RFC we can say that creating work for 
> static analysis tools is a blocking problem, and in a different RFC say that 
> the ability to inspect the return values by the developer can't even be 
> assumed. If we design one feature around the idea that a basic IDE may not 
> even be used, but design a different feature around the idea that we want to 
> minimize the impact to a third party tool that provides static analysis as 
> part of workflow that's not even part of an IDE... well that seems like a 
> very inconsistent approach to me.

There's a difference between

1. https://externals.io/message/116767#116768 

   One contributor to a static analyzer thinking it would not be worth the 
complexity it adds to support in static analysis 
   (and making type inference either vaguer or inaccurate for operands with a 
type `mixed` *more frequently*)
   Debuggability seemed to be their bigger objection.

   As a maintainer of a different static analysis tool, I'd disagree with that 
being a blocker, though. It'd be inconvenient to implement but would get 
implemented.
2. Aiming to "cater to the skill levels and platforms of a wide range of 
users", considering benefits/drawbacks to both groups of users (with/without 
static analyzers).
   (https://wiki.php.net/rfc/template)

I don't think I'd have much success in an RFC adding immutable datastructures, 
though.

Though interestingly, they do seem possible to implement with better 
performance (https://en.wikipedia.org/wiki/Big_O_notation) than I'd have 
expected.
Google searching mentioned Relaxed Radix Binary Trees, an immutable 
datastructure surprisingly claiming performance near arrays in most operations.
https://hypirion.com/musings/thesis https://github.com/hyPiRion/c-rrb

Aside: An Immutable Map/Set would probably have a stronger argument than 
immutable sequeunces/deques/linked lists (arrays support copy on write already 
so the original array is immutable, but arbitrary keys are something arrays 
don't support),
but voters that didn't see a use case for immutables in core might be 
unconvinced.

> Either modern development tools are factored into the language design or they 
> are not. This seems like a "having your cake and eating it too" situation.

The language design is decided by what the set of voters that voted on a 
particular RFC would approve with a 2/3 majority.
Voters have a wide variety of backgrounds and projects/libraries that they work 
on, and RFC authors have to guess at what those backgrounds are and what 
functionality would be accepted, or why similar past RFCs were really objected.

The preferences, backgrounds, and paradigms preferred by voters vary and I 
likely won't know what those preferences are or how many voters there would be 
until a vote is started. (e.g. frequency of writing new code vs maintaining 
existing codebases vs working with third party libraries)
(e.g. **I won't know if anyone is strongly against fluent interfaces, or 
specifically strongly against adopting fluent interfaces in an ad-hoc way, 
without a policy RFC** such as the one that passed for namespaces
https://wiki.php.net/rfc/namespaces_in_bundled_extensions)

The fact that the closest look at the design/implementation for complex RFCs 
sometimes only happens shortly before/after the start of a vote is also 
inconvenient for authors, but hard to change with a volunteer-based process.

This has downsides for current or future RFC authors, but I don't have any 
productive ideas for how to change the process that wouldn't split development 
or be burdensome to voters. (and would actually be accepted)

Regards,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To

Re: [PHP-DEV] Adding `final class Deque` to PHP

2022-02-02 Thread tyson andre

Hi Stephen,

> As a userland dev & library author it’s nice to see some progression on basic 
> data structures, so thank you for your efforts on this!
> 
> 
> Two little things in the RFC:
> 
> The proposed API switches between terms `front`, `back`, `start` and `end` in 
> comments - is there meant to be a conceptual difference between front/start 
> and end/back ?
 
Good point. I've changed the method names to first()/last() and also made the 
wording in https://wiki.php.net/rfc/deque more consistently use first/last to 
avoid confusion.

No, They're the same. front=start=bottom=first. Bottom was from 
SplDoublyLinkedList/SplStack, e.g. the bottom of the stack, top is where 
`push()` acts, etc.
Front was how I was referring to iteration order.
 
> In the "Why use this instead of array?” Section, the 3rd point seems cut off:
> > Note that starting in php 8.2, array

That should say "Note that starting in php 8.2, arrays that are lists (with 
no/few gaps) are represented in a more memory efficient way than associative 
arrays.".

I've updated the RFC.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Re: Adding `final class Deque` to PHP

2022-02-02 Thread tyson andre

Hi Larry Garfield,

> Request:
>
> push() and unshift() currently return void.  That's not helpful.  It would be 
> vastly more useful if they both returned $this.  Not as much for chaining, 
> but more so that you can add a value to a queue and pass it as an argument to 
> another call (often recursive, but not necessarily) in a single operation.
>
> Example: I was doing last year's Advent of Code in functional PHP, and had a 
> stack walker that looked like this:
>
> function parse(string $line, $pos = 0, array $stack = []): Result|string
> {
> $next = $line[$pos] ?? null;
> $head = $stack[0] ?? null;
>
> return match ($next) {
> // Opening brace, push an "expected" onto the stack.
> '{' => parse($line, $pos + 1, ['}', ...$stack]),
> '<' => parse($line, $pos + 1, ['>', ...$stack]),
> '(' => parse($line, $pos + 1, [')', ...$stack]),
> '[' => parse($line, $pos + 1, [']', ...$stack]),
> '}', '>', ')', ']' => $next === $head ? parse($line, $pos + 1, 
> array_slice($stack, 1)) : $next,
> null => count($stack) ? Result::Incomplete : Result::OK,
> };
> }
>
> The interesting part is the ['<', ...$stack], to pass on a modified version 
> of an array-as-stack.  That's of course annoyingly slow with arrays right 
> now, and a Deque would be better, but only if it could be "modified and 
> passed" like that.  If not, it would be incompatible with single-expression 
> usages (match statements, short lambdas, etc.)
>
> Returning $this would resolve that, I think.  (Making it return a new, 
> immutable copy of the Deque would be even nicer, but I realize that's 
> probably not an argument I'm going to win at this point on this RFC.)

Technically, you still can have single-expression usages in readable/unreadable 
ways

- `[$deque->shift('value'), $deque][1]`, or
- `($deque->shift('value') ?: $deque)`, or
- `my_userland_helper_shift_and_return($deque, 'value')`

My personal preference is against making this fluent.
I'd rather expose an efficient datastructure that's consistent with the rest of 
PHP's functionality to the extent possible,
which userland can use to write their own fluent/non-fluent classes.
There's drawbacks to returning `$this`, including:

1. Inconsistency with existing APIs making remembering what does what harder. 
Barely anything in php that I remember returns $this.

   https://www.php.net/manual/en/arrayobject.append.php returns void.

   https://www.php.net/manual/en/function.array-push returns an int.
2. Inconsistency with possible new datastructures/methods

   If a `Map`/`Set` function were to be added, then methods for add/remove 
would return booleans (or the old value), not $this

3. Slight additional performance overhead for functionality I assume will be 
used relatively infrequently

   (php has to increment reference counts and opcache can't eliminate the 
opcode to decrease reference counts and possibly free the return value of 
`$deque->shift()` with the return type info being an object)
4.  Returning $this makes code easier to write at some cost to readability - 
Developers new to php or using `Collections\Deque` for the first time would not 
immediately know what the code they're reading is doing.
   (less of a problem with a good IDE, typechecker, and a typed codebase, but 
this isn't universal)
   Having it return void, `return $deque->push()` would be less common and this 
would force the meaning to be clear.

   Developers might have several guesses/assumptions based on their experience 
with other methods in php/elsewhere

   - It returns the new count (JavaScript Array.push, array_push)
   - It returns $this (Ruby)
   - It returns a lazy copy, like you'd wanted, not modifying the original
   - It's returning void and the code in question is shorthand for `return 
null`.
 (Python, C++ https://www.cplusplus.com/reference/vector/vector/push_back/ 
, offsetSet and spl push()/shift() methods)

> Also, typo:
>
> "By introducing a data structure (Deque) that's even faster and more memory 
> usage than an array for use as a double-ended queue, even more applications 
> would benefit from it. "
>
> I think you mean "less memory usage", or possibly "more memory efficient", or 
> something like that.

Thanks, I've fixed that.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Adding `final class Deque` to PHP

2022-02-02 Thread tyson andre

Hi Mel Dafert,

> >The proposed API switches between terms `front`, `back`, `start` and `end` 
> >in comments - is there meant to be a conceptual difference between 
> >front/start and end/back ?
> 
> On a similar note, why are the methods for getting the first/last value called
> `top()`/`bottom()`? Off the top of my head, it is hard for me to imagine 
> which side is
> the top and which side is the bottom.
> I would prefer if it was called something more intuitive, possibly 
> `first()`/`last()` in
> accordance with `array_key_first()`/`array_key_last()`.

I've changed it to first/last in https://wiki.php.net/rfc/deque

I'd forgotten about first/last when looking for existing names php used for 
methods/global functions.
The closest I'd seen was in SplDoublyLinkedList, I'd forgotten about 
https://php.net/array_key_last getting added in php 7.3
(due to it existing for keys but not values)


Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: Adding `final class Deque` to PHP

2022-02-01 Thread tyson andre

Hi internals,

> I've created a new RFC https://wiki.php.net/rfc/deque to add a `final class 
> Deque`
> 
> This is based on the `Teds\Deque` implementation I've worked on
> for the https://github.com/TysonAndre/pecl-teds PECL.
> 
> While `SplDoublyLinkedList` and its subclass `SplQueue`/`SplStack` exist in 
> the SPL, they have several drawbacks
> that are addressed by this RFC to add a `Deque` class (to use instead of 
> those):
> 
> 1. `SplDoublyLinkedList` is internally represented by a doubly linked list,
>    making it use roughly twice as much memory as the proposed `Deque`
> 2. `push`/`pop`/`unshift`/`shift` from `SplDoublyLinkedList` are slower due to
>    needing to allocate or free the linked list nodes.
> 3. Reading values in the middle of the `SplDoublyLinkedList` is proportional 
> to the length of the list,
>    due to needing to traverse the linked list nodes.
> 4. `foreach` Iteration behavior cannot be understood without knowing what 
> constructed the
>    `SplDoublyLinkedList` instance or set the flags.
> 
> It would be useful to have an efficient `Deque` container in the standard 
> library
> to provide an alternative without those drawbacks,
> as well as for the following reasons:
> 
> 1. To save memory in applications or libraries that may need to store many 
> lists of values or run for long periods of time.
>    Notably, PHP's `array` type will never release allocated capacity.
>    See 
> https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
> 2. To provide a better alternative to `SplDoublyLinkedList`, `SplStack`, and 
> `SplQueue`
>    for use cases that require stacks or queues.
> 3. As a more efficient option than `array` and `SplDoublyLinkedList`
>    as a queue or `Deque`, especially for `unshift`.
> 
> A `Deque` is more efficient than an `array` when used as a queue, more 
> readable, and easier to use correctly.
> While it is possible to efficiently remove elements from the start of an 
> `array` (in terms of insertion order) (though this makes 
> reset()/array_key_first() inefficient),
> it is very inefficient to prepend elements to the start of a large `array` 
> due to needing to either copy the array
> or move all elements in the internal array representation,
> and an `array` would use much more memory than a `Deque` when used that way 
> (and be slower).
> 
> There are also several pitfalls to using an array as a queue for larger queue 
> sizes,
> some of which are not obvious and discovered while writing the benchmarks.
> (Having a better (double-ended) queue datastructure (`Deque`) than the 
> `SplDoublyLinkedList`
> would save users from needing to write code with these pitfalls):
> 
> 1. `array_key_first()` and reset()`takes time proportional to the number of 
> elements `unset` from the start of an array,
>    causing it to unexpectedly be extremely slow (quadratic time) after 
> unsetting many elements at the start of the queue.
>    (when the array infrequently runs out of capacity, buckets are moved to 
> the front)
> 2. `reset()` or `end()` will convert a variable to a reference,
>    and php is less efficient at reading or writing to reference.
>    Opcache is also less efficient at optimizing uses of variables using 
> references.
> 3. More obviously, `array_unshift` and `array_shift` will take time 
> proportional to the number of elements in the array
>    (to reindex and move existing/remaining elements).

I plan to start voting on https://wiki.php.net/rfc/deque on Friday, February 
4th.

Several changes have been made to https://wiki.php.net/rfc/deque#changelog
after the feedback in https://externals.io/message/116100

- The class is now named `Collections\Deque`
- The api documentation in https://wiki.php.net/rfc/deque#proposal was expanded 
for methods.
- Benchmarks were updated.
- Like other standard datastructures, iteration over the deque is now over the 
original object (instead of creating a copy), 
  and mutating the deque will be reflected in `$iterator->current()` (and 
moving the end with push()/pop() will affect where iteration ends).
- Iteration will account for calls to shift/unshift moving the start of the 
deque.
  the offsets will be corrected and values won't be skipped or iterated over 
multiple times.
  (no matter how many iterators were created by `Deque->getIterator()`)
  See https://wiki.php.net/rfc/deque#iteration_behavior
- The get()/set() methods were removed, after feedback in 
https://externals.io/message/116100#116214

A WebAssembly demo is available at 
https://tysonandre.github.io/php-rfc-demo/deque/

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Adding `final class Deque` to PHP

2022-02-01 Thread tyson andre

Hi Levi Morrison,

> I think this RFC is in much better shape now.
> 
> The last thing I'll personally push for is dropping `get` and `set`.
> I'm not sure about those names, and the functionality is already
> provided by `offsetGet` and `offsetSet`, albeit through `mixed`
> instead of `int`, but I think this sort of cleanup should be done en
> masse at some point, rather than having this one type doing something
> different from the others.

The get/set methods have been removed. I've updated the RFC and implementation.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Adding `final class Deque` to PHP

2022-01-31 Thread tyson andre

> > The name "deque" is used in the standard library of these languages:
> >
> >  - C++: std::deque
> >  - Java: java.util.Deque (interface)
> >  - Python: collections.deque
> >  - Swift: Collections.Deque (not standard lib, apparently, but Apple
> > official? Don't know Swift)
> >  - Rust: std::collections::VecDeque
> >
> > And these don't have it in the standard library:
> >  - Go
> >  - C#
> >  - C
> >  - JavaScript
> >
> > Anyway, it's pretty decent evidence that:
> >  1. This functionality is pretty widely used across languages.
> >  2. This functionality should have "deque" be in the name, or be the
> > complete name.
> >
> > Discussion naming for "vector" I can understand, as it's less widely
> > used or sometimes means something specific to numbers, but there's
> > little point in discussing the name "deque." It's well established in
> > programming, whether PHP programmers are aware of it or not.
> >
> > As I see it, the discussion should be centered around:
> >  1. The API Deque provides.
> >  2. The package that provides it.
> >  3. Whether Deque's API is consistent with other structures in the same
> > package.
> >  4. Whether this should be included in php-src or left to extensions.
> >
> > I suggest that we try to make PHP as homogenous in each bullet as we
> > can with other languages:
> >  1. Name it "Deque."
> >  2. Put it in the namespace "Collections" (and if included in core,
> > then "ext/collections").
> >  3. Support common operations on Deque (pushing and popping items to
> > both front and back, subscript operator works, iteration, size, etc)
> > and add little else (don't be novel here). To me, the biggest thing
> > left to discuss is a notion of a maximum size, which in my own
> > experience is common for circular buffers (the implementation chosen
> > for Deque) but not all languages have this.
> >  4. This is less clear, but I'm in favor as long as we can provide a
> > few other data structures at the same time. Obviously, this a voter
> > judgement call on the yes/no.
> >
> > Given that I've suggested the most common options for these across
> > many languages, it shouldn't be very controversial. The worst bit
> > seems like picking the namespace "Collections" as this will break at
> > least one package: https://github.com/danielgsims/php-collections. We
> > should suggest that they vendor it anyway, as "collections" is common
> > e.g. "Ramsey\Collections", "Doctrine\Common\Collections". I don't see
> > a good alternative here to "Collections", as we haven't agreed on very
> > much on past namespace proposals.
> >
> > Anyway, that's what I suggest.
> >
> 
> I generally agree with everything Levi has said here. I think that adding a
> deque structure generally makes sense, and that putting it into a new
> ext/collections extension under the corresponding namespace would be
> appropriate. I wouldn't insist on introducing it together with other
> structures (I'm a lot more sceptical about your Vector proposal), but do
> think there should be at least some overarching plan here, even if we only
> realize a small part of it in this version.
 
https://wiki.php.net/rfc/deque has been updated after 
https://wiki.php.net/rfc/deque_straw_poll.
It's now going in the namespace `Collections\`, and a new always-enabled 
extension `collections` 
is added in `php-src/ext/collections` in that RFC (for end users, that would 
mainly affect php.net manual organization).

I plan to start voting in a few days.

Also, iteration behavior has changed to 
https://wiki.php.net/rfc/deque#iteration_behavior

I also set up a WebAssembly demo to make it easier to look up how it currently 
behaves:
https://tysonandre.github.io/php-rfc-demo/deque/

> A few questions for the particular API proposed here:
> 
> 1. There would be the possibility of having an interface Deque that is
> backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
> would actually make sense, but I wanted to throw it out there, given that
> the class is final (which is without any doubt the correct decision).

Would `ArrayDeque` be referring to something with a hardcoded maximum size?
I can't think of a strong use case for that, and it's possible in userland by 
wrapping `Collections\Deque` (with composition) and checking size on add 
operations.

> 2. How do set() / offsetSet() work with an out of range index? Is it
> possible to push an element with $deque[count($deque)] = x? I would assume
> that throws, but want to make sure. On a related note, why do we need both
> get()/set() and offsetGet()/offsetSet()? These APIs seem redundant.

They throw for out of range indexes. Offsets between 0 and count-1 are in range.

offsetSet/offsetGet attempt to coerce values to integers to act as a drop-in 
replacement 
for arrays and Spl datastructures (e.g. for applications that deal with string 
inputs).
They throw TypeError if coercion fails, OutOfBoundsException for invalid 
offsets.

> 3. I believe it's pretty standard for Deque

[PHP-DEV] [VOTE] Straw poll: Naming pattern to use for Deque RFC

2022-01-12 Thread tyson andre

Hi internals,

Voting has started on https://wiki.php.net/rfc/deque_straw_poll , to gather 
feedback on the following options:

1. `\Deque`, the name currently used in the RFC/implementation.
   See https://wiki.php.net/rfc/deque#global_namespace
   This was my preference because it was short,
   making it easy to remember and convenient to use.
2. `\Collections\Deque` - seems like a reasonable choice of name for 
collections 
   (Deque, and possible future additions such as Vector, Set, Map, Sorted 
Sets/Maps, etc.
   https://wiki.php.net/rfc/namespaces_in_bundled_extensions also allows using 
sub-namespaces 
   and that may be used for things that aren't strictly collections,
   e.g. `Collections\Builder\SomethingBuilder` or functions operating on 
collections.
3. `\SplDeque`, similar to datastructures added to the Spl in PHP 5.3.

   (I don't prefer that name because SplDoublyLinkedList, SplStack,
   and SplQueue are subclasses of a doubly linked list with poor performance
   (accessing the offset in the middle of a linked list requires traversing 
half the linked list, for example),
   and this name would easily get confused with them (e.g. leading to renamings 
unexpectedly making performance much worse). 
   Also, historically, none of the functionality with that naming pattern has 
been final.
   However, good documentation (e.g. suggesting `*Deque` instead where possible 
in the manual)
   would make that less of an issue.)
   
Previous threads:
- https://externals.io/message/116112 “(Planned) Straw poll: Naming pattern for 
`*Deque`”
- https://externals.io/message/116100 Adding `final class Deque` to PHP

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] (Planned) Straw poll: Naming pattern for `*Deque`

2022-01-11 Thread tyson andre

Hi Pierre,

> > While there is considerable division in whether or not members of internals 
> > want to adopt namespaces,
> > I hope that the final outcome of the poll will be accepted by members of 
> > internals
> > as what the representative of the majority of the members of internals
> > (from diverse backgrounds such as contributors/leaders of userland 
> > applications/frameworks/composer libraries written in PHP,
> > documentation contributors, PECL authors, php-src maintainers, etc. (all of 
> > which I expect are also end users of php))
> > want to use as a naming choice in future datastructure additions to PHP.
> > (and I hope there is a clear majority)
> >
> > -
> >
> > Are there any other suggestions to consider for namespaces to add to the 
> > straw poll?
> >
> > Several suggestions that have been brought up in the past are forbidden by 
> > the accepted policy RFC 
> > (https://wiki.php.net/rfc/namespaces_in_bundled_extensions
> PHP: rfc:namespaces_in_bundled_extensions
> Classes and functions provided by bundled PHP extensions are currently all 
> located in the global namespace (with one exception). There is a strong 
> sentiment that future additions to PHP's standard library should make use of 
> namespaces, to the point that otherwise unrelated proposals increasingly 
> degenerate into namespace-related discussions.
> wiki.php.net
> )
> > and can't be used in an RFC.
> >
> > - `Spl\`, `Core\`, and `Standard\` are forbidden: "Because these extensions 
> > combine a lot of unrelated or only tangentially related functionality, 
> > symbols should not be namespaced under the `Core`, `Standard` or `Spl` 
> > namespaces.
> >   Instead, these extensions should be considered as a collection of 
> > different components, and should be namespaced according to these."
> > - More than one namespace component (`A\B\`) is forbidden
> > - Namespace names should follow CamelCase.
> Besides the namespace thing (collection is fine imho). What is the
> reason to have it final?

Do you want singular Collection included as an option in addition to plural in 
https://wiki.php.net/rfc/deque_straw_poll ?

The reasons it was a `final` class in this RFC was described in 
https://wiki.php.net/rfc/deque#final_class.
It's easier to change a `final` class to a non-final class with final methods 
if needed later on.
(final methods so that array access, etc. continues to be fast, easy to reason 
about, bug/crash-free, etc)

> For collection in general, would it make sense to have a common
> interface representing the minimum expected API? If possible, then
> algorithm specific on top? a bit like we have with the traversable
> interface and related.


php-ds does this as https://www.php.net/manual/en/class.ds-collection.php and 
I've been considering it

Still,
- With union types and intersection types, it's still useful but isn't as 
compelling.
- There's the choice of namespacing to consider for the new namespace 
(`Collection` vs `Collections\Collection`).
- It didn't seem as useful until there were more datastructures to choose from 
and situations where more than one would be chosen.
- It couldn't be used until support for php <= 8.1 was dropped by 
applications/libraries, so it'd take a while to be adopted.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: (Planned) Straw poll: Naming pattern for `*Deque`

2022-01-09 Thread tyson andre

Hi internals,

> Because the naming choice for new datastructures is a question that has been 
> asked many times,
> I plan to create another straw poll (Single transferrable vote) on 
> wiki.php.net to gather feedback on the naming pattern to use for future 
> additions of datastructures to the SPL,
> with the arguments for and against the naming pattern.
> 
> https://wiki.php.net/rfc/namespaces_in_bundled_extensions recently passed.
> It permits using the same namespace that is already used in an extension,
> but offers guidance in choosing namespace names and allows for using 
> namespaces in new categories of functionality.
> 
> The planned options are:
> 
> 1. `\Deque`, the name currently used in the RFC/implementation. See 
> https://wiki.php.net/rfc/deque#global_namespace
> 
>    This was my preference because it was short, making it easy to remember 
> and convenient to use.
> 2. `\SplDeque`, similar to datastructures added to the `Spl` in PHP 5.3.
> 
>    (I don't prefer that name because `SplDoublyLinkedList`, `SplStack`, and 
> `SplQueue` are subclasses of a doubly linked list with poor performance,
>    and this name would easily get confused with them. Also, historically, 
> none of the functionality with that naming pattern has been final.
>    However, good documentation (e.g. suggesting `*Deque` instead where 
> possible in the manual) would make that less of an issue.)
> 
>    See https://wiki.php.net/rfc/deque#lack_of_name_prefix (and arguments for 
> https://externals.io/message/116100#116111)
> 3. `\Collection\Deque` - the singular form is proposed because this might 
> grow long-term to contain not just collections,
>    but also functionality related to collections in the future(e.g. helper 
> classes for building classes
>    (e.g. `ImmutableSequenceBuilder` for building an `ImmutableSequence`), 
> global functions, traits/interfaces,
>    collections of static methods, etc.
>    (especially since 
> https://wiki.php.net/rfc/namespaces_in_bundled_extensions prevents more than 
> one level of namespaces)
> 
>    Additionally, all existing extension names in php-src are singular, not 
> plural. https://github.com/php/php-src/tree/master/ext 
>    (Except for `sockets`, but that defines `socket_*` and `class Socket` and 
> I'd assume it would be named `Socket\` anyway, the rfc didn't say exactly 
> match?)
> 
>    So the namespace's contents might not just be `Collections`, but rather 
> all functionality related to a `Collection`)
>    Also, the examples in the "namespaces in bundled extension" RFC were all 
> singular
> 
>    > For example, the `array_is_list()` function added in PHP 8.1 should 
> indeed be called `array_is_list()`
>    > and should not be introduced as `Array\is_list()` or similar.
>    > Unless and until existing `array_*()` functions are aliased under an 
> Array\* namespace,
>    > new additions should continue to be of the form `array_*()` to maintain 
> horizontal consistency.
>
>**NOTE: This later was changed to `Collections\\`, I misread the 
> namespaces in bundled extensions RFC, and sub-namespaces are allowed**
> 
>    See https://wiki.php.net/rfc/deque#global_namespace (and 
> https://externals.io/message/116100#116111)
> 
>    Also, straw polls for other categories of functionality 
> (https://wiki.php.net/rfc/cachediterable_straw_poll#namespace_choices) 
>    had shown interest of around half of voters in adopting namespaces,
>    there was disagreement about the best namespace to use (e.g. none that 
> were preferred to the global namespace),
>    making me hesitant to propose namespaces in any RFC. For an ordinary 
> collection datastructure, the situation may be different.
> 
> While there is considerable division in whether or not members of internals 
> want to adopt namespaces,
> I hope that the final outcome of the poll will be accepted by members of 
> internals 
> as what the representative of the majority of the members of internals 
> (from diverse backgrounds such as contributors/leaders of userland 
> applications/frameworks/composer libraries written in PHP,
> documentation contributors, PECL authors, php-src maintainers, etc. (all of 
> which I expect are also end users of php))
> want to use as a naming choice in future datastructure additions to PHP.
> (and I hope there is a clear majority)
> 
> -
> 
> Are there any other suggestions to consider for namespaces to add to the 
> straw poll?
> 
> Several suggestions that have been brought up in the past are forbidden by 
> the accepted policy RFC 
> (https://wiki.php.net/rfc/namespaces_in_bundled_extensions)
> and can't be used in an RFC.
> 
> - `Spl\`, `Core\`, and `Standard\` are forbidden: "Because these extensions 
> combine a lot of unrelated or only tangentially related functionality, 
> symbols should not be namespaced under the `Core`, `Standard` or `Spl` 
> namespaces.
>   Instead, these extensions should be considered as a collection of different 
> components, and should

Re: [PHP-DEV] Cache zend_function*

2021-11-30 Thread tyson andre


Hi Glash Gnome,

> I'm doing the Cairo C API extension.
> Also there is a wrapper written in php for the OOP side( example:
> https://github.com/gtkphp/gtkphp/blob/main/lib/Cairo/Context.php)
> 
> So far, so good.
> 
> Now let's do the same thing with Gtk,
> (https://github.com/gtkphp/gtkphp/blob/main/lib/Gtk/Widget.php#L7)
> Luckily I can *store zend_function* pointer in the GtkWidgetClass*( C-like
> OOP)
> 
> Finally, I do the same thing for GHashTable( C API + php OOP)
> But now I need to *use a global zend_array/hash to store the overridden
> methods*
> for the same reasons as https://github.com/php/php-src/pull/7695
> 
> 
> I think it is better( more generic, simple to understand) to *overload the
> zend_class_entry* .
> 
> Do you think this is a good idea?
> Is this possible ?
> Do you have a solution for me?

Are you talking about all methods or just ArrayAccess?

If you're talking about 
https://github.com/gtkphp/php-ext-gtk-src/blob/master/php_glib/hash-table.c
it's possible to associate the table of overridden methods with the instance of 
the object,
and look it up in a C property of `intern->methods` to call the overridden 
method.

- If you're talking about avoiding doing a hash table lookup on every method 
call to an instance, you can use a `methods` property.

https://github.com/php/php-src/blob/PHP-8.1.0/ext/spl/spl_fixedarray.c#L241-L301
 does that - see create_object, spl_fixedarray_new, and `spl_fixedarray_object 
*intern;`
(I implemented that in https://github.com/php/php-src/pull/6552)

It turns out review comments mentioned something similar about `ArrayAccess`.
At the time I wasn't as familiar with how it'd be done for all classes and work 
with inheritance, though I think it's possible.

- If `arraylike_funcs_ptr` were added to php for ArrayAccess, 
`instance->ce->arrayaccess_funcs_ptr->offsetget->scope != my_base_class_entry` 
could be used to check if the internal implementation was overridden

> Ideally these methods would be cached in the class_entry, but that's a larger 
> change.


```c
// Declared in Zend/zend.h
// Initialized in Zend/zend_interfaces.c
// 
/* allocated only if class implements Iterator or IteratorAggregate 
interface */
zend_class_iterator_funcs *iterator_funcs_ptr;
```

Regards,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] PHP 8.1 and PECL ext builds for Windows

2021-11-14 Thread tyson andre

Hi internals,

> > Removing the centralized PECL builder and dependency manager would most 
> > likely lead to a huge regression in the support and manageability. Right 
> > now there's one place pecl.php.net to go for the non core extension builds 
> > and any dependencies are guaranteed to be non conflicting. If this gets 
> > decentralized, the effort is moved to the extension maintainers which will 
> > most likely mean the chaos in where to get a DLL, DLL hell issues, absent 
> > DLL because the configuration is hard. This will steadily lead to the 
> > situation that was there before.
> >
> > IMO even keeping the basic version of the centralized approach even having 
> > a sporadic chance to fix issues is a far better way to go than dropping the 
> > existing achievements. Also in the long run, other approaches like moving 
> > to vcpkg for deps, checking on other things like cmake and pickle might be 
> > a good way, if there's  a community interest. More volunteers on the 
> > community side would be great in this sense, too.
> 
> Good points, thank you for bringing them up!  I have to fully agree that
> we should not drop the central point of distribution (i.e.
> windows.php.net).  I don't think, however, that we can stick with the
> current PECL build system for long.  Maybe the biggest issue is that
> extension maintainers may see automatic DLL builds as a given, or at
> least may not be able to fix things, because only few have access to the
> build machine.  And even if that was not an issue, not many more would
> know where to look at.  In other words, the bus factor is very low, and
> it may happen at some point in time, that no new DLLs would be built for
> *any* extension.
> 
> This is why I still think it would be good to shift some of the burden
> of maintaining Windows builds to extension maintainers is a good thing.
>  It is not about making their job harder, but rather about preventing
> serious issues, and also to correct expectations; extension maintainers
> might well assume that their extension is supported on common Linux
> distros, but they shouldn't *assume* it is supported on Windows as well
> (let alone that the dependency libraries have fixes for all known
> relevant security issues).
> 
> Even if extensions are developed solely on Linux (and most are, as far
> as I know), they should have some Windows CI (at least doing the actual
> builds; better to run the test suite as well, of course), and that
> shouldn't be a real problem – there are several CI providers which are
> free for OSS projects.  We should do our best to provide them with
> appropriate tools, so Windows CI integration can be set up as easily as
> for Linux phpize builds.  That would not solve the issues regarding
> dependencies, but appears to be a reasonable first step in the right
> direction.

With the release for php 8.1.0 stable happening Nov 25 
(https://wiki.php.net/todo/php81),
what decision ended up being made (I couldn't tell if it was still being 
discussed from the thread)?
To publish Windows DLLs for PECLs for PHP 8.1 after 8.1.0 stable or not to 
publish?
I saw the proposal but didn't see any public announcement of plans,
and usually DLLs had been published earlier.

- If there are plans to get 8.1.0 working, what work is remaining (e.g. is 
there an issue tracker/list of tasks)?
  (I'm a Linux user, but I'd hope even if the windows team didn't have time, 
developers from large organizations may have time to look into those issues or 
get builds for individual extensions working, if those organizations used 
Windows and were migrating to php 8.1)

https://externals.io/message/114751#114759 sounded like there were plans to
build DLLs for PECLs with GitHub workflows instead of the current machine,
but I'm not sure of the status of those plans.

If php 8.1 DLL support was being dropped, I saw nothing communicating a change 
(or status of getting DLL builds+publishing working for php 8.1)
in the following places:

- https://news-web.php.net/group.php?group=php.internals.win
- https://marc.info/?l=pecl-dev=2=1=dll=b
- https://windows.php.net/ ("Where are the PECL DLLs" is an unrelated 
announcement that was resolved)
- https://twitter.com/official_php

As an arbitrary example, https://pecl.php.net/package/xdebug/3.1.1/windows 
mentions 
"In case of missing DLLs, consider to contact the PHP for Windows Team."

- If DLL publishing would end up being discontinued (or delayed) for PHP 8.1+, 
the footer common to all PECLs should be updated to indicate that.

I was delaying working on publishing DLLs for PECL releases until I was certain 
what the decision was,
or if DLLs would continue to be published.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Adding `final class Deque` to PHP

2021-10-04 Thread tyson andre

Hi Nikita Popov,

> 1. There would be the possibility of having an interface Deque that is
> backed by a VecDeque/ArrayDeque implementation. I'm not convinced this
> would actually make sense, but I wanted to throw it out there, given that
> the class is final (which is without any doubt the correct decision).

Do you mean ArrayDeque for a hardcoded max capacity?
I'm also not convinced there's a use case.

> 2. How do set() / offsetSet() work with an out of range index? Is it
> possible to push an element with $deque[count($deque)] = x? I would assume
> that throws, but want to make sure. On a related note, why do we need both
> get()/set() and offsetGet()/offsetSet()? These APIs seem redundant.

It isn't possible to set an out of range index - both throws 
`\OutOfBoundsException`

In order to support the ArrayAccess `$deque[$offset] = $value;` or 
`$deque[$offset]` shorthand,
the offsetGet/offsetSet needed to be implemented to follow conventions.
(because of ArrayAccess, offsetGet must accept mixed offsets)

Those aren't type safe, though, so get()/set() are provided for a type safe 
alternative
that will throw a TypeError for use cases that would benefit from runtime type 
safety.

> 3. I believe it's pretty standard for Deque implementations to provide
> insert() / remove() methods to insert at any position (these would be O(n)).

https://www.php.net/manual/en/class.splqueue.php didn't offer that 
functionality. 

https://www.php.net/manual/en/class.ds-deque.php did, apparently.

> 4. The design of getIterator() here is rather unusual, and deserves some
> additional justification. Normally, we let getIterator() see concurrent
> modifications to the structure, though the precise behavior is unspecified.
> I would also like to know how this will look like on a technical level (it
> doesn't seem to be implemented yet?) This seems like something that will
> require a check on every write operation, to potentially separate the
> structure in some form.
 
The original plan was to copy the entire array on iterator creation,
to imitate the immediate copy nature of foreach on arrays.

This was assuming that `foreach` over a Deque without removing elements would 
be a rare use case.

That may have been a mistake since `foreach (clone $deque as $key => $value)` 
can be done to explicitly do that.
There're around 4 approaches I could take with different tradeoffs

1. Iterate over $offset = 0 and increment offset in calls to 
InternalIterator->next() until exceeding the size of the deque, not copying the 
deque.

   That's the **actual** current implementation, but would be unintuitive with 
shift()/unshift()

   This would repeat elements on unshift(), or skip over elements when shift() 
is called.

   The current implementation of `Ds\Deque` seems to do the same thing, but 
there's a similar discussion in its issue tracker in 
https://github.com/php-ds/ext-ds/issues/17


2. Similar iteration behavior, but also have it relative to a uint64 indicating 
the number of times elements were added to the front of the deque minus the 
number of elements were removed from the back of the deque.

    (e.g. if the current Deque internalOffset is 123, the InternalIterator 
returned by getOffset would start with an offset of 123 and increase it when 
advancing.
    Calls to shift would raise the Deque internalOffset, calls to unshift would 
decrease it (by the number of elements).
    This would be independent of the circular buffer offset.
    And the InternalIterator would increase the internalOffset to catch up if 
the signed offset difference became negative, e.g. by calling shift() twice in 
a foreach block)

3. Behave as if the entire Deque was copied when iteration starts (either by 
actually copying everything or by copy on write).

   That was the **planned** implementation documented in the RFC, but would be 
inefficient for iterations that end early and use a lot of memory for large 
Deques.

4. Throw if attempting to change the size of the `Deque` while a foreach is in 
progress.

   The semantics of that would be annoying, e.g. handling `clear()` or 
`shift()`, and the exception getting thrown would be unexpected.

> 5. The shift/unshift terminology is already used by our array API, but I'm
> not sure it's the most intuitive choice in the context of a deque. SplQueue
> has enqueue() and dequeue(). Another popular option from other languages
> (which I would personally favor) is pushFront() and popFront().

My original name idea was pushBack/popBack/pushFront/popFront but I had decided 
against proposing brand new terms
for functionality that already had terms in php-src.
It would also be inconsistent with proposed names for `Vector` if that was also 
added.

https://www.php.net/manual/en/class.ds-deque.php and SplDoublyLinkedList had 
done the same thing

enqueue and dequeue don't have a good term for enqueueing on the opposite side.

- It would make it easier to learn the language to have fewer terms and migrate

Re: [PHP-DEV] Unified ReflectionType methods

2021-10-02 Thread tyson andre

Hi Andreas,

> Hello list,
> I would like to propose new methods for ReflectionType, that would
> allow treating ReflectionNamedType and ReflectionUnionType in a
> unified way.
> This would eliminate the need for if (.. instanceof) in many use cases.
> 
> Some details can still be discussed, e.g. whether 'null' should be
> included in builtin type names, whether there should be a canonical
> ordering of type names, whether we should use class names as array
> keys, etc.
> ... 
> What do you think?

Relatedly, I also had different ideas lately about new methods for 
ReflectionType, though of a different form.

1. To simplify code that would check `instanceof` for all current and future 
types such as `never` and `mixed` and intersection types
`ReflectionType->allowsValue(mixed $value, bool $strict = true): bool`

   Maybe also `allowsClass(string $className, bool $strict = true): bool` to 
avoid needing to instantiate values (weak casting allows Stringable->string).
2. To simplify code generation, e.g. in mocking libraries for unit testing: 
`ReflectionType->toFullyQualifiedString(): string` (e.g. `\A|\B`) (may need to 
throw ReflectionType for types that can't be resolved, e.g. `parent` in 
reflection of traits, keep `static` as is)

(The raw output of `__toString()` isn't prefixed with `\` (e.g. `A`) and 
can't be used in namespaces

The fact that both intersection and union types (and possibility of union types 
of full intersection types)
make it hard for me to believe that getBuiltinTypes and getBuiltinClasses would 
be used correctly when used.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Allowing `(object)['key' => 'value']` in initializers?

2021-09-25 Thread tyson andre

Hey Marco Pivetta,

> > What are your thoughts on allowing the `(object)` cast in initializer types 
> > where `new` was already allowed, but only when followed by an array literal 
> > node. (e.g. continue to forbid `(object)SOME_CONSTANT`) (see 
> > https://wiki.php.net/rfc/new_in_initializers)
> > ...
> > Reasons:
> > - The ability to construct empty stdClass instances but not non-empty ones 
> > is something users would find surprising,
> >   and a lack of support for `(object)[]` be even more inconsistent if 
> > factory methods were allowed in the future.
> > - stdClass is useful for some developers, e.g. in unit tests, when using 
> > libraries requiring it for parameters,
> >   when you need to ensure data is encoded as a JSON `{}` rather than `[]`, 
> > etc.
> > - It would help developers write a clearer api contract for methods,
> >   e.g. `function setData(stdClass $default = (object)['key' => 'value'])`
> >   is clearer than `function setData(?stdClass $default = null) { $default 
> > ??= (object)['key' => 'value']; `
> > - stdClass may be the only efficient built-in way to represent objects with 
> > arbitrary names if RFCs such as https://externals.io/message/115800 passed

>   passed
>
> Right now, there's even an interest in getting rid (or deprecating) dynamic 
> properties on objects: why push the complete opposite ways?
> 
> What is the actual value of using an stdClass instance instead of an 
> `array` (already supported)?

My original message had a section with reasons why an end user might want that.

There's a push for getting rid of (or deprecating) dynamic properties on 
**objects that are not stdClass (or subclasses)**
Not a push for getting rid of stdClass. Way too many things use stdClass to get 
rid of stdClass.
(whether or not stdClass gets aliased or even renamed to DynamicObject).

```
php > var_dump(json_decode('{}'));
object(stdClass)#1 (0) {
}
```

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Allowing `(object)['key' => 'value']` in initializers?

2021-09-25 Thread tyson andre



Hi internals,

In PHP 8.1, it is possible to allow constructing any class name in an 
initializer, after the approval of https://wiki.php.net/rfc/new_in_initializers

```
php > static $x1 = new ArrayObject(['key' => 'value']);
php > static $x2 = new stdClass();
php > static $x3 = (object)['key' => 'value'];

Fatal error: Constant expression contains invalid operations in php shell code 
on line 1
```

What are your thoughts on allowing the `(object)` cast in initializer types 
where `new` was already allowed, but only when followed by an array literal 
node. (e.g. continue to forbid `(object)SOME_CONSTANT`) (see 
https://wiki.php.net/rfc/new_in_initializers)

stdClass has never implemented a factory method such as `__set_state` (which is 
not yet allowed). Instead, `(object)[]` or the `(object)array()` shorthand is 
typically used when a generic object literal is needed. This is also how php 
represents objects in var_export.

```
php > var_export(new stdClass());
(object) array(
)
```

Reasons:
- The ability to construct empty stdClass instances but not non-empty ones is 
something users would find surprising,
  and a lack of support for `(object)[]` be even more inconsistent if factory 
methods were allowed in the future.
- stdClass is useful for some developers, e.g. in unit tests, when using 
libraries requiring it for parameters,
  when you need to ensure data is encoded as a JSON `{}` rather than `[]`, etc.
- It would help developers write a clearer api contract for methods,
  e.g. `function setData(stdClass $default = (object)['key' => 'value'])`
  is clearer than `function setData(?stdClass $default = null) { $default ??= 
(object)['key' => 'value']; ` 
- stdClass may be the only efficient built-in way to represent objects with 
arbitrary names if RFCs such as https://externals.io/message/115800 
  passed

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Adding `final class Deque` to PHP

2021-09-21 Thread tyson andre

Hi Levi Morrison,

> > "Maximum size" shouldn't be an issue for `Deque` specifically, it's a 
> > `size_t`. It isn't an issue for SplFixedArray and ArrayObject.
> > (or for
> > PHP would just encounter a fatal error due to either
> 
> You wrote a lot, but unfortunately it was based on a misunderstanding.
> In some languages you can set the maximum allowed number of items a
> specific Deque can hold. For example (pseudo-code):
> 
>     let deque = new Deque(max_capacity: 3)
>     deque.push_back(1)
>     deque.push_back(2)
>     deque.push_back(3) # okay, it's now full
> 
>     deque.push_back(4) # !
> 
> In this condition, they either error or remove the earliest.
> 
> It's okay if the proposed Deque doesn't add this capability, but it's
> the only remaining major functionality which some languages have but
> not the others; it should at least be discussed, I think.

Oh, I hadn't remembered seeing that before. That makes sense.
All of the implementations I'd remembered were unlimited.
(https://cplusplus.com/reference/deque/deque/max_size/ was a system limit, for 
example)

I can't think of a common use case for that in php.
If there was one, though, I strongly believe it shouldn't be the same class 
(and shouldn't be a subclass),
in order to ensure the behavior of push or other operations remains easy to 
reason about.
This can be done in userland as a userland class.

- (e.g. with a Deque instance property and runtime `count() < 
$this->maxCapacity` checks,
  to choose their own desired return value or Throwable subclass (or manual 
array and circular buffer in PHP)

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-21 Thread tyson andre

Hi Mike Shinkel,

> >> Hmm. I must have missed that thread as I was definitely following the list 
> >> at that time. 
> >> 
> >> But I found the thread, which only had three (3) comments from others:
> >> 
> >> https://externals.io/message/112639
> >> 
> >> From Levi Morrison it seems his objection was to adding `push()` and 
> >> `pop()` to a class including the name "Fixed."  Levi suggested 
> >> soft-deprecating `SplStack` because it was implemented as a linked-list, 
> >> but he proposed adding `Spl\ArrayStack` or similar, so it seems he was 
> >> open to iterating on the `Spl` classes in general (no pun intended.) 
> >> 
> >> From Nikita is seemed that he did not object so much as comment on Levi's 
> >> suggestion of adding `Spl\ArrayStack` and suggested instead an `SqlDeque` 
> >> that would handle queue usage more efficiently that plain PHP arrays.
> >> 
> >> So I think those responses were promising, but that you did not followed 
> >> up on them. I mean no disrespect — we all get busy, our priorities change, 
> >> and things fall off our radar

I said that **in response to you suggesting adding functionality to 
`SplFixedArray`** as the reason why I am not adding functionality to 
`SplFixedArray`.
Those responses were promising for functionality that is not about 
`SplFixedArray`.

The `Vector` is a name choice. `SplArrayStack` and a `Vector` would have very 
similar performance characteristics and probably identical internal 
representations.
However, a more expansive standard library such as `contains`, `map`, `filter`, 
`reduce`, etc. makes more sense on a List/Vector
than a `Stack` if you're solely going based on the name - when you hear 
`Stack`, you mostly think of pushing or popping from it.

As you said also below in your followup response, I am following up on the 
suggestion for a `Deque`.

>  — but it feels to me like you might have more success pursing your use-cases 
> related to the `Spl` classes than via a pure `Vector` class.

It's hard to know which approach (namespaces such as Collection\, SplXyz, or 
short names) will succeed without actually creating an RFC.
I'd mentioned my personal reasons for expecting Spl not to be the best choice.
Any email discussion only has comments from a handful of people with different 
arguments and preferences,
and many times more people might vote on the final RFC

> > Experience in past RFCs gave me the impression that if:
> > 
> > 1. All of the responses are suggesting using a different approach(php-ds, 
> > arrays),
> > 2. Other comments are negative or uninterested.
> > 3. None of the feedback on the original idea is positive or interested in 
> > it.
> > 
> > When feedback was like that, voting would typically have mostly "no" 
> > results.
> 
> Understood, but for clarity I was implying that wanting to change 
> `SplFixedArray` was an "XY problem" and that maybe the way to address your 
> actually use-cases was to pursue other approaches that people were 
> suggesting, which _is_ what you did yesterday.  :-)
>
> >> Maybe propose an `SplVector` class that extends `SplFixedArray`, or 
> >> something similar that addresses the use-case and with a name that people 
> >> can agree on?
> > 
> > I'd be stuck with all of the features in `SplFixedArray` that get 
> > introduced later and its design deisions.
> 
> You wouldn't be stuck with all the feature of `SplFixedArray` if you did 
> "something similar." 

> (I make this point only as it seems you have dismiss one aspect of my 
> suggestion while not acknowledging the alternatives I present. Twice now, at 
> least.)

I'm not sure which of the multiple suggestions you brought up was  you're 
referring to as "something similar".
Your original suggestion I responded to was to modify "SplFixedArray",
I assumed you were suggesting that I change my RFC to focus on SplFixedArray,
I had the impression after feedback those changes to SplFixedArray would 
overwhelmingly fail especially due to being named "Fixed".

I don't consider making it a subclass of SplFixedArray a good design decision.
It's possible to invoke methods on base classes with `ReflectionMethod` so I 
can't make `Vector` a subclass of `SplFixedArray` with an entirely different 
implementation.
So any memory SplFixedArray wastes (e.g. to mitigate bugs already found or 
found in the future) is also wasted in any subclass of SplFixedArray.


- Additionally, this has the same problem as `SplDoublyLinkedList` and its 
subclasses.
  It prevents changing the internal representation of adding certain types of 
functionality if that wouldn't work with the base class.
- Additionally, this would make deprecating and removing `SplFixedArray` 
significantly harder or impractical,
  if it ever seemed appropriate in the future due to lack of use.

Additionally, I'm proposing a final class: SplFixedArray already exists and 
can't be converted to a final class because code already extends it.
See https://wiki.php.net/rfc/deque#final_class for the

Re: [PHP-DEV] Adding `final class Deque` to PHP

2021-09-21 Thread tyson andre

Hi Levi Morrison,

> The name "deque" is used in the standard library of these languages:
> 
>  - C++: std::deque
>  - Java: java.util.Deque (interface)
>  - Python: collections.deque
>  - Swift: Collections.Deque (not standard lib, apparently, but Apple
> official? Don't know Swift)
>  - Rust: std::collections::VecDeque
> 
> And these don't have it in the standard library:
>  - Go
>  - C#
>  - C
>  - JavaScript
>
> Anyway, it's pretty decent evidence that:
>  1. This functionality is pretty widely used across languages.
>  2. This functionality should have "deque" be in the name, or be the
> complete name.

Thanks for putting that together.

For anyone wondering about the languages that don't have it:
The first 3 are compiled languages so there's the same performance for a 
standard library 
and third-party library code written in those libraries. 
For C and Go, the standard libraries of C and Go are written in C and Go.
C and Go also have a very minimal standard library as a design goal.
Everything in C and Go is a "native library", so a third-party library and a 
standard library would have the same performance.
(The standard library of C and Go use or allow embedding assembly for 
platform-dependent functionality or optimizations. 
Third party libraries in C or Go can also use inline assembly 
https://stackoverflow.com/a/23796420)
(Not familiar with C#'s standard library, I assume it's similar?)

And browser vendors have put immense amounts of effort on optimizing JavaScript
and the JIT compilers for JavaScript for low memory usage,
supporting inlining, working efficiently on various platforms, etc.

> Discussion naming for "vector" I can understand, as it's less widely
> used or sometimes means something specific to numbers, but there's
> little point in discussing the name "deque." It's well established in
> programming, whether PHP programmers are aware of it or not.

Yes, I'd agree.

It's a well established term in programming, and using a non-standard or more 
verbose term
would make it harder to find/remember this functionality for users moving to 
php from 
other programming languages, 
or for users moving from php to other programming languages.

Learning programming inevitably has unfamiliar terms such as `array`, `printf`, 
etc.
that are not commonly used in mathematics or daily life.

> As I see it, the discussion should be centered around:
>  1. The API Deque provides.
>  2. The package that provides it.
>  3. Whether Deque's API is consistent with other structures in the same 
> package.
>  4. Whether this should be included in php-src or left to extensions.
> 
> I suggest that we try to make PHP as homogenous in each bullet as we
> can with other languages:
>  1. Name it "Deque."
>  2. Put it in the namespace "Collections" (and if included in core,
> then "ext/collections").
>  3. Support common operations on Deque (pushing and popping items to
> both front and back, subscript operator works, iteration, size, etc)
> and add little else (don't be novel here). To me, the biggest thing
> left to discuss is a notion of a maximum size, which in my own
> experience is common for circular buffers (the implementation chosen
> for Deque) but not all languages have this.
>  4. This is less clear, but I'm in favor as long as we can provide a
few other data structures at the same time. Obviously, this a voter
judgement call on the yes/no.

I had planned on a minimal API for Deque.

"Maximum size" shouldn't be an issue for `Deque` specifically, it's a `size_t`. 
It isn't an issue for SplFixedArray and ArrayObject.
(or for 
PHP would just encounter a fatal error due to either

1. Hitting the `memory_limit` or available memory limit
while constructing or appending to the `Deque`
2. Seeing a fatal error due to integer overflow, which would only matter on 
32-bit php.
   (The `Deque` API doesn't have a setSize)
   The `safe_erealloc` macros allow you to do that.

For additions such as `Vector` and `Deque`, because we **can** choose a 
`size_t` (type large enough to store any size of memory) (and because 32-bit 
systems are increasingly rare),
I currently think always emitting an uncatchable fatal error (and exiting 
immediately) for impossibly large values
would be the most consistent (e.g. applications wouldn't reach `catch 
(Throwable $t)` 
blocks meant for a different purpose in an unexpected way, if they failed to 
validate inputs).
This is already done in array and SPL types.

```
php > var_dump(new SplFixedArray(PHP_INT_MAX));
Fatal error: Possible integer overflow in memory allocation 
(9223372036854775807 * 16 + 0) in php shell code on line 1

php > var_dump(array_fill(0, 2**31 - 2, null));

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to 
allocate 68719476744 bytes) in php shell code on line 1
php > var_dump(array_fill(0, 2**31, null));

Warning: Uncaught ValueError: array_fill(): Argument #2 ($count) is too large 
in php shell code:1
...

// Without memory_limit
php >

[PHP-DEV] (Planned) Straw poll: Naming pattern for `*Deque`

2021-09-20 Thread tyson andre

Hi internals,

Because the naming choice for new datastructures is a question that has been 
asked many times,
I plan to create another straw poll (Single transferrable vote) on wiki.php.net 
to gather feedback on the naming pattern to use for future additions of 
datastructures to the SPL,
with the arguments for and against the naming pattern.

https://wiki.php.net/rfc/namespaces_in_bundled_extensions recently passed.
It permits using the same namespace that is already used in an extension,
but offers guidance in choosing namespace names and allows for using namespaces 
in new categories of functionality.

The planned options are:

1. `\Deque`, the name currently used in the RFC/implementation. See 
https://wiki.php.net/rfc/deque#global_namespace

   This was my preference because it was short, making it easy to remember and 
convenient to use.
2. `\SplDeque`, similar to datastructures added to the `Spl` in PHP 5.3.

   (I don't prefer that name because `SplDoublyLinkedList`, `SplStack`, and 
`SplQueue` are subclasses of a doubly linked list with poor performance,
   and this name would easily get confused with them. Also, historically, none 
of the functionality with that naming pattern has been final.
   However, good documentation (e.g. suggesting `*Deque` instead where possible 
in the manual) would make that less of an issue.)

   See https://wiki.php.net/rfc/deque#lack_of_name_prefix (and arguments for 
https://externals.io/message/116100#116111)
3. `\Collection\Deque` - the singular form is proposed because this might grow 
long-term to contain not just collections,
   but also functionality related to collections in the future(e.g. helper 
classes for building classes
   (e.g. `ImmutableSequenceBuilder` for building an `ImmutableSequence`), 
global functions, traits/interfaces,
   collections of static methods, etc.
   (especially since https://wiki.php.net/rfc/namespaces_in_bundled_extensions 
prevents more than one level of namespaces)

   Additionally, all existing extension names in php-src are singular, not 
plural. https://github.com/php/php-src/tree/master/ext 
   (Except for `sockets`, but that defines `socket_*` and `class Socket` and 
I'd assume it would be named `Socket\` anyway, the rfc didn't say exactly 
match?)

   So the namespace's contents might not just be `Collections`, but rather all 
functionality related to a `Collection`)
   Also, the examples in the "namespaces in bundled extension" RFC were all 
singular

   > For example, the `array_is_list()` function added in PHP 8.1 should indeed 
be called `array_is_list()`
   > and should not be introduced as `Array\is_list()` or similar.
   > Unless and until existing `array_*()` functions are aliased under an 
Array\* namespace,
   > new additions should continue to be of the form `array_*()` to maintain 
horizontal consistency.

   See https://wiki.php.net/rfc/deque#global_namespace (and 
https://externals.io/message/116100#116111)

   Also, straw polls for other categories of functionality 
(https://wiki.php.net/rfc/cachediterable_straw_poll#namespace_choices) 
   had shown interest of around half of voters in adopting namespaces,
   there was disagreement about the best namespace to use (e.g. none that were 
preferred to the global namespace),
   making me hesitant to propose namespaces in any RFC. For an ordinary 
collection datastructure, the situation may be different.

While there is considerable division in whether or not members of internals 
want to adopt namespaces,
I hope that the final outcome of the poll will be accepted by members of 
internals 
as what the representative of the majority of the members of internals 
(from diverse backgrounds such as contributors/leaders of userland 
applications/frameworks/composer libraries written in PHP,
documentation contributors, PECL authors, php-src maintainers, etc. (all of 
which I expect are also end users of php))
want to use as a naming choice in future datastructure additions to PHP.
(and I hope there is a clear majority)

-

Are there any other suggestions to consider for namespaces to add to the straw 
poll?

Several suggestions that have been brought up in the past are forbidden by the 
accepted policy RFC (https://wiki.php.net/rfc/namespaces_in_bundled_extensions)
and can't be used in an RFC.

- `Spl\`, `Core\`, and `Standard\` are forbidden: "Because these extensions 
combine a lot of unrelated or only tangentially related functionality, symbols 
should not be namespaced under the `Core`, `Standard` or `Spl` namespaces.
  Instead, these extensions should be considered as a collection of different 
components, and should be namespaced according to these."
- More than one namespace component (`A\B\`) is forbidden
- Namespace names should follow CamelCase.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-20 Thread tyson andre

Hi Peter Bowyer,

> That is a fair point. Vector is an overloaded and common word. For me a
> vector will always default to an entity characterized by a magnitude and a
> direction, because that's what I learned and used for years. The next
> definition I learned was the Numpy one.
> 
> That for me is the sticking point if this Vector allows mixed types which
> include arrays or vectors. Store them inside a Vector and then you end up
> with a matrix, a tensor and so-on in something identified as a Vector,
> which is nonsense. Yes C++ does that [1]. Yes with generics it sort-of
> makes sense. Numpy gets round it by calling the type `ndarray` and a vector
> is a specialised one-dimensional array.
> 
> If it's a high-performance array and that's the goal, call it hparray. Call
> it a tuple. Call it a dictionary.

- `hparray`: I think putting high performance in any class name in core is a 
mistake,
  and generally poor naming choice, and will mislead users now or in the future.
  (unless it is literally an API client for a database or server that includes 
high performance in the server software's name)

  Benchmarks currently show it using less memory but some more time than 
`array`,
  and those benchmarks will change as opcache's internals or PHP's 
representation 
  of `object`s or `array`s change.

  Which choice of data structure is highest performance would depend on the 
benchmark or needs of the application/library.
- `tuple`: In mathematics, most references I've heard of to tuples are 
generally 
  fixed sizes (n-tuples). In programming, python and C++ and various other 
languages
  use tuple to refer to a fixed-size (and immutable) data structure,
  making this naming choice extremely confusing.
  https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
  https://en.cppreference.com/w/cpp/utility/tuple

  > (In C++)Class template std::tuple is a fixed-size collection of 
heterogeneous values.
- `dictionary` - Wikipedia refers to this as an associative array 
https://en.wikipedia.org/wiki/Associative_array
  which is the exact opposite of what my Vector RFC is proposing.
 
So I don't consider any of those proposed names appropriate alternatives, 
and expect much, much stronger opposition to an RFC using that naming choice 
for this functionality.

I expect opposition to any naming choice I propose; `Vector` is what I expect 
to have the least opposition.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Adding `final class Deque` to PHP

2021-09-20 Thread tyson andre

Hi Pierre,

> It seems that you are writing more than one RFC to add many data 
> structures. I love that you're doing that, but I suggest that you'd 
> normalize them all

I'm not certain what you mean by "normalize".
https://www.merriam-webster.com/dictionary/normalize mentions

1. "to make conform to or reduce to a norm or standard"
2. 
https://www.freetext.org/Introduction_to_Linear_Algebra/Basic_Vector_Operations/Normalization/
   (no pun intended)
3. "to bring or restore to a normal condition"

If you mean to make Vector and Queue's APIs consistent with each other,
I plan to make changes to Vector (e.g. remove $preserveKeys, add isEmpty), but 
the Vector RFC is currently on hold.

If you also mean all datastructure RFCs should be combined into a single RFC,
I'd considered combining the Vector RFC with https://wiki.php.net/rfc/deque,
but decided against combining the RFCs in this instance, because of:

1. Current discussion about whether or not to choose an alternate name for a 
`Vector`
2. The fact that `Deque` has much better performance for various queue workloads
   on both time and memory usage than `array`
   (and significantly better performance than `SplDoublyLinkedList`).

Still, I may consider the approach for future RFCs, given that

1. Many developers in internals have expressed a desire for having a 
significantly 
   larger data structure library in core along the lines of what php-ds 
provides,
   but may be uninterested in some of the individual datastructures or design 
choices.

   E.g. if 60% of developers were in favor of a sorted set and its proposed 
API/name 
   (along the lines of https://cplusplus.com/reference/set/set/),
   60% were in favor of an immutable sequence and its proposed API/name (of 
values) (similar to 
https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences),
   then with the 2/3 voting threshold,
   neither of those RFCs would pass but a proposal combining those two would 
pass,
   despite ~95% of developers wanting some type of improved datastructures 
added to core in general (I would guess).
2. This would allow seeing how datastructures compare to each other.

Combining RFCs has the drawback of significantly increasing the implementation, 
discussion, review,
delays, and time involvement for the volunteer RFC authors and voters,
and may lead to a larger number of last-minute concerns raised after voting has 
started when more time 
is spent trying out the new code and looking at the RFC.

> and place all new classes in a single new dedicated 
> namespace.

My rationale for deciding against a dedicated namespace is in 
https://wiki.php.net/rfc/deque#global_namespace
which I've recently expanded on.

The `Deque` proposal is normalized with respect to the namespace choice of data 
structures that already exist.

The choice of global namespace maintains consistency with the namespace used 
for general-purpose collections already in the SPL 
(as well as relatively recent additions such as ''WeakReference'' (PHP 7.4) and 
''WeakMap'' (PHP 8.0)).
Other recent additions to PHP such as ''ReflectionIntersectionType'' in PHP 8.1 
have 
also continued to use the global namespace when adding classes with 
functionality related to other classes.

Additionally, prior polls for namespacing choices of other datastructure 
functionality showed preferences 
for namespacing and not namespacing were evenly split in a straw poll for a new 
iterable type
(https://wiki.php.net/rfc/cachediterable_straw_poll#namespace_choices)

Introducing a new namespace for data structures would also raise the question 
of whether existing datastructures 
should be moved to that new namespace (for consistency), and that process would:

1. Raise the amount of work needed for end users or 
library/framework/application authors to migrate to new PHP versions.
2. Cause confusion and inconvenience for years about which namespace can or 
should be used in an application 
   (''SplObjectStorage'' vs ''Xyz\SplObjectStorage''), especially for 
developers working on projects supporting different php version ranges.
3. Prevent applications/libraries from easily supporting as wide of a range of 
php versions as they otherwise could.
4. Cause serialization/unserialization issues when migrating to different php 
versions,
   if the old or new class name in the serialized data did not exist in the 
other php version and was not aliased.
   For example, if the older PHP version could not ''unserialize()'' 
''Xyz\SplObjectStorage'' 
   and would silently create a `__PHP_Incomplete_Class_Name` 
   (see 
https://www.php.net/manual/en/language.oop5.serialization.php#language.oop5.serialization)
   without any warnings or notices.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Adding `final class Deque` to PHP

2021-09-19 Thread tyson andre

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/deque to add a `final class 
Deque`

This is based on the `Teds\Deque` implementation I've worked on
for the https://github.com/TysonAndre/pecl-teds PECL.

While `SplDoublyLinkedList` and its subclass `SplQueue`/`SplStack` exist in the 
SPL, they have several drawbacks
that are addressed by this RFC to add a `Deque` class (to use instead of those):

1. `SplDoublyLinkedList` is internally represented by a doubly linked list,
   making it use roughly twice as much memory as the proposed `Deque`
2. `push`/`pop`/`unshift`/`shift` from `SplDoublyLinkedList` are slower due to
   needing to allocate or free the linked list nodes.
3. Reading values in the middle of the `SplDoublyLinkedList` is proportional to 
the length of the list,
   due to needing to traverse the linked list nodes.
4. `foreach` Iteration behavior cannot be understood without knowing what 
constructed the
   `SplDoublyLinkedList` instance or set the flags.

It would be useful to have an efficient `Deque` container in the standard 
library
to provide an alternative without those drawbacks,
as well as for the following reasons:

1. To save memory in applications or libraries that may need to store many 
lists of values or run for long periods of time.
   Notably, PHP's `array` type will never release allocated capacity.
   See https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html
2. To provide a better alternative to `SplDoublyLinkedList`, `SplStack`, and 
`SplQueue`
   for use cases that require stacks or queues.
3. As a more efficient option than `array` and `SplDoublyLinkedList`
   as a queue or `Deque`, especially for `unshift`.

A `Deque` is more efficient than an `array` when used as a queue, more 
readable, and easier to use correctly.
While it is possible to efficiently remove elements from the start of an 
`array` (in terms of insertion order),
it is very inefficient to prepend elements to the start of a large `array` due 
to needing to either copy the array
or move all elements in the internal array representation,
and an `array` would use much more memory than a `Deque` when used that way 
(and be slower).

There are also several pitfalls to using an array as a queue for larger queue 
sizes,
some of which are not obvious and discovered while writing the benchmarks.
(Having a better (double-ended) queue datastructure (`Deque`) than the 
`SplDoublyLinkedList`
would save users from needing to write code with these pitfalls):

1. `array_key_first()` takes time proportional to the number of elements 
`unset` from the start of an array,
   causing it to unexpectedly be extremely slow (quadratic time) after 
unsetting many elements at the start of the queue.
   (when the array infrequently runs out of capacity, buckets are moved to the 
front)
2. `reset()` or `end()` will convert a variable to a reference,
   and php is significantly less efficient at reading or writing to reference.
   Opcache is also less efficient at optimizing uses of variables using 
references.
3. More obviously, `array_unshift` and `array_shift` will take time 
proportional to the number of elements in the array
   (to reindex and move existing/remaining elements).

After the discussion period ends, I currently plan to start voting on the 
`Deque` RFC and await the results
to determine next steps for the `Vector` RFC.



The thread for my other open proposal `final class Vector` 
(https://externals.io/message/116048) has prior discussion on implementation 
choices,
naming choices, and motivation for adding datastructures to php-src.

- E.g. the question of why we should add general-purpose datastructures
  to php-src itself rather than have users rely on PECLs,
  and why this proposal doesn't and can't use `php-ds`/`ext-ds`.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Proposal: Adding an ARRAY_FILTER_REINDEX flag to array_values

2021-09-19 Thread tyson andre

Hi internals,

Currently, array_filter will always return the original keys.
This often requires an additional wrapping call of 
array_values(array_filter(...)) to reindex the keys and return a list.
(or applications may not realize there will be gaps in the keys until it causes 
a bug or unexpected JSON encoding, etc.)

PHP is also more memory/time efficient at creating packed arrays than it is at 
creating associative arrays.

What are your thoughts on adding `ARRAY_FILTER_REINDEX`, to ignore the original 
int/string keys and replace them with `0, 1, 2, ...`

```
php > echo json_encode(array_filter([5,6,7,8], fn($value) => $value % 2 > 0));
{"0":5,"2":7}
// proposed flag
php > echo json_encode(array_filter([5,6,7,8], fn($value) => $value % 2 > 0, 
ARRAY_FILTER_REINDEX));
[5,7]
```

https://www.php.net/array_filter already has the `int $mode = 0` which accepts 
the bit flags `ARRAY_FILTER_USE_KEY` and `ARRAY_FILTER_USE_BOTH`
These could be then be combined with the proposed bit flag 
`ARRAY_FILTER_REINDEX`, e.g. to filter an array based on both the array keys 
and values, and return a list without gaps.
(and if $callback is null, this would return a list containing only the truthy 
values)

Thoughts? 

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-19 Thread tyson andre

Hi Mike Schinkel,
 
> >> Given there seems to be a lot of concern about the approach the RFC 
> >> proposes would it not address the concerns about memory usage and 
> >> performance if several methods were added to SplFixedArray instead (as 
> >> well as functions like indexOf(), contains(), map(), filter(), 
> >> JSONSerialize(), etc., or similar):
> >> 
> >> ===
> >> 
> >> setCapacity(int) — Sets the Capacity, i.e. the maximum Size before resize
> >> getCapacity():int — Gets the current Capacity.
> >> 
> >> setGrowthFactor(float) — Sets the Growth Factor for push(). Defaults to 2
> >> getGrowthFactor():float — Gets the current Growth Factor
> >> 
> >> pop([shrink]):mixed — Returns [Size] then subtracts 1 from Size. If 
> >> (bool)shrink passed then call shrink().
> >> push(mixed) — Sets [Size]=mixed, then Size++, unless Size=Capacity then 
> >> setSize(n) where n=round(Size*GrowthFactor,0) before Size++.
> >> 
> >> grow([new_capacity]) — Increases memory allocated. Sets Capacity to 
> >> Size*GrowthFactor or new_capacity.
> >> shrink([new_capacity]) — Reduces memory allocated. Sets Capacity to 
> >> current Size or new_capacity.
> >> 
> >> ===
> >> 
> >> If you had these methods then I think you would get the memory and 
> >> performance improvements you want, and if you really want a final Vector 
> >> class for your own uses you could roll your own using inheritance or 
> >> containment.
> > 
> > I asked 8 months ago about `push`/`pop` in SplFixedArray. The few responses 
> > were unanimously opposed to SplFixedArray being repurposed like a vector, 
> > the setSize functionality was treated more like an escape hatch and it was 
> > conceptually for fixed-size data.
> 
> Hmm. I must have missed that thread as I was definitely following the list at 
> that time. 
> 
> But I found the thread, which only had three (3) comments from others:
> 
> https://externals.io/message/112639
> 
> From Levi Morrison it seems his objection was to adding `push()` and `pop()` 
> to a class including the name "Fixed."  Levi suggested soft-deprecating 
> `SplStack` because it was implemented as a linked-list, but he proposed 
> adding `Spl\ArrayStack` or similar, so it seems he was open to iterating on 
> the `Spl` classes in general (no pun intended.) 
> 
> From Nikita is seemed that he did not object so much as comment on Levi's 
> suggestion of adding `Spl\ArrayStack` and suggested instead an `SqlDeque` 
> that would handle queue usage more efficiently that plain PHP arrays.
> 
> So I think those responses were promising, but that you did not followed up 
> on them. I mean no disrespect — we all get busy, our priorities change, and 
> things fall off our radar — but it feels to me like you might have more 
> success pursing your use-cases related to the `Spl` classes than via a pure 
> `Vector` class.

Experience in past RFCs gave me the impression that if:

1. All of the responses are suggesting using a different approach(php-ds, 
arrays),
2. Other comments are negative or uninterested.
3. None of the feedback on the original idea is positive or interested in it.

When feedback was like that, voting would typically have mostly "no" results.

Some of the feedback such as `*Deque` was interesting, but not related to 
extending SplFixedArray.

> Maybe propose an `SplVector` class that extends `SplFixedArray`, or something 
> similar that addresses the use-case and with a name that people can agree on?

I'd be stuck with all of the features in `SplFixedArray` that get introduced 
later and its design deisions.

> BTW, here are two other somewhat-related threads:
> 
> - https://externals.io/message/110731
> - https://externals.io/message/113141
> 
> > I also believe adding a configurable growth factor would be excessive for a 
> > high level language.
> 
> I wavered on whether or not to propose a configurable growth factor, but 
> ironically I did so to head off the potential complaint from anyone who cares 
> deeply about memory usage (isn't that you?) that not allowing the growth 
> factor to be configurable would mean that either the class would use too much 
> memory for some use-cases, or would need to reallocate more memory too 
> frequently for other use-cases, depending on what the default growth factor 
> would be.
> 
> That said, I don't see how a configurable growth factor should be problematic 
> for PHP? For those who don't need/care to optimize memory usage or 
> reallocation frequency they can simply ignore it; no harm done. But for those 
> who do care, it would give them the ability to fine tune their memory usage, 
> which for selected use-cases could mean the difference between being able to 
> implement something in PHP, or not.
> 
> Note that someone could easily argue that adding a memory-optimized data 
> structure when we already have a perfectly flexible data structure with PHP 
> arrays that can be used for the same algorithms is "excessive for a 
> high-level language."

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread tyson andre

Hi Mike Schinkel,

> Given there seems to be a lot of concern about the approach the RFC proposes 
> would it not address the concerns about memory usage and performance if 
> several methods were added to SplFixedArray instead (as well as functions 
> like indexOf(), contains(), map(), filter(), JSONSerialize(), etc., or 
> similar):
> 
> ===
> 
> setCapacity(int) — Sets the Capacity, i.e. the maximum Size before resize
> getCapacity():int — Gets the current Capacity.
> 
> setGrowthFactor(float) — Sets the Growth Factor for push(). Defaults to 2
> getGrowthFactor():float — Gets the current Growth Factor
> 
> pop([shrink]):mixed — Returns [Size] then subtracts 1 from Size. If 
> (bool)shrink passed then call shrink().
> push(mixed) — Sets [Size]=mixed, then Size++, unless Size=Capacity then 
> setSize(n) where n=round(Size*GrowthFactor,0) before Size++.
> 
> grow([new_capacity]) — Increases memory allocated. Sets Capacity to 
> Size*GrowthFactor or new_capacity.
> shrink([new_capacity]) — Reduces memory allocated. Sets Capacity to current 
> Size or new_capacity.
> 
> ===
> 
> If you had these methods then I think you would get the memory and 
> performance improvements you want, and if you really want a final Vector 
> class for your own uses you could roll your own using inheritance or 
> containment.

I asked 8 months ago about `push`/`pop` in SplFixedArray. The few responses 
were unanimously opposed to SplFixedArray being repurposed like a vector,
the setSize functionality was treated more like an escape hatch and it was 
conceptually for fixed-size data.

I also believe adding a configurable growth factor would be excessive for a 
high level language.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread tyson andre

Hi Larry Garfield,

> Rather than go point by point, I'm going to respond globally here.
> 
> I am frequently on-record hating on PHP arrays, and stating that I want 
> something better.  The problems with PHP arrays include:
> 
> 1. They're badly performing (because they cannot be optimized)
> 2. They're not type safe
> 3. They're mutable
> 4. They mix sequences (true arrays) with dictionaries/hashmaps, making 
> everything uglier
> 5. People keep using them as structs, when they're not
> 6. The API around them is procedural, inconsistent, and overall gross
> 7. They lack a lot of native shorthand operations found in other languages 
> (eg, slicing)
> 8. Their error handling is crap
> 
> Any new native/stdlib alternative to arrays needs to address at least half of 
> those issues, preferably most/all.
> 
> This proposal addresses the first point and... that's it.  Point 5 is sort of 
> covered by virtue of being out of scope, so maybe this covers 1.5 out of 8.  
> That's insufficient to be worth the effort to support and deal with in code.  
> That makes this approach a strong -1 for me.
> 
> "Fancy algorithms are slow when n is small, and n is usually small." -- Rob 
> Pike
> 
> That some of the design choices here mirror existing poor implementations is 
> not an endorsement of them.  I don't think I've seen anyone on this list say 
> anything good about SPL beyond iterators and autoloading, so it's not really 
> a good model to emulate.
> 
> Additionally, please don't play into the trope about procedural/mutable code 
> being more beginner friendly.  That's not the case, beyond being a 
> self-fulfilling prophesy.  (If we teach procedural/mutable code first, then 
> most beginners will be most proficient in procedural/mutable code.)  I would 
> argue that, on the whole, immutable values make code easier to reason about 
> and write once you get past trivially small sizes.  We do new developers a 
> gross disservice by treating immutability as an "advanced" technique, when it 
> should really be the default, beginner technique taught from day one.
> 
> I am not aware of any PECL implementations of lists that have type safety, 
> because I don't use many PECL packages.  However, in user space it's quite 
> simple to do:
> 
> https://presentations.garfieldtech.com/slides-never-use-arrays/phpkonf2021/#/5/2
> 
> See that slide and scroll down for additional examples.  Every one of those 
> examples took me less than 5 minutes to write.  If we want to have a better 
> alternative in core, it needs to be *at least* as capable as what I can throw 
> together in 5 minutes.  The proposal as-is is not even as capable as those 
> examples.

Yes, you can implement those immutable and typed data structures in userland.
You are doing that by adding userland code hiding the internal implementations 
of the mutable `array` to solve the needs of your library/application (e.g. 
those 8).
Adding a mutable `Vector` gives another way to internally represent those 
userland data structures when you need those userland data structures to share 
data internally without using PHP references (not as part of the public api), 
e.g. appending to a list of error objects, performing a depth-first search or 
breadth-first search, etc.

As for your example, it's impossible to type hint without generics, and 
nobody's working on generics.
If I have your userland `TypedArray::forType(MyClass::class);`,
I can pass it to any parameter/return value/property expecting a `TypedArray`,
but that will then throw an Error at runtime with no warning ahead of time if I 
pass it to a function expecting a `TypedArray` of `OtherClass`.
Static analyzers exist separately from php that could analyze that, but 

1. Many developers wouldn't have static analyzers set up.
2. The TypedArrays may be created from unserialization from apcu/memcache/redis 
and be impractical to analyze (e.g. from an older release of a library or 
application)
3. Voters may object to this additional way to write PHP code that could error 
at runtime.

**What data structures do you want in core? Do you want them to eagerly 
evaluate generators or lazily evaluate them? Is `TypedArray` or `TypedSequence` 
something you think should have an RFC or plan to create an RFC for?**

Even if immutable data structures are proposed, there's a further division 
between programmers who want lazy or eager immutables (e.g. their constructors 
or factory methods to eagerly evaluate iterable values or lazily evaluate 
values),
and there may be enough objections to either choice (for the specific data 
structure proposed) when it was time to actually vote to result in the vote 
failing.
(in addition to other objections that come up in any new proposal for core 
datastructures)
This discourages me from proposing immutable data structures.

I'd agree on the utility of Set/Map/sorted data structures (though the hashable 
vs not hashable, comparator vs no comparator, how to hash, etc. is a discussion

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-18 Thread tyson andre

Hi Peter Bowyer,

> > > To echo Pierre, a Vector needs to be of a single guaranteed type.
> > > Yes, this gets us back to the generics conversation again, but I presume
> > (perhaps naively?) there are ways to address this question without getting
> > into full-blown generics.
> >
> > Yep, as you said, this type is mixed, just like the SplFixedArray,
> > ArrayObject, values of SplObjectStorage/WeakMap, etc.
> >
> 
> Please rename your proposal as the use of the term "Vector" is confusing
> for people who use them in other languages. Much of the discussion so far
> has been around whether it's a Vector or what it should be; changing the
> proposed name will allow the discussion to focus on what you're proposing
> to add, not what others (myself included) would like to see added to PHP :)

Many of php's names are based on the naming choices in libraries made in C/C++.
So using https://cplusplus.com/reference/vector/vector/ for my RFC 
https://wiki.php.net/rfc/vector
seems like the most natural naming choice,
and would make it easier for people with backgrounds in that family of 
languages to find the functionality they're looking for.
PHP already has a SplStack, SplQueue, etc, like C++'s `stack`, `queue`, etc.

I expect having a second `Stack` would be confusing and make it hard to 
remember which is the efficient one.
(Especially since stacks typically don't include specialized resizing methods)

No alternative names have been suggested by you or them so far, as far as I 
remember, and 2 of those responders seem to be saying they would vote no 
regardless of the choice of name (for reasons such as wanting generic-like 
functionality, wanting immutability or built-in types, etc.).
PHP's already using List to refer to linked lists, and `array` in PHP already 
refers to a hash table (including in ArrayObject).
So I expect a stronger objection to alternative names that I can think of.

Also, your comment is ambiguous. Are you saying that you personally object to 
the name,
or that you're fine with the name but think that the comments by 
Larry/Chris/Pierre in this email thread are representative of voters.

- People who wouldn't find the name surprising wouldn't bother writing an email 
to indicate a lack of surprise.

Thanks,
-Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Make strtolower/strtoupper just do ASCII

2021-09-18 Thread tyson andre

Hi Tim Starling,
 
> I would like to know if a patch to make strtolower and strtoupper do
> plain ASCII case conversion would be accepted, or if an RFC should be
> created.
> 
> The situation with case conversion is inconsistent.
> 
> The following functions do ASCII case conversion: strcasecmp,
> strncasecmp, substr_compare.
> 
> The following functions do locale-dependent case conversion:
> strtolower, strtoupper, str_ireplace, stristr, stripos, strripos,
> strnatcasecmp, ucfirst, ucwords, lcfirst.
> 
> I would make them all do ASCII case conversion.
> 
> Developers need ASCII case conversion, because it is used internally
> by PHP for things like class name comparison, and because it is a
> specified algorithm in HTML 5 and related standards.
> 
> The existing options for ASCII case conversion are:
> 
> * Never call setlocale(). But this breaks non-ASCII characters in
escapeshellarg() and can't be guaranteed in a library.
> 
> * Call setlocale(LC_ALL, "C.UTF-8"). But this is non-portable and also
can't be guaranteed in a library.
> 
> * Use strtr(). But this is ugly and slow.
> 
> If mbstring has a way to do it, I can't find it. I tested
> mb_strtolower($s, '8bit') and mb_strtolower($s,'ascii').
> 
> Note that locale-dependent case conversion is almost never a useful
> feature. Strings are passed through tolower() one byte at a time, to
> be interpreted with some legacy 8-bit character set. So the result
> will typically be mojibake even if the correct locale is selected.
> 
> strtolower() mangles UTF-8 strings in many locales, such as fr-FR. I
> made a full list at . The
> UTF-8 locales mostly work, except for the Turkish ones, which mangle
> ASCII strings.
> 
> At https://bugs.php.net/bug.php?id=67815 , Nikita Popov wrote: "My
> general recommendation is to avoid locales and locale-dependent
> functions, as locales are a fundamentally broken concept." I agree
> with that. I think PHP should migrate away from locale dependence.
> When PHP was young, it was convenient to use the C library, but we've
> progressed well past that point now.

I think it's a good idea (But would still require an RFC)
As you said, the way it acts on bytes rather than codepoints seems like it's 
almost always incorrect outside a narrow range
(except for rare charsets such as https://en.wikipedia.org/wiki/ISO/IEC_8859-1)

The behavior of strtolower is inconvenient for common uses in
- filesystem paths, where strolower('I') isn't 'i' in tr_TR
- username validation, if it's possible to create a new account that is 
considered the same case-insensitive strings in some locales but not others
- etc.

When implementing this, Zend/Optimizer/sccp.c has optimizations for functions 
such as str_contains, etc to optimize.
After removing locale dependence, those optimizations could be safely added for 
functions that would be locale independent as a result of your change.
- This would allow eliminating more dead code, and make code calling those 
functions (on constant arguments) faster by caching the resulting strings in 
opcache.

The function `zend_string_tolower` can safely be used to efficiently convert 
strings to lowercase in a case-insensitive way.
(zend_string_toupper hasn't been needed yet due to not yet having any use cases 
in php-src's internals, but could be added in such a PR)

```
841:|| zend_string_equals_literal(name, "str_contains")
842:|| zend_string_equals_literal(name, "str_ends_with")
843:|| zend_string_equals_literal(name, "str_replace")
844:|| zend_string_equals_literal(name, "str_split")
845:|| zend_string_equals_literal(name, "str_starts_with")
```

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre


Hi Pierre Joye,

> Not sure you care or read my reply but I had to jump in one more time here :)
> 
> On Sat, Sep 18, 2021 at 8:49 AM tyson andre  wrote:
> 
> > setSize is useful in allocating exactly the variable amount of memory 
> > needed while using less memory than a PHP array.
> > `setSize($newSize, 0)` would be much more efficient and concise in 
> > initializing the value.
> >
> > - Or in quickly reducing the size of the array rather than repeatedly 
> > calling pop in a loop.
> 
> I would rather not reduce it at all, but use the vector_size and keep
> it. User land set its max size but a realloc/free should not be
> necessary and counter productive from a perf point of view. If one
> uses it in a daemon, it can always be destroyed as needed.
> 
> > > To echo Pierre, a Vector needs to be of a single guaranteed type.
> > > Yes, this gets us back to the generics conversation again, but I presume 
> > > (perhaps naively?) there are ways to address this question without 
> > > getting into full-blown generics.
> >
> > Yep, as you said, this type is mixed, just like the SplFixedArray, 
> > ArrayObject, values of SplObjectStorage/WeakMap, etc.
> > Generic support is something that's been brought up before, investigated, 
> > then abandoned.
> >
> > My concerns with adding StringVector, MixedVector, IntVector, FloatVector, 
> > BoolVector, ArrayVector (confusing), ObjectVector, etc is that
> >
> > - I doubt many people would agree that there's a wide use case for any
> >   specific one of them compared to a vector of any type.
> 
> I am lost here. This is the main usage of Vector. For linear
> arithmetic like dot product, masking, add/sub/mul/div of vector etc. I
> do not see any other usage per see for all the things I have
> implemented or saw out there. Additionally, f.e., a string is a vector
> already on its own, I am not sure a vector of vectors makes sense ;).
> 
> >   This would be even harder to argue for than just a single Vector type.
> > - Mixes of null and type `T` might make sense in many cases (e.g. optional 
> > objects, statistics that failed to get computed, etc) but would be 
> > forbidden by that
> > - It would be a bad choice if generic support did get added in the future.
> 
> These are special cases for general purposes of vectors. Implementing
> vectors focusing on these special cases rather than the general
> purpose (vectorization) would be a strategic mistake. I mentioned it
> before, but please take a look at the numpy's Vector f.e., with
> python's operator overload, what has been done there is simply
> amazing, bringing vector processing/arithmetic a huge boost in
> performance, even with millions of entries (14 to 400x speed boost
> compared to classic array, even fixed).
> 
> > > But really, a non-type-guaranteed Vector/List construct is of fairly 
> > > little use to me in practice, and that's before we even get into the 
> > > potential performance optimizations for map() and filter() from type 
> > > guarantees.
> >
> > See earlier comments on `vec`/Generics not being actively worked on right 
> > now and probably being a far way away from an implementation that would 
> > pass a vote.
> 
> Generics!=Vector. But I hope that's not the way we are heading here :)
> 
> > As for optimizations, opcache currently doesn't optimize individual global 
> > functions (let alone methods), it optimizes opcodes.
> > Even array_map()/array_filter() aren't optimized, they call callbacks in an 
> > ordinary way.
> > E.g. https://github.com/php/php-src/pull/5588 or 
> > https://externals.io/message/109847 regarding ordinary methods.
> >
> > Aside: In the long term, I think the opcache core team had a long-term plan 
> > of changing the intermediate representation to make these types of 
> > optimizations feasible without workarounds like the one I proposed in 5588
> 
> You are fully correct here, I see a lack of the engine devs
> involvement (not complaining, just a state of the affairs :) in such
> RFC where this kind of feature could greatly benefit PHP. Well
> planned, this is a huge addition to PHP.
> 
> It is also why I am convinced that doing it right for Vectors (as a
> start) and thinking forwards to JIT and ops overloading (internally or
> userland) to allow smooth and nice vectorization (as some parts use
> them already internally f.e.) will bring PHP up to speed with the
> competition. If we don't, we just have something that would be similar
> to what anyone could do in userland with more flexibility.

I have no plans to change the direction of this RFC in th

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre


> Improving collection/set operations in PHP is something near and dear to my 
> heart,
> so I'm in favor of adding a Vector class or similar to the stdlib.
> 
> However, I am not a fan of this particular design.
> 
> As Levi noted, this being a mutable object that passes by handle is asking 
> for trouble.
> It should either be some by-value internal type, or an immutable object with 
> evolver methods on it.
> (E.g., add($val): Vector). Making it a mutable object is creating spooky 
> action at a distance problems.
> An immutable object seems likely easier to implement than a new type,
> but both are beyond my capabilities so I defer to those who could do so.

https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec discusses 
why I'm doubtful of `is_vec` getting implemented or passing.
Especially with `add()` taking linear time to copy all elements of the existing 
value if you mean an array rather than a linked list-like structure, and any 
referenced copies taking a lot more memory than an imperative version would.


PHP's end users and internals members come from a wide variety of backgrounds,
and I assume most beginning or experienced PHP programmers would tend towards 
imperative programming rather than functional programming.

PHP provides tools such as `clone`, private visibility, etc to deal with that.

The lack of any immutable object datastructures in core and the lack of 
immutable focused extensions in PECL 
https://pecl.php.net/package-search.php?pkg_name=immutable
https://www.php.net/manual-lookup.php?pattern=immutable=quickref
(other than DateTimeImmutable)
heavily discourage me from proposing anything immutable.

(Technically, https://github.com/TysonAndre/pecl-teds has minimal 
implementations of immutable data structures, but the api is still being 
revised and Vector is the primary focus, followed by iterable functions. e.g. 
there's no `ImmutableSequence::add($value): ImmutableSequence` method.)


> The methods around size control are seemingly pointless from a user POV.

setSize is useful in allocating exactly the variable amount of memory needed 
while using less memory than a PHP array.
`setSize($newSize, 0)` would be much more efficient and concise in initializing 
the value.

- Or in quickly reducing the size of the array rather than repeatedly calling 
pop in a loop.

And while methods around capacity control exist in many other programming 
languages, they aren't used by most users and most users are fine with 
functionality they don't use existing.
The applications or libraries that do have a good use case to reduce memory 
will take advantage of them and end users of those applications/libraries would 
benefit from the memory usage reduction.

> I understand the memory optimization value they have, but that's not 
> something PHP developers are at all used to dealing with.
> That makes it less of a convenient drop-in replacement for array and more 
> just another user-space collection object, but in C with internals 
> endorsement.
> If such logic needs to be included, it should be kept as minimalist as 
> possible for usability,
> even at the cost of a little memory usage in some cases.

If the functionality was just a drop-in replacement for array, others may say 
"why not just use array and the array libraries?" (or Vector).
With the strategy of doubling capacity, it can be up to 99% more memory than 
needed in some cases (Even more wastage after shrinking from the maximum size).

> There is no reason to preserve keys.
> A Vector or list type should not have user-defined keys.
> It should just be a linear list. If you populate it from an existing 
> array/iterable, the keys should be entirely ignored.
> If you care about keys you want a HashMap or Dictionary or similar (which we 
> also desperately need in the stdlib, but that's a separate thing).

The behavior is similar to 
https://www.php.net/manual/en/splfixedarray.fromarray.php 
It tries to preserve the keys, and fills in gaps with null.

1. There's the consistency with existing functionality such as 
SplFixedArray::fromArray, or existing constructors preserving keys.
2. And I'd imagined that a last minute objection of "Wait, `new 
SplFixedArray([1 => 'second', 0 => 'first'])` does what by default? Isn't this 
using the keys 0 and 1?", and the same for gaps

   I was considering only having the no-param constructor, and adding the 
static method fromValues(iterable $it) to make it clearer keys are ignored.

> Whether or not contains() needs a comparison callback in my mind depends 
> mainly on whether or not the operator overloading RFC passes. 
> If it does, then contains() can/should use the __compareTo() method on 
> objects.
> If it doesn't, then there needs to be some other way to compare non-identical 
> objects or else that method becomes mostly useless.

There's a distinction between needs and very nice to have - a contains check 
for some predicate on a Vector can be done with a userland helper

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre

Hi Max Semenik,

> Since Ds was mentioned, I've added it to your benchmark (code and complete 
> results at https://gist.github.com/MaxSem/d0ea0755d6deabaf88c9ef26039b2f27):
> 
> Appending to array:         n= 1048576 iterations=      20 memory=33558608 
> bytes, create+destroy time=0.369 read time = 0.210 result=10995105792000
> Appending to Vector:        n= 1048576 iterations=      20 memory=16777304 
> bytes, create+destroy time=0.270 read time = 0.270 result=10995105792000
> Appending to SplStack:      n= 1048576 iterations=      20 memory=33554584 
> bytes, create+destroy time=0.893 read time = 0.397 result=10995105792000
> Appending to SplFixedArray: n= 1048576 iterations=      20 memory=16777304 
> bytes, create+destroy time=2.475 read time = 0.340 result=10995105792000
> Appending to Ds\Vector:     n= 1048576 iterations=      20 memory=24129632 
> bytes, create+destroy time=0.389 read time = 0.305 result=10995105792000
> 
> Another comparison with Ds, I wonder if an interface akin to Ds\Sequence[1] 
> could be added, to have something in common with other future containers.

It's worth noting that the first 4 data structures all start with initial sizes 
that are powers of 2 and continue doubling (and not mattering for SplStack, a 
doubly linked list),
but according to Ds\Vector's documentation,
it starts with a minimum size of 10. So it's an unfair comparison. 
http://docs.php.net/manual/en/class.ds-vector.php#ds-vector.constants.min-capacity
So there are probably larger copies done in Ds\Vector - Ds\Vector might do 
better for other sizes or use less memory under other circumstances.

(for reasons mentioned in https://externals.io/message/116048#116054 , I 
haven't checked the resizing strategy used by Ds\Vector - doubling is a common 
choice in vector implementations in other languages, others use other multiples 
of old capacity, etc)

Regards,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre

Hi Christian Schenider,

> First of all: I don't have a strong opinion on a Vector class being useful or 
> necessary.
> 
> But I have two comments about this RFC:
> 
> 1. Using the very generic name Vector without any prefix/namespace seems 
> dangerous and asking for BC breaks.

I downloaded the top 400 composer packages with 
https://github.com/nikic/popular-package-analysis/ and didn't find any classes 
named Vector.

- Only php-cs-fixer extends SplFixedArray in one class. It can continue do so.
- I don't see other classes called Vector. Just stubs for `\Ds\Vector`.

There are tradeoffs and objections to any possible choice of name I could make, 
including this or alternates.

- Too likely to have conflicts
- Excessively long
- Open to adopting namespace but objecting to migrating existing classes (or 
not doing so)
- Objecting to a specific choice 

> 2. I don't like that this class is final. The reasons given in 
> https://wiki.php.net/rfc/vector#final_class 
> https://wiki.php.net/rfc/vector#final_class seem unconvincing to me and 
> restrict the usage of Vector in a way which makes me question the usefulness 
> to a big enough part of the PHP community.
> These two reasons combined would make me reject the RFC at the current stage.

There are alternatives such as making all/almost all of the methods 
final(especially for reading and modifying array elements or basic properties 
of the vector), but allowing extending the class.

- Still, I don't think that'd be very useful, and would make future final 
method additions to Vector backwards incompatible.
- Trying to do everything (e.g. be extensible and handle all edge cases of 
extension) has often resulted in many spl data structures doing not anything 
very well(efficiently, correctly, or possible to make universal assumptions 
about or optimize in the future with opcache/the jit).

While it is possible to extend ArrayObject and SplFixedArray, very few things 
do that, and it'd generally lead to worse API design except in a few cases.
(E.g. `UserList extends \Vector` wouldn't be able to enforce that inserted 
values are actually users with final methods)

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre

Hi Levi Morrison,

> I mean that there isn't a way to provide a custom way to compare for
> equality. One way to accomplish this is to have a signature like:
> 
>    function contains(T $value, ?callable(T, T):bool $comparator = null): bool
> 
> The same goes for `indexOf`.

It'd make much more sense to have `->any(static fn($other): bool => 
$comparator($value, $other)): ?:int`
Overloading contains to do two different things (identity check or test the 
result of a callable)
seems like it's unintuitive to users.

Since there is plenty of time to add more functionality,
and I still haven't created the extended iterable library proposal,
this currently only adds operations that are significantly more efficient 
inside the Vector
(or have a return type of Vector) rather than going through the generic 
Iterator methods.

> > > - I don't know what `setSize(int $size)` does. What does it do if the
> > > current size is less than `$size`? What about if its current size is
> > > greater? I suspect this is about capacity, not size, but without docs
> >  > I am just guessing.
> >
> > It's the same behavior as 
> > https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, 
> > not capacity.
> >
> > > Change the size of an array to the new size of size.
> > > If size is less than the current array size, any values after the new 
> > > size will be discarded.
> > > If size is greater than the current array size, the array will be padded 
> > > with null values.
> >
> > I'd planned to add phpdoc documentation and examples before starting a vote 
> > to document the behavior and thrown exceptions of the proposed methods.
> 
> I would rather see multiple methods like:
>     function truncateTo(int $size)
>     function padEnd(int $length, $value) // allows more than just null
>     function padBeginning(int $length, $value)

I'd consider this unfriendly to users (and personally consider it a poor 
design) if we start with 3 or 4 different ways to change the size of the Vector.
(Especially if English is a second language)

A wide variety of programming languages such as Java, Rust, C++, etc. all use 
resize rather than truncateTo/padEnd,
after what I assume is considerable discussion among language design experts in 
those languages.
In the vast majority of cases, users know the exact size they want and don't 
care about the mechanism to set that.
(And if the size is set larger or smaller in an `if{...}else{...}`, the 
existence of setSize is still needed.
Or if the user intends to reuse the allocated memory while overwriting all 
values.)

- Diverging from what end users are familiar with (without a strong reason to) 
would also make it harder to start using `Vector`.

I'd considered using a signature of `setSize(int $size, mixed $value = null)` 
to allow using something other than null
but decided to leave that to a followup proposal if it passed.

For now, I'd omitted ways to add to the start of the array because the linear 
time taken would be potentially objectionable,
if people didn't imagine using it themselves or thought it'd be more 
appropriate for end users to use a Deque.

> And one or more for increasing/ensuring capacity without changing size.

setCapacity seems useful to me for reserving exactly the amount of memory 
needed when the final size was known (e.g. setCapacity(2) to avoid 
over-allocating) but I was waiting to see if anyone else wanted that.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-17 Thread tyson andre


Hi Pierre,

> That's nice, and I like it, but like many people I will argue about the
> API itself.
> 
> One thing is that there's many methods in there that would totally fit
> generic collection common interfaces, and in that regard, I'd be very
> sad that it would be merged as is.

It isn't an interface, but my previous attempts at introducing common 
functionality for working with iterables have failed,
e.g. with preferring userland reasons or being too small in scope among the 
reasons.
https://wiki.php.net/rfc/any_all_on_iterable#straw_poll

Until there's a Set type or a Map type, adding generic functionality such as 
contains()
to all spl data structures is harder.

I haven't seen any recent additions of utility methods to existing spl 
datastructures in years other than when filling an urgent need,
(e.g. SplHeap->isCorrupted())
and have been pessimistic about that succeeding, but may be mistaken.

> I think it's taking the problem backwards, I would personally prefer that:
> 
>  - This RFC introduces the vector into a new Collection namespace, or
> any other collection/iterable/enumerable related namespace, that'd
> probably become the birth of a later to be standard collection API.
> 
>  - Start thinking about a common API even if it's for one or two
> methods, and propose something that later would give the impulsion for
> adding new collection types and extending this in order to be become
> something that looks like a really coherent collection API.
> 
> If this goes in without regarding the greater plan, it will induce
> inconsistencies in the future, when people will try to make something
> greater. I'd love having something like DS and nikic/iter fused
> altogether into PHP core, as a whole, in a consistent, performant, with
> a nice and comprehensive API (and that doesn't require to install
> userland dependencies).

Aside: https://github.com/TysonAndre/pecl-teds#iterable-functions
starts doing that, but evaluating eagerly instead of using generators.
I still don't think there's enough functionality yet to re-propose that.

> I know this vector proposal is not about that, but nevertheless, in my
> opinion, it must start preparing the terrain for this, or all other RFC
> in the future will only create new isolated data structures and make the
> SPL even more inconsistent.

It's possible, but I don't know what others think.

1. https://www.php.net/manual/en/class.ds-collection.php actually seems fairly 
universal, but out of scope, and I don't know if people would json encode a 
SplMaxHeap. Right now that isn't implemented and the value is always `{}`
2. `add($value)/remove($value)/contains[Value]($value)` is limited to some 
structures - Only containsValue() would apply to ArrayObject/SplObjectStorage. 
The others wouldn't work since you'd need to know the keys as well.

Also,

- Union type/intersection type support exists, so allowing any generic 
collection interface is less urgent.
- equals() may work, though infinite recursion (or the way it is or isn't 
detected) in circular data structures is a potential objection, especially with 
lack of stack overflow detection - php just crashes/segfaults without a useful 
method when it runs out of stack space.

For the ones that are universal, 
Countable/ArrayAccess/IteratorAggregate/Traversable already exist.

Also, as you said, this RFC is not about that.
Requiring that anyone systematically overhaul existing data structures before 
adding any new data structures
seems like it would significantly delay or discourage any future additions of 
data structures.

In the immediate future, an RFC only doing that would not have much short-term 
benefit to users - it would also have short-term drawbacks for what I consider 
not enough benefit,
if adopting that interface made libraries drop support for older php versions.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread tyson andre

Hey Marco Pivetta,

> Would it perhaps make sense to drag in php-ds, which has matured quite a bit 
> over the years? I'm referring to: 
> https://www.php.net/manual/en/class.ds-sequence.php
> 
> Is what you are suggesting with `Vector` different from it?
> 
> Note: For some reason, I can't quote your post and then reply, so it will be 
> a top-post 路‍♀️

This was outlined in the section 
https://wiki.php.net/rfc/vector#why_not_use_php-ds_instead before I sent out 
the announcement. To expand on that,

This has been asked about multiple times in threads on unrelated proposals 
(https://externals.io/message/112639#112641 and 
https://externals.io/message/93301#93301 years ago) throughout the years,
but the maintainer of php-ds had a long term goal of developing the separately 
from php's release cycle (and was still focusing on the PECL when I'd asked on 
the GitHub issue in the link almost a year ago).

- There have been no proposals from the maintainer to do that so far, that was 
what the maintainer mentioned as a long term plan.
- I personally doubt having it developed separately from php's release cycle 
would be accepted by voters (e.g. if unpopular decisions couldn't be voted 
against), or how backwards compatibility would be handled in that model, and 
had other concerns. (e.g. API debates such as 
https://externals.io/message/93301#93301)
- With php-ds itself getting merged anytime soon seeming unlikely to me, I 
decided to start independently working on efficient data structure 
implementations.

I don't see dragging it in (against the maintainer's wishes) as a viable option 
for many, many, many reasons.
But having efficient datastructures in PHP's core is still useful.

- While PECL development outside of php has its benefits for development and 
ability to make new features available in older php releases,
  it's less likely that application and 
  library authors will start making use of those data structures because many 
users won't have any given PECL already installed. 
  (though php-ds also publishes a polyfill, it would not have the cpu and 
memory savings, and add its own overhead)

- Additionally, users (and organizations using PHP) can often make stronger 
assumptions on
  backwards compatibility and long-term availability of functionality that is 
merged into PHP's core.

So the choice of feature set, some names, signatures, and internal 
implementation details are different, because this is reimplementing a common 
datastructure found in different forms in many languages.
It's definitely a mature project, but I personally feel like reimplementing 
this (without referring to the php-ds source code and without copying the 
entire api as-is) is the best choice to add efficient data structures to core 
while respecting the maintainer's work on the php-ds project and their wish to 
maintain control over the php-ds project.

As a result, I've been working on implementing data structures such as Vector 
based on php-src's data structure implementations (mostly SplFixedArray and 
ArraayObject) instead (and based on my past PECL/RFC experience, e.g. with 
runkit7/igbinary)

Regards,
Tyson

Re: [PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread tyson andre

Hi Levi Morrison,

> I'm okay with a final Vector class in general. I don't love the
> proposed API but don't hate it either. Feedback on that at the end.
> 
> What I would _love_ is a `vec` type from hacklang, which is similar to
> this but pass-by-value, copy-on-write like an array. Of course, this
> would require engine work and I understand it isn't as simple to add.

Yeah, as mentioned in 
https://wiki.php.net/rfc/vector#adding_a_native_type_instead_is_vec , it would 
require a massive amount of work.

- A standard library for dealing with `vec`, filtering it, etc
- Userland libraries and PECLs would need to deal with a third complex type 
different from array/object that probably couldn't be implicitly 
- Extensive familiarity with opcache and the JIT for x86 and other platforms 
beyond what I have
- Willingness to do that with the uncertainty the final implementation would 
get 2/3 votes with backwards compatibility objections, etc.

> Feedback on API:
> 
> -  `indexOf` returning `false` instead of `null` when it cannot be
> found. If we are keeping this method (which I don't like, because
> there's no comparator), please return `null` instead of false. The
> language has facilities for working with null like `??`, so please
> prefer that when it isn't needed for BC (like this, this is a new
> API).

I hadn't thought about that - that seems reasonable since I don't remember 
anything else adding indexOf as a method name.

> - `contains` also doesn't have a comparator.

I was considering proposing `->any(callable)` and `->all(callable)` extensions 
if this passed.
I'm not quite sure what you mean by a comparator for contains. There'd have to 
be a way to check if a raw closure is contained.

> -  Similarly but less strongly, I don't like the filter callable
> returning `mixed` -- please just make it `bool`.

The filter callable is something that would be passed into the filter function. 
The return value would be checked for truthiness.
The phpdoc in the documentation could be changed, but that wouldn't change the 
implementation.

> - I don't know what `setSize(int $size)` does. What does it do if the
> current size is less than `$size`? What about if its current size is
> greater? I suspect this is about capacity, not size, but without docs
 > I am just guessing.

It's the same behavior as 
https://www.php.net/manual/en/splfixedarray.setsize.php . It's about size, not 
capacity.

> Change the size of an array to the new size of size.
> If size is less than the current array size, any values after the new size 
> will be discarded.
> If size is greater than the current array size, the array will be padded with 
> null values.

I'd planned to add phpdoc documentation and examples before starting a vote to 
document the behavior and thrown exceptions of the proposed methods.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] RFC: Add `final class Vector` to PHP

2021-09-16 Thread tyson andre

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/vector proposing to add `final 
class Vector` to PHP.

PHP's native `array` type is rare among programming language in that it is used 
as an associative map of values, but also needs to support lists of values.
In order to support both use cases while also providing a consistent internal 
array HashTable API to the PHP's internals and PECLs, additional memory is 
needed to track keys 
(https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html - 
around twice as much as is needed to just store the values due to needing space 
both for the string pointer and int key in a Bucket, for non-reference counted 
values)).
Additionally, creating non-constant arrays will allocate space for at least 8 
elements to make the initial resizing more efficient, potentially wasting 
memory.

It would be useful to have an efficient variable-length container in the 
standard library for the following reasons: 

1. To save memory in applications or libraries that may need to store many 
lists of values and/or run as a CLI or embedded process for long periods of 
time 
   (in modules identified as using the most memory or potentially exceeding 
memory limits in the worst case)
   (both in userland and in native code written in php-src/PECLs)
2. To provide a better alternative to `ArrayObject` and `SplFixedArray` for use 
cases 
   where objects are easier to use than arrays - e.g. variable sized 
collections (For lists of values) that can be passed by value to be read and 
modified.
3. To give users the option of stronger runtime guarantees that property, 
parameter, or return values really contain a list of values without gaps, that 
array modifications don't introduce gaps or unexpected indexes, etc.

Thoughts on Vector?

P.S. The functionality in this proposal can be tested/tried out at 
https://pecl.php.net/teds (under the class name `\Teds\Vector` instead of 
`\Vector`).
(That is a PECL I created earlier this year for future versions of iterable 
proposals, common data structures such as Vector/Deque, and less commonly used 
data structures that may be of use in future work on implementing other data 
structures)

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Deprecate dynamic properties

2021-08-25 Thread tyson andre

Hi Nikita Popov,

> I'd like to propose the deprecation of "dynamic properties", that is
> properties that have not been declared in the class (stdClass and
> __get/__set excluded, of course):
> 
> https://wiki.php.net/rfc/deprecate_dynamic_properties
> 
> This has been discussed in various forms in the past, e.g. in
> https://wiki.php.net/rfc/locked-classes as a class modifier and
> https://wiki.php.net/rfc/namespace_scoped_declares /
> https://github.com/nikic/php-rfcs/blob/language-evolution/rfcs/-language-evolution.md
> as a declare directive.
> 
> This RFC takes the more direct route of deprecating this functionality
> entirely. I expect that this will have relatively little impact on modern
> code (e.g. in Symfony I could fix the vast majority of deprecation warnings
> with a three-line diff), but may have a big impact on legacy code that
> doesn't declare properties at all.

I'd had some concerns.

It might be too soon after your addition of WeakMap.
https://www.php.net/weakmap was introduced in PHP 8.0 (and WeakReference in 
7.4),
so applications/libraries fixing the deprecation may need to drop support for 
php 7.x early.
Applications attempting to polyfill a WeakMap in earlier PHP versions would 
potentially leak a lot of memory in php 7.x.

- I don't know how many minor versions to expect before 9.0 is out
- Is it feasible for a developer to create a native PECL polyfill for WeakMap 
for earlier PHP versions that has a
  subset of the functionality the native weak reference counting does?
  (e.g. to only free polyfilled weak references when cyclic garbage collection 
is triggered and the reference count is 1).

Additionally, it makes it less efficient (but still feasible) to associate 
additional fields
with libraries or native classes/PECLs you don't own (especially for circular 
data structures), especially if they need to be serialized later.
(it isn't possible to serialize WeakMap, and the WeakMap would have fields 
unrelated to the data being serialized)
I guess you can have a wrapper class that iterates over a WeakMap to capture 
and serialize the values that still exist in SplObjectStorage, though.
(Though other languages do just fine without this functionality)

I'm not sure if a library owner would want to change their class hierarchy to 
extend stdClass (to avoid changing the behavior of anything using `$x 
instanceof stdClass`) and the attribute/trait approach might be more acceptable 
to library owners.
E.g. 
https://github.com/vimeo/psalm/blob/master/src/Psalm/Internal/Analyzer/Statements/Expression/Call/FunctionCallAnalyzer.php
 
would set a dynamic property `$stmt->pure` in `PhpParser\Node\Expr\FuncCall 
$stmt` in a vendor dependency on php-parser.

Regards,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-19 Thread tyson andre

Hi Craig Francis,

> As an aside, only 4 of 23 'no' voters provided any comment as to why they
> voted that way on the mailing list, which I feel undermines the point of
> the Request For Comment process, with an additional 5 responding personally
> off-list after prompting. This makes it harder (or impossible) for points
> to be discussed and addressed.

1. My earlier comments about static analysis, and on behavior depending on 
whether opcache is enabled
2. This might prevent certain optimizations in the future. For example, 
currently, 1-byte strings are all interned to save memory.
If is_literal was part of php prior to proposing that optimization, then 
that optimization may be rejected.
3. PHP's `string` type is used both for (hopefully) valid unicode strings and 
for low level operations on literal byte arrays (e.g. cryptogrophy).
It seems really, really strange for a type system to track trustedness for 
a low level primitive to track byte arrays. (the php 6 unicode string 
refactoring failed)

Further, if this were to be extended in the future beyond the original 
proposal (e.g. literal strings that are entirely digits are automatically 
interned or marked as trusted),
this might open previously safe code acting on byte arrays to side channel 
issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack)
4. Internal functions and userland polyfills for those functions may 
unintentionally differ significantly for the resulting taintedness,
e.g. base64_decode in userland being built up byte by byte would end up 
being possibly untainted?
5. The fact that 1-byte strings are almost always interned seems like a 
noticeable inconsistency (though library authors can deal with it once they're 
aware of it), though for it to become an issue a library may need to take 
multiple strings as input
(e.g. a contrived example`"echo -- " . $trustedPrefix . 
shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and $notTrusted 
of "; evaluate command"
6. Including it in core would make it harder to remove later if it interfered 
with opcache or jit work, or to migrate code to alternative interpreters for 
php if those were ever implemented (if frameworks were to extensively depend on 
is_literal)
7. Tracking whether a string is untrusted is a definition only suitable for a 
few (extremely common) formats for php. But for less common features, even 
stringified integers may be a problem (e.g. binary file formats, etc)

This is relatively minor given that php is typically used in a web 
programming context with json or html or js/css output

I'd think is_interned()/intern_string() is much closer to tracking 
something that corresponds with php's internals (e.g. and may allow saving 
memory in long-running processes which receive duplicate strings as input), 
though the 10 people who wanted fully featured trustedness checking would 
probably want is_literal instead
8. Serializing and unserializing data would lose information about trustedness 
of inputs, unpredictably (e.g. unserialize() in php 8.0 interns array keys).

There's no (efficient) way to change trusted strings to untrusted or vice 
versa, though there are inefficient workarounds (modifying a byte and restoring 
it to stop trusting it, imploding single characters to create a trusted string)

This may done implicitly in frameworks using APCu/memcached/redis as a cache

(I definitely don't think the serialization data format should track 
is_literal())

9. Future refactorings, optimizations or deoptimizations (or security fixes) to 
unserialize(), etc. may unexpectedly break code using is_literal that throw 
instead of warn (more bug reports, discourage users from upgrading, etc.)
10. This RFC adds an unknown amount of future work for php-src and PECLs to 
*intuitively* support mapping trusted inputs to trusted outputs or vice versa - 
less commonly used or unmaintained functions may not behave as expected for a 
while
11. https://pecl.php.net/package/taint is available already for a use case with 
some overlap for setups that need this


Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an 
interned string) would make sense to expose in a PECL as a regular `extension` 
(not a `zend_extension`) if is_interned also fails.
Unlike the zend_extension for 
https://www.php.net/manual/en/function.is-tainted.php ,
something simple may be possible without needing the performance hit and
future conflicts with XDebug that I assume 
https://www.php.net/manual/en/function.is-tainted.php would be prone to.
(https://pecl.php.net/package/taint seems to use a separate bit to track this. 
The latest release of the Taint pecl fixes XDebug compatibility)

- Other languages, such as Java, have exposed this for memory management 
purposes (rather than security) though it's rarely used directly or in 
frameworks, e.g.

Re: [PHP-DEV] Changing method naming in FFI Type Reflection API from Arg->Parameter, etc

2021-07-13 Thread tyson andre

> > The FFI Type Reflection API mentioned in 
> > https://externals.io/message/115336 was recently added
> > 
> > My opinion is that that they should be renamed to use the same naming 
> > scheme that PHP's Reflection extension is already using.
> > Having different ways of naming very similar concepts (different from 
> > https://www.php.net/reflectionfunctionabstract) would make the language 
> > harder to remember.
> > I'd brought that up in 
> > https://github.com/php/php-src/pull/7217#pullrequestreview-700990479 
> > with no response
> > 
> > What do others think about the name? I was considering holding a short 
> > vote
> > (on getReturnType, getParameterCount, getParameterType) before the 
> > feature freeze if there was interest
> > 
> > In particular,
> > 
> > - FFI\CData->getFuncReturnType should be changed to getReturnType - 
> > only functions have return types
> > 
> >   This is consistent with 
> > https://www.php.net/reflectionfunctionabstract 
> > - I believe Arg should be renamed to Parameter and Func should be 
> > removed from names where redundant.
> >   E.g. getFuncArgCount should be renamed to getParameterCount 
> > (getFuncArgType should be renamed getParameterType) - only functions 
> > have parameters,
> >   and PHP is already already using "Parameter" instead of "Argument" 
> > for reflection on types elsewhere.
> > 
> >   Parameter is used to refer to the function declarations (AST_PARAM 
> > internally in the AST, ReflectionFunctionAbstract->getParameters(), 
> > etc.)
> >   Argument is used to refer to expressions passed to the functions by 
> > the caller (ArgumentCountError, etc.)
> > 
> >   Other languages use similar definitions, e.g. 
> > https://developer.mozilla.org/en-US/docs/Glossary/Parameter
> > - The discussion over where FFI arrays should support Countable::count 
> > (and non-arrays should throw) might be contentious so I'd rather keep 
> > getArrayLength
> 
> This all makes sense to me.  Consistent naming is better unless there's a 
> very specific reason to to otherwise.

Created a PR https://github.com/php/php-src/pull/7236

Actually, looking at this again, I don't see a need to drop the "Func" - 
there's already getFuncABI.

If you look at the current implementation, there's getStruct* for structures, 
getArray*, getPointer*, meaning `getFunc*` sort of makes sense for a naming 
scheme to make it easier to find functionality associated with a given func.

Still, I find my proposal of Arg->Parameter continues to make sense to me.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Changing method naming in FFI Type Reflection API from Arg->Parameter, etc

2021-07-13 Thread tyson andre

Hi internals,

The FFI Type Reflection API mentioned in https://externals.io/message/115336 
was recently added

My opinion is that that they should be renamed to use the same naming scheme 
that PHP's Reflection extension is already using.
Having different ways of naming very similar concepts (different from 
https://www.php.net/reflectionfunctionabstract) would make the language harder 
to remember.
I'd brought that up in 
https://github.com/php/php-src/pull/7217#pullrequestreview-700990479 with no 
response

What do others think about the name? I was considering holding a short vote
(on getReturnType, getParameterCount, getParameterType) before the feature 
freeze if there was interest

In particular,

- FFI\CData->getFuncReturnType should be changed to getReturnType - only 
functions have return types

  This is consistent with https://www.php.net/reflectionfunctionabstract 
- I believe Arg should be renamed to Parameter and Func should be removed from 
names where redundant.
  E.g. getFuncArgCount should be renamed to getParameterCount (getFuncArgType 
should be renamed getParameterType) - only functions have parameters,
  and PHP is already already using "Parameter" instead of "Argument" for 
reflection on types elsewhere.

  Parameter is used to refer to the function declarations (AST_PARAM internally 
in the AST, ReflectionFunctionAbstract->getParameters(), etc.)
  Argument is used to refer to expressions passed to the functions by the 
caller (ArgumentCountError, etc.)

  Other languages use similar definitions, e.g. 
https://developer.mozilla.org/en-US/docs/Glossary/Parameter
- The discussion over where FFI arrays should support Countable::count (and 
non-arrays should throw) might be contentious so I'd rather keep getArrayLength

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] clamp

2021-06-23 Thread tyson andre

Hello Kim Hallberg,

> The RFC for the clamp function is now open and under discussion, you now have 
> 2 weeks 
> to discuss, suggest improvements and open issues before voting is considered.


>From https://wiki.php.net/rfc/clamp - 

> Current userland implementations are handled in several ways, some of which 
> use min and max to check the bound,
> which is costly and slow when called often.
> Because userland implementations are for the most part not cost-effective 
> when called multiple times,
> a language implementation is desired.

I'd strongly prefer an actual benchmark for context and accuracy - is it 
actually faster or slower than the most efficient userland implementation and 
by how much?
E.g. in an optimized NTS build with `CFLAGS=-O2`, opcache 
enabled(zend_extension=opcache, opcache.enable=1,opcache.enable_cli=1),
and no debug configure flags, how many calls per second can be made on variable 
values of $num for both?
(I'd assume over twice as fast as calling both min/max from another function, 
but possibly slower than efficient_clamp, but haven't run this)

For userland implementations that did use min/max, they probably weren't 
performance sensitive for the application.

```php
function efficient_clamp(int|float $num, int|float $min, int|float $max): 
int|float {
return $num < $min ? $min : ($num > $max ? $max : $num);
}
```

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] clamp

2021-06-23 Thread tyson andre

Hi Kim Hallberg,

> The RFC for the clamp function is now open and under discussion, you now have 
> 2 weeks 
> to discuss, suggest improvements and open issues before voting is considered.
> 
> Any and all feedback is welcomed.
> 
> The RFC is available for viewing here: https://wiki.php.net/rfc/clamp
> The implementation is available in a PR here: 
> https://github.com/php/php-src/pull/7191

https://wiki.php.net/rfc/howto mentions:

> Listen to the feedback, and try to answer/resolve all questions.
> **Update your RFC to document all the issues and discussions.
> Cover both the positive and negative arguments.** Put the RFC URL into all 
> your replies.

So I think the major objections are that:

1. This is easy to implement in userland and there's negligible performance 
benefit
   (in code that is a performance sensitive loop, it may still be worse 
compared to `$num < $min ? $min : ($num > $max : $max : $num)` when validity of 
types and min/max are known, especially with the JIT)
2. People not being sure if they'd ever use it personally, especially with the 
ability to customize parameter order and permitted argument types in userland
3. It's inconsistent with min() and max(), which support any comparable type 
such as GMP(https://www.php.net/manual/en/class.gmp)
   (arbitrary precision numbers), DateTime, etc., and that may lead to 
surprises.
   Although PHP's comparison operator and 
https://www.php.net/manual/en/function.min.php have many, many inconsistencies, 
already

   (Though special casing GMP is probably a bad idea due to it being optional 
and having an incompatible license with core for packagers)
4. I'm not sure what this is meant to do with the float NAN (Not A Number) from 
the RFC description, but that's solvable

```
php > var_dump(min(gmp_init('123'), gmp_init('456')));
object(GMP)#1 (1) {
  ["num"]=>
  string(3) "123"
}
php > var_dump(max(new DateTime('@0'), new DateTime()));
object(DateTime)#2 (3) {
  ["date"]=>
  string(26) "2021-06-23 23:44:47.302531"
  ["timezone_type"]=>
  int(3)
  ["timezone"]=>
  string(3) "UTC"
}
php > echo json_encode(max([0,2],[0,1]));
[0,2]
```

The RFC should probably link to this RFC announcement thread 
https://externals.io/message/115076 as well.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [Vote] make Reflection*#setAccessible() no-op

2021-06-23 Thread tyson andre

Mi Marco Pivetta,

> I'm opening the vote for making `Reflection*#setAccessible()`.
> 
> Voting starts today (2021-06-23) and ends in 14 days (2021-07-07).
> 
> Vote at https://wiki.php.net/rfc/make-reflection-setaccessible-no-op
> 
> Discussion: https://marc.info/?l=php-internals=162360269505048=2
> 
> Discussion^2: https://externals.io/message/114841

I'm in favor of this even without adding isAccessible(),
but just to note:

https://wiki.php.net/rfc/howto mentions:

> Listen to the feedback, and try to answer/resolve all questions.
> **Update your RFC to document all the issues and discussions.
> Cover both the positive and negative arguments.** Put the RFC URL into all 
> your replies.

1. This should probably link to the RFC discussions in a References section,
   not everyone who votes reads the mailing list.
2. https://externals.io/message/114841#114845  is the only thing that resembled 
an objection for a "Discussion" 
   section or future scope, though

   > I think that isAccessible should be added if any applications actually did 
depend on ReflectionException
   > being thrown for correctness - they could throw their own exception if 
isAccessible was false.
   > (e.g. for code meant to handle possibly undefined public typed properties 
by checking for initialization
   > then getting the value)
   >
   > I can't actually remember needing this (anything other than 
setAccessible(true)) personally, though, since `$obj->{$method}(...$args)` 
could be used.
   > I've only used this to access private and protected properties/methods.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Deprecate boolean to string coercion

2021-06-22 Thread tyson andre

Hi George P. Banyard,

> With Ilija we are proposing a short RFC to deprecate coercion from bool to
> string:
> https://wiki.php.net/rfc/deprecate-boolean-string-coercion
> 
> As this is the final day for any RFC to be even able to land in PHP 8.1
> the voting is expected to start in two weeks on the 6th of July.
> 
> The implementation is yet to be done but is expected to be rather
> straightforward and finished within the week.

I'd agree any casts from booleans to strings are usually a bug in the 
application

Something I'd like to see in the rfc: What's the intended behavior (notices) of 
`sprintf('%s', false);` (functions internally casting to strings)
What about `echo true; print(false);`, etc.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [PROPOSAL] Bare name array literals (again)

2021-06-21 Thread tyson andre

Hi Mel Dafert,

> >I would prefer an improved syntax for creation of anonymous objects. This
> >is something which I have been annoyed with, myself.I'd like to see a
> >simple way of creating anonymous objects with typed properties.
> 
> Another advantage arrays currently have over anonymous objects is 
> destructuring -
> if this was (somehow?) also made possible with objects, this would be the
> best of both worlds.
> (Returning multiple named values from a function is also mentioned in the
> use-cases of the RFC.)
> I know this works:
>
> ```
> [ "foo" => $foo, "baz" => $baz ] = (array) $object;
> ```
> 
> (Alternatively also using get_object_vars() instead of casting.)
> But we both have to convert to an intermediate array again, and lose the
> type information of the object (eg. for static analysis), so this could also 
> be
> made more ergonomic if we want to go down the anonymous object route.

Ideas for syntax:

There's `object(foo: $foo, 'hyphenated-key' => $key) = $object;`,
but that would require making `object` a reserved keyword, and it currently 
isn't one.

- I had suggested using `$x = object(foo: $foo);` as a shorthand for `$x = 
(object)['foo' => $foo];`
  at some point. Feedback was negative for largely unrelated reasons (stdClass 
in general). https://externals.io/message/112082

`object{foo: $foo}` seems similarly unambiguous by requiring at least one 
property but would also require a new reserved word

Or `({ foo: $foo }) = $object;` - On second thought, I'm against that - that is 
very likely to conflict with possible future proposals to parse block 
expressions.
(e.g. getting parsed as a label for a goto followed by returning the variable 
$foo.)

`list{ foo: $foo } = $object` is unambiguous, I guess

`(object)['foo' => $foo] = $object;` was another idea but that wouldn't even 
work.
It's already valid syntax that is parsed as casting an assignment to an object.

`->{'foo' => $foo} = $object;` may be parseable but doesn't make much sense.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [PROPOSAL] Bare name array literals (again)

2021-06-21 Thread tyson andre

Hi Christian Schneider,

> > return [success: true, data: $data, cursor: $cursor];
> > // is equivalent to the following, but shorter:
> > return ['success' => true, 'data' => $data, 'cursor' => $cursor];
> 
> Just a little side-note: A while ago I proposed a 2-line-patch to allow :$foo 
> as a synonym for 'foo' => $foo.
> 
> This allows for
>     return ['success' => true, :$data, :$cursor];
> which is both shorter and removes repetition while keeping the variable usage 
> $data and $cursor visible.
> 
> I know that this has been shot down before but I couldn't resist mentioning 
> it in this context, sorry ;-)

I'd also implemented the same thing at https://github.com/php/php-src/pull/6635 
and reverted it - didn't see that when the PR was first created

> It had also suggested `:$var` or `=$var` as shorthand for `var: $var`,
> but this is going to be left out of this proposal
> https://externals.io/message/101698 has mostly negative feedback on a recent 
> proposal (and there are multiple syntax candidates)

That was left out - I expected it would get less votes than just `var: $var` 
for reasons mentioned in https://externals.io/message/101698

Based on past RFCs I've seen, I'd assume an RFC would fail if most feedback was 
proposing alternate solutions or arguing against it, like it is here.
And voting results for https://wiki.php.net/rfc/bare_name_array_literal were 
mostly negative.
I'd hoped named arguments using the same syntax might raise interest in this, 
but it doesn't look like it so far.

E.g. short functions in https://externals.io/message/113751 had some positive 
feedback, but still got less than 2/3 votes

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [PROPOSAL] Bare name array literals (again)

2021-06-21 Thread tyson andre

Hi Rowan Tommins,

> This is an immediate "no" from me: it multiplies the ways to write the
> same thing from 2 to 4, in order to save a few bytes, in a few instances.

It's from 4 to 6 if you include single quoted strings vs double quoted strings.
If linters and automatic fixers were set up for a project to enforce it (e.g. 
phpcbf),
there would only be one way that literals would be used in a project either way.

> I think this is something that Douglas Crockford got absolutely right
> when he simplified JavaScript object syntax to formalise JSON: every
> valid key can be represented as a quoted string, so if the quotes are
> always there, you don't need to remember a list of rules about reserved
> words, allowed characters, etc.

It's the same rules as parts of identifiers or variable names.
I don't think there are any reserved words for named params or this 
proposal(`default`, etc. are allowed despite being keywords)

What makes sense for a serialization format which will have hundreds of 
encoders/decoders may not make sense for a programming language.

> > This is useful for shortening long arrays where the keys are known literals,
> > e.g.
> > 
> > return [success: true, data: $data, cursor: $cursor];
> > // is equivalent to the following, but shorter:
> > return ['success' => true, 'data' => $data, 'cursor' => $cursor];
> 
> Although common, this is not a good use of arrays; if your keys are
> "known literals", they should be fields of some object:
> 
> return new Result(success: true, data: $data, cursor: $cursor);

If there is only a single place where an array with those keys is returned,
or if dynamic properties get added later,
then creating a class may not be worth it.

Developers would have to switch between files and keep track of which classes 
were 
ordinary data storage and which had side effects or transformed parameters.

> If you don't want to declare a class (yet), you can use an anonymous
> object. Rather than yet another way to write arrays, it would be great
> to have some more powerful syntax for those; currently you'd have
> something like this:
> 
> return new class(success: true, data: $data, cursor: $cursor) { public
> function __construct(public bool $success, public array $data, public
> CursorInterface $cursor) {} };
> Brainstorming, we could perhaps extend property promotion into the "new
> class" clause, so that you could write this:
> 
> return new class(public bool success: true, public array data: $data,
> public CursorInterface cursor: $cursor) {};

I think anonymous objects would benefit some use cases but would not be useful 
for every use case (e.g. short single-file shell scripts),
though the typed properties and constructor property promotion are definitely 
convenient.

Also, with no common interface between those anonymous classes,
using just anonymous classes would be writing functions that accept any object 
or return any object, 
which would be error prone and hard to analyze.

Those anonymous classes wouldn't have any ancestors in common for `instanceof` 
checks.
Also, if there were optional parameters, that can be represented in 
non-standard JSDoc supported by static analyzers
(`success: bool, data?: array, cursor?: CursorInterface, errorMessage?: 
string`),
but that wouldn't be represented in an anonymous class
(setting a property to null is not the same as omitting it).

This reminds me of 
https://docs.python.org/3/library/collections.html#collections.namedtuple
but I don't plan to propose that (something similar could be done by validating 
all property names are valid, 
getting the sha1 of the ordered list of property names to choose a class name, 
and optionally setting properties to `readonly` (if the RFC passes) and 
forbidding dynamic properties,
and extending some common interface).

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [PROPOSAL] Bare name array literals (again)

2021-06-21 Thread tyson andre

Hi internals,

In every place where `key` is a valid php identifier
(e.g. can be used in PHP 8.0's named parameters),
I propose to allow `[key: expr]` to be used instead of `['key' => expr]`.
(including `list(key: expr)` and `array(key: expr)`
(This can be mixed anywhere with existing key/value syntax such as `$key => 
$value`, `...`, etc)

The implementation can be found inathttps://github.com/php/php-src/pull/6635

This is useful for shortening long arrays where the keys are known literals,
e.g.

```php
return [success: true, data: $data, cursor: $cursor];
// is equivalent to the following, but shorter:
return ['success' => true, 'data' => $data, 'cursor' => $cursor];
```

This uses a similar syntax to named parameter invocations,
making it unlikely to cause future parser conflicts.

```php
// Invoking a function with PHP 8.0 named parameters.
process_api_result(success: true, data: $data);
```

This can also be used in the older `array()` value and `list()` destructuring
syntaxes. Forbidding `key: value` there seemed like
an unnecessary restriction that would make
the language harder to remember and a language specification a bit longer.

I haven't written up an RFC yet, but an older RFC for PHP 5 
https://wiki.php.net/rfc/bare_name_array_literal
includes most of the arguments I plan to make, as well as the PR description. 
Things that have changed since then include:

- In php 8.0, named parameters were already added, so the `key: expr` syntax is 
not likely to cause conflicts with future syntax.
- Users would already be familiar with this syntax and its meaning due to named 
parameters.
- There are better open source static analyzers to detect misuse of array keys 
or passing unexpected types to arrays (Phan, Psalm, PHPStan) when code is 
properly annotated

https://wiki.php.net/rfc/named_params#shorthand_syntax_for_matching_parameter_and_variable_name
 mentioned this among the future scope.
It had also suggested `:$var` or `=$var` as shorthand for `var: $var`, but this 
is going to be left out of this feedback - https://externals.io/message/101698 
has mostly negative feedback on a recent proposal (and there are multiple 
syntax candidates)

Any feedback?

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] ImmutableIterable (immutable, rewindable, allows any key keys)

2021-06-17 Thread tyson andre

Hi Larry Garfield,

Thanks for responding.

> While I like the idea of an immutable collection, and the performance boost 
> seems useful, this proposal seems to go about it in a sloppy way.
> 
> 1) Iterable doesn't seem like the right "family" for this.  It is iterable, 
> but so are lots of other things.

I'd suggested alternative names such as ImmutableKeyValueSequence, in 
https://externals.io/message/114834#114834 , but

- It seemed as if the sentiment was very strongly against long names. I likely 
misjudged this.
- Many names were suggested by only one person. I can't tell if there's a 
consensus from that. E.g. `*Aggregate`.
- When I suggested names such as `ImmutableKeyValueSequence` before starting 
the vote, nobody had any feedback of it being better/worse than my previous 
proposals.

> 2) I... have never seen anyone in PHP use "pairs" as a concept.  I have no 
> idea what they're doing here.

https://www.php.net/manual/en/class.ds-pair.php is a concept used in the DS 
PECL, e.g. https://www.php.net/manual/en/ds-map.first.php
Proposing that as a new object type seemed excessive here.

That reason is because PHP is (fairly) unique among languages with generic 
iterable types in that there's a key associated with values.
I had to deal with this unusual situation somehow, and it's not a surprise that 
the solution is also unusual.
Do you propose alternate solutions other than omitting the functionality?

- Javascript only provides values in .next() - 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols
- Python only provides values 
https://docs.python.org/3/glossary.html#term-iterable
- C++ iterators only provide values 
https://www.cplusplus.com/reference/iterator/
- And so on

A generator-based workaround would be much slower

```
function userland create_iterable(iterable $pairs) {
foreach ($pairs as [$key, $value]) {
yield $key => $value;
}
}
// $pairs = array_map(...array_filter(...fetchOrComputeData(...)...)
$iterator = new ImmutableKeyValueSequence($pairs);
```

The other reason is that php has a large collection of internal and 
user-defined functions for dealing with arrays (sorting, filtering, etc), but 
few for iterables.
toPairs and fromPairs allow easily converting values to this and back, then 
calling usort/filter for compact code.

And if I provided fromPairs, toPairs seemed to make sense for completeness.

> 3) The JsonSerialize seems out of place.  It may make sense from another 
> angle, but it just sorta appears out of nowhere here.
> 
> It almost feels like what you actually want is an immutable Dictionary class. 
>  Such would naturally be iterable, countable, serializing makes some sense, a 
> fromIterable() method would make sense, etc.  

It would be useful for some but not all use cases. Especially use cases where 
keys aren't hashable, or where keys are repeated.

Not all values would be hashable in a dictionary (e.g. circular data 
structures, self-referential arrays). 

There's a lot of open design questions for Dictionary in core, e.g. the name, 
and whether objects should be hashable, or namespace, or whether it may 
conflict with future native types.
- And if a Hashable magic method or interface was added, then that might throw 
and make it impossible to store a generator.
- And if large data structures are used (e.g. yielding extremely large keys or 
slow object hashing, the hashing would be slow even when the application didn't 
need hashing at all)

> That I could get behind, potentially, although it also runs into the exciting 
> question of type restrictions and thus generics, which is where list type 
> discussions go to die. :-)

That's another possible obstacle to dictionary in core, but I hope not.

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] ImmutableIterable (immutable, rewindable, allows any key keys)

2021-06-16 Thread tyson andre

Hi Nikita,

> I like the concept here. I think the naming choice is unfortunate, and 
> causing confusion for people.
> 
> What you're really proposing here is a data structure: A sequence of 
> key-value-pairs. That generally seems like a sensible thing to have, in that 
> we can implement it significantly more efficiently in core than you could do 
> it in userland, especially when it comes to memory usage.
> 
> The issue is that you're not really framing this as a data structure, but as 
> an iterable. I get that memoizing an iterable was the original motivation 
> here, but I think it causes confusion. If this were 
> KeyValueSequence::fromIterable($iterable), I think that the meaning and 
> behavior would be perfectly clear -- of course it would eagerly collect the 
> iterable, there is no other way it could reasonably work! I think Marco's 
> concerns wouldn't come up either -- it's perfectly reasonable for a data 
> structure to implement support for serialization and JSON encoding. Not so 
> much for an iterator.

It may be more positive, 
but I don't think it'd pass with any name or by changing the constructor to 
fromIterable.
Over half of the objections are to functionality, over half to unspecified 
reasons,
and other email discussion responses don't seem to indicate interest in having 
the functionality, just on clarifying implementation or naming details

I was trying to avoid proposing functionality similar to that already in php-ds 
or an improvement to spl
(especially with ongoing namespacing policy discussion),
but that seems to be a mistake - if it was chosen for inclusion in those 
modules then it'd be a very common use case
(e.g. https://www.php.net/manual/en/splqueue.construct, 
https://www.php.net/manual/en/class.ds-map.php, etc)

Cheers,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] ImmutableIterable (immutable, rewindable, allows any key keys)

2021-06-15 Thread tyson andre

Hi Thomas Bley,

> Maybe better use "IterableImmutable" to be more consistent with 
> "DateTimeImmutable"?

1. Replies won't show up on list if they aren't sent to internals@lists.php.net
2. I consider the name DateTimeImmutable a mistake (but one that isn't worth 
fixing).
If it was done from scratch I believe ImmutableDateTime would make more 
sense with the adjective first, then the noun.

https://www.php.net/recursiveiterator 
https://www.php.net/manual/en/class.cachingiterator.php
https://www.php.net/manual/en/class.splpriorityqueue.php etc

Regards,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] ImmutableIterable (immutable, rewindable, allows any key keys)

2021-06-15 Thread tyson andre

Hi Marco Pivetta,

> I see the RFC as valuable but:
> 
> * `__serialize` and `__unserialize` are out of scope: this depends on the 
>contents of it, and there's no point in implementing them
> * `__set_state` should also not be implemented: `var_export()` like any other 
>object and it should be fine
> * `jsonSerialize` also depends on the contents, and shouldn't be exposed
> 
> All of this is not part of what should be in a reusable iterator.

This is an IteratorAggregate implementation and all of the contents of the 
inner iterable are eagerly evaluated in the constructor.
I'd also considered names such as ImmutableIteratorAggregate, 
ImmutableKeyValueSequence or ImmutableEntrySet, but was unhappy with all of the 
names (excessively long, misleading, ambiguous, etc),
and prior discussion on the mailing list lead me to believe short names were 
widely preferred over long names https://externals.io/message/114834#114812

If it was lazily evaluated such as proposed in 
https://wiki.php.net/rfc/cachediterable#future_scope , I'd agree that 
`__serialize`, `__unserialize`, `json_encode`, etc likely didn't belong in a 
lazily evaluated data structure,
but the goal was creating a reusable data structure
(e.g. that could be used to store key-value sequences from any source compactly 
(e.g. generators) and be serialized
and persisted to memcached, redis, a file, static array, etc)

Cheers,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [VOTE] ImmutableIterable (immutable, rewindable, allows any key keys)

2021-06-15 Thread tyson andre

Hi internals,

Voting has started on the ImmutableIterable RFC 
https://wiki.php.net/rfc/cachediterable

Previous discussion can be found in https://externals.io/message/114834

Recent changes:
- The name was renamed to `ImmutableIterable` to indicate that it cannot be 
changed after being constructed.
  It was brought up in previous discussions that the previous name of 
`CachedIterable` could easily be assumed to have functionality similar to 
on-demand iterators/iterables such as https://php.net/cachingiterator
  (Additionally, immutability is rare among spl data structures)
- `__set_state` was added

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] make Reflection*#setAccessible() no-op

2021-06-13 Thread tyson andre

Hi Marco Pivetta,

> I'm posting here to introduce a new simplification, as well as
> quality-of-life-improving RFC:
> 
> https://wiki.php.net/rfc/make-reflection-setaccessible-no-op
> 
> The RFC is quite minimal, and proposes removing any runtime behavior from
> `ReflectionMethod#setAccessible()` and
> `ReflectionProperty#setAccessible()`, making `ReflectionMethod` and
> `ReflectionProperty` accessible by default.
> 
> The rationale is:
> 
>  * this API is probably coming from a copy-pasted java-ism (although I
> couldn't verify that, so I did not factor it into the RFC)
>  * removes the last bit of mutable state from `ReflectionProperty` and
> `ReflectionMethod`
>  * simplifies usage of the API
>  * if I'm up to no good, I don't need to actually solemnly swear that i am
> up to no good (that's stuff for fantasy books)
> 
> I don't really know what the deadline for 8.1 features is, but I assume
> it's coming up quite quickly, so friendly NikiC poked me to see if this
> long-standing patch of mine was still relevant.
> 
> Should be short/sweet, but I'm looking forward to your feedback.

The deadline for new features for 8.1 is July 20: 
https://wiki.php.net/todo/php81
(with some discretion from the release managers, e.g. for amendments to changes 
already made in 8.1)

I think that isAccessible should be added if any applications actually did 
depend on ReflectionException
being thrown for correctness - they could throw their own exception if 
isAccessible was false.
(e.g. for code meant to handle possibly undefined public typed properties by 
checking for initialization
then getting the value)

I can't actually remember needing this personally, though, since 
`$obj->{$method}()` could be used.
I've only used this to access private and protected properties/methods.

Regards,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Re: RFC: CachedIterable (rewindable, allows any key keys)

2021-06-12 Thread tyson andre

Hi internals,

> > > So I'm probably changing this to `ImmutableTraversable` as a short name 
> > > for the functionality,
> > > to make it clear arguments are eagerly evaluated when it is created.
> > > (ImmutableSequence may be expected to only contain values, and would be 
> > > confused with the ds PECL's 
> > > https://www.php.net/manual/en/class.ds-sequence.php)
> >
> > Hello,
> >
> > And why not simply RewindableIterator ? Isn't it the prominent feature
> > of it ?
> >
> > Agreed it's immutable, but a lot of traversable could be as well.
> 
> All iterators are "rewindable", though of course not in practice. I
> would avoid such names because we may eventually add an interface
> which works as a "tag" to say "yes, I actually do support rewinding."
> 
> The property of being rewindable comes from it being cached. Maybe
> `CachedAggregate`? Aggregates are data structures from which an
> external iterator can be obtained, so it makes a bit more sense if
> it's eager.

I think CachedAggregate would have problems with an unclear meaning similar to 
those that were raised previously in https://externals.io/message/114819#114798
(Some developers would think it may refer to the act of lazily evaluating the 
iterable(caching it on-demand to access later))

https://en.wikipedia.org/wiki/Aggregate on its own refers to a collection of 
objects/values
or in other contexts, functions such as count/sum/min/max 
https://en.wikipedia.org/wiki/Aggregate_function 

- In other contexts such as set theory, there might not be keys associated with 
the values
  so aggregate on its own seems unclear.

ImmutableIteratorAggregate or just ImmutableIterable/ImmutableTraversable makes 
more sense than `Cached*` to me.
ImmutableKeyValueSequence is an even shorter name than 
ImmutableIteratorAggregate and describes what the data structure is.

Thanks,
Tysosn

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Re: RFC: CachedIterable (rewindable, allows any key keys)

2021-06-10 Thread tyson andre

Hi Alex,

> > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding 
> > CachedIterable,
> > which eagerly evaluates any iterable and contains an immutable copy of the 
> > keys and values of the iterable it was constructed from
> >
> > A heads up - I will probably start voting on 
> > https://wiki.php.net/rfc/cachediterable this weekend after 
> > https://wiki.php.net/rfc/cachediterable_straw_poll is finished.
> >
> > Any other feedback on CachedIterable?
> 
> Thanks for explaining 4 months ago about my concern.
> I think I understand the main real impact of an eager iterable cache vs a 
> lazy iterable cache from a functional point of view:
> - exceptions are thrown during construction vs during the first iteration
> - predictable performance also on the first iteration.
> 
> How did you gather the information that eager implementation is more valuable 
> than lazy one? I'm mostly curious also how to assess this as technically to 
> me it also looks the other way around. Maybe mention that in the RFC.
> I was even thinking that CachedIterable should be lazy and an 
> EagerCachedIterable would be built upon that with more methods. Or have it in 
> the same class with a constructor parameter.

One of the reasons was size/efficiency. Adding the functionality to support 
lazy evaluation would require extra properties to track internal state and 
extra checks at runtime, 
point to the original iterable and the functions being applied to that iterable 
- so an application that creates lots of small/empty cached iterables would 
have a higher memory usage.

Having a data structure that tries to do everything would do other things 
poorly 
(potentially not support serialization, use more memory than necessary,
have unintuitive behaviors when attempting to var_export/var_dump it, 
surprisingly throw when being iterated over, etc)

> Also, being able to have a perfect userland implementation, not very complex, 
> even considering the lower performance, is not that good for positive voting 
> from what I remember from history...

1. The userland polyfill included in the RFC is an incomplete implementation 
that only supports iteration. 
   It's meant to be as fast as possible at the cost of memory usage.
   It's not even an IteratorAggregate, doesn't support json encode, 
createFromPairs, and many other functions.
2. Virtually all of the spl iterables that don't deal with filesystems can be 
reimplemented in userland.
   (https://en.wikipedia.org/wiki/Turing_completeness)

   Even complicated extensions such as redis or memcached can be reimplemented 
in userland on top of sockets,
   but with higher cpu usage than native extensions 
(https://github.com/predis/predis/blob/main/FAQ.md#predis-is-a-pure-php-implementation-it-can-not-be-fast-enough)

   The benefit of having data structures internally is the fact that developers 
who learn them can use them in any project without adding dependencies
   (even in single file scripts) and that applications using CachedIterable 
would have much better performance 

Also, you and Levi have pointed out that iterable/iterator functionality is 
traditionally on-demand
(https://en.wikipedia.org/wiki/Lazy_evaluation) (e.g. iterables such as 
CallbackFilterIterator, RecursiveArrayIterator, etc)

As a result, I'm thinking CachedIterable is really not a good name for the 
eagerly evaluated data structure I'm proposing here,
and that there was confusion about how the data structure behaved when the name 
CachedIterable was suggested.
If functionality like that described in 
https://externals.io/message/114805#114792 was added, it could use the name 
CachedIterable instead.

So I'm probably changing this to `ImmutableTraversable` as a short name for the 
functionality,
to make it clear arguments are eagerly evaluated when it is created.
(ImmutableSequence may be expected to only contain values, and would be 
confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php)

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Re: RFC: CachedIterable (rewindable, allows any key keys)

2021-06-09 Thread tyson andre

Hi Levi Morrison,

> > > Hi internals,
> > >
> > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding 
> > > > CachedIterable,
> > > > which eagerly evaluates any iterable and contains an immutable copy of 
> > > > the keys and values of the iterable it was constructed from
> > > >
> > > > This has the proposed signature:
> > > >
> > > > ```
> > > > final class CachedIterable implements IteratorAggregate, Countable, 
> > > > JsonSerializable
> > > > {
> > > > public function __construct(iterable $iterator) {}
> > > > public function getIterator(): InternalIterator {}
> > > > public function count(): int {}
> > > > // [[$key1, $value1], [$key2, $value2]]
> > > > public static function fromPairs(array $pairs): CachedIterable {}
> > > > // [[$key1, $value1], [$key2, $value2]]
> > > > public function toPairs(): array{}
> > > > public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
> > > > public function __unserialize(array $data): void {}
> > > >
> > > > // useful for converting iterables back to arrays for further 
> > > >processing
> > > > public function keys(): array {}  // [$k1, $k2, ...]
> > > > public function values(): array {}  // [$v1, $v2, ...]
> > > > // useful to efficiently get offsets at the middle/end of a long 
> > > >iterable
> > > > public function keyAt(int $offset): mixed {}
> > > > public function valueAt(int $offset): mixed {}
> > > >
> > > > // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
> > > > public function jsonSerialize(): array {}
> > > > // dynamic properties are forbidden
> > > > }
> > > > ```
> > > >
> > > > Currently, PHP does not provide a built-in way to store the state of an 
> > > > arbitrary iterable for reuse later
> > > > (when the iterable has arbitrary keys, or when keys might be repeated). 
> > > > It would be useful to do so for many use cases, such as:
> > > >
> > > > 1. Creating a rewindable copy of a non-rewindable Traversable
> > > > 2. Generating an IteratorAggregate from a class still implementing 
> > > > Iterator
> > > > 3. In the future, providing internal or userland helpers such as 
> > > > iterable_flip(iterable $input), iterable_take(iterable $input, int 
> > > > $limit),
> > > > iterable_chunk(iterable $input, int $chunk_size), 
> > > >iterable_reverse(), etc (these are not part of the RFC)
> > > > 4. Providing memory-efficient random access to both keys and values of 
> > > > arbitrary key-value sequences
> > > >
> > > > Having this implemented as an internal class would also allow it to be 
> > > > much more efficient than a userland solution
> > > > (in terms of time to create, time to iterate over the result, and total 
> > > > memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks
> > > >
> > > > After some consideration, this is being created as a standalone RFC, 
> > > > and going in the global namespace:
> > > >
> > > > - Based on early feedback on 
> > > > https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the 
> > > > namespace preferred in previous polls)
> > > >   It seems like it's way too early for me to be proposing namespaces in 
> > > >any RFCs for PHP adding to modules that already exist, when there is no 
> > > >consensus.
> > > >
> > > >   An earlier attempt by others on creating a policy for namespaces in 
> > > >general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not 
> > > >pass.
> > > >
> > > >   Having even 40% of voters opposed to introducing a given namespace 
> > > >(in pre-existing modules)
> > > >   makes it an impractical choice when RFCs require a 2/3 majority to 
> > > >pass.
> > > > - While some may argue that a different namespace might pass,
> > > >   
> > > >https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote 
> > > >had a sharp dropoff in feedback after the 3rd form.
> > > >   I don't know how to interpret that - e.g. are unranked namespaces 
> > > >preferred even less than the options that were ranked or just not seen 
> > > >as affecting the final result.
> > >
> > > A heads up - I will probably start voting on 
> > > https://wiki.php.net/rfc/cachediterable this weekend after 
> > > https://wiki.php.net/rfc/cachediterable_straw_poll is finished.
> > >
> > > Any other feedback on CachedIterable?
> > >
> > > Thanks,
> > > Tyson
> > >
> > > --
> > > PHP Internals - PHP Runtime Development Mailing List
> > > To unsubscribe, visit: https://www.php.net/unsub.php
> > >
> >
> > Based on a recent comment you made on GitHub, it seems like
> > `CachedIterable` eagerly creates the datastore instead of doing so
> > on-demand. Is this correct?
> 
> Sorry, yes, that's correct and pointed out in the RFC.
> 
> I think that's a significant implementation flaw. I don't see why we'd
> balloon memory usage unnecessarily by being eager -- if an operation
> needs to fetch more data then it can go ahead and do so.

First, PHP's standard library

[PHP-DEV] Re: RFC: CachedIterable (rewindable, allows any key keys)

2021-06-08 Thread tyson andre

Hi internals,

> I've created a new RFC https://wiki.php.net/rfc/cachediterable adding 
> CachedIterable,
> which eagerly evaluates any iterable and contains an immutable copy of the 
> keys and values of the iterable it was constructed from
> 
> This has the proposed signature:
> 
> ```
> final class CachedIterable implements IteratorAggregate, Countable, 
> JsonSerializable
> {
>     public function __construct(iterable $iterator) {}
>     public function getIterator(): InternalIterator {}
>     public function count(): int {}
>     // [[$key1, $value1], [$key2, $value2]]
>     public static function fromPairs(array $pairs): CachedIterable {}
>     // [[$key1, $value1], [$key2, $value2]]
>     public function toPairs(): array{} 
>     public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
>     public function __unserialize(array $data): void {}
>  
>     // useful for converting iterables back to arrays for further processing
>     public function keys(): array {}  // [$k1, $k2, ...]
>     public function values(): array {}  // [$v1, $v2, ...]
>     // useful to efficiently get offsets at the middle/end of a long iterable
>     public function keyAt(int $offset): mixed {}
>     public function valueAt(int $offset): mixed {}
>  
>     // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
>     public function jsonSerialize(): array {}
>     // dynamic properties are forbidden
> }
> ```
> 
> Currently, PHP does not provide a built-in way to store the state of an 
> arbitrary iterable for reuse later
> (when the iterable has arbitrary keys, or when keys might be repeated). It 
> would be useful to do so for many use cases, such as:
> 
> 1. Creating a rewindable copy of a non-rewindable Traversable 
> 2. Generating an IteratorAggregate from a class still implementing Iterator
> 3. In the future, providing internal or userland helpers such as 
> iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
>     iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc 
> (these are not part of the RFC)
> 4. Providing memory-efficient random access to both keys and values of 
> arbitrary key-value sequences 
> 
> Having this implemented as an internal class would also allow it to be much 
> more efficient than a userland solution
> (in terms of time to create, time to iterate over the result, and total 
> memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks
> 
> After some consideration, this is being created as a standalone RFC, and 
> going in the global namespace:
> 
> - Based on early feedback on 
> https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace 
> preferred in previous polls)
>   It seems like it's way too early for me to be proposing namespaces in any 
> RFCs for PHP adding to modules that already exist, when there is no consensus.
> 
>   An earlier attempt by others on creating a policy for namespaces in 
> general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.
> 
>   Having even 40% of voters opposed to introducing a given namespace (in 
> pre-existing modules)
>   makes it an impractical choice when RFCs require a 2/3 majority to pass.
> - While some may argue that a different namespace might pass,
>   https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had 
> a sharp dropoff in feedback after the 3rd form.
>   I don't know how to interpret that - e.g. are unranked namespaces preferred 
> even less than the options that were ranked or just not seen as affecting the 
> final result.

A heads up - I will probably start voting on 
https://wiki.php.net/rfc/cachediterable this weekend after 
https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Disable autovivification on false

2021-06-05 Thread tyson andre

Hi Kamil,

> I have reworked the RFC based on some feedback. The improved RFC now will
> hold 2 votes. One vote to decide whether the behaviour should be deprecated
> from false, and another from null.
> 
> If there are no objections then I would like to start the votes in a couple
> of days.
> 
> However, I would still like to hear from you whether you
> use autovivification from false and/or null in your projects. So far, I was
> not able to identify when this would be useful in real-life scenarios.
> 
> RFC: https://wiki.php.net/rfc/autovivification_false

Without an implementation it'd be hard to actually tell what the impact would 
be. There isn't one linked to from the RFC or on github.com/php/php-src.
You might have code in your application or external dependencies that relies on 
that without you remembering or being aware of it for null.

**I started working on a prototype independently just to get an idea of how 
many things would encounter deprecations. See 
https://github.com/TysonAndre/php-src/pull/17** (I am not one of the RFC's 
authors. If someone bases their implementation on that prototype PR please keep 
the authorship of initial commits (e.g. `Co-Authored-By` in git)

Also, what is the planned deprecation message, what about documenting all kinds 
of expressions that can cause autovivication, etc: e.g. `$x = 
&$falseVar['offset']`


My assumption is that false would be reasonably practical to implement (this 
patch with `== IS_FALSE` instead of `>= IS_NULL`, plus some changes to the 
optimizer to account for the fact some dimension assignment statements might 
now have.
For IS_NULL, that would probably require more familiarity with php's internals 
than I have to be certain the implementation is correct and to properly 
distinguish between undefined and null when automatically creating properties.

Deprecation notices would be a lot more common for null than for false for 
example snippets such as the below, I see dozens of test failures in Zend/tests 
with that prototype

`$this->someArray[$offset] = $event` for the implicitly null `public 
$someArray;`. 
https://github.com/vimeo/psalm/blob/105c6f3a1c6521e4077da39f05a94b1ddbd76249/src/Psalm/Internal/PhpVisitor/ReflectorVisitor.php#L399
 is an example of that - the property's default is null, not the empty array.
- Obviously, that project would fix it quickly (I'm just testing on projects 
I've already downloaded), but the point is there may be a lot of code like that 
elsewhere.

And code such as https://github.com/nikic/php-ast/blob/v1.0.12/util.php#L25 
(for a PECL I use - that would be fixed very quickly for null if this passed, 
there may be a lot of code like that elsewhere and other projects may not be 
maintained)

```
static $someVar; // equivalent to static $someVar = null;
if ($someVar !== null) {
return $someVar;
}
// ... someVar is still null
$someVar[] = '123';
```

```
function example() {
global $array;
var_dump($array); // null, not undefined due to global $array converting 
undefined to null before creating a reference.
$array[] = 1;
}
```



Overall, I'm in favor of this for false, but on the fence about null.
It seems like a lot of old and possibly unmaintained applications/libraries 
might be converting null to arrays implicitly in ways such as the above,
and there wouldn't be too much of a benefit to users to forcing them to start 
explicitly converting nulls to arrays before adding fields to arrays or taking 
references.

If a project isn't already providing type information everywhere (and isn't 
using a static analyzer such as phan/psalm) it may be easy to miss the 
possibility of a value being null in rare code paths,
but deprecating should give enough time to detect and fix typical issues 
between 8.1 and 9.0, and php 7.4's typed properties and 8.0's union types may 
make type information much easier to track.

I try to avoid autovivication from null in php projects I work on, but I'm not 
the only contributor to those

Regards,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [VOTE] Straw poll: Namespace to use for CachedIterable and iterable functionality

2021-06-05 Thread tyson andre

Hi internals,

Voting has started on https://wiki.php.net/rfc/cachediterable_straw_poll and 
ends in a week on June 12, 2021
(the voting period is shorter because this is a poll, not an RFC, and the 
feature freeze is soon)

Previously, https://wiki.php.net/rfc/namespaces_in_bundled_extensions passed 
37-1,
but nobody has created any RFCs using namespaces that I know of.
As a result, I'm uncertain if voters would prefer namespaces over the global 
namespace in practice for extensions that already have functions,
and even if namespaces are preferred, there may be multiple candidates for 
namespaces.

This poll was created to gather information on

1. How voters interpret the 
https://wiki.php.net/rfc/namespaces_in_bundled_extensions RFC for existing 
namespaces,
    as it makes recommendations but also permits the global namespace for new 
functionality consistent with existing functionality - the way I expect voters 
to interpret it may be different from how it is interpreted in practice.

   iterable_all() seemed to have been preferred over iterable\all in the 
previous straw poll.
   I would expect https://wiki.php.net/rfc/namespaces_in_bundled_extensions to 
shift that preference towards namespaces, but there have been no votes on an 
RFC using namespaces yet.

   Additionally, I didn't notice this earlier, but the RFC recommended (but 
didn't mandate) that "Namespace names should follow CamelCase." - so I'm not 
sure if iterable\ or Iterable\ makes the most sense to others.
2. To see if there's interest in that functionality before spending too much 
time on it. E.g. for CachedIterable, https://externals.io/message/113136 had 
little feedback
    but would enable implementing a standard library for iterables that was 
much wider in scope (e.g. iterable\flip(), iterable\reversed(), 
iterable\take(), etc)

(Sorry - It is difficult to tell if feedback from a few people on a mailing 
list is representative of the majority of voters and there have been no RFCs 
for me to look at as a precedent).

Previous discussion can be found at https://externals.io/message/114687

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] "Namespaces in bundled extensions" and iterable additions to the standard library

2021-06-01 Thread tyson andre

Hi internals,

Recently, there have been 3 proposals to add functionality related to 
iterables/iterators to the standard library where voting was postponed for 
reasons related to namespacing policy:

1. https://wiki.php.net/rfc/any_all_on_iterable - where major objections were 
not having enough functionality and the choice of namespace.

   If this did go with a namespace, I believe iterable\any(), iterable\all(), 
etc. would be reasonable.

   iterable\X was the most popular choice among choices with a single namespace 
part - https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace
2. https://wiki.php.net/rfc/cachediterable - My preference would be 
iterable\CachedIterable.

   With this being one of the first proposed namespaced additions to PHP, I 
felt obligated to gather feedback on alternative namespace choices.
3. https://externals.io/message/113061 and 
https://github.com/php/php-src/pull/6535 by Levi Morrison - I'm uncertain of 
the status of this, e.g. what name the author had planned to go with 
"Namespaces in bundled extensions" namespace RFC passed weeks ago.

https://wiki.php.net/rfc/namespaces_in_bundled_extensions passed 37-1, but as 
nobody has created any RFCs using namespaces that I know of,
I'm uncertain if voters would prefer namespaces over the global namespace in 
practice for extensions that already have functions, and even if namespaces are 
preferred, there may be multiple candidates.

I plan to start a straw poll 
(https://wiki.php.net/rfc/cachediterable_straw_poll) in a few days to gather 
information on

1. How voters interpret the 
https://wiki.php.net/rfc/namespaces_in_bundled_extensions RFC for existing 
namespaces,
as it makes recommendations but also permits the global namespace for new 
functionality consistent with existing functionality - the way I expect voters 
to interpret it may be different from how it is interpreted in practice.

   iterable_all() seemed to have been preferred over iterable\all in the 
previous straw poll.
   However, I would expect 
https://wiki.php.net/rfc/namespaces_in_bundled_extensions to shift that 
preference towards namespaces but there have been no votes on an RFC using 
namespaces yet.

   Additionally, I didn't notice this earlier, but the RFC recommended (but 
didn't mandate) that "Namespace names should follow CamelCase." - so I'm not 
sure if iterable\ or Iterable\ makes the most sense to others.
2. To see if there's interest in that functionality before spending too much 
time on it. E.g. for CachedIterable, https://externals.io/message/113136 had 
little feedback
but would enable implementing a standard library for iterables thtat was 
much wider in scope (e.g. iterable\flip(), iterable\reversed(), 
iterable\take(), etc)

(Sorry - It is difficult to tell if feedback from a small number people on a 
mailing list is representative of the majority of voters and there have been no 
RFCs for me to look at as a precedent).

Questions I had:
- Did anyone interested in adding CachedIterable have a different idea for 
namespace choices that should be included in the poll?
- Did anyone have feedback on whether iterable\ or Iterable\ makes more sense - 
Personally, iterable\ seems like it should be an exception due to it also being 
used as a soft reserved keyword that is typically lowercase.
- Any feedback on CachedIterable's functionality?
- Any other feedback?

Regards,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [VOTE] Allow static properties in enums

2021-06-01 Thread tyson andre

Hi internals,

I have opened the vote on https://wiki.php.net/rfc/enum_allow_static_properties
Voting ends on June 15, 2021

Previous discussion can be found at https://externals.io/message/114494

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: [RFC]: Allow static properties in enums

2021-05-30 Thread tyson andre

Hi internals,

> I've created a new RFC https://wiki.php.net/rfc/enum_allow_static_properties
> 
> Although enums are immutable objects, it is often useful to have functions or 
> methods that operate on enum instances.
> In many cases, it would make sense to declare that functionality as static 
> methods on the enum itself (which is already permitted).
> In cases where static methods require shared state, it would be useful to 
> allow storing those shared state in static properties.
> To ensure immutability of enum instances, it's only necessary to forbid 
> instance properties, but all properties were forbidden in the initial 
> functionality > included with the enums RFC.
> 
> This RFC proposes allowing static properties in enums, while continuing to 
> forbid instance properties.

I plan to start voting on https://wiki.php.net/rfc/enum_allow_static_properties 
on June 1, 2021
 
Regards,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC]: Allow static properties in enums

2021-05-17 Thread tyson andre

> > On Mon, May 17, 2021, at 9:16 AM, Michał Marcin Brzuchalski wrote:
> > > pon., 17 maj 2021, 16:02 użytkownik tyson andre <
> > tysonandre...@hotmail.com>
> > > napisał:
> > >
> > > > Hi internals,
> > > >
> > > > I've created a new RFC
> > > > https://wiki.php.net/rfc/enum_allow_static_properties
> > > >
> > > > Although enums are immutable objects, it is often useful to have
> > functions
> > > > or methods that operate on enum instances.
> > > > In many cases, it would make sense to declare that functionality as
> > static
> > > > methods on the enum itself.
> > > > In cases where static methods require shared state, it would be useful
> > to
> > > > allow storing those shared state in static properties.
> > > > To ensure immutability of enum instances, it's only necessary to forbid
> > > > instance properties, but all properties were forbidden in the initial
> > > > functionality included with the enums RFC.
> > > >
> > > > This RFC proposes allowing static properties in enums, while
> > continuing to
> > > > forbid instance properties.
> > > >
> > >
> > > Would you be able to provide more real life example?
> > > The example in RFC could easily encapsulate current Environment reading
> > in
> > > for eg. EnvironmentConfiguration class with static property and method
> > and
> > > TBH possibly that would be my preference to solve this.
> >
> > I would agree.  Static properties are ugly to begin with.  They're globals
> > with extra syntax.  I have no desire to see them on enums.
> >
> > Also a clarification, since it wasn't entirely clear in Tyson's original
> > email: Static methods on Enums *are already supported*.  They were included
> > in the original Enum RFC.  The change proposed here is just about static
> > properties.
> >
> 
> Personally, I'd prefer to see enums as value objects only, adding static
> properties allow to implementation of statically conditional behaviour.
> IMO enums should consist only of pure functions. This is why I'd vote NO on
> this proposal.
> 
> Cheers,
> Michał Marcin Brzuchalski

Hi internals,

I've updated https://wiki.php.net/rfc/enum_allow_static_properties to include 
the arguments made so far against/for including
static properties, as well as including more arguments/examples in favor of 
including static properties in traits.

Thanks,
- Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [RFC]: Allow static properties in enums

2021-05-17 Thread tyson andre

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/enum_allow_static_properties

Although enums are immutable objects, it is often useful to have functions or 
methods that operate on enum instances.
In many cases, it would make sense to declare that functionality as static 
methods on the enum itself.
In cases where static methods require shared state, it would be useful to allow 
storing those shared state in static properties.
To ensure immutability of enum instances, it's only necessary to forbid 
instance properties, but all properties were forbidden in the initial 
functionality included with the enums RFC.

This RFC proposes allowing static properties in enums, while continuing to 
forbid instance properties.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] [Vote] Adding return types to internal methods

2021-04-24 Thread tyson andre

Hi Marco Pivetta,

> > In fact, if reflection were to switch to the actual runtime return types of
> > those methods, I don't see a reason why downstream consumers would break
> > (stubbing tools, code generators, type checkers, dependency solvers, etc.)
> 
> If the published library/application had to support older versions (e.g. php 
> 7.4),
> but the tentative return types contained types/syntaxes that required php 8.0 
> (e.g. union types such as `string|false`, new types such as `mixed`/`never`, 
> etc,)
> then the code generators and type checkers and stubbing tools would need to 
> be 
> updated to exclude the new tentative return types much earlier than 
> absolutely needed.
> 
> From experience, code generated with tooling while running on newer PHP 
> versions is already incompatible with older PHP versions: you re-generate the 
> code when changing any of the dependencies anyway (think "no ABI 
> compatibility").
> 
> This is at least true for all codegen tools I worked/contributed to/used on 
> so far.

Mocking libraries and static analyzers that don't result in published code were 
my largest concern, generated code that gets published was a smaller one.

Changing getReturnType would significantly increase the scope of that 
incompatibility earlier on for users that don't install multiple php versions
(users/maintainers may default to whatever is provided by their package manager 
for convenience)

I'd rather have a larger time window with deprecations to change those and have 
any potentially breaking changes 
(from the perspective of users of older versions of code generation tools, test 
libraries, static analyzers)
in 9.0 instead of 8.1, to put the (small) BC breaks in major releases where 
possible.

The introduction of many `mixed` tentative types which makes sense from a type 
system perspective,
but with your alternate proposal for changing getReturnType(),  
but would result in code generating tools generating a lot of `: mixed` return 
types (requiring php 8.0+ runtime) in various interfaces and classes 
which would be incompatible with a missing return type override due to 
https://wiki.php.net/rfc/mixed_type_v2#explicit_returns

> We're mostly breaking BC (new methods on reflection symbols, requiring 
> special treatment) for stuff that is really an edge case that is only 
> affecting tooling that would really work just fine even if the reflection API 
> started to report the real return types now (no API change whatsoever).
> 
> What's the plan for PHP 9 about these methods? Deprecation/removal? Or are we 
> adding something that we'll have to drag on forever?

The RFC proposal https://wiki.php.net/rfc/internal_method_return_types stated 
those plans.

Unless new information comes up in the case of specific methods such as 
breaking commonly used frameworks,
in almost all cases, I'd assume tentative types in php 8.x would become real 
types in the next major version (php 9.0).

> Non-final internal method return types - when possible - are declared 
> tentatively in PHP 8.1,
> **and they will become enforced in PHP 9.0.** It means that in PHP 8.x 
> versions,
> a “deprecated” notice is raised during inheritance checks when an internal 
> method 
> is overridden in a way that the return types are incompatible, 
> **and PHP 9.0 will make these a fatal error.** A few examples:

Tentative return types would also be used by PECLs, so the 
getTentativeReturnType would continue to be used forever.
(I'd expect PECLs would generally add tentative types in `n.x.y` and change the 
real type in `(n+1).0.0`)

The alternate design you've proposed of changing getReturnType seems to have 
issues
- For user-defined types (if we allow an annotation mentioning a tentative 
return type exists without indicating the type),
  it'd be possible for hasTenativeReturnType to be true but getReturnType to be 
null, which is the opposite of internal classes
- As I'd mentioned before, if return type functionality gets extended to also 
work on functions that already have return types (user-defined and/or internal),
  in which case php would need to add 
`ReflectionFunctionAbstract->getRealReturnType`, but I'd rather keep the 
current semantics of `getReturnType`.

I personally expect your alternate proposal to be more controversial due to the 
larger potential bc break and barriers to upgrading in a minor release rather 
than a major release, but may be mistaken.

Thanks,
- Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] [Vote] Adding return types to internal methods

2021-04-24 Thread tyson andre

Hi Marco Pivetta,

> In fact, if reflection were to switch to the actual runtime return types of
> those methods, I don't see a reason why downstream consumers would break
> (stubbing tools, code generators, type checkers, dependency solvers, etc.)

If the published library/application had to support older versions (e.g. php 
7.4),
but the tentative return types contained types/syntaxes that required php 8.0 
(e.g. union types such as `string|false`, new types such as `mixed`/`never`, 
etc,)
then the code generators and type checkers and stubbing tools would need to be 
updated to exclude the new tentative return types much earlier than absolutely 
needed.

- Users would benefit from code/tooling working the same way in php 7.x and 8.1 
when upgrading to 8.1
  and making the tooling unexpectedly stricter may be regarded as a breaking 
change by end users, especially for unmaintained tools/libraries.

  I'd prefer if the tooling authors and end users had to opt in to use the 
tentative return types
  and upgrade to a version of tooling that was aware of the tentative return 
types to start using them.
- Forcing getReturnType to change immediately would be a barrier to upgrading, 
especially for users that aren't deeply familiar with php, 
  if the php version or library versions being used in production weren't 
compatible with the latest versions 
  of those stubbing tools, code generators, type checkers, dependency solvers, 
etc. that were aware of tentative return types
  (e.g. a test mock no longer returning null)

  To upgrade from php 7.3 (or older) to 8.1, a user may want applications and 
libraries that worked the same way in both 7.3 and 8.1,
  and would only want to upgrade the applications/libraries (and fix the 
tentative type notices) after they stopped using php 7
- PHP 8.0 would be only one year older than 8.1 and automatically generating 
more user-defined subclasses with union types
  this early on (e.g. and publishing to packagist) would be inconvenient for 
users still on php 7.

Also, as mentioned by Nikita Popov in https://externals.io/message/113413#114052
**Having a distinction between getReturnType and getTentativeReturnType also 
allows the functionality in this RFC to be extended in the future,
e.g. from a getReturnType of `BaseType` to getTentativeReturnType of `SubType`, 
rather than only being useful when return types are missing**

Additionally, I agree with the points made by Nikita/Máté Kocsis - older 
releases of static analyzers would treat getReturnType as if it was definitely 
the real type,
and falsely treat some type checks as **definitely** redundant/impossible 
rather than **probably** redundant/impossible,
leading users to remove those checks with insufficient 
validation/testing/review, before it's definitely safe/correct to do so.

- third party code in vendor dependencies (and mocks generated for unit tests) 
are typically not analyzed for issues,
  but third party code might override internal classes and make those seemingly 
redundant/impossible checks actually required).
- E.g. if getReturnType were to change instead of adding getTentativeReturnType,
  older releases of `phan` with `--redundant-condition-detection` would falsely 
report that conditions were definitely impossible/redundant when it is still 
possible for subclasses to return different types.

(I'm a maintainer of the static analyzer http://github.com/phan/phan/ and would 
personally prefer the getTentativeReturnType approach,
Marco Pivetta(Ocramius) works on/contributes to various php projects/analyzers 
such as BetterReflection)

Thanks,
-Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Namespaced in bundled extensions

2021-04-05 Thread tyson andre

> > The question of namespaces in the stdlib has been coming up a lot recently,
> > so I'd like to present my own stab at resolving this question:
> >
> > https://wiki.php.net/rfc/namespaces_in_bundled_extensions
> >
> > Relative to a number of previous (declined) proposals, the main difference
> > is that I do not propose a top-level "PHP\" vendor namespace, and instead
> > recommend the use of "ExtName\", in line with existing practice for
> > extensions. I believe this addresses the primary concern with previous
> > proposals.
> 
> Both of the namespacing RFCs have been announced for over 3 weeks and I don't 
> think I've
> seen any new discussion since then.
> Are any updates planned? When will voting on the namespacing RFC(s) start?
> (I had some stdlib RFCs/RFC ideas I was postponing since February to avoid 
> interfering with the namespacing discussion)
> 
> I'd love to have some more feedback on this RFC before opening voting. There 
> has been a lot of discussion beforehand, but only a couple responses to this 
> RFC...

I didn't plan to suggest changing the direction of the RFC, so I didn't have 
much to say.
I guess it's an improvement from a user perspective and that splitting 
core/PECL/composer namespacing wouldn't make much sense,
especially with the ability to polyfill most core functionality in composer 
packages (especially with PHP providing FFI, low level socket/stream code, etc).

For something like https://wiki.php.net/rfc/cachediterable I'd still be faced 
with the namespacing choice among multiple options if this passed,
but choosing names for everything is out of the scope of this RFC.

- `iterable\CachedIterable` would be the most likely, although it's also in 
some ways a datastructure
- For SPL, e.g. for a new Map type or existing classes such as 
SplObjectStorage, 
  there'd still be a number of different names such as `DataStructure\Map` or 
`Collections\Map` (DS is already used by an independent PECL)
- "When adding new symbols to existing extensions, it is more important to be 
consistent with existing symbols than to follow the namespacing guidelines."
  raises the question of whether existing iterables should be aliased to a 
namespace around the same time
- 5 years from now we may have a different group of active voters, so if this 
passed with low voting turnout
  I'm not sure if there'd still be arguments over the choice to use/not use a 
namespace.

For a future iteration of https://wiki.php.net/rfc/any_all_on_iterable it'd 
help if there was known community consensus (i.e. the vote on namespaces in 
bundled extensions finished)

I didn't notice before, but I assume you'd still planned to summarize feedback 
so far in a discussion section before opening 
https://wiki.php.net/rfc/namespaces_in_bundled_extensions

For https://wiki.php.net/rfc/namespaces_in_bundled_extensions#core_standard_spl
`use Array;` and `use String;` are currently syntax errors for the unexpected 
token "array".
That could be fixed in the parser by adding a special case for namespace uses,
especially now that T_NAMESPACED_NAME now allows `string\contains` to be used 
without a syntax error.

One possible concern is what would happen if PHP implemented new functionality 
that overlapped with a fairly well-known PECL/Composer package.
E.g. if there was already a FooDB\Client in a composer/PECL package, and an 
independent implementation was later added to php-src,
there'd potentially be conflicting names.
Being able to implement `PHP\FooDB\Client` would avoid that ambiguity

- Then again, other programming languages such as Python have no issue with 
that, so never mind.
  FooDBClient\ or Foo\ or something could probably be used.

> All symbols defined in the extension should be part of the top-level 
> namespace or a sub-namespace.

This should be clarified - do you mean **the extension's** top-level namespace 
(e.g. OpenSSL) instead of the global namespace? I assume the former.

Regards,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: [RFC] debug_backtrace_depth(int $limit=0): int

2021-03-27 Thread tyson andre

Hi internals,

> I've created a new RFC https://wiki.php.net/rfc/debug_backtrace_depth to 
> return the depth of the current stack trace.
> 
> Inspecting the current stack trace depth is occasionally useful for
> 1. Manually debugging with temporary debug statements
> 2. Checking for potential infinite recursion or investigating reproducible 
> reports of infinite recursion
> 3. Checking if code is likely to hit stack frame limits when run in 
> environments using extensions such as Xdebug
>     (https://xdebug.org/docs/all_settings#max_nesting_level , which also 
> checks for potential infinite recursion)
>     (note that Xdebug is a debugger - running php under xdebug is 
> significantly slower than without Xdebug)
> 
> It is currently possible to compute the depth through 
> `count(debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, $limit=0))`,
> but this is verbose, inefficient, and harder to read compared to returning 
> the depth directly.
> 
> Thoughts?

I plan to start voting on https://wiki.php.net/rfc/debug_backtrace_depth 
tomorrow.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] Namespaced in bundled extensions

2021-03-19 Thread tyson andre

Hi Nikita Popov,

> The question of namespaces in the stdlib has been coming up a lot recently,
> so I'd like to present my own stab at resolving this question:
>
> https://wiki.php.net/rfc/namespaces_in_bundled_extensions
>
> Relative to a number of previous (declined) proposals, the main difference
> is that I do not propose a top-level "PHP\" vendor namespace, and instead
> recommend the use of "ExtName\", in line with existing practice for
> extensions. I believe this addresses the primary concern with previous
> proposals.

Both of the namespacing RFCs have been announced for over 3 weeks and I don't 
think I've
seen any new discussion since then.
Are any updates planned? When will voting on the namespacing RFC(s) start?
(I had some stdlib RFCs/RFC ideas I was postponing since February to avoid 
interfering with the namespacing discussion)

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `println(string $data = ''): int`

2021-03-14 Thread tyson andre

Hi Rowan,

> Hi Tyson,
> 
> I'm on the fence on this one: I've certainly had occasions when it would 
> be useful, but mostly in quick prototypes and demo code, and just as 
> often in HTML context (where I'd want it to add '') as plain text 
> context.
> 
> I am not keen, however, on the proposed implementation details, and 
> would prefer it to much more closely match either print or echo.
> 
> 
> - Until reading this RFC, I would not have guessed that printf() 
> returned the number of bytes written; I guess it's useful in C, where 
> you're managing string buffers, but in PHP it feels very arbitrary. I 
> would also not particularly associate this new function with printf() so 
> my immediate guess would be the same return value as print (even though 
> that return value, int(1), is equally arbitrary).

There's also the alias fputs of 
https://www.php.net/manual/en/function.fwrite.php
which also returns the byte count.

I'd considered just returning void, but then you'd have a mix of standard 
library
functions that did/didn't return void and users would need to remember which 
did what.
If someone actually had a need to use the result of println, it would likely be 
to count bytes,
rather than to get a hardcoded constant such as 1/null.
(or expression chaining, but `println('Starting') or die(2); // somehow failed 
to output` also has a use case)

> - You explicitly state that this function would depend on strict_types 
> mode, but echo and print do not. I can't see any particular advantage to 
> doing so, and 'println((string)$foo)' would be longer than 'print 
> $foo,"\n"' rather defeating the purpose.

I could change it to accept `mixed` and convert it to a string inside of the 
function.
(or `object|string|float|int|bool|null` and throw or return false if an object 
could not be converted to a string).
Throwing/warning for bool/null in strict mode may be worth it but may be more 
convenient to detect with external static analyzers)
It initially seemed consistent with the behavior of https://www.php.net/fwrite 
to only accept a string
and forcing callers to do so would catch unexpected edge cases with bool/null 
(''/'1'),
floats, etc. in strict mode.

Surprisingly, some output functions in PHP do accept `mixed`.
https://www.php.net/file_put_contents accepts `mixed` and casts non-resources 
and non-arrays to strings (php-src/ext/standard/file.c)
(handling of arrays/resources is different)

```
case IS_NULL:
case IS_LONG:
case IS_DOUBLE:
case IS_FALSE:
case IS_TRUE:
convert_to_string(data);

case IS_STRING:
```

Annoyingly, any decision I make would be inconsistent with something,
e.g. file_put_contents currently doesn't even emit a notice for a failure to 
convert to string.

```
php > var_dump(file_put_contents('test.txt', new stdClass()));
bool(false)
```

It definitely would be longer to use `println((string)$foo)`,
but there may be cases where that would be done,
e.g. if you are writing a script that would be reviewed/modified/used by 
programmers that are more familiar with languages that aren't PHP.
(It'd be more readable using 1 output method than 3)

There's also `println("$foo");`

> - Most importantly, I accept your points about a function being more 
> forward- and backward-compatible, but worry that not making it a keyword 
> will lead to further confusion about how parentheses interact with echo 
> and print. There is a common misconception that they have some kind of 
> "optional parentheses", because they are usually used with a single 
> expression, so wrapping it in parentheses usually doesn't change the 
> outcome; but this is not the case, and it does sometimes matter.
> 
> 
> As currently proposed, I can see people getting nasty surprises from 
> these inconsistencies. For instance:
> 
> print (1+2)*3; // prints "9"
> println (1+2)*3; // prints "3\n"; unless strict_types=1 is in effect, in 
> which case TypeError
> print ($foo ?? 'Foo') . ($bar ?? 'Bar'); // prints "FooBar" if both vars 
> are null
> println ($foo ?? 'Foo') . ($bar ?? 'Bar'); // never prints $bar or 
> "Bar", because they are not passed to println()

Static analyzers for PHP such as Phan warn about the unused result of 
multiplication
to catch cases like this, and other analyzers can easily add this check if they 
don't already check for it.
https://github.com/phan/phan/wiki/Issue-Types-Caught-by-Phan#phannoopbinaryoperator

> if ( something() && println($foo) && somethingElse() )  // does what it 
> looks like, if println is a normal function
> if ( something() && print($foo) && somethingElse() )  // does not print 
> $foo, because the expression passed is actually ($foo)&()

The fact that `print` doesn't require parenthesis is something that surprised 
me initially,
though changing it to force it to use function syntax would be a much larger bc 
break
more suitable for a major version, that I

Re: [PHP-DEV] Storing the lcname of symbols

2021-03-13 Thread tyson andre

Hi Levi Morrison,

> > Hello!
> >
> > Most of PHP's symbols are case insensitive. This means extensions that
> > need to do things with function and method names end up lowercasing
> > and hashing the lowercased names, often having to do more memory
> > allocations too. Since case insensitive symbols is language dictated
> > behavior, it makes sense to expose the correctly cased symbols to
> > extensions. In PHP 8.0 (and possibly older, I did not check), the
> > engine is already interning the lowercased name of user defined
> > functions; it's just not made available to extensions.
> >
> > In my ideal world, we'd actually switch all symbols to be case
> > sensitive. However, that won't be happening for PHP 8 due to BC.
> >
> > So, instead, I propose adding an `.lcname` member (or some other name
> > indicating it's been normalized to the preferred PHP case) to at least
> > zend_op_array and zend_class_entry, but preferably for internal
> > functions too. Note that many internal functions will already be
> > lowercase, so the data can be shared.
> >
> > I could make this change in the main engine, but I strongly suspect it
> > will not play correctly with opcache.
> >
> > --
> > PHP Internals - PHP Runtime Development Mailing List
> > To unsubscribe, visit: https://www.php.net/unsub.php
> 
> I just realized I didn't ask any specific questions. Oops:
> 
>  1. Can anyone think of issues except increased memory due to
> increasing the size of the struct? Since the strings were previously
> interned, I don't think the strings themselves will have much effect
> on memory usage (but we can measure this).
>  2. Anyone else who thinks this would be useful?

I don't have a personal use case for this and no common operations come to mind 
but could be persuaded.
The lack of examples or might be why there's been no response.
I assume the overhead is probably negligible for classes, and larger for 
functions.
What fields of zend_op_array did you mean?

What parts of the engine or extensions would use the lowercase string? 
I see a few places it's used for compilation in php-src itself but nothing that 
seems performance critical.

What are examples of functionality/functions of extensions that you expect 
would see a performance improvement?
Why would they need to convert the strings to lowercase rather than use the 
casing of the declaration
(e.g. using "memcached" instead of "Memcached" in "class Memcached{...}")

E.g. if something had already looked up the `zend_class_entry *ce` for a class 
name, than `ce->name`
would be a string (and a unique pointer) such as "Memcached" that uniquely 
identifies that class's name
(unless the code is unexpectedly redeployed later with different casing)

- They're still case sensitive in some ways, e.g. the composer autoloader is 
case sensitive
- Changing case should be rare

Regards,
Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC: Add `println(string $data = ''): int`

2021-03-13 Thread tyson andre

> Hi Tyson,
> 
> I like this proposal, but why is the main argument optional? Wouldn't it
> make sense to always require a string as an argument?
> 
> Regards,
> Kamil

I initially considered making it required, but then I felt like there wasn't a 
compelling reason to force end users
to write `println('')` instead of `println()` to print a single newline in code 
surrounded by other println statements.
Printing an empty blank line would be common enough in CLI output or plaintext 
HTTP responses for this proposal to make it easier to do.

- This differs from echo/print/printf statements, where not including an 
argument wouldn't make sense to support, because it would not output anything
- This proposed println behavior is similar to python, where `print()` with no 
arguments would print a single newline

https://wiki.php.net/rfc/println#proposal

```
println("test");
println();  // moderately useful to not switch to echo or pass the empty string 
to print a blank line
println("third line");
/*
Output:
test
 
third line
*/
```

Thanks,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] RFC: Add `println(string $data = ''): int`

2021-03-13 Thread tyson andre

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/println 
This proposes adding a global function to PHP to
print a string followed by a unix newline (`\n`).

Printing a string followed by a newline to stdout is a commonly performed 
operation in many applications and programming languages.
Many programming languages provide a helper function to do this specifically, 
for readability and convenience.
The choice of end of line string may differ, but many recent programming 
languages will unconditionally use the unix newline,
to avoid unexpected differences in behavior between platforms.

I've looked over prior related discussions such as 
https://externals.io/message/104545,
and I've written down the reasons for my name choice and newline choice in the 
RFC.

Any other feedback or elaboration on discussions?

Thanks,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [RFC] debug_backtrace_depth(int $limit=0): int

2021-03-13 Thread tyson andre

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/debug_backtrace_depth to return 
the depth of the current stack trace.

Inspecting the current stack trace depth is occasionally useful for
1. Manually debugging with temporary debug statements
2. Checking for potential infinite recursion or investigating reproducible 
reports of infinite recursion
3. Checking if code is likely to hit stack frame limits when run in 
environments using extensions such as Xdebug
(https://xdebug.org/docs/all_settings#max_nesting_level , which also checks 
for potential infinite recursion)
(note that Xdebug is a debugger - running php under xdebug is significantly 
slower than without Xdebug)

It is currently possible to compute the depth through 
`count(debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, $limit=0))`,
but this is verbose, inefficient, and harder to read compared to returning the 
depth directly.

Thoughts?

Thanks,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Deprecate debug_zval_dump?

2021-02-27 Thread tyson andre

Hi Rowan Tommins,

> I would like to propose we formally deprecate the function debug_zval_dump 
> and remove it in PHP 9.0.
> 
> This function is mostly similar to var_dump, with the additional ability to 
> output the refcount of a variable. This has always been hard to interpret, 
> but is now so complex as to be effectively useless:
> 
> - Because it is implemented as a function taking a parameter in the normal 
> way, the refcount has been modified by the time it is displayed. Depending on 
> the value passed, this may include reference separation; in older versions of 
> PHP, it was possible to affect this directly by forcing a pass-by-reference. 
> The manual still discusses this, but it hasn't been possible since PHP 5.4. 
> [1]
> - Since PHP 7, some types don't have a refcount at all, and references are 
> represented by additional levels of zval. Without completely changing the 
> output format, this information is impossible to convey accurately.
> - Optimisations affect the refcount in increasingly non-obvious ways. For 
> instance, an array defined literally in the source code has an extra counted 
> reference compared to one which has been modified at runtime. [2]
> 
> Since this is a rather specialised piece of debugging information, not useful 
> to the average user, I think it should be left to dedicated debugging tools. 
> XDebug includes an equivalent function that takes a name and looks it up in 
> the symbol table, avoiding some of the worst issues [3]. I'm not familiar 
> with PHPDBG, and it doesn't seem to have much documentation, but I assume it 
> would be able to display this as well.

I'd disagree that it's useless. Even with the format changes,

- Checking the difference between two runs is useful in a bug report - if 
counts increase or decrease after calling a function 
  that isn't supposed to modify reference counts, this makes it an obvious 
indicator for reference counting bugs that can be submitted 
  or requested on issue trackers such as bugs.php.net
- Dynamic values (e.g. db results) generally aren't constant values, with some 
exceptions.

I'm opposed to the removal - while it is not useful to the average user, **it 
is very useful in the development of php-src and the PECL extensions that 
average users use every day.** 
E.g. the php developers working on php-src or contributors to PECLs may use it 
while investigating memory reference counting bugs or while developing new 
functionality,
and having debug_zval_dump is useful for investigating, detecting, reporting, 
or adding regression tests for reference counting bugs.

- Instructions often recommend that users/packagers run tests before installing 
a PECL.
  If running a subset of the tests required a third party PECL, those tests 
might just end up being skipped
  and memory counting bugs in new PHP versions or rare PHP configurations 
(32-bit ZTS) might be harder to track down,
  or PECL authors may just not get around to installing and enabling external 
PECLs in CI.
- A maintainer or someone looking at a bug report on bugs.php.net may be 
reluctant (or not have the time) to install and enable a PECL 
  they're unfamiliar with in order to check if a bug report was reproducible, 
making bugs take longer to fix or never get fixed.

Even if the debug_zval_dump doesn't end up in the final phpt test case or the 
final PR created to fix a bug in PECLs,
it may have been used while tracking down which function had the reference 
counting bug.

Many of the C functions in php-src have no documentation whatsoever, so it's 
hard for new and experienced developers 
to tell if they do/don't need to addref/delref before/after calling a function 
because surrounding code (in code they're basing new code on)
may have adjusted the reference count in other ways already.

I've seen a few places where debug_zval_dump is added in tests in order to 
ensure that the reference count didn't change,
e.g. if the code was prone to bugs and they wanted to assert the reference 
count or use of references was correct.
(usage might be limited by the obscurity and not knowing about it - having PECL 
writing tutorials mention it exists for tracking down
reference counting bugs may make it more widely used)

- 
https://github.com/runkit7/runkit7/blob/master/tests/runkit_superglobals_obj_alias.phpt#L111
- https://github.com/krakjoe/apcu/blob/master/tests/apc_006.phpt
- https://github.com/php/php-src/search?q=debug_zval_dump=code
- 
https://bugs.php.net/search.php?search_for=debug_zval_dump=0=100=display=All_type=All

https://github.com/runkit7/runkit7/blob/master/tests/runkit_superglobals_obj_alias.phpt#L30

Also, the last time I checked, XDebug replaces the php interpreter entirely and 
is slower than the php interpreter,
so I wouldn't consider it a viable replacement (especially if a bug only 
manifests with the standard php interpreter).

> I notice there's a draft for an omnibus "deprecations for PHP 8.1" RFC [4]. 
> Should I add this there,

Re: [PHP-DEV] Re: Proposal: namespace the SPL

2021-02-14 Thread tyson andre

Hi internals,

> > 1b. We may switch the direction of this alias in 9.0.

The new names for existing Spl types at least seem more readable and possible 
to polyfill with `class_alias`.
It should be clarified if "data types" include interfaces such as 
https://www.php.net/manual/en/class.splobserver 
but I assume it does.

A minor concern is that the wider switch on a case-by-case basis to namespacing 
will collectively add a 
lot of small barriers for end users that are upgrading (especially with 
external libraries/applications),
but agree that starting out small is the best way to discuss it.

- e.g. data `serialize()`d and put into memcached or a database by PHP 9.0 
  could not be read from memcached by servers still running PHP 8.0,
  and this would unexpectedly result in PHP_Incomplete_Class 
  and new users may need to do a lot of research to figure out how to avoid/fix 
this 
  (by calling `class_alias` in 8.0).

> > Let me know what you think. I am hopeful this approach will work because:

I support voting on this separately from adding new functionality.

To clarify my earlier decisions, I had strong reservations about introducing 
new functionality and setting a de facto precedent for a namespace policy in 
the same vote
on https://wiki.php.net/rfc/any_all_on_iterable (but did so anyway due to 
feedback of multiple voters suggesting widespread opposition to continuing 
global namespacing months ago),
because the deciding factor in a vote on new functions or new classes for 
voters may have been
"do I (dislike the namespacing choice used in this RFC) less than (I dislike 
not having this useful proposed enhancement added to PHP)."
and am happier with your plan to create an RFC asking "do 2/3rds of people 
think this namespace choice should be used for future additions to ext/spl."

In my decision to independently gather feedback for 
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace
I was deliberately overcommunicating knowing that:

1. If it passed, the namespacing policy choices made there might be interpreted 
or insisted on
   as a precedent for future additions to the Spl (or even PHP in general) for 
a long time.

   This may cause a tolerable namespace choice to be used for many future 
additions to ext/spl, rather than the one preferred by most of internals,
   which is why I was only fine with starting a vote after gathering feedback 
from https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace
2. If it was voted on without gathering widespread feedback on overall 
namespace preferences,
   I'd expect even more No votes and the function namespacing discussion could 
continually get reopened if some namespace permutation wasn't even an option.

   I apologize for the 11-way straw poll.
   Many voters do not comment on the internals mailing list, e.g. if someone 
else has already said the same thing.

> > Let me know what you think. I am hopeful this approach will work because:
> > 
> > It is focused on a specific area which already has an established
> > "namespace", but in name-only (not technically).
> > It does not try to solve the larger problem, which has a lot of
> > disagreement.
> > I will be proposing new types for ext/spl soon (ReverseIterator
> > and an array iterator that is more efficient than \ArrayIterator),
> > and Tyson Andre has already proposed CachedIterable and company
> > which is in ext/spl, so this space has active development.
> > Thank you for your time.
> 
> Do you want a dumping ground? Because this is how you create a dumping
> ground :-)
> 
> If we're going to start putting things into namespaces (and we should)
> then we should absolutely avoid repeating the mistakes of the past by
> dumping completely unrelated things together.
> 
> If SPL\ is to exist (and personally I think SPL is so cancerous, it
> shouldn't) then IMO it must absolutely be SPL\iterators.
> 
> Without that all we've done is swap one problem for another.
> 
> The idea of putting data structures next to generic iterator helpers is,
> quite frankly, nuts.

Do you have an expected timeline for creating the RFC document for this 
proposal and starting the vote?
A vote would greatly reduce the uncertainty and time/energy involvement of 
proposing 
adding additional datastructures, benefiting contributors both familiar and 
unfamiliar
with the PHP RFC process, and I agree with Levi that it would be useful to 
ensure that 
"new additions going into the ext/spl can avoid having this naming discussion 
every time."

**My main objection to the proposal is that this forces 
all core generic datastructures to go in the Spl namespace
indefinitely, or would entail the creation of a separate module and splitting 
up the php.net manual pages to document new built-in datastructures
that don't begin with `SPL\

Re: [PHP-DEV] RFC: CachedIterable (rewindable, allows any key keys)

2021-02-12 Thread tyson andre

Hi Alex,

> > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding 
> > CachedIterable,
> > which eagerly evaluates any iterable and contains an immutable copy of the 
> > keys and values of the iterable it was constructed from
> >
> >
> > Any other feedback unrelated to namespaces?
>
> Hi Tyson,
>
> I needed this feature a few years ago. In that case, the source was a 
> generator that was slowly generating data while fetching them from a 
> paginated API that had rate limits.
> The result wrapping iterator was used at runtime in multiple (hundreds) other 
> iterators that were processing elements in various ways (technical analysis 
> indicator on time series) and after that merged back with some 
> MultipleIterator.
>
> Just for reference, this is how the implementation in userland was and I was 
> happy with it as a solution:
> https://gist.github.com/drealecs/ad720b51219675a8f278b8534e99d7c7
>
> Not sure if it's useful but I thought I should share it as I noticed you 
> mentioned in your example for PolyfillIterator you chose not to use an 
> IteratorAggregate because complexity
> Was wondering how much inefficient this would be compared to the C 
> implementation.

That was for simplicity(shortness) of the RFC for people reading the polyfill.
I don't expect it to affect CPU timing or memory usage for large arrays in the 
polyfill.

Userland lazy iterable implementations could still benefit from having a 
CachedIterable around,
by replacing the lazy IteratorAggregate with a Cached Iterable when the end of 
iteration was detected.

> Also, the implementation having the ability to be lazy was important and I 
> think that should be the case here as well, by design, especially as we are 
> dealing with Generators.

We're dealing with the entire family of iterables, including but not limited to 
Generators, arrays, user-defined Traversables, etc.

I'd considered that but decided not to include it in the RFC's scope.
If I was designing that, it would be a separate class `LazyCachedIterable`.

Currently, `CachedIterable` has several useful properties:

1. Serialization/Unserializable behavior is predictable - if the object was 
constructed it can be safely serialized if keys/values can be serialized.
2. Iteration has no side effects (e.g. won't throw)
3. keyAt(int $offset) and so on have predictable behavior, good performance, 
and only one throwable type
4. Memory usage is small - this might also be the case for a LazyIterable 
depending on implementation choices/constraints.

Adding lazy iteration support would make it no longer have some of those 
properties.

While I'd be in favor of that if it was implemented correctly, I don't plan to 
work on implementing this until I know
if the addition of `CachedIterable` to a large family of iterable classes would 
pass.

CachedIterable has some immediate benefits on problems I was actively working 
on, such as:

1. Being able to represent iterable functions such as iterable_reverse()
2. Memory efficiency and time efficiency for iteration
3. Being something internal code could return for getIterator(), etc.

Regards,
Tyson

[PHP-DEV] Re: RFC: CachedIterable (rewindable, allows any key keys)

2021-02-11 Thread tyson andre

Hi internals,

> I've created a new RFC https://wiki.php.net/rfc/cachediterable adding 
> CachedIterable,
> which eagerly evaluates any iterable and contains an immutable copy of the 
> keys and values of the iterable it was constructed from
> 
> This has the proposed signature:
> 
> ```
> final class CachedIterable implements IteratorAggregate, Countable, 
> JsonSerializable
> {
>     public function __construct(iterable $iterator) {}
>     public function getIterator(): InternalIterator {}
>     public function count(): int {}
>     // [[$key1, $value1], [$key2, $value2]]
>     public static function fromPairs(array $pairs): CachedIterable {}
>     // [[$key1, $value1], [$key2, $value2]]
>     public function toPairs(): array{} 
>     public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
>     public function __unserialize(array $data): void {}
>  
>     // useful for converting iterables back to arrays for further processing
>     public function keys(): array {}  // [$k1, $k2, ...]
>     public function values(): array {}  // [$v1, $v2, ...]
>     // useful to efficiently get offsets at the middle/end of a long iterable
>     public function keyAt(int $offset): mixed {}
>     public function valueAt(int $offset): mixed {}
>  
>     // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
>     public function jsonSerialize(): array {}
>     // dynamic properties are forbidden
> }
> ```
> 
> Currently, PHP does not provide a built-in way to store the state of an 
> arbitrary iterable for reuse later
> (when the iterable has arbitrary keys, or when keys might be repeated). It 
> would be useful to do so for many use cases, such as:
> 
> 1. Creating a rewindable copy of a non-rewindable Traversable 
> 2. Generating an IteratorAggregate from a class still implementing Iterator
> 3. In the future, providing internal or userland helpers such as 
> iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
>     iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc 
> (these are not part of the RFC)
> 4. Providing memory-efficient random access to both keys and values of 
> arbitrary key-value sequences 
> 
> Having this implemented as an internal class would also allow it to be much 
> more efficient than a userland solution
> (in terms of time to create, time to iterate over the result, and total 
> memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks
> 
> After some consideration, this is being created as a standalone RFC, and 
> going in the global namespace:
> 
> - Based on early feedback on 
> https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace 
> preferred in previous polls)
>   It seems like it's way too early for me to be proposing namespaces in any 
> RFCs for PHP adding to modules that already exist, when there is no consensus.
> 
>   An earlier attempt by others on creating a policy for namespaces in 
> general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.
> 
>   Having even 40% of voters opposed to introducing a given namespace (in 
> pre-existing modules)
>   makes it an impractical choice when RFCs require a 2/3 majority to pass.
> - While some may argue that a different namespace might pass,
>   https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had 
> a sharp dropoff in feedback after the 3rd form.
>   I don't know how to interpret that - e.g. are unranked namespaces preferred 
> even less than the options that were ranked or just not seen as affecting the 
> final result.
> 
> Any other feedback unrelated to namespaces?

After feedback, I have decided to postpone the start of voting on this (or 
other proposals related to SPL or iterables) until April at the earliest,
to avoid interfering with the ongoing SPL naming policy discussions.

Thanks,
- Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] RFC: CachedIterable (rewindable, allows any key keys)

2021-02-10 Thread tyson andre

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/cachediterable adding 
CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys 
and values of the iterable it was constructed from

This has the proposed signature:

```
final class CachedIterable implements IteratorAggregate, Countable, 
JsonSerializable
{
public function __construct(iterable $iterator) {}
public function getIterator(): InternalIterator {}
public function count(): int {}
// [[$key1, $value1], [$key2, $value2]]
public static function fromPairs(array $pairs): CachedIterable {}
// [[$key1, $value1], [$key2, $value2]]
public function toPairs(): array{} 
public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
public function __unserialize(array $data): void {}
 
// useful for converting iterables back to arrays for further processing
public function keys(): array {}  // [$k1, $k2, ...]
public function values(): array {}  // [$v1, $v2, ...]
// useful to efficiently get offsets at the middle/end of a long iterable
public function keyAt(int $offset): mixed {}
public function valueAt(int $offset): mixed {}
 
// '[["key1","value1"],["key2","value2"]]' instead of '{...}'
public function jsonSerialize(): array {}
// dynamic properties are forbidden
}
```

Currently, PHP does not provide a built-in way to store the state of an 
arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It 
would be useful to do so for many use cases, such as:

1. Creating a rewindable copy of a non-rewindable Traversable 
2. Generating an IteratorAggregate from a class still implementing Iterator
3. In the future, providing internal or userland helpers such as 
iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc 
(these are not part of the RFC)
4. Providing memory-efficient random access to both keys and values of 
arbitrary key-value sequences 

Having this implemented as an internal class would also allow it to be much 
more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory 
usage). See https://wiki.php.net/rfc/cachediterable#benchmarks

After some consideration, this is being created as a standalone RFC, and going 
in the global namespace:

- Based on early feedback on 
https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace 
preferred in previous polls)
  It seems like it's way too early for me to be proposing namespaces in any 
RFCs for PHP adding to modules that already exist, when there is no consensus.

  An earlier attempt by others on creating a policy for namespaces in 
general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.

  Having even 40% of voters opposed to introducing a given namespace (in 
pre-existing modules)
  makes it an impractical choice when RFCs require a 2/3 majority to pass.
- While some may argue that a different namespace might pass,
  https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a 
sharp dropoff in feedback after the 3rd form.
  I don't know how to interpret that - e.g. are unranked namespaces preferred 
even less than the options that were ranked or just not seen as affecting the 
final result.

Any other feedback unrelated to namespaces?

Thanks,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] PHP\iterable\any() and all() on iterables

2021-02-10 Thread tyson andre

trcmp()), not C++ (e.g. std::less).)
> 
> Of course here the "correct" thing is to return `$a <=> $b` (or `$b <=> $a` 
> for descending order), but you can also return `$a - $b` (not necessarily in 
> [-1,0,1]), or even a string `"foo"` still without any warning in 8.0.2 (just 
> a certainly wrong result)...

A callback function called **once** (before php 8.0) on a pair mapping 2 states 
(true, false) onto 3 states (-1, 0, 1) only worked coincidentally
because the sort wasn't stable. The desired type had more possible values than 
the returned type, so it did the wrong thing.
- This is different in that there's only 2 desired values.

https://wiki.php.net/rfc/stable_sorting fixed that calling it twice and 
deprecating booleans.
It still allows you to return strings (not likely in practice) and casts those 
to integers.

"Second, if boolean false is returned, PHP will automatically call the 
comparison function again with arguments swapped.
This allows us to distinguish whether the “false” stood for “equal” or “smaller 
than”. This fallback behavior should be removed in a future version of PHP."

> Anyway, to me it feels natural that any()/all() would "work" like 
> array_filter().
> 
> @Tyson by the way, in the any()/all() case (vs the any_value()/all_values() 
> and potential any_key()/all_keys() etc.), wouldn't it be preferable to add 
> the optional `int $flags = 0` (or "$mode") parameter right from the start 
> (even if not used yet), as adding it in a later release would apparently pose 
> some BC concerns (ArgumentCountError, polyfills etc.)?

If an iterables RFC passed, and all amendments got approved before 8.1 stable 
was released, it wouldn't matter. There's months before the feature freeze.

If a function was added with flags "just in case" in 8.1 stable and we never 
used the flags, we'd have to deprecate them and remove them in a subsequent 
major release.
I don't think that's worth it.

Cheers,
Tyson Andre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] PHP\iterable\any() and all() on iterables

2021-02-08 Thread tyson andre

Hi Larry Garfield,

> > Hi Larry Garfield,
> > 
> > > > Hi internals,
> > > > 
> > > > Voting has started on https://wiki.php.net/rfc/any_all_on_iterable and 
> > > > ends on 2021-02-22.
> > > > 
> > > > This RFC proposes to add the functions `PHP\iterable\any(iterable 
> > > > $input, ?callable $callback = null): bool` and `PHP\iterable\all(...)`
> > > > to PHP's standard library's function set, using the namespace preferred 
> > > > in the previous straw poll.
> > > > 
> > > > There is a primary vote on whether to add the functions, and a 
> > > > secondary vote on the name to use within the `PHP\iterable` namespace.
> > > > 
> > > > Thanks,
> > > > - Tyson
> > > > --
> > > > PHP Internals - PHP Runtime Development Mailing List
> > > > To unsubscribe, visit: https://www.php.net/unsub.php
> > > 
> > > 
> > > Ak!  I literally just finished reading it and wanted to note a lack of 
> > > clarity on one point. :-)
> > > 
> > > The signature of the callback is never specified explicitly.  The ternary 
> > > is a bit confusing.  I assume the signature is 
> > > 
> > > callable(mixed): bool
> > > 
> > > But that's not made explicit.  It's also not made explict that omitting 
> > > the callable collapses to "is truthy".  That's a sensible thing to do, 
> > > but it's not stated explicitly anywhere, just inferred from the code 
> > > sample.
> > > 
> > > I'm not sure if it's safe to clarify at this point as the vote just 
> > > started.
> > 
> > If there is a callable, it allows `callable(mixed): mixed`,
> > and converts the callable's return value to a boolean.
> > So omitting the callable is the same as passing in the callable `fn($x) 
> > => $x`, which is equivalent to `fn($x) => (bool)$x`.
> > This is exactly what the reference implementation would do.
> > 
> > I definitely should have clarified it instead of assuming that the 
> > reference implementation was clear enough.
> > 
> > I clarified this and gave examples because the RFC started a few hours 
> > ago and the implementation didn't change.
> 
> Oof.  I'm glad I asked, because I don't like that at all.  If available, the 
> callable should be returning bool, not "anything that may be truthy/falsy."  
> If you have an explicit function, it should have an explicit return type.  A 
> truthy check is a reasonable default, but not for when you're opting in to 
> specifying the logic.
> 
> I am in favor of the RFC, but I will have to consider if that changes my vote 
> to No.
> 
> --Larry Garfield

This was a deliberate choice and is consistent with the weak type comparison 
behavior of array_filter() and other functions that default to using weak type 
checks internally.

I'd agree that I'd prefer to see callbacks returning booleans in code I'm 
reviewing,
but a truthiness check seems more practical and consistent with the rest of the 
language
than throwing a TypeError or checking the predicate return value using `!== 
true`

This was made to make PHP more widely accessible and free of surprises.
e.g. `(bool)array_filter($arr, $predicate)` can be safely converted to 
`any($arr, $predicate)` without introducing a TypeError or behavior change.

```
php > var_dump(array_filter([-1,0,1], fn($x)=>$x));
array(2) {
  [0]=>
  int(-1)
  [2]=>
  int(1)
}
```

This is the same choice as many other dynamic languages that aren't compiled 
ahead of time have made.

```
# python
>>> any([1])
True
>>> any([0])
False
# Ruby
irb(main):001:0> [nil].any?
=> false
irb(main):002:0> [false].any?
=> false
irb(main):003:0> !!0
=> true
irb(main):004:0> [0].any?
=> true
# JavaScript
> [0].some(x=>x)
false
> [1].some(x=>x)
true
```

It is currently possible to check if code is passing a callable returning 
anything other than a boolean
to functions such as `array_filter()` using a wide variety of static 
analyzers/tools, e.g. http://github.com/phan/phan

```
 $x % 3);
```

Thanks,
-Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: [VOTE] PHP\iterable\any() and all() on iterables

2021-02-08 Thread tyson andre

Hi internals,

> Voting has started on https://wiki.php.net/rfc/any_all_on_iterable and ends 
> on 2021-02-22.
> 
> This RFC proposes to add the functions `PHP\iterable\any(iterable $input, 
> ?callable $callback = null): bool` and `PHP\iterable\all(...)`
> to PHP's standard library's function set, using the namespace preferred in 
> the previous straw poll.
> 
> There is a primary vote on whether to add the functions, and a secondary vote 
> on the name to use within the `PHP\iterable` namespace.

This RFC(https://wiki.php.net/rfc/any_all_on_iterable) has been updated to 
include a straw poll on the reason you voted against it.
If you have voted, please also fill out 
https://wiki.php.net/rfc/any_all_on_iterable#straw_poll

This has also been updated to elaborate on the implementation details (the 
implementation has not changed),
based on feedback received after the RFC started.

Thanks,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] PHP\iterable\any() and all() on iterables

2021-02-08 Thread tyson andre

Hi Levi Morrison,

> > Hi internals,
> >
> > Voting has started on https://wiki.php.net/rfc/any_all_on_iterable and ends 
> > on 2021-02-22.
> >
> > This RFC proposes to add the functions `PHP\iterable\any(iterable $input, 
> > ?callable $callback = null): bool` and `PHP\iterable\all(...)`
> > to PHP's standard library's function set, using the namespace preferred in 
> > the previous straw poll.
> >
> > There is a primary vote on whether to add the functions, and a secondary 
> > vote on the name to use within the `PHP\iterable` namespace.
> >
> > Thanks,
> > - Tyson
> > --
> > PHP Internals - PHP Runtime Development Mailing List
> > To unsubscribe, visit: https://www.php.net/unsub.php
> >
> 
> Thanks for the RFC. I have voted no, even though I am very supportive
> of the direction. My objections are:
> - I think the scope is too small. This is introducing a new family of
> functions, but is only proposing two functions. This is too small to
> firmly root in good design and precedence.

I misread your earlier comment in https://externals.io/message/111756#111764

My general stance on this is similar to 
https://github.com/Danack/RfcCodex/blob/4cb3466e42063be00ece0cdb296c0b1336eb81c0/rfc_etiquette.md#dont-volunteer-other-people-for-huge-amounts-of-work

I have limited time, and this has generated a lot of discussion.
I'm concerned that adding more functionality initially would add questions like 
"Do we really need to add `none()` if we already have `!any()` and
"I voted against this because I don't see the need for `chunk()`, `reversed()`, 
`filter()`, etc. (or disagree with one of the implementation details)"

> - I do not like the chosen namespace. This is not as important as the
> previous point, but still factored into my decision as we are still
> very early in choosing namespaces for internals. I don't want to vote
> for something I think is a bad direction when we're this early on.
> 
> Again, I am supportive of adding these functions in some form, but I
> very strongly do not believe this RFC is what we should do.

And you're strongly opposed to the global namespace. 
https://externals.io/message/112558#112598

An unrealistic hypothetical worst-case scenario would be where half of voters 
vote against 
any new categories of functions in the global namespace, and half of voters 
vote against
any new categories of functions outside of the global namespace, and nothing 
achieves a 2/3 majority
in php 8.1.

> I tried to collaborate with Tyson more on these points but either we
> mis-communicated or he wasn't interested. In any case, it's up for a
> vote so I choose "no."

I disagreed. 
My decision was based on 
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote
I strongly feel that this should be based on feedback from voters as a whole 
when we're this early in namespacing discussions,
or the namespacing discussion would continue as "maybe this was the wrong 
choice of namespace."

Thanks,
- Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [VOTE] PHP\iterable\any() and all() on iterables

2021-02-08 Thread tyson andre

Hi Larry Garfield,

> > Hi internals,
> > 
> > Voting has started on https://wiki.php.net/rfc/any_all_on_iterable and 
> > ends on 2021-02-22.
> > 
> > This RFC proposes to add the functions `PHP\iterable\any(iterable 
> > $input, ?callable $callback = null): bool` and `PHP\iterable\all(...)`
> > to PHP's standard library's function set, using the namespace preferred 
> > in the previous straw poll.
> > 
> > There is a primary vote on whether to add the functions, and a 
> > secondary vote on the name to use within the `PHP\iterable` namespace.
> > 
> > Thanks,
> > - Tyson
> > --
> > PHP Internals - PHP Runtime Development Mailing List
> > To unsubscribe, visit: https://www.php.net/unsub.php
> 
> 
> Ak!  I literally just finished reading it and wanted to note a lack of 
> clarity on one point. :-)
> 
> The signature of the callback is never specified explicitly.  The ternary is 
> a bit confusing.  I assume the signature is 
> 
> callable(mixed): bool
> 
> But that's not made explicit.  It's also not made explict that omitting the 
> callable collapses to "is truthy".  That's a sensible thing to do, but it's 
> not stated explicitly anywhere, just inferred from the code sample.
> 
> I'm not sure if it's safe to clarify at this point as the vote just started.

If there is a callable, it allows `callable(mixed): mixed`,
and converts the callable's return value to a boolean.
So omitting the callable is the same as passing in the callable `fn($x) => $x`, 
which is equivalent to `fn($x) => (bool)$x`.
This is exactly what the reference implementation would do.

I definitely should have clarified it instead of assuming that the reference 
implementation was clear enough.

I clarified this and gave examples because the RFC started a few hours ago and 
the implementation didn't change.

- Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [VOTE] PHP\iterable\any() and all() on iterables

2021-02-08 Thread tyson andre

Hi internals,

Voting has started on https://wiki.php.net/rfc/any_all_on_iterable and ends on 
2021-02-22.

This RFC proposes to add the functions `PHP\iterable\any(iterable $input, 
?callable $callback = null): bool` and `PHP\iterable\all(...)`
to PHP's standard library's function set, using the namespace preferred in the 
previous straw poll.

There is a primary vote on whether to add the functions, and a secondary vote 
on the name to use within the `PHP\iterable` namespace.

Thanks,
- Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: [RFC] Global functions any() and all() on iterables

2021-02-06 Thread tyson andre

Hi internals,

> I've created an RFC for https://wiki.php.net/rfc/any_all_on_iterable
> 
> This was proposed 2 days ago in https://externals.io/message/111711 with some 
> interest
> ("Proposal: Adding functions any(iterable $input, ?callable $cb = null, int 
> $use_flags=0) and all(...)")
>
> - The $use_flags parameter was removed
>
> The primitives any() and all() are a common part of many programming 
> languages and help in avoiding verbosity or unnecessary abstractions.
>
> - https://hackage.haskell.org/package/base-4.14.0.0/docs/Prelude.html#v:any
> - 
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/some
> - https://docs.python.org/3/library/functions.html#all
> - 
> https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#allMatch-java.util.function.Predicate-
> 
> For example, the following code could be shortened significantly
> 
> ```
> // Old version
> $satisifies_predicate = false;
> foreach ($item_list as $item) {
>     if (API::satisfiesCondition($item)) {
>     $satisfies_predicate = true;
>     break;
>     }
> }
> if (!$satisfies_predicate) {
>     throw new APIException("No matches found");
> }
> 
> // New version is much shorter and readable
> if (!any($item_list, fn($item) => API::satisfiesCondition($item))) {
>     throw new APIException("No matches found");
> }
> ```
> 
> That example doesn't have any suitable helpers already in the standard 
> library.
> Using array_filter would unnecessarily call satisfiesCondition even after the 
> first item was found,
> and array_search doesn't take a callback.
> 
> A proposed implementation is https://github.com/php/php-src/pull/6053 - it 
> takes similar flags and param orders to array_filter().

The implementation and RFC https://wiki.php.net/rfc/any_all_on_iterable have 
been updated 
after finishing the previous straw poll: 
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace

This is now a vote on `PHP\iterable\any(iterable $input, ?callable $callback = 
null)` and `PHP\iterable\all(...)`.

A secondary vote was added for choosing between `any`/`all` and 
`any_value`/`all_values` within `PHP\iterable`.

I plan to start the vote on Monday.

Any other feedback?

Thanks,
- Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] [VOTE] var_representation() : readable alternative to var_export()

2021-02-05 Thread tyson andre

Hi internals,

Voting has started today on https://wiki.php.net/rfc/readable_var_representation
and closes on 2021-02-19.

This RFC proposes introducing a new function `var_representation`
with the following differences from var_export:

1. `var_representation()` unconditionally returns a string
2. Use `null` instead of `NULL` - the former is recommended by more coding
   guidelines (https://www.php-fig.org/psr/psr-2/).
3. Change the way indentation is done for arrays/objects.
   See ext/standard/tests/general_functions/short_var_export1.phpt
   (e.g. always add 2 spaces, never 3 in objects, and put the array start on the
   same line as the key)
4. Render lists as `"[\n  'item1',\n]"` rather than `"array(\n  0 => 
'item1',\n)"`

   Always render empty lists on a single line, render multiline by default when 
there are 1 or more elements
5. Prepend `\` to class names so that generated code snippets can be used in
   namespaces without any issues.
6. Support `VAR_REPRESENTATION_SINGLE_LINE` in $flags.
   This will use a single-line representation for arrays/objects.
7. Escape control characters("\x00"-"\x1f" and "\x7f"(backspace)) inside of 
double quoted strings
   instead of single quoted strings with unescaped control characters mixed 
with ` . "\0" . `.

Thanks,
-Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] Re: [RFC] var_representation() : readable alternative to var_export()

2021-02-04 Thread tyson andre

Hi internals,

> > > I've created https://wiki.php.net/rfc/readable_var_representation based on
> > > my original proposal in https://externals.io/message/112924
> > > 
> > > This RFC proposes adding a new function `var_representation(mixed $value, 
> > > int $flags=0): string`
> > > with the following differences from var_export:
> > > 
> > > 1. var_representation() unconditionally returns a string
> > > 2. Use `null` instead of `NULL` - lowercase is recommended by more coding
> > >    guidelines (https://www.php-fig.org/psr/psr-2/).
> > > 3. Change the way indentation is done for arrays/objects.
> > >    See ext/standard/tests/general_functions/short_var_export1.phpt
> > >    (e.g. always add 2 spaces, never 3 in objects, and put the array start 
> > > on the
> > >    same line as the key)
> > > 4. Render lists as `"['item1']"` rather than `"array(\n  0 => 
> > > 'item1',\n)"`
> > > 
> > >    Always render empty lists on a single line, render multiline by 
> > > default when there are 1 or more elements
> > > 5. Prepend `\` to class names so that generated code snippets can be used 
> > > in
> > >    namespaces without any issues.
> > > 6. Support `VAR_REPRESENTATION_SINGLE_LINE` in `$flags`.
> > >    This will use a single-line representation for arrays/objects, though
> > >    strings with embedded newlines will still cause newlines in the output.
> > > 7. If a string contains control characters("\x00"-"\x1f" and 
> > > "\x7f"(backspace)),
> > >    then represent the entire string as a double quoted string
> > >    escaping `\r`, `\n`, `\t`, `\$`, `\\`, and `\"`, in addition to 
> > > escaping remaining control characters
> > >    with hexadecimal encoding (\x00, \x7f, etc)
> > > 
> > > This is different from my original proposal in two ways:
> > > 1. The function signature and name changed from my previous proposal.
> > >     It now always returns a string.
> > > 2. Backspace control characters (\x7f) are now also escaped.
> > 
> > A reminder that voting on the var_representation RFC starts in a day.
> > This RFC proposes adding a new function `var_representation(mixed $value, 
> > int $flags=0): string` with multiple improvements on `var_export()`.
> > 
> > Any other feedback?
> 
> Given the recent discussion in the interactive shell thread,
> I think you should consider whether the new function could also be expanded 
> to serve that use case.
> I think that if we're going to add one more dumping function to the 4 we 
> already have,
> it better cover all the use-cases we have.
> The "limited size dump" doesn't really fit in with "dump is executable PHP 
> code",
> but if I understand correctly, executable PHP code is not the whole goal of 
> the proposal.

I suppose that technically could be done by adding a 
VAR_REPRESENTATION_DEBUG_DUMP flag to a $flags bitmask to generate var_dump 
output,
and allow that to be combined with VAR_REPRESENTATION_SINGLE_LINE and other 
style flags.
I should at least mention it as an option - that possibly combines unrelated 
functionality (Debug vs evaluable code) in flags,
but at least it cuts down on the number of different functions.
I don't plan to include that in this RFC.

-

I have considered it but think that readable executable PHP code and debug 
representations are largely incompatible - 
having a function to generate executable PHP code that generates a truncated 
representation of a value doesn't seem as useful.
If you're generating code to eval() - you're usually generating all of it.

For example, for this

```
php > $x = (object)[]; $v = [$x, $x]; echo var_representation($v);
[
  (object) [],
  (object) [],
]
```

In an application where the identity or use of refereinces in the object didn't 
matter (e.g. read but not modified), that might be the best representation.

A hypothetical function could emit `'(static function () { $t1 = (object)[]; 
return [$t1, $t1]; })();'`
if it detected object duplication or references (and so on),
and likely be faster at generating output than a userland implementation such 
as https://github.com/brick/varexporter
(and avoid limitations of ReflectionReference only being able to check 
individual pairs at a time),
but I'm concerned that there's not much interest in that, and the response 
would be "this should go in a PECL instead"
due to the complexity of an implementation, edge cases that unserialize solves 
but not the hypothetical function, and due to the ongoing requirement to 
maintain it.
- e.g. some internal classes forbid `newInstanceWithoutConstructor()`
- If an application needed all of the functionality of serialize, it's already 
possible to generate a call to unserialize instead.

I feel like trying to do everything at once would increase the scope to the 
point
where the RFC and implementation would be hard to implement and review.
with only a few people responding so far with mixed feedback I think it's too 
early to do that.
There's other orthogonal improvements 
(e.g. Internal objects don't

Re: [PHP-DEV] Re: [RFC] var_representation() : readable alternative to var_export()

2021-02-04 Thread tyson andre

Hi internals,

> > > I've created https://wiki.php.net/rfc/readable_var_representation based on
> > > my original proposal in https://externals.io/message/112924
> > > 
> > > [...]
> > 
> > A reminder that voting on the var_representation RFC starts in a day.
> > This RFC proposes adding a new function `var_representation(mixed $value, 
> > int $flags=0): string` with multiple improvements on `var_export()`.
> > 
> > Any other feedback?
>  
> I think the "though strings with embedded newlines will still cause newlines 
> in the output" part is obsolete (since `\r` and `\n` are escaped now).
>  
> Apart from that, since var_export (and var_dump) can't really be "fixed" for 
> BC reasons, I'm +1 for the new function.

Thanks, that's indeed obsolete - I removed it from the RFC.

-Tyson

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

1 2 3 4 >

1 - 100 of 345 matches

Mail list logo