Re: [PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-29 Thread Hans Henrik Bergan
>1. Are there any reasonable objections to consistently implementing
character range expressions for all character masks?

would be a minor BC break to silently change the meaning of memspn($str,
"a..b"), which currently has the same meaning as "a.b" with wasted cpu
cycles, but with your suggestion it would become the same meaning as "ab"
and the dot would no longer pass the check..
But then again, currently writing ".." is just a waste of cpu, and i don't
think i've actually ever seen anyone do that in the wild
¯\_(ツ)_/¯


On Fri, 29 Jul 2022 at 10:58, Guilliam Xavier 
wrote:

> On Fri, Jul 29, 2022 at 7:15 AM mickmackusa  wrote:
>
> >
> >
> > On Monday, July 25, 2022, Guilliam Xavier 
> > wrote:
> >
> >> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa 
> wrote:
> >>
> >>> I've discovered that several native string functions offer a character
> >>> mask
> >>> as a parameter.
> >>>
> >>> I've laid out my observations at
> >>> https://stackoverflow.com/q/72865138/2943403
> >>>
> >>
> >> Out of curiosity, why do you say that strtr() is "not a good candidate
> >> because character order matters" (although you give a reasonable
> example)?
> >> Maybe you have some counter-example?
> >>
> >> Regards,
> >>
> >> --
> >> Guilliam Xavier
> >>
> >
> > I prefer to keep my scope very tight when posting on Stack Overflow.
> >
> > My focus was purely on enabling character range syntax for native
> > functions with character mask parameters.  My understanding of character
> > masks in PHP requires single-byte characters and no meaning to character
> > order.
> >
> > When strtr() is fed two strings, they cannot be considered "character
> > masks" because the character orders matter.
> >
> > If extending character range syntax to parameters which are not character
> > masks, I might support the feature for strtr(), but ensuring that the two
> > strings are balanced will be made more difficult with ranged syntax.
> > strtr() will silently condone imbalanced strings.
> https://3v4l.org/PY15F
> >
>
> Thanks for the clarifications. You're right that the internal
> `php_charmask` converts a character list (possibly containing one or more
> ranges) into a 256-char *mask*, thus "losing" any original order; so
> strtr() actually couldn't use the same implementation (even without
> ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')`
> (`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid
> '..'-range, '..'-range needs to be incrementing").
>
> I had seen a parallel with the Unix `tr` command, which *does* support
> [incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo
> adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` gives
> "eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' are
> in reverse collating sequence order"), but its implementation doesn't use
> character masks indeed (
> https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo
> abracadabra | tr a-f x` gives "xxrxxrx" not "xbrxcxdxbrx"; and it also
> supports more things like POSIX character classes...
>
> PS: I find the `strtr(string $string, array $replace_pairs)` form generally
> superior to the `strtr(string $string, string $from, string $to)` one
> anyway ;)
>
> Regards,
>
> --
> Guilliam Xavier
>


[PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-29 Thread Guilliam Xavier
On Fri, Jul 29, 2022 at 7:15 AM mickmackusa  wrote:

>
>
> On Monday, July 25, 2022, Guilliam Xavier 
> wrote:
>
>> On Sat, Jul 9, 2022 at 1:56 AM mickmackusa  wrote:
>>
>>> I've discovered that several native string functions offer a character
>>> mask
>>> as a parameter.
>>>
>>> I've laid out my observations at
>>> https://stackoverflow.com/q/72865138/2943403
>>>
>>
>> Out of curiosity, why do you say that strtr() is "not a good candidate
>> because character order matters" (although you give a reasonable example)?
>> Maybe you have some counter-example?
>>
>> Regards,
>>
>> --
>> Guilliam Xavier
>>
>
> I prefer to keep my scope very tight when posting on Stack Overflow.
>
> My focus was purely on enabling character range syntax for native
> functions with character mask parameters.  My understanding of character
> masks in PHP requires single-byte characters and no meaning to character
> order.
>
> When strtr() is fed two strings, they cannot be considered "character
> masks" because the character orders matter.
>
> If extending character range syntax to parameters which are not character
> masks, I might support the feature for strtr(), but ensuring that the two
> strings are balanced will be made more difficult with ranged syntax.
> strtr() will silently condone imbalanced strings.  https://3v4l.org/PY15F
>

Thanks for the clarifications. You're right that the internal
`php_charmask` converts a character list (possibly containing one or more
ranges) into a 256-char *mask*, thus "losing" any original order; so
strtr() actually couldn't use the same implementation (even without
ranges), and a counter-example is `strtr('adobe', 'abcde', 'ebcda')`
(`strtr('adobe', 'a..e', 'e..a')` would trigger a Warning "Invalid
'..'-range, '..'-range needs to be incrementing").

I had seen a parallel with the Unix `tr` command, which *does* support
[incrementing] ranges (e.g. both `echo adobe | tr abcde ABCDE` and `echo
adobe | tr a-e A-E` give "ADoBE", while `echo adobe | tr abcde edcba` gives
"eboda" but `echo adobe | tr a-e e-a` errors "range-endpoints of 'e-a' are
in reverse collating sequence order"), but its implementation doesn't use
character masks indeed (
https://github.com/coreutils/coreutils/blob/master/src/tr.c), and `echo
abracadabra | tr a-f x` gives "xxrxxrx" not "xbrxcxdxbrx"; and it also
supports more things like POSIX character classes...

PS: I find the `strtr(string $string, array $replace_pairs)` form generally
superior to the `strtr(string $string, string $from, string $to)` one
anyway ;)

Regards,

-- 
Guilliam Xavier


[PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-22 Thread mickmackusa
>
>
>
 If I seek to have a round of voting for an RFC on character ranges in
character mask parameters, should I propose it for PHP8.3 or a higher
version?
I have only identified 4 native string functions that make reasable
candidates to join the 7 existing functions with this feature.

I don't think there is any benefit in explaining how these functions work.
The sole purpose for this change (and the reason that other functions have
it already) is to reduce code bloat without needing any extra function
calls.  If the feature is good enough for the first 7 functions, then it
should be good enough for these other 4 functions.

Breaking change possibility: if code is silly enough to repeat ANY
characters in the mask AND the repeated character is a dot between two
other characters, then I don't have much sympathy.   Honestly though, we
are talking about a super unlikely occurrence.

 Some demos: https://3v4l.org/2Y0q4

Mick


[PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-09 Thread Christoph M. Becker
On 09.07.2022 at 01:55, mickmackusa wrote:

> I've discovered that several native string functions offer a character mask
> as a parameter.
>
> I've laid out my observations at
> https://stackoverflow.com/q/72865138/2943403
>
> In a nutshell, not all character masks offer ranges via "double dot"
> syntax. Or should I refer to ".." as the "string spread operator" to avoid
> naming conflict with "..." -- the better known "spread operator" (array
> spread operator)?
>
> Rowan/@IMSoP informed me that the current division between the haves and
> the have-nots appears to be based on the source language from which PHP
> pulled. Essentially, if from C, the double dot does not represent a range.
> https://chat.stackoverflow.com/transcript/11?m=54864842#54864842
>
> Character ranges are not yet supported for:
> - strcspn()
> - strpbrk()
> - strspn()
>
> Before I fire off an RFC, I would like to know:
>
> 1. Are there any reasonable objections to consistently implementing
> character range expressions for all character masks?

In my opinion, this notation is somewhat confusing; trim($str, "a..z")
and trim($str, "a.z") look pretty similar, but have completely different
meaning.  I'd rather have some general way to construct such ranges; the
slightly contrived implode(range()) is already available, though.
Besides, adding support for such character ranges to other functions
now, constitutes a (probably minor) BC break.

> 2. Are there any native functions that I did not mention my Stack Overflow
> answer?

It is impossible to list all "native" functions, at least if you mean
internal functions, because these may be defined by extensions.  And
these extensions would need to explicitly implement support for such
character ranges.

> 3. Is it true that only single-byte characters can be used in all
> scenarios? If so, must it remain that way?

I think it needs to remain that way, since the functions already
accepting character ranges actually work on byte strings.

> 4. Is there already an official or widely-used term that I should be using
> for the two-dot operator?

I'd call them character ranges; the implementation is called
php_charmask()
().

> I should also mention that I initially considered requesting that all
> character mask parameters be named $mask (instead of $separators, $token,
> or $characters), but I later resigned to the fact that changing to a name
> that describes the texture of the string would remove the more
> vital/intuitive purpose of the string.  I suppose the best that can be done
> to inform developers is to explicitly mention in the documentation when
> character range expressions are implemented and demonstrate their usage in
> an example (not just as a user comment at the bottom; this isn't In-N-Out
> Burger -- put your offerings on the frickin' menu!).

I agree that the documentation needs to be improved.  While trim()
mentions the character range support in one sentence, addcslashes()
dedicates several paragraphs of detailed explanation.

--
Christoph M. Becker

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-09 Thread Rowan Tommins
On 9 July 2022 05:02:21 BST, mickmackusa  wrote:
>Thanks for your reply, Kirill, but I am no way trying to introduce a new,
>general use operator for all encountered strings.
>
>I am purely focused on having the operator consistently implemented for all
>character masks.


I think the confusion here comes from your use of the word "operator" - in a 
technical sense, this is not an operator in the language, which takes two 
values or expressions and produces a new value. Rather, it's a convention used 
inside certain functions, to interpret a string argument in a special way. I 
suppose you could argue that the result is a very simple embedded language, 
like regular expressions, and then '..' would be an operator in that embedded 
language; but it's probably not how most people would describe it.

As for proposing to add it in more places, it would be good to have a clear 
expression of why having this facility in those functions would be useful. 
Every extra feature adds complexity, and is a potential source of bugs both in 
its implementation and in code that users write which touches it. A proposal 
needs to make a clear case of the gains that outweigh those costs.

Regards,

-- 
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] Re: Character range syntax ".." for character masks

2022-07-08 Thread mickmackusa
On Saturday, July 9, 2022, Kirill Nesmeyanov  wrote:

>
> Note that the "..." operator is unary, so there is no syntax conflict when
> using two floats:
> ```
> echo 0...1; // 00.1
> ```
>
> However, in the case of the ".." operator, it is assumed to be a binary
> operator, so problems with grammar ambiguity may arise:
> ```
> echo 0 ..1; // 00.1
> echo 0.. 1; // 01
> ```
>
> *  Note: The syntax you suggest is widely used in at least Ruby (
> https://ruby-doc.org/core-2.5.1/Range.html ) and CoffeeScript.
> *  Note: There is also a `trim`, `ltrim` and `rtrim` functions
>
> >Суббота, 9 июля 2022, 2:56 +03:00 от mickmackusa :
> >
> >I've discovered that several native string functions offer a character
> mask
> >as a parameter.
> >
> >I've laid out my observations at
> >https://stackoverflow.com/q/72865138/2943403
> >
> >In a nutshell, not all character masks offer ranges via "double dot"
> >syntax. Or should I refer to ".." as the "string spread operator" to avoid
> >naming conflict with "..." -- the better known "spread operator" (array
> >spread operator)?
> >
> >Rowan/@IMSoP informed me that the current division between the haves and
> >the have-nots appears to be based on the source language from which PHP
> >pulled. Essentially, if from C, the double dot does not represent a range.
> >https://chat.stackoverflow.com/transcript/11?m=54864842#54864842
> >
> >Character ranges are not yet supported for:
> >- strcspn()
> >- strpbrk()
> >- strspn()
> >
> >Before I fire off an RFC, I would like to know:
> >
> >1. Are there any reasonable objections to consistently implementing
> >character range expressions for all character masks?
> >2. Are there any native functions that I did not mention my Stack Overflow
> >answer?
> >3. Is it true that only single-byte characters can be used in all
> >scenarios? If so, must it remain that way?
> >4. Is there already an official or widely-used term that I should be using
> >for the two-dot operator?
> >
> >I should also mention that I initially considered requesting that all
> >character mask parameters be named $mask (instead of $separators, $token,
> >or $characters), but I later resigned to the fact that changing to a name
> >that describes the texture of the string would remove the more
> >vital/intuitive purpose of the string. I suppose the best that can be done
> >to inform developers is to explicitly mention in the documentation when
> >character range expressions are implemented and demonstrate their usage in
> >an example (not just as a user comment at the bottom; this isn't In-N-Out
> >Burger -- put your offerings on the frickin' menu!).
> >
> >mickmackusa
>
>
> --
> Kirill Nesmeyanov
>


Thanks for your reply, Kirill, but I am no way trying to introduce a new,
general use operator for all encountered strings.

I am purely focused on having the operator consistently implemented for all
character masks.

The language construct `echo` does not have a specified character mask
parameter.

mickmackusa