On 04/03/2021 10:54, Nikita Popov wrote:
The main one that comes to mind is something like '0' == '0.0'. However,
the real problem is something else: Comparison behavior doesn't affect just
== and !=, but also < and >. And I can see how people would want '2' < '10'
to be true (numeric comparison) rather than false (lexicographical
comparison).


That's a very good point, and I think the existence of the <=> makes this even more complicated.


Considering your two options:

1. Decouple equality comparison from relational comparison. Don't handle
numeric strings for == and !=, but do handle them for <, >, etc.


What would then be the result of '0' <=> '0.0'? Would the operator need to special case the fact that they are numerically equal but lexicographically unequal?


2. Don't allow relational comparison on strings. If you want to compare
them lexicographically, use strcmp(), otherwise cast to number first.


This is easy to *implement* for the <=> operator, but makes it much less useful. Part of the appeal of the operator is that you can write code like $sortCallback = fn($a,$b) => $a[$sortField] <=> $b[$sortField]; without needing different cases for different data types.

Granted, that's not going to use an appropriate sorting collation for many languages, but nor is strcmp().


I think further narrowing the definition of "numeric string" is a more useful course. If we were designing from scratch, the straight-forward definition would be:

- all digits: /^\d+$/
- or, all digits with leading hyphen-minus: /^-\d+$/
- or, at least one digit, a dot, and at least one more digit: /^\d+\.\d+$/
- or, as above, but with leading hyphen-minus: /^-\d+\.\d+$/

I think anything beyond that list needs to be carefully justified.

- Leading and trailing spaces are probably OK. Other whitespace (newlines, tabs, etc) probably not. - Alternative notations like hexadecimal and exponentials are easy to have false positive matches, and how common are they in practice? - Leading and trailing dots (".5", "1.") might be used sometimes, but I'd probably lean against

So, ignoring BC concerns, I would be happy with "numeric string" defined as "maybe space, maybe hyphen, some digits, maybe a dot and more digits, maybe space", which I think in regex form looks like /^ *-?\d+(\.\d+)? *$/


Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to