On 30 Jul 2014, at 07:50, Tjerk Meesters <[email protected]> wrote:
>> That would make sense, but doesn't solve all edge cases as your maximum array
>> index is still more than 2 times the largest positive integer on 32-bit.
>
> Is that by design, a bug or something else entirely? Could you explain this
> edge case with some code?
On a 32-bit platform, the maximum signed long is 0x7FFFFFFF, but the maximum
unsigned long is 0xFFFFFFFF, slightly more than twice as big.
For example, this does what you’d expect on my machine (OS X 64-bit Intel Core
i5):
andreas-air:~ ajf$ php -r '$x = [0xFFFFFFFF => 1]; $x[] = 2; var_dump($x);'
array(2) {
[4294967295]=>
int(1)
[4294967296]=>
int(2)
}
On my 32-bit Ubuntu VM (which I use precisely to test this kind of issue when
working on bigints), however, it wraps around:
ajf@andrea-VirtualBox:~$ php -r '$x = [0xFFFFFFFF => 1]; $x[] = 2;
var_dump($x);'
array(2) {
[-1]=>
int(1)
[0]=>
int(2)
}
I think we should probably use an unsigned long internally, but prevent
negative values.
> Forbidding negative indices is a bit harsh and imho quite unnecessary;
Actually, I missed the bit of your email suggesting treating them as strings
the first time I read it. I’d be fine with that.
> turning “out of range” indices into strings should work just fine afaict. Is
> there a reason why it shouldn’t?
Well… there is one issue. Basically, some array functions treat integer and
string keys completely differently.
> A compromise could be to allow string keys that would otherwise have
> converted into a negative integer, but disallow negative int/float explicitly.
It’d be a complete BC break, but we could make negative indices work like they
do in Python and grab the (length + index)th item (i.e. -1 returns item 4 in a
list of 5, -2 returns item 3, and so on). However, because our arrays are weird
semi-indexed semi-hashmap things, this probably isn’t good, as it’d prevent you
from using strings like “-1” as keys. Alas, I can dream.
To actually respond to your suggestion, I don’t like the idea of blocking -1
but allowing “-1”. In PHP, numeric strings, integers and floats are supposed to
be equivalent, and I’m already unhappy that large integer indexes and large
numeric string indexes work differently. Whatever we do, I’d like PHP 7’s
arrays to treat integer, float and numeric string indexes consistently.
Thinking about it a little more, if we use a long for indexes, we don’t even
need to make them strings. It would fit the principle of least astonishment IMO
if any valid PHP int is a valid index and won’t be a string. I was going to say
that negative indexes don’t work right internally, but then I realised they
could work fine for indexing into the buckets if we just cast them to unsigned
longs internally (hence getting the 2’s complement representation on modern
CPUs) for indexing and hashing, but only expose signed longs to the outside
world, including through the API.
So in summary, I think we should use signed longs for indexes (or at least
whatever type PHP’s basic int is), and anything outside of the range of one
should be treated as a string. This would make numeric strings and ints
consistent, would solve all the weird overflow issues, and is the most
intuitive approach IMO.
--
Andrea Faulds
http://ajf.me/
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php