On Tue, Dec 1, 2015 at 3:48 PM, Nikita Popov <nikita....@gmail.com> wrote:

> On Tue, Dec 1, 2015 at 10:50 AM, Dmitry Stogov <dmi...@zend.com> wrote:
>
>> Hi Nikita,
>>
>> few notes:
>>
>> It looks like the patch messes the attempt of catching the problem (few
>> lines related to zend_hash_find_bucket changes) with optimization to
>> compensate collision check overhead (everything else). I think it's better
>> to separate these parts.
>>
>
> The addition of zend_hash_add_or_return() isn't an optimization, it is
> required to ensure that the check happens in all relevant cases. The code
> previously used a combination of zend_hash_find() and zend_hash_add_new().
> However the whole purpose of the zend_hash_add_new() function is to skip
> the part of the insertion operation where the collision count would be done.
>
> At this point I could either a) also count collisions in zend_hash_find(),
> which is imho not a good option as it's enough to count once and not at
> every lookup, or b) avoid the combination of zend_hash_find() and
> zend_hash_add_new() by introducing a new zend_hash_add_or_return() function
> (which does the same as the previous combination but also performs the
> collision count). Alternatively I would also simply make
> zend_hash_add_new() the same as zend_hash_add(), which is probably not
> wanted either.
>

I got it now.


>
>
>> Introduction of zend_hash_add_or_return() in 7.0.1 is going to make
>> forward incompatibility with 7.0.0. However, we may reserve it for internal
>> usage removing ZEND_API.
>>
>
> I'm fine with not marking it ZEND_API in 7.0.
>
>
>> I don't think PHP should prevent user from construction of arrays he
>> likes, because of internal problems (like in your example).
>>
>> >  $s = 1 << 16; $a = [];
>> >  for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
>>
>> It makes performance problem, but it doesn't mean we should kill it.
>> We have "max_execution_time" to catch them all.
>>
>
>> I think only big arrays coming from external sources should be checked.
>>
>
> There is no way for PHP to know whether or not an array is constructed
> from an external source. Yes, of course we can special case json_decode()
> and unserialize(), but this would not fix the vulnerability for any array
> that is created in PHP code. It's not exactly unusual to create arrays
> which are in some way based on user input.
>

If this is a real problem, we should better provide an API for "safe" array
construction from "unsafe" data.
Also, if such arrays are constructed from many elements, it may make sense
to insert all elements first and then rehash them.
This will make O(2*N) complexity, even if all of them have the same hash
value.


>
> Your solution is incomplete, anyway, because of crafting a single list
>> with 10000 collisions, attacker will able to use N lists with 1000
>> collisions.
>>
>
> One list with N*1000 collisions and N lists with 1000 collisions each have
> very different asymptotic complexity. The former is O(N^2), while the
> latter is O(N). The DOS attack is based on having O(N^2) complexity.
>

You are right of course about O(N) and O(N^2), but today affecting
data-locality may make more harm then list length.
The fact, that PHP7 is few times faster on your example, is caused by the
linear placement of the collisions list.
Once we start to work with randomized lists, in addition to complexity, we
will also suffer from stalls because of the cache misses.

Thanks. Dmitry.

Nikita
>

Reply via email to