Stanislav Malyshev wrote:
> Hi!
>
>> <?php
>> $m = memory_get_usage();
>> $a = explode(',', str_repeat(',', 100000));
>> print (memory_get_usage() - $m)/100000;
>
> Says 93.2482 for me. Should be even less since string generated by
> str_repead itself also is counted as overhead (without it it's
> 92.2474). Aren't you perchance using debug build? Debug build gives
> 196 for me.

Yes, it was debug on 32-bit, but non-debug on 64-bit. So non-debug
memory usage on 64-bit is still 259 bytes per element. On 64-bit I am
using PHP 5.2.4-2ubuntu5.7wm1 from apt.wikimedia.org.

In another post:
>
> HashTable uses 40 bytes, zval is 16 bytes, Bucket is 36 bytes, which
> means if you use integer indexes, the overhead is 72 bytes per value
> including memory block headers and alignments. It might be too much
> for you, in which case I'd go towards making an extension that creates
> an object storing strings more efficiently and implementing either
> get/set handlers or ArrayAccess (or both). This of course would be
> most useful if you access only small part of strings in each
> function/method.
>

Fair enough, but we do have to support default installations. We do
already have a couple of optional extensions which reduce memory usage,
but they do more specific tasks than that.

> I do not see what could be removed from Bucket or zval without hurting
> the functionality.
>

Right, and that's why PHP is so bad compared to other languages. Its
one-size-fits-all data structure has to store a lot of data per element
to support every possible use case. However, there is room for
optimisation. For instance, an array could start off as being like a C++
std::vector. Then when someone inserts an item into it with a
non-integer key, it could be converted to a hashtable. This could
potentially give you a time saving as well, because conversion to a
hashtable could resize the destination hashtable in one step instead of
growing it O(log N) times.

Some other operations, like deleting items from the middle of the array
or adding items past the end (leaving gaps) would also have to trigger
conversion. The point would be to optimise the most common use cases for
integer-indexed arrays.

>> not much less. A simple object (due to being based on the same
>> inefficient data structure) may use a kilobyte or two.
>
> Kilobyte looks like too much for a single simple object (unless we
> have different notions of simple). Could you describe what exactly
> makes up the kilobyte - what's in the object?

<?php
class C {
    var $v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9, $v10;
}

$m = memory_get_usage();
$a = array();
for ( $i = 0; $i < 10000; $i++ ) {
    $a[] = new C;
}
print ((memory_get_usage() - $m) / 10000) . "\n";
?>

1927 bytes (I'll use 64-bit from now on since it gives the most shocking
numbers)

>
>> * Objects that can optionally pack themselves into a class-dependent
>> structure and unpack on demand
>
> Objects can do pretty much anything in Zend Engine now, provided you
> do some C :) For the engine, object is basically a pointer and an
> integer, the rest is changeable. Of course, on PHP level we need to
> have more, but that's because certain things just not doable on PHP
> level. Do you have some specific use case that would allow to reduce

Basically I'm thinking along the same lines as the array optimisation I
suggested above. For my class C in the test above, the zend_class_entry
would have a hashtable like:

v1 => 0, v2 => 1, v3 => 2, v4 => 3, v5 => 4, v6 => 5, v7 => 6, v8 =>7,
v9 => 8, v10 => 9

Then the object could be stored as a zval[10]. Object member access
would be implemented by looking up the member name in the class entry
hashtable and then using the resulting index into the zval[10]. When the
object is unpacked (say if the user creates or deletes object members at
runtime), then the object value becomes a hashtable.

>
>> * Exposing strongly-typed list and vector data structures to the user,
>> that don't have massive hashtable overheads
>> * An oparray format with less 64-bit pointers and more smallish integers
>
> Ah, you're on 64-bit... That explains why your memory requirements is
> larger :) But I'm not sure how the data op array needs can be stored
> without using pointers. 

Making oplines use a variable amount of memory (like they do in machine
code) would be a great help.

For declarations, you could pack structures like zend_class_entry and
zend_function_entry on to the end of the opline, and access them by
casting the opline to the appropriate opcode-specific type. That would
save pointers and also allocator overhead.

At the more extreme end of the spectrum, the compiler could produce a
pointerless oparray, like JVM bytecode. Then when a function is executed
for the first time, the oparray could be expanded, with pointers added,
and the result cached. This would reduce memory usage for code which is
never executed. And it would have the added advantage of making APC
easier to implement, since it could just copy the whole unexpanded
oparray with memcpy().

-- Tim Starling

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to