> Am 30.06.2015 um 04:37 schrieb François Laupretre <franc...@php.net>:
> 
> Hi,
> 
>> -----Message d'origine-----
>> De : Bob Weinand [mailto:bobw...@hotmail.com]
>> 
>> Hey, looks like I'm a bit late to the party, but I'd like to object against 
>> the
>> accessor macros on zend_string.
>> 
>> What does it help to replace very nice ->val, ->len etc. with macros?
> 
> If you find it 'very nice', probably nothing. I suggest you stop using macros 
> everywhere. After all, why did we define Z_STRVAL() instead of writing 
> something 'very nice' like 'zv.value.str->val' ? Just one reason : if we had 
> written 'zv.value.val' everywhere, zend_strings couldn't exist today.
> 
> In a few words, we are improving encapsulation : publishing an API instead of 
> a C structure.

That's what I said in the other mail (I sent about 4 mins after you sent 
yours)… It makes sense for hiding an union, but that's rather out of necessity 
because it's the best of two bads. (Anon union would be preferable)

>> ZSTR_VAL(str)[ZSTR_LEN(str)] = 0; or str->val[str->len] = 0;
>> zend_string_set_len(str, ZSTR_LEN(str) - 2); or str->len -= 2;
>> Also, that particular function suggests to me that the string is expanded if
>> lengths don't match.
> 
> I agree it would be quite natural and I'd like to implement it in a future 
> version. We just need to store the allocated size in the struct, which also 
> allows to over-allocate memory in advance. Anyway, performance is critical 
> there, so the code must remain simple and fast.

The point rather is that, while it's natural, you probably don't want that in 
most cases.

>> I much prefer direct accessors which indicate a) no side-effects happening
>> here, b) less functions to remember (you just look at the struct and know
>> everything about it) and c) end up being more readable.
> 
> a) You deal with an API, not a structure. So, you call methods instead of 
> accessing variables. The user does not have to care about side effects of the 
> methods he's using. Don't think about C macros, think about methods.
> b) Getting documentation from the code is something we should forget. We 
> still need to do it because PHP lacks a usable core API documentation, 
> especially for PHP 7. So, getting API usage from the code is a workaround, 
> not the appropriate way to publish an API. The same with some 
> recently-published articles about new PHP 7 structures. These articles should 
> *never* include the underlying structures. Then, we complain about people 
> bypassing the 'official' APIs but, if they use such documentation, how can 
> they know what they should access or not ? IMO, the rules are 'never publish 
> structure details' and 'never allow direct access to structure elements'.
> c) Using ZSTR_VAL/ZSTR_LEN macros can also be considered as more readable. I 
> personally consider it as more readable because other unrelated structures 
> may also define 'len' and 'val' elements.

a) the structure is the API. Assigning or reading from/to a structure is like 
fetching the location to/from where to write.
b) You'll never reach that goal. IMHO the goal is rather simplifying 
everything, yes, even direct structure usage, if it's making things *easy to 
use with a small API*. There are cases where the structure needs to be opaque, 
because all you ever want to do on that API is higher level. E.g. HashTable. 
Every manipulation on a HashTable needs many other things (adapting size, 
managing buckets etc.) changed. It's really *simplifying* our lives here. The 
ZSTR_VAL/LEN/H() macros? no idea where that simplifies anything.
c) Sure, but the variable name will usually tell you. like "zend_string 
*function_name;". Does it there make it more readable to have 
"ZSTR_VAL(function_name)" than "function_name->val"

Also, note, when debugging, you ultimately end up always using the structures 
directly; like when lldb sees a str[1], I just see one char (when printing a 
zend_string *) … So having to access it via "p str->val" as char[1] decays to 
char * upon direct access.
You end up needing knowledge of the structures. Using an API which is only a 
direct access alias doesn't make things easier here and is potentially only 
confusing.

>> Yes, I'm totally aware that when we ever want to change the API (I don't see
>> coming it in near future anyway), it could help having wrappers. But
>> wrappers don't help against every API change… let's say for some reason
>> we'd want to go back to the old char*/int; would wrappers help? no. (Yes,
>> that's maybe a bad example, but just trying to show that, while it has maybe
>> a theoretical gain, I don't see a real practical benefit.)
> 
> The fact that you don't imagine the API changing in the near future is not an 
> argument. First, wrappers don't protect against API changes, they allow to 
> maintain the same API when the backend implementation is modified. I see at 
> least two reasons that could bring changes in the zend_string API, and it 
> could be quite soon. The first one is unifying zend_string with smart_str, 
> including in zend_string the smart_str allocation scheme. I don't know 
> exactly the form it would take but reading and/or writing the string length 
> could involve more than just writing a value in a structure element. The 
> other subject that could require implementation changes is Unicode. It failed 
> for a number of reasons in PHP6, but it doesn't mean we'll avoid the question 
> forever. If we want to include Unicode at a low level, we may have to work at 
> the zend_string level. Once again, in this case, reading and writing the 
> string length will probably involve different internal operations.

If you do anything unicode related, we'd have to also change the direct API, I 
think. Like we need setters then specify the exact encoding or similar. 
Whenever we want to do such low-level changes, we'll need to change the API 
too. It won't work transparently.
At that point, where it gets worth it, you can add proper functions for that. 
But the macros don't really help here; whether we need to change the direct 
access or the macros doesn't really matter.

Also, smart_str API could be even transparently injected… we just could assume 
that alloc_size == (len + SMART_STR_PAGE) & ~(SMART_STR_PAGE  - 1) or similar 
techniques. Using always-needed, rarely-used extra bytes for such a low-level 
structure like a string is IMO anyway a bad idea, if it's just smart_str.

> So, the question is not to blindly apply the theory and say 'tight coupling 
> is evil'. There are objective reasons to encapsulate APIs. Zend_string is 
> just the first one. It was choosen because no API existed to access the 'val' 
> and 'len' elements, and because it was relatively easy to fix, allowing for 
> an integration in 7.0. But the next one is HashTable, which is *much* more 
> complex. If you look at the implementation and usage of HashTables in the 
> code, you'll see how a combination of growing complexity and tight coupling 
> can make code unmanageable. Actually, we are at a point where changing the 
> HashTable internal implementation would be extremely heavy and complex. If we 
> don't fix it now, it can only become worse over time.

The API to access (size_t)((char *) str + 16) is basically str->len. Why do we 
need to encapsulate that API in another API?

And btw. I'm not arguing over HashTables. But that's a completely different 
beast by it's complexity...

>> zend_string is really a core structure and extremely simple. Please don't
>> trade simplicity for a small other benefit.
>> 
>> Also, we still leave zend_string struct exposed (presumably for allowing
>> inlining). At the point where we are now, many extension authors probably
>> already are also accessing zend_string directly. We're going to have a 
>> colorful
>> mix of code using the APIs and not using it, which is the worst we can get
>> now.
> 
> Providing an API when it was missing allows to start 'cleaning' the existing 
> code. It will take time and, during this time, we'll have a mix of code, yes. 
> As long as the code is not 100 % clean, we cannot modify the API, yes. But 
> the most important, IMO, is to know where we are going to. What alternative 
> do we have ? Provide no encapsulation at all and get blocked when we *need* 
> to change implementation in 1 or 2 years ? It will be too late.

As already said, when we need to change the implementation, we are *very* 
likely to also need to change that encapsulating API.

>> To end up, I very much like unifying prefixes (ZSTR_*), but I strongly 
>> dislike
>> hurting readability and consistency so much for such little benefit.
> 
> You will be satisfied. I just sent a new patch to Dmitry where the whole 
> zend_string API is prefixed with 'ZSTR_', providing a better consistency 
> (with compatibility macros for old names).
> 
> Regards
> 
> François

Thanks,
Bob
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to