Re: [PHP-DEV] Unicode chars allowed in numbers?
Hi Andrei, One more related question: What about for any leading whitespace with numeric strings, like in zend_u_strtol()? Is u_isspace() needed, or are only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed? Thanks again, Matt - Original Message - From: Andrei Zmievski Sent: Friday, November 10, 2006 Hi Andrei, et al., I was just looking at README.UNICODE, regarding interpretation of numbers: we restrict numbers to consist only of ASCII digits, and Numeric strings are supposed to adhere to the same rules. Is it correct to take that to mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z' equivalents for bases 10)? Correct. I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array keys, etc., the u_digit() function is used, which also allows non-ASCII, higher- value digit characters, doesn't it? But then in is_numeric_unicode(), when checking for hex numbers, the ASCII values '0' and 'x' are used, which is what I'd expect after reading README.UNICODE. You're correct here again, u_digit() should not be used there. -Andrei -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Question on thread safety
Hi All, My first post on here but I have a come across a potential issue with the PHP code and rather than just raise a defect thought it better to solicit other peoples views on the issue first. I have been reviewing the PHP code recently in order to familiarize myself with how it all fits together. Lately I have been focusing on thread safety and I have already raised a couple of defects on issues found in the code: http://bugs.php.net/bug.php?id=39623 http://bugs.php.net/bug.php?id=39648 Other potential issues have also been identified and further defects may follow. However, this email relates to a question on the design of the TSRM.c code itself. The code in/ ts_allocate_id() /which used to allocate a new thread safe resource id is single threaded by virtue of the mutex acquired on entry. When a new resource is allocated, the code allocates an instance of that resource for each active thread as follows: /* enlarge the arrays for the already active threads */ for (i=0; itsrm_tls_table_size; i++) { tsrm_tls_entry *p = tsrm_tls_table[i]; while (p) { if (p-count id_count) { int j; p-storage = (void *) realloc(p-storage, sizeof(void *)*id_count); for (j=p-count; jid_count; j++) { p-storage[j] = (void *) malloc(resource_types_table[j].size); if (resource_types_table[j].ctor) { resource_types_table[j].ctor(p-storage[j], p-storage); } } p-count = id_count; } p = p-next; } } The realloc() in the above code will potentially acquire a new memory block, copy the contents from original block and the free the original block (making it eligible for re-allocation) before returning to caller which saves away the new memory blocks address in the threads/ /tsrm_tls-entry/. / Next, looking at ts_resource_ex() which is called by a thread to get its thread local storage for a particular resource we see: if (!th_id) { /* Fast path for looking up the resources for the current * thread. Its used by just about every call to * ts_resource_ex(). This avoids the need for a mutex lock * and our hashtable lookup. */ thread_resources = tsrm_tls_get(); if (thread_resources) { TSRM_ERROR((TSRM_ERROR_LEVEL_INFO, Fetching resource id %d for current thread %d, id, (long) thread_resources-thread_id)); /* Read a specific resource from the thread's resources. * This is called outside of a mutex, so have to be aware about external * changes to the structure as we read it. */ TSRM_SAFE_RETURN_RSRC(thread_resources-storage, id, thread_resources-count); } thread_id = tsrm_thread_id(); } else { thread_id = *th_id; } This is executed WITHOUT the mutex (I assume for performance reasons) and directly accesses the same storage field which is modified by ts_allocate_id(). The comment suggests someone has thought about potential problems here but I see no code here or in TSRM_SAFE_RETURN_RSRC that takes account of possible modifications to the address in storage. My reading of the code as it currently stands is that there is a window between the freeing of the original storage block by realloc() and the saving away of the new memory block address in the storage field by ts_allocate_id() during which time the address in storage is stale. The old memory could potentially be reallocated and modified during this window. So it is possible for a thread to access its tsrm_tls_entry and read an old address for storage; potentially picking up the address of storage which may have been reallocated to another thread and modified. If is does so then the results are unpredictable but a segmentation violation is one of most likely outcomes. Further, on an architecture which has a weakly ordered memory model, e.g PPC, there is further potential that another thread will see a stale address even after the store into storage has been executed due to absence of any memory barrier instructions in the code. If all access to storage were within a mutex then this would not be an issue as the mutex enter/release provide the necessary memory synchronization but as ts_resource_ex() accesses the memory outside of a mutex their is no guarantee another thread calling ts_resource_ex() will see the result of the store. Now having said all that I do not believe given the current usage of ts_allocate_id() that this will cause an issue. The reason being that a quick scan of the code reveals that ts_allocate_id() is only called during PHP initialization and extension initialization (MINIT) when the code is effectively single threaded anyway so no thread will see any stale address in storage. However, I see nothing in the code that
Re: [PHP-DEV] Unicode chars allowed in numbers?
We should use whatever trim() uses, I think. -Andrei On Nov 29, 2006, at 5:58 AM, Matt Wilmas wrote: Hi Andrei, One more related question: What about for any leading whitespace with numeric strings, like in zend_u_strtol()? Is u_isspace() needed, or are only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed? Thanks again, Matt - Original Message - From: Andrei Zmievski Sent: Friday, November 10, 2006 Hi Andrei, et al., I was just looking at README.UNICODE, regarding interpretation of numbers: we restrict numbers to consist only of ASCII digits, and Numeric strings are supposed to adhere to the same rules. Is it correct to take that to mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z' equivalents for bases 10)? Correct. I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array keys, etc., the u_digit() function is used, which also allows non-ASCII, higher- value digit characters, doesn't it? But then in is_numeric_unicode(), when checking for hex numbers, the ASCII values '0' and 'x' are used, which is what I'd expect after reading README.UNICODE. You're correct here again, u_digit() should not be used there. -Andrei -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Unicode chars allowed in numbers?
Hello, On 11/29/06, Andrei Zmievski [EMAIL PROTECTED] wrote: We should use whatever trim() uses, I think. I think so too (more consistent). --Pierre -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP-DEV] Question on thread safety
So finally to my question. Is it the intention of TSRMc. to allow ts_allocate_id() to be called at any time or is there an unwritten rule that it should only ever called during php startup ? If its the former then I I think it gets called only on startup. I also think it was the intent, though there is no safeguard as far as I can see against calling it in run-time, but no module does it and it doesn't make sense to do it in other place than startup. I myself see no reason why extension writers should be restricted from calling ts_allocate_id() outside PHP startup so believe the code needs Well, the reason is that if you want to use TSRM globals, you have to allocate ID before you do basically anything with them. Startup is a good place for that. If you don't need globals, then you should not call it at all. The situation where in the mid-run you suddenly remember you need globals seems quite unrealistic to me. Of course, if you can describe scenario when you would really need it in mid-run or it would make sense to allocate ID in mid-run, then I guess this should be fixed or at least safeguarded. -- Stanislav Malyshev, Zend Products Engineer [EMAIL PROTECTED] http://www.zend.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
[PHP-DEV] Re: Cross-extension resource
I'm writing an extension in order to achieve better performance in a specific module from our application. Right now, I'm trying to use an already established mysql connection (with mysql_connect) in our extension, so we don't have to connect to the database twice, since that connection is used in the PHP code. I don't need PHP's mysql API, since I can do everything in C, I just want the connection ID. I've read Sara's (great... GREAT) book, but I couldn't find any specific solution to this situation. Is this possible? Is there a best way? Surprising how often this question comes up /* True global to store local copy of mysql's le_link */ static int local_mysql_le = -1; PHP_RINIT_FUNCTION(myext) { if (local_mysql_le == -1) { local_mysql_le = zend_fetch_list_dtor_id(mysql link); } return SUCCESS; } A few notes about this solution: (1) I'm doing this in RINIT because, prior to PHP 5.1, there's no way to enforce load order for shared extensions, so it's possible that your ext will load prior to mysql and therefore the list ID won't have been registered during the time of MINIT. If you know that MySQL will always be loaded prior to your extension (either because of local policy or because you're only targeting 5.1 or later -- which has module dependencies), then you can do it once in MINIT and be done with it. (2) The name mysql link is case sensitive and must match the resource type name you're looking for precisely. There's also no guard against the (unlikely) possibility that some other extension registers a resource named mysql link which isn't actually a MySQL link. (3) You should guard against the possibility that no matching list id will be found for that name (perhaps MySQL isn't loaded), so be sure to put some error checking in there... Hint: This function returns 0 on failure. (4) You'll probably need to fetch mysql link persistent as well... I think this function is covered in Appendix A, but the cat is on my lap and I can't reach the bookshelf from here -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php