Re: [PHP-DEV] Unicode chars allowed in numbers?

2006-11-29 Thread Matt Wilmas
Hi Andrei,

One more related question: What about for any leading whitespace with
numeric strings, like in zend_u_strtol()?  Is u_isspace() needed, or are
only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed?


Thanks again,
Matt


- Original Message -
From: Andrei Zmievski
Sent: Friday, November 10, 2006

  Hi Andrei, et al.,
 
  I was just looking at README.UNICODE, regarding interpretation of
  numbers:
  we restrict numbers to consist only of ASCII digits, and Numeric
  strings
  are supposed to adhere to the same rules.  Is it correct to take
  that to
  mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
  equivalents for bases  10)?

 Correct.

  I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
  keys, etc.,
  the u_digit() function is used, which also allows non-ASCII, higher-
  value
  digit characters, doesn't it?  But then in is_numeric_unicode(), when
  checking for hex numbers, the ASCII values '0' and 'x' are used,
  which is
  what I'd expect after reading README.UNICODE.

 You're correct here again, u_digit() should not be used there.

 -Andrei

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Question on thread safety

2006-11-29 Thread Andy Wharmby

Hi All,
   My first post on here but I have a come across a potential issue 
with the PHP code and rather than just raise a defect thought it better 
to solicit other

peoples views on the issue first.

I have been reviewing the PHP code recently in order to familiarize 
myself with how it all fits together. Lately I have been focusing on 
thread safety and I
have already raised a couple of defects on issues found in the code:  


   http://bugs.php.net/bug.php?id=39623
   http://bugs.php.net/bug.php?id=39648

Other potential issues have also been identified and further defects may 
follow.


However, this email relates to a question on the design of the TSRM.c 
code itself. The code in/ ts_allocate_id() /which used to allocate a new 
thread safe
resource id is single threaded by virtue of the mutex acquired on entry. 
When a new resource is allocated, the code allocates an instance of that 
resource

for each active thread as follows:

   /* enlarge the arrays for the already active threads */
   for (i=0; itsrm_tls_table_size; i++) {
   tsrm_tls_entry *p = tsrm_tls_table[i];

   while (p) {
   if (p-count  id_count) {
   int j;

   p-storage = (void *) realloc(p-storage, sizeof(void 
*)*id_count);

   for (j=p-count; jid_count; j++) {
   p-storage[j] = (void *) 
malloc(resource_types_table[j].size);

   if (resource_types_table[j].ctor) {
   resource_types_table[j].ctor(p-storage[j], 
p-storage);

   }
   }
   p-count = id_count;
   }
   p = p-next;
   }
   }

The realloc() in the above code will potentially acquire a new memory 
block, copy the contents from original block and the free the original block
(making it eligible for re-allocation) before returning to caller which 
saves away the new memory blocks address in the threads/ /tsrm_tls-entry/. /


Next, looking at ts_resource_ex() which is called by a thread to get its 
thread local storage for a particular resource we see:


   if (!th_id) {
   /* Fast path for looking up the resources for the current
* thread. Its used by just about every call to
* ts_resource_ex(). This avoids the need for a mutex lock
* and our hashtable lookup.
*/
   thread_resources = tsrm_tls_get();

   if (thread_resources) {
   TSRM_ERROR((TSRM_ERROR_LEVEL_INFO, Fetching resource id %d 
for current thread %d,

   id, (long) thread_resources-thread_id));
   /* Read a specific resource from the thread's resources.
* This is called outside of a mutex, so have to be aware 
about external

* changes to the structure as we read it.
*/
   TSRM_SAFE_RETURN_RSRC(thread_resources-storage, id, 
thread_resources-count);

   }
   thread_id = tsrm_thread_id();
   } else {
   thread_id = *th_id;
   }

This is executed WITHOUT the mutex (I assume for performance reasons) 
and directly accesses the same storage field which is modified
by ts_allocate_id(). The comment suggests someone has thought about 
potential problems here but I see no code here or in
TSRM_SAFE_RETURN_RSRC that takes account of possible modifications to 
the address in storage.


My reading of the code as it currently stands is that there is a window 
between the freeing of the original storage block by realloc() and the
saving away of the new memory block address in the storage field by 
ts_allocate_id() during which time the address in storage is stale.
The old memory could potentially be reallocated and modified during this 
window. So it is possible for a thread to access its tsrm_tls_entry
and read an old address for storage; potentially picking up the 
address of storage which may have been reallocated to another thread and
modified. If is does so then the results are unpredictable but a 
segmentation violation is one of most likely outcomes.


Further, on an architecture which has a weakly ordered memory model, e.g 
PPC, there is further potential that another thread will see a stale
address even after the store into storage has been executed due to 
absence of any memory barrier instructions in the code. If all access to
storage were within a mutex then this would not be an issue as the 
mutex enter/release provide the necessary memory synchronization but
as ts_resource_ex() accesses the memory outside of a mutex their is no 
guarantee another thread calling ts_resource_ex() will see the result

of the store.

Now having said all that I do not believe given the current usage of 
ts_allocate_id() that this will cause an issue. The reason being that a 
quick
scan of the code reveals that ts_allocate_id() is only called during PHP 
initialization and extension initialization (MINIT) when the code is
effectively single threaded anyway so no thread will see any stale 
address in storage. However, I see nothing in the code that 

Re: [PHP-DEV] Unicode chars allowed in numbers?

2006-11-29 Thread Andrei Zmievski

We should use whatever trim() uses, I think.

-Andrei


On Nov 29, 2006, at 5:58 AM, Matt Wilmas wrote:


Hi Andrei,

One more related question: What about for any leading whitespace with
numeric strings, like in zend_u_strtol()?  Is u_isspace() needed,  
or are

only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed?


Thanks again,
Matt


- Original Message -
From: Andrei Zmievski
Sent: Friday, November 10, 2006


Hi Andrei, et al.,

I was just looking at README.UNICODE, regarding interpretation of
numbers:
we restrict numbers to consist only of ASCII digits, and Numeric
strings
are supposed to adhere to the same rules.  Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases  10)?


Correct.


I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it?  But then in is_numeric_unicode(),  
when

checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.


You're correct here again, u_digit() should not be used there.

-Andrei


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Unicode chars allowed in numbers?

2006-11-29 Thread Pierre

Hello,

On 11/29/06, Andrei Zmievski [EMAIL PROTECTED] wrote:

We should use whatever trim() uses, I think.


I think so too (more consistent).

--Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Question on thread safety

2006-11-29 Thread Stanislav Malyshev
So finally to my question.  Is it the intention of TSRMc. to allow 
ts_allocate_id() to be called at any time or is there an unwritten rule 
that it
should only ever called during php startup ? If its the former then I 


I think it gets called only on startup. I also think it was the intent, 
though there is no safeguard as far as I can see against calling it in 
run-time, but no module does it and it doesn't make sense to do it in 
other place than startup.


I myself see no reason why extension writers should be restricted from 
calling ts_allocate_id() outside PHP startup so believe the code needs 


Well, the reason is that if you want to use TSRM globals, you have to 
allocate ID before you do basically anything with them. Startup is a 
good place for that. If you don't need globals, then you should not call 
it at all. The situation where in the mid-run you suddenly remember you 
need globals seems quite unrealistic to me. Of course, if you can 
describe scenario when you would really need it in mid-run or it would 
make sense to allocate ID in mid-run, then I guess this should be fixed 
or at least safeguarded.

--
Stanislav Malyshev, Zend Products Engineer
[EMAIL PROTECTED]  http://www.zend.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Re: Cross-extension resource

2006-11-29 Thread Sara Golemon

I'm writing an extension in order to achieve better performance in a
specific module from our application. Right now, I'm trying to use an
already established mysql connection (with mysql_connect) in our extension,
so we don't have to connect to the database twice, since that connection is
used in the PHP code. I don't need PHP's mysql API, since I can do
everything in C, I just want the connection ID. I've read Sara's (great...
GREAT) book, but I couldn't find any specific solution to this situation. Is
this possible? Is there a best way?


Surprising how often this question comes up

/* True global to store local copy of mysql's le_link */
static int local_mysql_le = -1;

PHP_RINIT_FUNCTION(myext)
{
if (local_mysql_le == -1) {
local_mysql_le = zend_fetch_list_dtor_id(mysql link);
}

return SUCCESS;
}

A few notes about this solution:
(1) I'm doing this in RINIT because, prior to PHP 5.1, there's no way to 
enforce load order for shared extensions, so it's possible that your ext 
will load prior to mysql and therefore the list ID won't have been 
registered during the time of MINIT.  If you know that MySQL will always 
be loaded prior to your extension (either because of local policy or 
because you're only targeting 5.1 or later -- which has module 
dependencies), then you can do it once in MINIT and be done with it.


(2) The name mysql link is case sensitive and must match the resource 
type name you're looking for precisely.  There's also no guard against 
the (unlikely) possibility that some other extension registers a 
resource named mysql link which isn't actually a MySQL link.


(3) You should guard against the possibility that no matching list id 
will be found for that name (perhaps MySQL isn't loaded), so be sure to 
put some error checking in there...  Hint: This function returns 0 on 
failure.


(4) You'll probably need to fetch mysql link persistent as well...

I think this function is covered in Appendix A, but the cat is on my lap 
and I can't reach the bookshelf from here


-Sara

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php