Re: [PHP-DEV] Question on thread safety

2006-11-30 Thread Andy Wharmby

Stanislav Malyshev wrote:
So finally to my question.  Is it the intention of TSRMc. to allow 
ts_allocate_id() to be called at any time or is there an unwritten 
rule that it
should only ever called during php startup ? If its the former then I 


I think it gets called only on startup. I also think it was the 
intent, though there is no safeguard as far as I can see against 
calling it in run-time, but no module does it and it doesn't make 
sense to do it in other place than startup.


I myself see no reason why extension writers should be restricted 
from calling ts_allocate_id() outside PHP startup so believe the code 
needs 


Well, the reason is that if you want to use TSRM globals, you have to 
allocate ID before you do basically anything with them. Startup is a 
good place for that. If you don't need globals, then you should not 
call it at all. The situation where in the mid-run you suddenly 
remember you need globals seems quite unrealistic to me. Of course, if 
you can describe scenario when you would really need it in mid-run or 
it would make sense to allocate ID in mid-run, then I guess this 
should be fixed or at least safeguarded.
I did not have any particular scenario in mind here as I too could not 
come up with a sensible reason to allow calls to ts_allocate_id() 
outside initialization. Initially I thought of the case of a user script
loading an extension using dl() but soon found out this is policed  and 
not allowed if ZTS is enabled. The reason I assumed that 
ts_allocate_id() was designed to be called at any time was the fact that 
the code is wrapped in a mutex, which lead to my concerns about 
ts_resource_ex(). If ts_allocate_id() is not designed to be called 
outside startup then my concerns about ts_resource_ex() are unfounded. 
However, the mutex acquire and release calls in ts_allocate_id()  are 
therefore unnecessary and should be removed.


However, I do believe this restriction should be policed to fail any 
calls outside startup. I see nothing in the code to stop a extension 
writer calling ts_allocate_id at runtime. Why would anyone do this ? So 
TSRM global  storage is only allocated when an extension is actually 
needed rather than when a new thread is started. What if a user has an 
extension that only gets called in some exceptional circumstance and 
they design it so the ts_allocate_id() call is made on the first call to 
the extension to save storage being allocated for threads until its 
needed. Unlikely I know but  sometimes users do what you least
expect so the code should protect them wherever possible from them doing 
something which will:

   (a) cause them grief, and
   (b) probably lead them into raising a bogus defect and waste 
someones time diagnosing the problem

A simple check in the code could prevent all this.

There are further routines in TSRM.c which also acquire the tsmm_mutex 
were the reason for this is not clear given current usage:


   * ts_free_id(). Only called at MSHUTDOWN when single threaded so no
 apparent need for mutex.  Again easily policed to ensure calls
 outside startup/shutdown not allowed.
   * ts_free_worker_threads():  Only called by php_module_shutdown when
 single threaded so no need for mutex.
   * ts_resource_ex: Needs mutex whilst it updates tsrm_tls_table for a
 new thread but it looks like it keeps mutex longer than it need
 do. By reworking this routine and allocate_new_resource() I
 believe the time the mutex is held could be reduced.

I am happy to work on a patch for all this and will raise a defect with 
patch when I have something worthy of consideration..


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] Question on thread safety

2006-11-29 Thread Stanislav Malyshev
So finally to my question.  Is it the intention of TSRMc. to allow 
ts_allocate_id() to be called at any time or is there an unwritten rule 
that it
should only ever called during php startup ? If its the former then I 


I think it gets called only on startup. I also think it was the intent, 
though there is no safeguard as far as I can see against calling it in 
run-time, but no module does it and it doesn't make sense to do it in 
other place than startup.


I myself see no reason why extension writers should be restricted from 
calling ts_allocate_id() outside PHP startup so believe the code needs 


Well, the reason is that if you want to use TSRM globals, you have to 
allocate ID before you do basically anything with them. Startup is a 
good place for that. If you don't need globals, then you should not call 
it at all. The situation where in the mid-run you suddenly remember you 
need globals seems quite unrealistic to me. Of course, if you can 
describe scenario when you would really need it in mid-run or it would 
make sense to allocate ID in mid-run, then I guess this should be fixed 
or at least safeguarded.

--
Stanislav Malyshev, Zend Products Engineer
[EMAIL PROTECTED]  http://www.zend.com/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DEV] Question on thread safety

2006-11-29 Thread Andy Wharmby

Hi All,
   My first post on here but I have a come across a potential issue 
with the PHP code and rather than just raise a defect thought it better 
to solicit other

peoples views on the issue first.

I have been reviewing the PHP code recently in order to familiarize 
myself with how it all fits together. Lately I have been focusing on 
thread safety and I
have already raised a couple of defects on issues found in the code:  


   http://bugs.php.net/bug.php?id=39623
   http://bugs.php.net/bug.php?id=39648

Other potential issues have also been identified and further defects may 
follow.


However, this email relates to a question on the design of the TSRM.c 
code itself. The code in/ ts_allocate_id() /which used to allocate a new 
thread safe
resource id is single threaded by virtue of the mutex acquired on entry. 
When a new resource is allocated, the code allocates an instance of that 
resource

for each active thread as follows:

   /* enlarge the arrays for the already active threads */
   for (i=0; icount < id_count) {
   int j;

   p->storage = (void *) realloc(p->storage, sizeof(void 
*)*id_count);

   for (j=p->count; j   p->storage[j] = (void *) 
malloc(resource_types_table[j].size);

   if (resource_types_table[j].ctor) {
   resource_types_table[j].ctor(p->storage[j], 
&p->storage);

   }
   }
   p->count = id_count;
   }
   p = p->next;
   }
   }

The realloc() in the above code will potentially acquire a new memory 
block, copy the contents from original block and the free the original block
(making it eligible for re-allocation) before returning to caller which 
saves away the new memory blocks address in the threads/ /tsrm_tls-entry/. /


Next, looking at ts_resource_ex() which is called by a thread to get its 
thread local storage for a particular resource we see:


   if (!th_id) {
   /* Fast path for looking up the resources for the current
* thread. Its used by just about every call to
* ts_resource_ex(). This avoids the need for a mutex lock
* and our hashtable lookup.
*/
   thread_resources = tsrm_tls_get();

   if (thread_resources) {
   TSRM_ERROR((TSRM_ERROR_LEVEL_INFO, "Fetching resource id %d 
for current thread %d",

   id, (long) thread_resources->thread_id));
   /* Read a specific resource from the thread's resources.
* This is called outside of a mutex, so have to be aware 
about external

* changes to the structure as we read it.
*/
   TSRM_SAFE_RETURN_RSRC(thread_resources->storage, id, 
thread_resources->count);

   }
   thread_id = tsrm_thread_id();
   } else {
   thread_id = *th_id;
   }

This is executed WITHOUT the mutex (I assume for performance reasons) 
and directly accesses the same "storage" field which is modified
by ts_allocate_id(). The comment suggests someone has thought about 
potential problems here but I see no code here or in
TSRM_SAFE_RETURN_RSRC that takes account of possible modifications to 
the address in "storage".


My reading of the code as it currently stands is that there is a window 
between the freeing of the original storage block by realloc() and the
saving away of the new memory block address in the "storage" field by 
ts_allocate_id() during which time the address in "storage" is stale.
The old memory could potentially be reallocated and modified during this 
window. So it is possible for a thread to access its tsrm_tls_entry
and read an old address for "storage"; potentially picking up the 
address of storage which may have been reallocated to another thread and
modified. If is does so then the results are unpredictable but a 
segmentation violation is one of most likely outcomes.


Further, on an architecture which has a weakly ordered memory model, e.g 
PPC, there is further potential that another thread will see a stale
address even after the store into "storage" has been executed due to 
absence of any memory barrier instructions in the code. If all access to
"storage" were within a mutex then this would not be an issue as the 
mutex enter/release provide the necessary memory synchronization but
as ts_resource_ex() accesses the memory outside of a mutex their is no 
guarantee another thread calling ts_resource_ex() will see the result

of the store.

Now having said all that I do not believe given the current usage of 
ts_allocate_id() that this will cause an issue. The reason being that a 
quick
scan of the code reveals that ts_allocate_id() is only called during PHP 
initialization and extension initialization (MINIT) when the code is
effectively single threaded anyway so no thread will see any stale 
address in "storage". However, I see nothing in the code that would stop an
extension writer from calling ts_allocate_id() outside of MINIT, e.g in 
request initial