On Nov 1, 2017, at 9:58 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> 
> On Tue, Oct 31, 2017 at 11:38 AM, Israel Brewster <isr...@ravnalaska.net> 
> wrote:
>> A question that has arisen before (for example, here: 
>> https://mail.python.org/pipermail/python-list/2010-January/565497.html 
>> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is 
>> the question of "is defaultdict thread safe", with the answer generally 
>> being a conditional "yes", with the condition being what is used as the 
>> default value: apparently default values of python types, such as list, are 
>> thread safe,
> 
> I would not rely on this. It might be true for current versions of
> CPython, but I don't think there's any general guarantee and you could
> run into trouble with other implementations.

Right, completely agreed. Kinda feels "dirty" to rely on things like this to me.

> 
>> [...]
> 
> [...] You could use a regular dict and just check if
> the key is present, perhaps with the additional argument to .get() to
> return a default value.

True. Using defaultdict is simply saves having to stick the same default in 
every call to get(). DRY principal and all. That said, see below - I don't 
think the defaultdict is the issue.

> 
> Individual lookups and updates of ordinary dicts are atomic (at least
> in CPython). A lookup followed by an update is not, and this would be
> true for defaultdict as well.
> 
>> [...]
>> 1) Is this what it means to NOT be thread safe? I was thinking of race 
>> conditions where individual values may get updated wrong, but this 
>> apparently is overwriting the entire dictionary.
> 
> No, a thread-safety issue would be something like this:
> 
>    account[user] = account[user] + 1
> 
> where the value of account[user] could potentially change between the
> time it is looked up and the time it is set again.

That's what I thought - changing values/different values from expected, not 
missing values.

All that said, I just had a bit of an epiphany: the main thread is actually a 
Flask app, running through UWSGI with multiple *processes*, and using the 
flask-uwsgi-websocket plugin, which further uses greenlets. So what I was 
thinking was simply a separate thread was, in reality, a completely separate 
*process*. I'm sure that makes a difference. So what's actually happening here 
is the following:

1) the main python process starts, which initializes the dictionary (since it 
is at a global level)
2) uwsgi launches off a bunch of child worker processes (10 to be exact, each 
of which is set up with 10 gevent threads)
3a) a client connects (web socket connection to be exact). This connection is 
handled by an arbitrary worker, and an arbitrary green thread within that 
worker, based on UWSGI algorithms.
3b) This connection triggers launching of a *true* thread (using the python 
threading library) which, presumably, is now a child thread of that arbitrary 
uwsgi worker. <== BAD THING, I would think
4) The client makes a request for the list, which is handled by a DIFFERENT 
(presumably) arbitrary worker process and green thread.

So the end result is that the thread that "updates" the dictionary, and the 
thread that initially *populates* the dictionary are actually running in 
different processes. In fact, any given request could be in yet another 
process, which would seem to indicate that all bets are off as to what data is 
seen.

Now that I've thought through what is really happening, I think I need to 
re-architect things a bit here. For one thing, the update thread should be 
launched from the main process, not an arbitrary UWSGI worker. I had launched 
it from the client connection because there is no point in having it running if 
there is no one connected, but I may need to launch it from the __init__.py 
file instead. For another thing, since this dictionary will need to be accessed 
from arbitrary worker processes, I'm thinking I may need to move it to some 
sort of external storage, such as a redis database. Oy, I made my life 
complicated :-)

> That said it's not
> obvious to me what your problem actually is.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to