A question that has arisen before (for example, here: 
https://mail.python.org/pipermail/python-list/2010-January/565497.html 
<https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is 
the question of "is defaultdict thread safe", with the answer generally being a 
conditional "yes", with the condition being what is used as the default value: 
apparently default values of python types, such as list, are thread safe, 
whereas more complicated constructs, such as lambdas, make it not thread safe. 
In my situation, I'm using a lambda, specifically:

lambda: datetime.min

So presumably *not* thread safe.

My goal is to have a dictionary of aircraft and when they were last "seen", 
with datetime.min being effectively "never". When a data point comes in for a 
given aircraft, the data point will be compared with the value in the 
defaultdict for that aircraft, and if the timestamp on that data point is newer 
than what is in the defaultdict, the defaultdict will get updated with the 
value from the datapoint (not necessarily current timestamp, but rather the 
value from the datapoint). Note that data points do not necessarily arrive in 
chronological order (for various reasons not applicable here, it's just the way 
it is), thus the need for the comparison.

When the program first starts up, two things happen:

1) a thread is started that watches for incoming data points and updates the 
dictionary as per above, and
2) the dictionary should get an initial population (in the main thread) from 
hard storage.

The behavior I'm seeing, however, is that when step 2 happens (which generally 
happens before the thread gets any updates), the dictionary gets populated with 
56 entries, as expected. However, none of those entries are visible when the 
thread runs. It's as though the thread is getting a separate copy of the 
dictionary, although debugging says that is not the case - printing the 
variable from each location shows the same address for the object.

So my questions are:

1) Is this what it means to NOT be thread safe? I was thinking of race 
conditions where individual values may get updated wrong, but this apparently 
is overwriting the entire dictionary.
2) How can I fix this?

Note: I really don't care if the "initial" update happens after the thread 
receives a data point or two, and therefore overwrites one or two values. I 
just need the dictionary to be fully populated at some point early in 
execution. In usage, the dictionary is used to see of an aircraft has been seen 
"recently", so if the most recent datapoint gets overwritten with a slightly 
older one from disk storage, that's fine - it's just if it's still showing 
datetime.min because we haven't gotten in any datapoint since we launched the 
program, even though we have "recent" data in disk storage thats a problem. So 
I don't care about the obvious race condition between the two operations, just 
that the end result is a populated dictionary. Note also that as datapoint come 
in, they are being written to disk, so the disk storage doesn't lag 
significantly anyway.

The framework of my code is below:

File: watcher.py

last_points = defaultdict(lambda:datetime.min)

# This function is launched as a thread using the threading module when the 
first client connects
def watch():
        while true:
                <wait for datapoint>
                pointtime= <extract/parse timestamp from datapoint>
                if last_points[<aircraft_identifier>] < pointtime:
                        <do stuff>
                        last_points[<aircraft_identifier>]=pointtime
                        #DEBUGGING
                        print("At update:", len(last_points))


File: main.py:

from .watcher import last_points

# This function will be triggered by a web call from a client, so could happen 
at any time
# Client will call this function immediately after connecting, as well as in 
response to various user actions.
def getac():
        <load list of aircraft and times from disk>
        <do stuff to send the list to the client>
        for record in aclist:
                last_points[<aircraft_identifier>]=record_timestamp
        #DEBUGGING
        print("At get AC:", len(last_points))


-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------




-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to