On Fri, 2 Dec 2016 11:26 am, DFS wrote: > On 12/01/2016 06:48 PM, Ned Batchelder wrote: >> On Thursday, December 1, 2016 at 2:31:11 PM UTC-5, DFS wrote: >>> After a simple test below, I submit that the above scenario would never >>> occur. Ever. The time gap between checking for the file's existence >>> and then trying to open it is far too short for another process to sneak >>> in and delete the file. >> >> It doesn't matter how quickly the first operation is (usually) followed >> by the second. Your process could be swapped out between the two >> operations. On a heavily loaded machine, there could be a very long >> time between them > > > How is it possible that the 'if' portion runs, then 44/100,000ths of a > second later my process yields to another process which deletes the > file, then my process continues. > > Is that governed by the dreaded GIL?
No, that has nothing to do with the GIL. It is because the operating system is a preemptive multi-processing operating system. All modern OSes are: Linux, OS X, Windows. Each program that runs, including the OS itself, is one or more processes. Typically, even on a single-user desktop machine, you will have dozens of processes running simultaneously. Every so-many clock ticks, the OS pauses whatever process is running, more-or-less interrupting whatever it was doing, passes control on to another process, then the next, then the next, and so on. The application doesn't have any control over this, it can be paused at any time, normally just for a small fraction of a second, but potentially for seconds or minutes at a time if the system is heavily loaded. > "The mechanism used by the CPython interpreter to assure that only one > thread executes Python bytecode at a time." > > But I see you posted a stack-overflow answer: > > "In the case of CPython's GIL, the granularity is a bytecode > instruction, so execution can switch between threads at any bytecode." > > Does that mean "chars=f.read().lower()" could get interrupted between > the read() and the lower()? Yes, but don't think about Python threads. Think about the OS. I'm not an expert on the low-level hardware details, so I welcome correction, but I think that you can probably expect that the OS can interrupt code execution between any two CPU instructions. Something like str.lower() is likely to be thousands of CPU instructions, even for a small string. [...] > With a 5ms window, it seems the following code would always protect the > file from being deleted between lines 4 and 5. > > -------------------------------- > 1 import os,threading > 2 f_lock=threading.Lock() > 3 with f_lock: > 4 if os.path.isfile(filename): > 5 with open(filename,'w') as f: > 6 process(f) > -------------------------------- > > > >> even if on an average machine, they are executed very quickly. Absolutely not. At least on Linux, locks are advisory, not mandatory. Here are a pair of scripts that demonstrate that. First, the well-behaved script that takes out a lock: # --- locker.py --- import os, threading, time filename = 'thefile.txt' f_lock = threading.Lock() with f_lock: print '\ntaking lock' if os.path.isfile(filename): print filename, 'exists and is a file' time.sleep(10) print 'lock still active' with open(filename,'w') as f: print f.read() # --- end --- Now, a second script which naively, or maliciously, just deletes the file: # --- bandit.py --- import os, time filename = 'thefile.txt' time.sleep(1) print 'deleting file, mwahahahaha!!!' os.remove(filename) print 'deleted' # --- end --- Now, I run them both simultaneously: [steve@ando thread-lock]$ touch thefile.txt # ensure file exists [steve@ando thread-lock]$ (python locker.py &) ; (python bandit.py &) [steve@ando thread-lock]$ taking lock thefile.txt exists and is a file deleting file, mwahahahaha!!! deleted lock still active Traceback (most recent call last): File "locker.py", line 14, in <module> print f.read() IOError: File not open for reading This is on Linux. Its possible that Windows behaves differently, and I don't know how to run a command in the background in command.com or cmd.exe or whatever you use on Windows. [...] > Also, this is just theoretical (I hope). It would be terrible system > design if all those dozens of processes were reading and writing and > deleting the same file. It is not theoretical. And it's not a terrible system design, in the sense that the alternatives are *worse*. * Turn the clock back to the 1970s and 80s with single-processing operating systems? Unacceptable -- even primitive OSes like DOS and Mac System 5 needed to include some basic multiprocessing capability. - And what are servers supposed to do in this single-process world? - Enforce mandatory locks? A great way for malware or hostile users to perform Denial Of Service attacks. Even locks being left around accidentally can be a real pain: Windows users can probably tell you about times that a file has been accidentally left open by buggy applications, and there's nothing you can do to unlock it short of rebooting. Unacceptable for a server, and pain in the rear even for a desktop. - Make every file access go through a single scheduling application which ensures there are no clashes? Probably very hard to write, and would probably kill performance. Imagine you cannot even check the existence of a 4GB file until its finished copying onto a USB stick... The cost of allowing two programs to run at the same time is that sometimes they will both want to do something to the same file. Fundamentally though, the solution here is quite simple: don't rely on "Look Before You Leap" checks any time you have shared data, and the file system is shared data. If you want *reliable* code, you MUST use a try...except block to recover from file system errors. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list