Before I file a bug report against Python 2.5.2, I want to run this by the newsgroup to make sure I'm not being stupid.
I have a text file of fixed-length records I want to read in random order. That file is being changed in real-time by another process, and my process want to see the changes to the file. What I'm seeing is that, once I've opened the file and read a record, all subsequent seeks to and reads of that same record will return the same data as the first read of the record, so long as I don't close and reopen the file. This indicates some sort of buffering and caching is going on. Consider the following: $ echo "hi" >foo.txt # Create my test file $ python2.5 # Run Python Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07) [GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> f = open('foo.txt') # Open my test file >>> f.seek(0) # Seek to the beginning of the file >>> f.readline() # Read the line, I get the data I expected 'hi\n' >>> # At this point, in another shell I execute 'echo "bye" >foo.txt'. >>> 'foo.txt' now has been changed >>> # on the disk, and now contains 'bye\n'. >>> f.seek(0) # Seek to the beginning of the still-open file >>> f.readline() # Read the line, I don't get 'bye\n', I get the >>> original data, which is no longer there. 'hi\n' >>> f.close() # Now I close the file... >>> f = open('foo.txt') # ... and reopen it >>> f.seek(0) # Seek to the beginning of the file >>> f.readline() # Read the line, I get the expected 'bye\n' 'bye\n' >>> It seems pretty clear to me that this is wrong. If there is any caching going on, it should clearly be discarded if I do a seek. Note that it's not just readline() that's returning me the wrong, cached data, as I've also tried this with read(), and I get the same results. It's not acceptable that I have to close and reopen the file before every read when I'm doing random record access. So, is this a bug, or am I being stupid? -- http://mail.python.org/mailman/listinfo/python-list