Bugs item #1394612, was opened at 2005-12-31 19:06 Message generated for change (Comment added) made by tim_one You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1394612&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Not a Bug Status: Closed Resolution: Invalid Priority: 5 Submitted By: Cory Dodt (corydodt) Assigned to: Nobody/Anonymous (nobody) Summary: 'Plus' filemode exposes uninitialized memory on win32 Initial Comment: (Note: I'm using cygwin zsh, hence the prompts. I am using standard, python.org Python for these tests.) % echo abcdef > foo % python Python 2.3.5 (#62, Feb 8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> f = file('foo','r+b') >>> f.write('ghi') >>> f.read() '\x00x\x01\x83\x00\xe8\x03\x00\x00\xff\xff\xff\xff\x00\x00\x00\x00e\x00\x00i\x01 \x00d\x00\x00\x83\x01\x00Fd\x01\x00S\x00S\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0 0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0 0\x00\x00\x00[...lots and lots and lots of uninitialized memory deleted...]\x00\x00\x00\x00\x00\ x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x00\x00\x00abcdef\n' >>> f.close() >>> ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2006-01-02 19:55 Message: Logged In: YES user_id=31435 "make sense?" So far as it goes, yes -- thanks. At a higher level ;-), I've been slinging Python for 15 years and have never had any trouble with this stuff because I never push on the end cases (I always seek when switching between reading and writing, even if only by using the peculiar- looking f.seek(f.tell()), and regardless of the platform du jour). Pushing beyond that doesn't interest me. Note that Python's file_read() (in fileobject.c) is already much more complicated than simply calling C's fread(). Because of this, it may be that Python is adding strange end case behavior beyond what the platform C would exhibit if the latter were used directly. ---------------------------------------------------------------------- Comment By: Paul G (paul_g) Date: 2006-01-02 19:22 Message: Logged In: YES user_id=1417712 i'll comment about the rest later, but re not understanding: here is what ansi says: "If the file has been opened for read/write, a read may not be followed by a write. Or vice versa, without first calling either fflush(), fseek(), fsetpos(), or rewind(). Unless EOF was the last character read by fread(). " note the last sentence. python docs say that f.read() with no size parameter will read all data until it hits EOF. this means that any ansi c compliant implementation should perform synchronization when you fread() an EOF. glibc fopen(3) man page states that it follows this; it does not. msvscrt docs do not state that it follows this; it does not. so glibc promises ansi c compliance, but does not deliver. msvscrt does not promise ansi c compliance and doesn't deliver either, but at least it behaves as advertised in this respect. make sense? -p ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-01-02 19:08 Message: Logged In: YES user_id=31435 paul_g's option #1 is the only that takes no work, so is the only one I'm volunteering for <0.5 wink>. Docs would be good. #4 is in general impossible short of Python implementing its own I/O -- "the last" operation done on a C stream isn't necessarily visible to the Python implementation (extensions can and do perform their own I/O on C streams directly via platform C stdio calls -- Python has no way to know about that now even in theory). BTW, I don't understand: """1. in the f.read()+f.write()+f.read() case, the f.write() generates an IOError. this deviates from ansi c, but is in line with msdn docs.""" All behavior in that case is explicitly not defined by ANSI C if there isn't a file-positioning operation too between the read() and write(), and again between the write() and read(). Raising an exception is fine by ANSI C in that case. So is a segfault. So is reading nothing, or reading a terabtye, or wiping the disk clean, etc: nothing is defined about it. ---------------------------------------------------------------------- Comment By: Paul G (paul_g) Date: 2006-01-02 16:32 Message: Logged In: YES user_id=1417712 i think there's a bit of confusion here as to what exactly the problem is. ansi c says that for files fopen()ed for reading and writing (ie r+, w+ etc), you must issue an fflush(), fseek(), fsetpos(), or rewind() between a read and a write. the exception to this is if the read last read EOF. the behaviour we are seeing using python file objects: with glibc: 1. read + write + read result in no data being returned by the last read. this is the case regardless of whether we do f.readlines()+f.writelines()+f.readlines() or f.read()+f.write()+f.read(). this does not comnform to expected behaviour (as per ansi c and glibc fopen(3)), because at least in the latter (read() with no size parameter) case, python docs promise to stop at EOF, triggering the exception ansi c/glibc make to the intervening synchronization with file positioning requirement. with msvscrt: 1. in the f.read()+f.write()+f.read() case, the f.write() generates an IOError. this deviates from ansi c, but is in line with msdn docs. 2. in the f.readlines()+f.writelines()+f.readlines() case, you see the type of results quoted in the bug submission. this deviates from ansi c if you expect readlines() to read EOF, but is still in line with msdn docs. there are 3 issues here: 1. if we give users a high level interface for file i/o, as we do by giving them a File object, should we expect them to research, be aware of and deal with the requirements imposed by the low level implementation used? if it is reasonable to require that when they use read() and write(), is it still reasonable to require it when they user readlines() and writelines()? 2. if we expect users to be aware of ansi c requirements for file stream usage and deal with them, is it reasonable to expect them to deal with the differences in libc implementations, including the differing requirements they impose and differing failure modes being seen? should we not attempt to present an ansi c compliant interface to them, performing workarounds as is necessary on a given platform (or libc make as is the case here)? we certainly do that in some cases (but not in this one) based on my brief reading of Objects/fileobject.c. 3. if we leave users to deal with this mess, should we not at least document this in some fashion? whether it be a detailed explanation or just a pointer to look at the appropriate docs, or even just a mention that they should be reading up on fopen(), since that is the underlying implemention behind file objects. is it reasonable to expect folks for whom python is their first language, as some folks seem to promote python, to figure all of this out when they haven't the foggiest about ansi c? to recap, the real issue, imo, seems to be that we shouldn't be exposing users to this, rather than the funky results of not doing this right. there are 4 options for dealing with this: 1. do nothing (what tim currently fabours, it appears) 2. document this to some extent 3. make this work the same across all libcs 4. perform the syncrhonization (fflush, fsetpos etc depending on libc) for the user, behind the scenes, if we see a write coming in and the previous op was a read. the latter option, from the perspective of "this is exactly what a high level interface should do for the user", makes the most sense to me. but then, maybe that's why i'm not a python core dev ;) cheers, -p ---------------------------------------------------------------------- Comment By: Cory Dodt (corydodt) Date: 2006-01-01 19:43 Message: Logged In: YES user_id=889183 Tim - at a minimum this should be documented; even if it's just a link to the ANSI C documentation. Python is not ANSI C, we shouldn't expect the Python user to seek out ANSI C documentation. Want me to open a separate doc bug? The current doc only says this (about the file builtin): """ Modes 'r+', 'w+' and 'a+' open the file for updating (note that 'w+' truncates the file). Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files (else it is ignored). If the file cannot be opened, IOError is raised. """ Either here, or perhaps in section 2.3.9, a clear description should be given of how to properly operate a + mode file. Failing that, a pointer to ANSI C documentation so the user can read about it on its own (and so the user knows that this behavior conforms to the underlying platform API in every ugly detail). I'm also dubious that this exposed memory is innocuous, but I'll defer to your expertise on that one. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-01-01 19:24 Message: Logged In: YES user_id=31435 There's nothing Python can do about this short of implementing I/O itself. That isn't likely. If someone is truly bothered by this behavior on Windows (or any other platform), they need to convince Microsoft (or other relevant C vendors) to change _their_ stdio implementation -- Python inherits the platform C's behavior, quirks and all. I'll note that I'm not bothered by it. It's one of those "doctor, doctor, it hurts when I do this!" kinds of pseudo-problems, IMO: so don't do that. It's not like hostile user input can "trick" a Python application into doing I/O operations in an undefined order. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2006-01-01 17:08 Message: Logged In: YES user_id=366566 I think Cory was aware of the underlying requirement to interpose a seek operation between the write and read operations, but was concerned about the consequences of not doing so. Python normally protects one from doing things that are *too* dangerous: I guess it's unclear to him (and perhaps others) whether the current behavior of file is just (relatively) obscure or if it could lead to real problems (exposing sensitive data, crashing the process). It seems like the latter is somewhat unlikely in practice, but since the behavior is unspecified, it seems like it *could* happen. I guess since Tim closed this, he thinks it's not too dangerous. In this case, the documentation could probably stand to be improved somewhat. The section (<http://python.org/doc/lib/built-in-funcs.html#built-in-funcs>) that documents the various modes which can be used to open a file could updated to include a warning along the lines of that in the C standard. It should probably explicitly state which file methods can be used to satisfy this requirement, since it's not clear otherwise except by reading the implementation of the file type (one could guess, from <http://python.org/doc/lib/bltin-file-objects.html#bltin-file-objects> that file.flush() and file.seek() are suitable, but the documentation for these only says "like stdio ...", so you can't be completely sure). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-01-01 01:06 Message: Logged In: YES user_id=31435 This is actually pilot error (not a bug!), although it's subtle: Python uses the platform C I/O implementation, and in standard C mixing reads with writes yields undefined behavior unless a file-positioning operation (typically a seek()) occurs between switching from reading to writing (or vice versa); here from the C standard: When a file is opened with update mode (â+â as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. In other words, the result of running your sample code is undefined: nothing is guaranteed about its behavior, which both can and does vary across platforms. If you want defined behavior, then, for example, add >>> f.seek(0) between your write() and read() calls. ---------------------------------------------------------------------- Comment By: Clinton Roy (clintonroy) Date: 2006-01-01 00:38 Message: Logged In: YES user_id=31446 Hi Cory, I don't think r+ mode will create the file if it doesn't exist, so at a guess I think what you're seeing is the actual contents of a file named foo that are on the disk, not junk. If you delete the file foo and run your test again, you should get an error to that effect. cheers, ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1394612&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com