Bugs item #1701389, was opened at 2007-04-16 12:05 Message generated for change (Comment added) made by doerwalter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1701389&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Iceberg Luo (iceberg4ever) Assigned to: M.-A. Lemburg (lemburg) Summary: utf-16 codec problems with multiple file append Initial Comment: This bug is similar but not exactly the same as bug215974. (http://sourceforge.net/tracker/?group_id=5470&atid=105470&aid=215974&func=detail) In my test, even multiple write() within an open()~close() lifespan will not cause the multi BOM phenomena mentioned in bug215974. Maybe it is because bug 215974 was somehow fixed during the past 7 years, although Lemburg classified it as WontFix. However, if a file is appended for more than once, by an "codecs.open('file.txt', 'a', 'utf16')", the multi BOM appears. At the same time, the saying of "(Extra unnecessary) BOM marks are removed from the input stream by the Python UTF-16 codec" in bug215974 is not true even in today, on Python2.4.4 and Python2.5.1c1 on Windows XP. Iceberg ------------------ PS: Did not find the "File Upload" checkbox mentioned in this web page, so I think I'd better paste the code right here... import codecs, os filename = "test.utf-16" if os.path.exists(filename): os.unlink(filename) # reset def myOpen(): return codecs.open(filename, "a", 'UTF-16') def readThemBack(): return list( codecs.open(filename, "r", 'UTF-16') ) def clumsyPatch(raw): # you can read it after your first run of this program for line in raw: if line[0] in (u'\ufffe', u'\ufeff'): # get rid of the BOMs yield line[1:] else: yield line fout = myOpen() fout.write(u"ab\n") # to simplify the problem, I only use ASCII chars here fout.write(u"cd\n") fout.close() print readThemBack() assert readThemBack() == [ u'ab\n', u'cd\n' ] assert os.stat(filename).st_size == 14 # Only one BOM in the file fout = myOpen() fout.write(u"ef\n") fout.write(u"gh\n") fout.close() print readThemBack() #print list( clumsyPatch( readThemBack() ) ) # later you can enable this fix assert readThemBack() == [ u'ab\n', u'cd\n', u'ef\n', u'gh\n' ] # fails here assert os.stat(filename).st_size == 26 # not to mention here: multi BOM appears ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2007-04-19 12:30 Message: Logged In: YES user_id=89016 Originator: NO append mode is simply not supported for codecs. How would the codec find out the codec state that was active after the last characters where written to the file? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1701389&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com