Bugs item #1701389, was opened at 2007-04-16 18:05
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1701389&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Iceberg Luo (iceberg4ever)
Assigned to: M.-A. Lemburg (lemburg)
Summary: utf-16 codec problems with multiple file append

Initial Comment:
This bug is similar but not exactly the same as bug215974.  
(http://sourceforge.net/tracker/?group_id=5470&atid=105470&aid=215974&func=detail)

In my test, even multiple write() within an open()~close() lifespan will not 
cause the multi BOM phenomena mentioned in bug215974. Maybe it is because bug 
215974 was somehow fixed during the past 7 years, although Lemburg classified 
it as WontFix. 

However, if a file is appended for more than once, by an 
"codecs.open('file.txt', 'a', 'utf16')", the multi BOM appears.

At the same time, the saying of "(Extra unnecessary) BOM marks are removed from 
the input stream by the Python UTF-16 codec" in bug215974 is not true even in 
today, on Python2.4.4 and Python2.5.1c1 on Windows XP.

Iceberg
------------------

PS: Did not find the "File Upload" checkbox mentioned in this web page, so I 
think I'd better paste the code right here...

import codecs, os

filename = "test.utf-16"
if os.path.exists(filename): os.unlink(filename)  # reset

def myOpen():
  return codecs.open(filename, "a", 'UTF-16')
def readThemBack():
  return list( codecs.open(filename, "r", 'UTF-16') )
def clumsyPatch(raw): # you can read it after your first run of this program
  for line in raw:
    if line[0] in (u'\ufffe', u'\ufeff'): # get rid of the BOMs
      yield line[1:]
    else:
      yield line

fout = myOpen()
fout.write(u"ab\n") # to simplify the problem, I only use ASCII chars here
fout.write(u"cd\n")
fout.close()
print readThemBack()
assert readThemBack() == [ u'ab\n', u'cd\n' ]
assert os.stat(filename).st_size == 14  # Only one BOM in the file

fout = myOpen()
fout.write(u"ef\n")
fout.write(u"gh\n")
fout.close()
print readThemBack()
#print list( clumsyPatch( readThemBack() ) )  # later you can enable this fix
assert readThemBack() == [ u'ab\n', u'cd\n', u'ef\n', u'gh\n' ] # fails here
assert os.stat(filename).st_size == 26  # not to mention here: multi BOM appears


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1701389&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to