[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2010-07-28 Thread Florent Xicluna

Changes by Florent Xicluna :


--
nosy: +flox

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2010-07-27 Thread STINNER Victor

STINNER Victor  added the comment:

I fixed #6213 in 2.6 and 2.7, and so it's now possible to backport this fix to 
2.6 => r83200

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2010-06-30 Thread Conrad.Irwin

Conrad.Irwin  added the comment:

Shouldn't this fix be back-ported to the 2.6 branch too?

--
nosy: +Conrad.Irwin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-05-14 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Ok, committed in r72635.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-05-14 Thread Benjamin Peterson

Benjamin Peterson  added the comment:

As I said on IRC a few days ago, I think the patch is ready to go.

--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-05-14 Thread Ezio Melotti

Changes by Ezio Melotti :


--
nosy: +ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-05-14 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

If no-one objects, I'm going to commit this in a couple of days.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-05-10 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Updated patch against py3k.

--
Added file: http://bugs.python.org/file13953/append_bom-4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-05-01 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-29 Thread STINNER Victor

STINNER Victor  added the comment:

seek() has also the problem? It's really hard to encode UTF-16/32
correctly...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-28 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Here is a new patch catering to more cases (seek()) in addition to just
opening in append mode.

--
Added file: http://bugs.python.org/file13445/append_bom-3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-28 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

It's better now, although I think it's not good to duplicate the
encoding switch logic. It would be better to have a separate flag
indicate whether it's the start of stream or not. I'm gonna produce a
new patch, unless you beat me to it.

Also, I'm adding Amaury to the nosy list so that he tells us whether he
thinks the approach is sound.

--
nosy: +amaury.forgeotdarc

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-26 Thread STINNER Victor

STINNER Victor  added the comment:

Hum, it's a detail, but is it a good idea to keep the reference the int(0)? Or 
would it be better to release it at exit?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-26 Thread STINNER Victor

Changes by STINNER Victor :


Removed file: http://bugs.python.org/file13377/append_bom.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-26 Thread STINNER Victor

STINNER Victor  added the comment:

Faster patch!
 - add fast encoder for UTF-32, UTF-32-LE and UTF-32-BE (copy/paste of 
utf16 functions)
 - move utf-8 before utf-16-* because utf8 more popular than utf16 :-p
 - don't set self->encodefunc=NULL (loose all the encoder 
optimisation), but only fix self->encodefunc for two special cases: 
utf16=>utf16le or utf16be and utf32=>utf32le or utf32be
 - remove self->ok: it was may be usefull for an older version of my 
patch, but it's not more needed

--
Added file: http://bugs.python.org/file13421/append_bom-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-26 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Le jeudi 26 mars 2009 à 17:26 +, STINNER Victor a écrit :
> STINNER Victor  added the comment:
> 
> > - I'm not sure why you make self->ok handling more complicated than it was
> 
> tell() requires self->ok=1. I choosed to reset self->ok to zero on error, but 
> it's maybe useless.

You are calling tell() on the underlying binary buffer, not on the
TextIOWrapper object. So self->ok should have no impact. Am I missing
something?

> self>encodefunc value have to be changed because .setstate() changes 
> the encoding function: UTF-16 => UTF-16-LE or UTF-16-BE.

I know, but then the logic should probably be changed and use an
additional struct member to signal that the start of file has been
skipped.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-26 Thread STINNER Victor

STINNER Victor  added the comment:

> - I'm not sure why you make self->ok handling more complicated than it was

tell() requires self->ok=1. I choosed to reset self->ok to zero on error, but 
it's maybe useless.

> - encodefunc should not be forced to NULL, otherwise it will yield a big
> decrease in write() performance

self>encodefunc value have to be changed because .setstate() changes 
the encoding function: UTF-16 => UTF-16-LE or UTF-16-BE.

I don't know how to get the new value of self>encodefunc. I will try to write 
another patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-26 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

As for the C version of the patch:
- I'm not sure why you make self->ok handling more complicated than it was
- encodefunc should not be forced to NULL, otherwise it will yield a big
decrease in write() performance

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-19 Thread STINNER Victor

STINNER Victor  added the comment:

@pitrou: You're right, but the state have to be changed for the 
encoder, not the decoder. I added the following code to TextIOWrapper 
constructor (for the C and the Python version of io library):

if self._seekable and self.writable():
position = self.buffer.tell()
if position != 0:
self._encoder = self._get_encoder()
self._encoder.setstate(0)

--
keywords: +patch
Added file: http://bugs.python.org/file13377/append_bom.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-03-13 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

One possible solution would be to call tell() in the TextIOWrapper
constructor, and then decoder.setstate((b"", 0)) if the current pos is >0.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-01-20 Thread STINNER Victor

STINNER Victor  added the comment:

See also issues #5008 (f.tell()) and #5016 (f.seekable()), not 
directly related to this issue.

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-01-19 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-01-19 Thread STINNER Victor

STINNER Victor  added the comment:

Bug is reproductible with:
 * Python 2.5 : charset utf-8-sig and utf-16 for codecs.open()
 * trunk : charset utf-8-sig, utf-16 and utf-32 for codecs.open()
 * py3k : charset utf-8-sig, utf-16 and utf-32 for open()

With utf-7 or utf-8, no BOM is written.

Note: with UTF-32, the BOM is 4 bytes long (0xff 0xfe 0x00 0x00 on 
little endian) but it's still the character (BOM) \ufeff (little 
endian).

--
versions: +Python 2.6, Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5006] Duplicate UTF-16 BOM if a file is open in append mode

2009-01-19 Thread STINNER Victor

New submission from STINNER Victor :

Copy/paste of message79330 from the issue #4862:
--
>>> f = open('utf16.txt', 'w', encoding='utf-16')
>>> f.write('abc')
3
>>> f.close()

>>> f = open('utf16.txt', 'a', encoding='utf-16')
>>> f.write('def')
3
>>> f.close()
>>> open('utf16.txt', 'r', encoding='utf-16').read()
'abc\ufeffdef'
--

--
messages: 80221
nosy: haypo
severity: normal
status: open
title: Duplicate UTF-16 BOM if a file is open in append mode
versions: Python 3.0, Python 3.1

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com