Re: encoding name mappings in codecs.py with email/charset.py

2014-12-15 Thread Stefanos Karasavvidis
I played around with changing the names in the aliases.py and locale.py
files (from iso8859 to iso-88559), but this broke mailman.

I ended up changing the charset.py file

   input_charset = codecs.lookup(input_charset).name
except LookupError:
pass
if (input_charset == 'iso8859-7'):
input_charset = 'iso-8859-15
if (input_charset == 'iso8859-15'):
input_charset = 'iso-8859-7'
if (input_charset == 'iso8859-1'):
input_charset = 'iso-8859-1'


This seems to work for now.

I really wonder why I'm the only one with this problem. This should affect
all Mailman users with member on MS Exchange 2010 (at least) servers.
Exchange produces
CAT.InvalidContent.Exception: InvalidCharsetException, Character set name
(iso8859-7) is invalid or not installed.; cannot handle content of message
with...

Thanks gst for the input

sk

On Sun, Dec 14, 2014 at 9:53 PM, gst g.sta...@gmail.com wrote:

 Le dimanche 14 décembre 2014 14:10:22 UTC-5, Stefanos Karasavvidis a
 écrit :
  thanks for replying gst.
 
  I've thought already of patching the Charset class, but hoped for a
 cleaner solution.
 
 
  This ALIASES dict has already all the iso names *with* a dash. So it
 must get striped somewhere else.


 not on my side, modifying this dict with the missing key-value apparently
 does what you want also :

 Python 2.7.6 (default, Mar 22 2014, 22:59:56)
 [GCC 4.8.2] on linux2
 Type copyright, credits or license() for more information.
 
  import email.charset
  email.charset.ALIASES
 {'latin-8': 'iso-8859-14', 'latin-9': 'iso-8859-15', 'latin-2':
 'iso-8859-2', 'latin-3': 'iso-8859-3', 'latin-1': 'iso-8859-1', 'latin-6':
 'iso-8859-10', 'latin-7': 'iso-8859-13', 'latin-4': 'iso-8859-4',
 'latin-5': 'iso-8859-9', 'euc_jp': 'euc-jp', 'latin-10': 'iso-8859-16',
 'ascii': 'us-ascii', 'latin_10': 'iso-8859-16', 'latin_1': 'iso-8859-1',
 'latin_2': 'iso-8859-2', 'latin_3': 'iso-8859-3', 'latin_4': 'iso-8859-4',
 'latin_5': 'iso-8859-9', 'latin_6': 'iso-8859-10', 'latin_7':
 'iso-8859-13', 'latin_8': 'iso-8859-14', 'latin_9': 'iso-8859-15', 'cp949':
 'ks_c_5601-1987', 'euc_kr': 'euc-kr'}
 
  for i in range(1, 16):
 c = 'iso-8859-' + str(i)
 email.charset.ALIASES[c] = c


 
  iso7 = email.charset.Charset('iso-8859-7')
  iso7
 iso-8859-7
  str(iso7)
 'iso-8859-7'
 

 regards,

 gst.

 
  sk
 
 
 
  On Sun, Dec 14, 2014 at 7:21 PM, gst g.st...@gmail.com wrote:
  Le vendredi 12 décembre 2014 04:21:14 UTC-5, Stefanos Karasavvidis a
 écrit :
 
   I've hit a wall with mailman which seems to be caused by pyhon's
 character encoding names.
 
  
 
   I've narrowed the problem down to the email/charset.py file. Basically
 the following happens:
 
  
 
 
 
  Hi,
 
 
 
  it's all in the email.charset.ALIASES dict.
 
 
 
  you could also simply patch the __str__ method of Charset :
 
 
 
  Python 2.7.6 (default, Mar 22 2014, 22:59:56)
 
  [GCC 4.8.2] on linux2
 
  Type copyright, credits or license() for more information.
 
  
 
   import email.charset
 
  
 
   c = email.charset.Charset('iso-8859-7')
 
   str(c)
 
  'iso8859-7'
 
  
 
   old = email.charset.Charset.__str__
 
  
 
   def patched(self):
 
  r = old(self)
 
  if r.startswith('iso'):
 
  return 'iso-' + r[3:]
 
  return r
 
 
 
  
 
   email.charset.Charset.__str__ = patched
 
  
 
   str(c)
 
  'iso-8859-7'
 
  
 
 
 
 
 
  regards,
 
 
 
  gst.
 
  --
 
  https://mail.python.org/mailman/listinfo/python-list
 
 
 
 
  --
 
 
  ==
  Stefanos Karasavvidis,  Electronic  Computer Engineer, M.Sc.
  e-mail: s...@isc.tuc.gr, Tel.: (+30) 2821037508, Fax: (+30) 2821037520
  Technical University of Crete, Campus, Building A1
 --
 https://mail.python.org/mailman/listinfo/python-list



-- 
==
Stefanos Karasavvidis,  Electronic  Computer Engineer, M.Sc.
s...@isc.tuc.gre-mail: s...@isc.tuc.gr, Tel.: (+30) 2821037508, Fax: (+30)
2821037520
Technical University of Crete, Campus, Building A1
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: encoding name mappings in codecs.py with email/charset.py

2014-12-14 Thread Stefanos Karasavvidis
thanks for replying gst.

I've thought already of patching the Charset class, but hoped for a cleaner
solution.

This ALIASES dict has already all the iso names *with* a dash. So it must
get striped somewhere else.

sk

On Sun, Dec 14, 2014 at 7:21 PM, gst g.sta...@gmail.com wrote:

 Le vendredi 12 décembre 2014 04:21:14 UTC-5, Stefanos Karasavvidis a
 écrit :
  I've hit a wall with mailman which seems to be caused by pyhon's
 character encoding names.
 
  I've narrowed the problem down to the email/charset.py file. Basically
 the following happens:
 

 Hi,

 it's all in the email.charset.ALIASES dict.

 you could also simply patch the __str__ method of Charset :

 Python 2.7.6 (default, Mar 22 2014, 22:59:56)
 [GCC 4.8.2] on linux2
 Type copyright, credits or license() for more information.
 
  import email.charset
 
  c = email.charset.Charset('iso-8859-7')
  str(c)
 'iso8859-7'
 
  old = email.charset.Charset.__str__
 
  def patched(self):
 r = old(self)
 if r.startswith('iso'):
 return 'iso-' + r[3:]
 return r

 
  email.charset.Charset.__str__ = patched
 
  str(c)
 'iso-8859-7'
 


 regards,

 gst.
 --
 https://mail.python.org/mailman/listinfo/python-list




-- 
==
Stefanos Karasavvidis,  Electronic  Computer Engineer, M.Sc.
s...@isc.tuc.gre-mail: s...@isc.tuc.gr, Tel.: (+30) 2821037508, Fax: (+30)
2821037520
Technical University of Crete, Campus, Building A1
-- 
https://mail.python.org/mailman/listinfo/python-list


encoding name mappings in codecs.py with email/charset.py

2014-12-12 Thread Stefanos Karasavvidis
I've hit a wall with mailman which seems to be caused by pyhon's character
encoding names.

I've narrowed the problem down to the email/charset.py file. Basically the
following happens:

given an encoding name as 'iso-8859-X' it is transformed to 'iso8859-X'
(without the first dash). This happens with python 2.7, but not with python
3.4. Now Microsoft Exchange doesn't like the form without the dash, and
bounces the emails from Mailman. And Mailman doesn't work with python 3.x.

This transformation is done in charset.py with the following line
input_charset = codecs.lookup(input_charset).name

The following code example demonstrates the issue
   from email.charset import Charset
   charset = Charset('iso-8859-7')
   print(str(charset))

In python 2.7, iso8859-7 is printed. In python 3.4 iso-8859-7 is printed.

I tried to find the location of these mappings in the codecs.py file, but
it seems that it uses some internal mapping I couldn't find. And I'm not
100% sure that this is not OS related.

So the question basically is if there is a way to change the name mappings
this codecs file does.

My environment is Ubuntu 14.04
python2.7 --version
Python 2.7.6

python3.4 --version
Python 3.4.0

-- 
==
Stefanos Karasavvidis,  Electronic  Computer Engineer, M.Sc.
s...@isc.tuc.gre-mail: s...@isc.tuc.gr, Tel.: (+30) 2821037508, Fax: (+30)
2821037520
Technical University of Crete, Campus, Building A1
-- 
https://mail.python.org/mailman/listinfo/python-list