Re: right adjusted strings containing umlauts

Dave Angel Thu, 08 Aug 2013 10:53:38 -0700

Kurt Mueller wrote:

> Now I have this small example:
> ----------------------------------------------------------
> #!/usr/bin/env python
> # vim: set fileencoding=utf-8 :
>
> from __future__ import print_function
> import sys, shlex
>
> print( repr( sys.stdin.encoding ) )
>
> strg_form = u'{0:>3} {1:>3} {2:>3} {3:>3} {4:>3}'
> for inpt_line in sys.stdin:
>     proc_line = shlex.split( inpt_line, False, True, )
>     encoding = "utf-8"
>     proc_line = [ strg.decode( encoding ) for strg in proc_line ]
>     print( strg_form.format( *proc_line ) )
> ----------------------------------------------------------
>
> $ echo -e "a b c d e\na ö u 1 2" | file -
> /dev/stdin: UTF-8 Unicode text
> $ echo -e "a b c d e\na ö u 1 2" | ./align_compact.py
> None
>   a   b   c   d   e
>   a   ö   u   1   2
> $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | file -
> /dev/stdin: ISO-8859 text
> $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | ./align_compact.py
> None
>   a   b   c   d   e
> Traceback (most recent call last):
>   File "./align_compact.py", line 13, in <module>
>     proc_line = [ strg.decode( encoding ) for strg in proc_line ]
>   File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
>     return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 0: 
> invalid start byte
> muk@mcp20:/sw/prog/scripts/text_manip>
>
> How do I handle this two inputs?
>


Once you're using pipes, you've given up any hope that the terminal will
report a useful encoding, so I'm not surprised you're getting None for
sys.stdin.encoding()

So you can either do as others have suggested, and guess, or you can get
the information explicitly, say from argv.  In any case you'll need a
different way to assign   encoding = 


-- 
DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: right adjusted strings containing umlauts

Reply via email to