This is a follow up to the Subject "right adjusted strings containing umlauts"
For some text manipulation tasks I need a template to split lines from stdin into a list of strings the way shlex.split() does it. The encoding of the input can vary. For further processing in Python I need the list of strings to be in unicode. Here is template.py: ############################################################################################################## #!/usr/bin/env python # vim: set fileencoding=utf-8 : # split lines from stdin into a list of unicode strings # Muk 2013-08-23 # Python 2.7.3 from __future__ import print_function import sys import shlex import chardet bool_cmnt = True # shlex: skip comments bool_posx = True # shlex: posix mode (strings in quotes) for inpt_line in sys.stdin: print( 'inpt_line=' + repr( inpt_line ) ) enco_type = chardet.detect( inpt_line )[ 'encoding' ] # {'encoding': 'EUC-JP', 'confidence': 0.99} print( 'enco_type=' + repr( enco_type ) ) try: strg_inpt = shlex.split( inpt_line, bool_cmnt, bool_posx, ) # shlex does not work on unicode except Exception, errr: # usually 'No closing quotation' print( "error='%s' on inpt_line='%s'" % ( errr, inpt_line.rstrip(), ), file=sys.stderr, ) continue print( 'strg_inpt=' + repr( strg_inpt ) ) # list of strings strg_unic = [ strg.decode( enco_type ) for strg in strg_inpt ] # decode the strings into unicode print( 'strg_unic=' + repr( strg_unic ) ) # list of unicode strings ############################################################################################################## $ cat <some-file> | template.py Comments are welcome. TIA -- Kurt Mueller -- http://mail.python.org/mailman/listinfo/python-list