Thanks David, it solved my problem immediately. I will follow your advise from next time but honestly I am new to python with not much knowledge about text formats. And the main portion of my project was not to deal with these, so I just wanted to get this solved as I was already struck at this for 2 days. If you think I am wrong in my approach to getting problems solved, please let me know. Your advise would be helpful in future for me.
--Thanks Again, Akhil Scott David Daniels wrote: > > akhil1988 wrote: > <mis-ordered reply, bits shown below>> >> Nobody-38 wrote: >>> On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote: > ... >>>>> In Python 3 you can't decode strings because they are Unicode strings >>>>> and it doesn't make sense to decode a Unicode string. You can only >>>>> decode encoded things which are byte strings. So you are mixing up >>>>> byte >>>>> strings and Unicode strings. >>>> ... I read a byte string from sys.stdin which needs to converted to >>>> unicode >>>> string for further processing. >>> In 3.x, sys.stdin (stdout, stderr) are text streams, which means that >>> they >>> read and write Unicode strings, not byte strings. >>> >>>> I cannot just remove the decode statement and proceed? >>>> This is it what it looks like: >>>> for line in sys.stdin: >>>> line = line.decode('utf-8').strip() >>>> if line == '<page>': #do something here >>>> .... >>>> If I remove the decode statement, line == '<page>' never gets true. >>> Did you inadvertently remove the strip() as well? >> ... unintentionally I removed strip().... >> I get this error now: >> File "./temp.py", line 488, in <module> >> main() >> File "./temp.py", line 475, in main >> for line in sys.stdin: >> File "/usr/local/lib/python3.1/codecs.py", line 300, in decode >> (result, consumed) = self._buffer_decode(data, self.errors, final) >> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: >> invalid >> data > > (1) Do not top post. > (2) Try to fully understand the problem and proposed solution, rather > than trying to get people to tell you just enough to get your code > going. > (3) The only way sys.stdin can possibly return unicode is to do some > decoding of its own. your job is to make sure it uses the correct > decoding. So, if you know your source is always utf-8, try > something like: > > import sys > import io > > sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8') > > for line in sys.stdin: > line = line.strip() > if line == '<page>': > #do something here > .... > > --Scott David Daniels > scott.dani...@acm.org > -- > http://mail.python.org/mailman/listinfo/python-list > > -- View this message in context: http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24550540.html Sent from the Python - python-list mailing list archive at Nabble.com. -- http://mail.python.org/mailman/listinfo/python-list