I am getting the error: UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid start byte
as I try to read some files through TaggedCorpusReader. TaggedCorpusReader is a module of NLTK. My files are saved in ANSI format in MS-Windows default. I am using Python2.7 on MS-Windows 7. I have tried the following options till now, string.encode('utf-8').strip() unicode(string) unicode(str, errors='replace') unicode(str, errors='ignore') string.decode('cp1252') But nothing is of much help. If any one may kindly suggest. I am trying if you may see. -- https://mail.python.org/mailman/listinfo/python-list