On Sunday, 10 June 2018 17:29:59 UTC-4, Cameron Simpson wrote: > On 10Jun2018 13:04, bellcanada...@gmail.com <bellcanada...@gmail.com> wrote: > >here is the full error once again > >to summarize, my script works fine in python2 > >i get this error trying to run it in python3 > >plz see below after the error, my settings for python 2 and python 3 > >for me it seems i need to change some settings to 'utf-8'..either just in > >python 3, since thats where i am having issues or change the settings to > >'utf-8' both in python 2 and 3....i would appreciate feedback b4 i do some > >trial and error > >thanks for the consideration > >tommy > > > >*********************************************** > >Traceback (most recent call last): > >File "createIndex.py", line 132, in <module> > >c.createindex() > >File "creatIndex.py", line 102, in createIndex > >pagedict=self.parseCollection() > >File "createIndex.py", line 47, in parseCollection > >for line in self.collFile: > >File > >"C:\Users\Robert\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", > > > >line 23, in decode > >return codecs.charmap_decode(input,self.errors,decoding_table[0] > >UnicodeDecodeError: 'charmap'codec can't decode byte 0x9d in position 7414: > >character maps to <undefined> > > Ok, this is more helpful. It says that the decoding error, which occurred in > ...\cp1252.py, was decoding lines from the file self.collFile. > > What is that file? And how was it opened? > > Also, your settings below may indeed be important. > > >*************************************************** > >python 3 settings > >import sys > > import locale > >locale.getpreferredencoding() > >'cp1252' > > The setting above is the default encoding used when you open a file in text > mode in Python 3, but you can override it. > > In Python 3 this matters a lot, because Python 3 strings are Unicode. In > Python > 2, strings are just bytes, and are not "decoded" (there is a whole separate > "unicode" type for that when it matters). > > So in Python 3 the text file reader is decoding the text in the file > according > to what it expects the encoding to be. > > Find the place where self.collFile is opened. You can specify the decoding > method there by adding the "encoding=" parameter to the open() call. It is > defaulting to "cp1252" because that is what locale.getpreferredencoding() > returns, but presumably the actual file data are not encoded that way. > > You can (a) find out what encoding _is_ used in the file and specify that or > (b) tell Python to be less picky. Choice (a) is better if it is feasible. > > If you have to guess because you don't know the encoding, one possibility is > that collFile contains utf-8 or utf-16; of these 2, utf-8 seems more likely > given the 0x9d byte causing the trouble. Try adding: > > encoding='utf-8' > > to the open() call, eg: > > self.collFile = open('path-to-the-coll-file', encoding='utf-8') > > at the appropriate place. > > If that just produces a different decoding error, you have 2 choices: pick an > encoding where every byte is "valid", such as 'iso8859-1', or to tell the > decode to just cope with th errors by adding the errors="replace" or > "errors="ignore" or errors="namereplace" parameter to the open() call. > > Both these choices have downsides. > > There are several ISO8859 encodings, and they might all be wrong for your > file, > leading to _incorrect_ text lines. > > The errors="..." parameter also has downsides: you will also end up with > missing (errors="ignore") or incorrect (errors="replace" or > errors="namereplace") text, because the decoder has to do something with the > data: drop it or replace it with something wrong. The former loses data while > the latter puts in bad data, but at least it is visible if you inspect the > data > later. > > The full documentation for Python 3's open() call is here: > > https://docs.python.org/3/library/functions.html#open > > where the various encoding= and errors= choices are described. > > Cheers, > Cameron Simpson <c...@cskk.id.au>
thank you for the reply let me try these tips and suggestions and i will update here thanxz alot and thnxz also to all who post ..i appreciate it.. regards tommy -- https://mail.python.org/mailman/listinfo/python-list