Re: UnicodeDecodeError issue

Ferrous Cranus Wed, 04 Sep 2013 04:42:31 -0700

Στις 4/9/2013 2:26 μμ, ο/η Dave Angel έγραψε:

On 4/9/2013 04:35, Ferrous Cranus wrote:

Τη Δευτέρα, 2 Σεπτεμβρίου 2013 9:28:36 μ.μ. UTC+3, ο χρήστης Dave Angel έγραψε:

On 2/9/2013 11:05, Ferrous Cranus wrote:

Στις 2/9/2013 3:21 μμ, ο/η Dave Angel έγραψε:

Starting with the byte string in the error message:

f = open("junk.txt", "w")

f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 
\xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')

f.close()

Ιndeed but yet again, file checks out the encoding of the filename that

consists of these lines above, not of the actual strings.




'file' does nothing interesting with the filename, it just opens it and

examines the contents.  For example,



file www/cgi-bin/files.py



will examine the Python source file, not run it.



So first in the interpreter, I ran

f = open("junk.txt", "w")

f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 
\xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')

f.close()




then at the bash prompt, I ran:



davea@think2:~$ file junk.txt

junk.txt: ISO-8859 text



That is one Clever Idea Dave.

I take it that the charset of the file 'junk.txt' gets identified by the 
characters encoding that read form within the file?


'file' only guesses the most likely encoding for 'junk.txt'  But at
least it can know it's not utf-8, since that would give an decoding
error.

That's why, whenever 'file' makes its verdict, it's up to you to check
it by displaying the data after decoding it with that tentative
encoding.


But wait a minute: What editor do you uses to write these 3 lines?
I mean am a bit confused.


As I said right above, "in the interpreter, I ran"...
And if that's not clear enough, you can see the >>>> prompts that the
Python interpreter uses.  By interpeter, I mean I ran Python with no
parameters.  I did not run IDLE or any other IDE, that might take it
upon itself to interfere.


i for example i 'nano tets.py' which has within:

f = open("junk.txt", "w")
f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 
\xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
f.close()

then when i save the file within nano for example by default in utf-8 charset


That's the encoding for the file tets.py, and you'll notice that it's
actually ASCII.  Notice that the string I copied from the error message
uses escape sequences for all non-ASCII bytes.


how would it be able to detect the bytestring within that is supposed to be of 
greek-iso's


I wouldn't be running 'file' on the tets.py file, but on the junk.txt
file created when you run
     python tets.py

So since the tets.py file was a sidetrack, I just ran those three lines
in the interpreter.

I'm still consused about this.

say we save those 3 lines inside junk.txt and we save it by default as utf-8

when we 'file junk.txt'

what will file respond with?

filename's charset?

or

will it llook at the bystering within to decide what encoding it uses?

fi

--
Webhost <http://superhost.gr>
--
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError issue

Reply via email to