Τη Σάββατο, 13 Απριλίου 2013 4:41:57 π.μ. UTC+3, ο χρήστης Cameron Simpson
έγραψε:
> On 11Apr2013 09:55, Nikos <[email protected]> wrote:
>
> | Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson
> έγραψε:
>
> | > On 10Apr2013 21:50, [email protected] <[email protected]>
> wrote:
>
> | > | the doctype is coming form the attempt of script metrites.py to open
> and read the 'index.html' file.
>
> | > | But i don't know how to try to open it as a byte file instead of an
> tetxt file.
>
>
>
> Lele Gaifax showed one way:
>
>
>
> from codecs import open
>
> with open('index.html', encoding='utf-8') as f:
>
> content = f.read()
>
>
>
> But a plain open() should also do:
>
>
>
> with open('index.html') as f:
>
> content = f.read()
>
>
>
> if you're not taking tight control of the file encoding.
>
>
>
> The point here is to get _text_ (i.e. str) data from the file, not bytes.
>
>
>
> If the text turns out to be incorrectly decoded (i.e. incorrectly
>
> reading the file bytes and assembling them into text strings) because
>
> the default encoding is wrong, then you may need to read for Lele's
>
> more verbose open() example to select the correct encoding.
>
>
>
> But first ignore that and get text (str) instead of bytes.
>
> If you're already getting text from the file, something later is
>
> making bytes and handing it to print().
>
>
>
> Another approach to try is to use
>
> sys.stdout.write()
>
> instead of
>
> print()
>
>
>
> The print() function will take _anything_ and write text of some form.
>
> The write() function will throw an exception if it gets the wrong type of
> data.
>
>
>
> If sys.stdout is opened in binary mode then write() will require
>
> bytes as data; strings will need to be explicitly turned into bytes
>
> via .encode() in order to not raise an exception.
>
>
>
> If sys.stdout is open in text mode, write() will require str data.
>
> The sys.stdout file itself will transcribe to bytes for you.
>
>
>
> If you take that route, at least you will not have confusion about
>
> str versus bytes.
>
>
>
> For an HTML output page I would advocate arranging that sys.stdout
>
> is in text mode; that way you can do the natural thing and .write()
>
> str data and lovely UTF-8 bytes will come out the other end.
>
>
>
> If the above test (using .write() instead of print()) shows it to
>
> be in binary mode we can fix that. But you need to find out.
>
>
>
> You will want access to the error messages from the CGI environment;
>
> do you have access to the web servers error_log? You can tail that
>
> in a terminal while you reload the page to see what's going on.
>
>
>
> | This works in the shell, but doesn't work on my website:
>
> |
>
> | $ cat utf8.txt
>
> | υλικό!Πρόκειται γ
>
>
>
> Ok, so your terminal is using UTF-8 as its output coding. (And so
>
> is your mail posting program, since we see it unmangled on my screen
>
> here.)
>
>
>
> | $ python3
>
> | Python 3.2.3 (default, Oct 19 2012, 20:10:41)
>
> | [GCC 4.6.3] on linux2
>
> | Type "help", "copyright", "credits" or "license" for more information.
>
> | >>> data = open('utf8.txt').read()
>
> | >>> print(data)
>
> | υλικό!Πρόκειται γ
>
>
>
> Likewise.
>
>
>
> However, in an exciting twist, I seem to recall that Python invoked
>
> interactively with aterminal as output will have the default terminal
>
> encoding in place on sys.stdout. Producing what you expect. _However_,
>
> python invoked in a batch environment where stdout is not a terminal
>
> (such as in the CGI environment producing your web page), that is
>
> _not_ necessarily the case.
>
>
>
> | >>> print(data.encode('utf-8'))
>
> |
> b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9
> \xce\xb3\n'
>
> |
>
> | See, the last line is what i'am getting on my website.
>
>
>
> The above line takes your Unicode text in "data" and transcribed
>
> it to bytes using UTF-8 as the encoding. And print() is then receiving
>
> that bytes object and printing its str() representation as "b'....'".
>
> That str is itself unicode, and when print passes it to sys.stdout,
>
> _that_ transcribed the unicode "b'...'" string as bytes to your
>
> terminal. Using UTF-8 based on the previous examples above, but
>
> since all those characters are in the bottom 127 code range the
>
> byte sequence will be the same if it uses ASCII or ISO8859-1 or
>
> almost anything else:-)
>
>
>
> As you can see, there's a lot of encoding/decoding going on behind
>
> the scenes even in this superficially simple example.
>
>
>
> | If i remove
>
> | the encode('utf-8') part in metrites.py, the webpage will not show
>
> | anything at all...
>
>
>
> Ah, but data will be being output. The print() function _will_ be
>
> writing "data" out in some form. I suggest you remove the .encode()
>
> and then examine the _source_ text of the web page, not its visible
>
> form.
>
>
>
> So: remove .encode(), reload the web page, "view page source"
>
> (depends on your browser, it is ctrl-U in Firefox ((Cmd-U in firefox
>
> on a Mac))).
>
>
>
> I think a lot of the issue you have in this thread is that your
>
> page is too complex. Make another page to do the same thing, and
>
> start with nothing. Add stuff to it a single item at a time until
>
> the page behaves incorrectly. Then you will know the exact item of
>
> code that introduced the issue. And then that single item can be
>
> examined in detail for the decode/encode issues.
>
>
>
> The other issue in the thread is that people losing patience get
>
> snarky. Respond only to the technical content. If a message is only
>
> snarky, _ignore_ it. People like the last word; let them have it
>
> and you won't get sidetracked into arguments.
>
>
>
> Cheers,
>
> --
>
> Cameron Simpson <[email protected]>
>
>
>
> PCs are like a submarine, it will work fine till you open Windows. - zollie101
First of all thank you very much Cameron for your detailed help and effort to
write to me:
It seems another issue had happened without my knowledge, i was uploading stuff
at /root/public_html/cgi-bin instead of /home/nikos/public_html/cgi-bin.
I realized that when i deliberately made error to metrites.py scropt and i got
still the same page.
Ookey after that is corrected, i then tried the plain solution and i got this
response back form the shell:
Traceback (most recent call last):
File "metrites.py", line 213, in <module>
htmldata = f.read()
File "/root/.local/lib/python2.7/lib/python3.3/encodings/iso8859_7.py", line
23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0xae in position 47:
character maps to <undefined>
then i switched to:
with open('/home/nikos/www/' + page, encoding='utf-8') as f:
htmldata = f.read()
and i got no error at all, just pure run *from the shell*!
But i get internal server error when i try to run the webpage from the
browser(Chrome).
So, can you tell me please where can i find the apache error log so to display
here please?
Apcher error_log is always better than running 'python3 metrites.py' because
even if the python script has no error apache will also display more web
related things?
Thank you Cameron.
--
http://mail.python.org/mailman/listinfo/python-list