date:20180529

Re: Re: Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Peter Otten

Paul St George wrote:

> This is very helpful indeed, thank you. Awe-inspiring.
> 
> It occurred to me that I could edit the PIL/ImageShow.py, replacing ‘xv’
> (in five places) with the utility of my choice and using ‘executable’ as
> the command.
> 
> Or, is this just not done?

No, this tends to become a maintenance problem. Instead write a little 
module of your own


from PIL import ImageShow

class MyViewer(ImageShow.UnixViewer):
def __init__(self, command):
self.command = command
def get_command_ex(self, file, **options):
return (self.command,) * 2

ImageShow.register(MyViewer("gwenview"), -1)


(replace "gwenview" with your favourite viewer) and import it before using 
Image.show().

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: help

2018-05-29 Thread Steven D'Aprano

On Mon, 28 May 2018 15:43:45 +0530, Mutyala Veera Vijaya Teja wrote:

> Hello,
>   This is vijayteja am getting an error like ssl certificate
> failure when try to install packages.

You get an error *like* SSL certificate failure? Is it a secret what the 
error is? Would you like us to try to guess?

The best way to get help is to COPY AND PASTE (do not re-type from 
memory, do not paraphrase or summarise) the EXACT errors you are getting. 
We need to know *exactly* what you are trying to do (if you are using 
pip, copy and paste the EXACT pip command you are using, if something 
else, you need to tell us exactly what).

The only exception to the rule to show EXACT commands is that you should 
cross out passwords and sensitive usernames or other private data. E.g. 
if you have a password in the command, replace it with "***".

Please do not send screen shots as this mailing list will delete them. If 
you must provide a screen shot, post it on https://imgur.com/ and send us 
the link.

But you should not need to send a screen shot. Copy and paste any command 
you are running. DO NOT take a screen shot of the command.

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer

On 2018-05-23 08:43:02 +1000, Chris Angelico wrote:
> On Wed, May 23, 2018 at 8:31 AM, Peter J. Holzer  wrote:
> > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote:
> >> > 1) For any given file it is almost always possible to find the correct
> >> >encoding (or *a* correct encoding, as there may be more than one).
> >>
> >> You can find an encoding which is capable of decoding a file. That's
> >> not the same thing.
> >
> > If the result is correct, it is the same thing.
> >
> > If I have an input file
> >
> > 4c 69 65 62 65 20 47 72 fc df 65 0a
> >
> > and I decode it correctly to
> >
> > Liebe Grüße
> >
> > it doesn't matter whether I used ISO-8859-1 or ISO-8859-2. The mapping
> > for all bytes in the input file is the same in both encodings.
> 
> Sure, but if you try it as ISO-8859-5 or  -7, you won't get an error,
> but you also won't get that string. So it DOES matter.

I get
Liebe Grќпe
or
Liebe Grόίe
which I can immediately recognize as wrong: They mix Cyrillic resp.
Greek letters with Latin letters in the same word which doesn't happen
in any natural language. Of course "Grќпe" could be a nickname in an
online forum (I've seen stranger names than that), but since "Liebe
Grüße" is a common German phrase it is much much more likely to the
correct interpretation. Also, a real file will usually contain more than
two words. So if the text is German it will contain more words with
umlauts and each byte which is part of a correctly spelled German word
when interpreted according to ISO-8859-1 increases the probability that
decoding with ISO-8859-1 will produce the correct result. There remains
a tiny probability that all those matches are mere coincidence, but I
wrote "almost always", not "always", so I can live with an error rate of
0.01% (or something like that).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

String encoding in Py2.7

2018-05-29 Thread ftg

Hello,
Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand how 
string encoding worked)
Could you please tell me is I understood well what occurs in Python's mind:
in a .py file:
if I write s="héhéhé", if my file is declared as unicode coding, python will 
store in memory s='hx82hx82hx82'
however this is not yet unicode for python interpreter this is just raw bytes. 
Right? 
By the way, why 'h' is not turned into hexa value? Because it is already in the 
ASCII table?
If I want python interpreter to recognize my string as unicode I have to 
declare it as unicode s=u'héhéhé' and magically python will look for those 
hex values 'x82' in the Unicode table. Still OK?
Now: how come when I declare s='héhéhé', print(s) displays well 'héhéhé'? Is it 
because of my shell windows that is dealing well with unicode? Or is it 
because the print function is magic?

Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer

On 2018-05-23 06:03:38 +, Steven D'Aprano wrote:
> On Wed, 23 May 2018 00:31:03 +0200, Peter J. Holzer wrote:
> > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote:
> >> You can find an encoding which is capable of decoding a file. That's
> >> not the same thing.
> > 
> > If the result is correct, it is the same thing.
> 
> But how do you know what is correct and what isn't? In the most general 
> case, even if you know the language nominally being used, you might not 
> be able to recognise good output from bad:
> 
> Max Steele strained his mighty thews against his bonds, but
> the §-rays had left him as weak as a kitten. The evil Galactic
> Emperor, Giµx-Õƒin The Terrible of the planet Œe∂¥, laughed: "I 
> have you now, Steele, and by this time tomorrow my armies will
> have overrun your pitiful Earth defences!"
> 
> If this text is encoding using MacRoman, then decoded in Latin-1, it 
> works, and looks barely any more stupid than the original:
> 
> Max Steele strained his mighty thews against his bonds, but
> the ¤-rays had left him as weak as a kitten. The evil Galactic
> Emperor, Giµx-ÍÄin The Terrible of the planet Îe¶´, laughed: "I
> have you now, Steele, and by this time tomorrow my armies will
> have overrun your pitiful Earth defences!"
> 
> but it clearly isn't the original text.

Please note that I wrote "almost always", not "always". It is of course
possible to construct contrived examples where it is impossible to find
the correct encoding, because all encodings lead to equally ludicrous
results.

I would still maintain that the kind of person who invents names like
this for their fanfic is also unlikely to be able to tell you what
encoding they used ("What's an encoding? I just clicked on 'Save'!").


> Mojibake is especially difficult to deal with when you are dealing with 
> short text snippets like file names or user names which can contain 
> arbitrary characters, where there is rarely any way to recognise the 
> "correct" string.

For single file names or user names, sure. But if you have a list of
them, there is still a high probability that many of them will contain
recognizable words which can be used to deduce the (or a) correct
encoding. (Unless it's from the Ministry of Silly Names).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: String encoding in Py2.7

2018-05-29 Thread Thomas Jollans

On 2018-05-29 09:55, f...@lutix.org wrote:
> Hello,
> Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand 
> how string encoding worked)

Oh dear. This is probably the exact wrong way to go about it: the
interplay between string encoding, unicode and bytes is much less clear
and easy to understand in Python 2.

> Could you please tell me is I understood well what occurs in Python's mind:
> in a .py file:
> if I write s="héhéhé", if my file is declared as unicode coding, python will 
> store in memory s='hx82hx82hx82'

No, it doesn't. At the very least, you're missing some backslashes – and
I don't know of any character encoding that using 0x82 to encode é.

On my system, I see

>>> s = 'héhéhé'
>>> s
'h\xc3\xa9h\xc3\xa9h\xc3\xa9'

My system uses UTF-8. If your PC is set up to uses an encoding like ISO
8859-15 or Windows-1252, you should see

'h\xe9h\xe9h\xe9'

The \x?? are just Python notation.

> however this is not yet unicode for python interpreter this is just raw 
> bytes. Right?

Right, this is a bunch of bytes:

>>> s
'h\xe9h\xe9h\xe9'
>>> [ord(c) for c in s]
[104, 233, 104, 233, 104, 233]
>>> [hex(ord(c)) for c in s]
['0x68', '0xe9', '0x68', '0xe9', '0x68', '0xe9']
>>>

> By the way, why 'h' is not turned into hexa value? Because it is already in 
> the ASCII table?

That's just how Python 2 likes to display stuff.

> If I want python interpreter to recognize my string as unicode I have to 
> declare it as unicode s=u'héhéhé' and magically python will look for those 
> hex values 'x82' in the Unicode table. Still OK?

In principle, the unicode table has nothing to do with anything here. It
so happens that for some characters in some encodings the value is equal
to the code point, but that's neither here nor there.

> Now: how come when I declare s='héhéhé', print(s) displays well 'héhéhé'? Is 
> it because of my shell windows that is dealing well with unicode? Or is it 
> because the print function is magic?

It's because the print statement is magic.

Actually, this *only* works if the encoding of your file matches the
default encoding required by your console. This is usually the case as
long as you stay on the same PC, but this assumption can fall apart
quite easily when you move code and data between systems, especially if
they use different operating systems or (human) languages.

Just use Python 3. There, the print function is not magic, which makes
life so much more logical.

-- Thomas
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-29 Thread Peter J. Holzer

On 2018-05-23 11:08:48 -0600, Ian Kelly wrote:
> On Wed, May 23, 2018 at 10:25 AM, Peter J. Holzer  wrote:
> > How about this?
> >
> > x = 
> > Here is a multi-line string
> > with
> >   indentation.
> > 
> >
> > This would be equivalent to
> >
> > x = 'Here is a multi-line string\nwith\n  indentation.'
> >
> > Rules:
> >
> >  * The leading and trailing  must be aligned vertically.
> 
> Ick, why?

To create an unambiguous left edge.

> What's wrong with letting the trailing delimiter be at the
> end of a line, or the beginning with no indentation?

If "no indentation" means "its indentation defines the left edge", so
that 

a_very_long_variable_name = 
A string.

is equivalent to "  A string.", I could live with that. The downside is
that the parser has to scan to the end of the string before it knows how
much whitespace to strip from each line. OTOH it makes consistent
indentation with tabs easier:

a_very_long_variable_name = 
»···»···»···»···  A string.
»···»···»···»···

Are the quad-quotes aligned? It depends on how wide a tab is. (I used
»··· to visualize a tab)

If "no indentation literally means no indentation, like this:

a = foo()
b = 
  A string.

c = bar()

then the reason for not allowing this is that it subverts the reason for
proposing this feature (to have multiline strings which nicely align
with the indentation of the code and don't stick out to the left like a
sore thumb). 

Similarily, if "no indentation" means "no additional indentation
relative to the surrounding code", then reason is that in a multiline
statement, the continuation lines should be indented more than the first
line (seep PEP 8).

The trailing delimiter could be at the end of the line to signify that
there is no newline at the end of the string:

s = 
  A string.
t = 
  A string.

would then be equivalent to 

s = '  A string.'
t = '  A string.\n'

Then the indentation of the first delimiter alone determines how much
white space is stripped. I think this looks untidy, though, and my rule
4 is more symmetrical.

> >  * The contents of the string must be indented at least as far as the
> >delimiters (and with consistent tabs/spaces).
> >This leading white space is ignored.
> >  * All the leading white space beyond this 'left edge' is preserved.
> >  * The newlines after the leading  and before the trailing  are
> >ignored, all the others preserved. (I thought about preserving the
> >trailing newline, but it is easier to add one than remove one.)
> 
> How about we instead just use the rules from PEP 257 so that there
> aren't two different sets of multi-line string indentation rules to
> have to remember?
> 
> https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation

These rules are nice for a specific application, but I think they are
too ad-hoc and not general enough for a language feature which should be
able to represent arbitrary strings.

In particular:

| will strip a uniform amount of indentation from the second and further
| lines of the docstring, equal to the minimum indentation of all
| non-blank lines after the first line

What if I want all lines to start with some white space? 

|  Any indentation in the first line of the docstring (i.e., up to the
|  first newline) is insignificant and removed.

What if I want the string to start with white space?

|  Blank lines should be removed from the beginning and end of the
|  docstring.

What if I want leading or trailing blank lines?

> Also, how about using a string prefix character instead of making
> quad-quote meaningful? Apart from being hard to visually distinguish
> from triple-quote, this would break existing triple-quote strings that
> happen to start with the quote character, e.g What?' she asked.'''

No confusion here, since in my proposal there is always a newline after
the leading delimiter, since otherwise the first line wouldn't line up
with the rest. So the parser would notice that this is a triple-quote
and not a quad-quote as soon as it sees the "W". 

A prefix might still be a good idea, but .. see below.

> I don't know if 'i' would be the right prefix character for this, but
> it's unused and is short for 'indented':

'i' is fine by me.

> b = i'''
> Here is a multi-line string
> with indentation, which is
> determined from the second
> line.'''

Visually, that letter doesn't look like a part of the quote, so I would
like to pull the contents of the string over to align with the quote:

b = i'''
 Here is a multi-line string
 with indentation, which is
 determined from the second
 line.'''

But that creates an ambiguity: Is the whole string now indented one
space or not? Where is the left edge?

hp

-- 
   _  | Peter J. Holzer| we build much big

Re: how to handle captcha through machanize module or any module

2018-05-29 Thread Peter J. Holzer

On 2018-05-24 09:59:14 +1000, Ben Finney wrote:
> If you are attempting to fool a CAPTCHA with an automated tool, you are
> entering an arms race against those who design the CAPTCHA to *prevent*
> exactly what you're doing.
> 
> Any technique someone can describe to fool the CAPTCHA, will most likely
> already be considered as part of the design of the more effective
> CAPTCHAs, and so the technique will still not actually work reliably.

And any technique that someone can describe to fool programs will most
likely already be considered by those who write programs to break
captchas, and so the technique will still not actually work reliably.

> So, there is no general answer, other than to stop thinking that's a
> race that you can win.

I agree that there is no *general* answer. For any specific captcha,
there probably is a way to break it automatically, and possibly with
higher reliability than a human can (many captchas are hard and
frustrating for humans).

It *is* an arms race and who wins depends on who where to break-even
point between effort and value is for the defender and the attacker.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico

On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer  wrote:
> So if the text is German it will contain more words with
> umlauts and each byte which is part of a correctly spelled German word
> when interpreted according to ISO-8859-1 increases the probability that
> decoding with ISO-8859-1 will produce the correct result. There remains
> a tiny probability that all those matches are mere coincidence, but I
> wrote "almost always", not "always", so I can live with an error rate of
> 0.01% (or something like that).

That's basically what the chardet module does, and its error rate is
far FAR higher than that. If you think it's easy to detect encodings,
I'm sure the chardet maintainers will be happy to accept pull
requests!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Peter J. Holzer

On 2018-05-28 06:34:30 +0200, Christian Gollwitzer wrote:
> I think this is a bug/misfeature in the PIL code. On all 3 major platforms
> there is a way to invoke the standard program for a given file or URL. On
> Windows, it is "cmd.exe /c start ...", on OSX it is "open " and on Linux
> it is "xdg-open ...". That way the file is opened by whatever the user has
> set in his desktop environment.
> 
> Technically, xdg-open needs not to be present on Linux, though it is usually
> installed.

xv and display don't need to be installed either. In fact, xv is
unlikely to be installed (it's non-free (shareware) and hasn't been
maintained in ages[1]). Display is part of imagemagick, which is also
optional (though quite likely to be installed - among other things CUPS
depends on it). And display isn't very useful either (for example, there
doesn't seem to be a way to scale down large images to the size of the
display).

OTOH, xdg-open is part of the free desktop specification, so if you have
any kind of linux desktop, it is very probable that you have xdg-open
and that it is reasonably configured.

I agree that show should call xdg-open preferentially and maybe fall
back to display. And of course silently ignoring a documented parameter
is clearly a bug (if it's true - I notice that the Pillow docs don't mention
that parameter and the original PIL seems to unmaintained).

hp

[1] Which is a pity, because I loved it.

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico

On Tue, May 29, 2018 at 6:34 PM, Peter J. Holzer  wrote:
> On 2018-05-23 06:03:38 +, Steven D'Aprano wrote:
>> Mojibake is especially difficult to deal with when you are dealing with
>> short text snippets like file names or user names which can contain
>> arbitrary characters, where there is rarely any way to recognise the
>> "correct" string.
>
> For single file names or user names, sure. But if you have a list of
> them, there is still a high probability that many of them will contain
> recognizable words which can be used to deduce the (or a) correct
> encoding. (Unless it's from the Ministry of Silly Names).

Ohh... are you assuming that, in a list of file names, all of them use
the same encoding? Ah, yes, well, that WOULD make it easier, wouldn't
it. Sadly, not the case.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: String encoding in Py2.7

2018-05-29 Thread Fabien LUCE

May 29 2018 11:12 AM, "Thomas Jollans"  wrote:
> On 2018-05-29 09:55, f...@lutix.org wrote:
> 
>> Hello,
>> Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand 
>> how string encoding
>> worked)
> 
> Oh dear. This is probably the exact wrong way to go about it: the
> interplay between string encoding, unicode and bytes is much less clear
> and easy to understand in Python 2.

Ok I will quickly jump into py3 then.

> 
>> Could you please tell me is I understood well what occurs in Python's mind:
>> in a .py file:
>> if I write s="héhéhé", if my file is declared as unicode coding, python will 
>> store in memory
>> s='hx82hx82hx82'
> 
> No, it doesn't. At the very least, you're missing some backslashes – and
> I don't know of any character encoding that using 0x82 to encode é.
> 
 surprinsingly backslash were removed from my initial text...
ok so stored raw bytes are the one processed by the system encoder. If my 
console were utf-8 I would have same raw bytes string than you. 
  

> On my system, I see
> 
 s = 'héhéhé'
 s
> 
> 'h\xc3\xa9h\xc3\xa9h\xc3\xa9'
> 
> My system uses UTF-8. If your PC is set up to uses an encoding like ISO
> 8859-15 or Windows-1252, you should see
> 
> 'h\xe9h\xe9h\xe9'
> 
> The \x?? are just Python notation.
> 
>> however this is not yet unicode for python interpreter this is just raw 
>> bytes. Right?
> 
> Right, this is a bunch of bytes:
> 
 s
> 
> 'h\xe9h\xe9h\xe9'
> 
 [ord(c) for c in s]
> 
> [104, 233, 104, 233, 104, 233]
> 
 [hex(ord(c)) for c in s]
> 
> ['0x68', '0xe9', '0x68', '0xe9', '0x68', '0xe9']
> 
 
>> 
>> By the way, why 'h' is not turned into hexa value? Because it is already in 
>> the ASCII table?
> 
> That's just how Python 2 likes to display stuff.
> 
>> If I want python interpreter to recognize my string as unicode I have to 
>> declare it as unicode
>> s=u'héhéhé' and magically python will look for those
>> hex values 'x82' in the Unicode table. Still OK?
> 
> In principle, the unicode table has nothing to do with anything here. It
> so happens that for some characters in some encodings the value is equal
> to the code point, but that's neither here nor there.
> 
>> Now: how come when I declare s='héhéhé', print(s) displays well 'héhéhé'? Is 
>> it because of my shell
>> windows that is dealing well with unicode? Or is it
>> because the print function is magic?
> 
> It's because the print statement is magic.
> 
> Actually, this *only* works if the encoding of your file matches the
> default encoding required by your console. This is usually the case as
> long as you stay on the same PC, but this assumption can fall apart
> quite easily when you move code and data between systems, especially if
> they use different operating systems or (human) languages.
> 
> Just use Python 3. There, the print function is not magic, which makes
> life so much more logical.

Thanks
> 
> -- Thomas
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: String encoding in Py2.7

2018-05-29 Thread Chris Angelico

On Tue, May 29, 2018 at 5:55 PM,   wrote:
> Hello,
> Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand 
> how string encoding worked)
> Could you please tell me is I understood well what occurs in Python's mind:
> in a .py file:
> if I write s="héhéhé", if my file is declared as unicode coding, python will 
> store in memory s='hx82hx82hx82'
> however this is not yet unicode for python interpreter this is just raw 
> bytes. Right?
> By the way, why 'h' is not turned into hexa value? Because it is already in 
> the ASCII table?
> If I want python interpreter to recognize my string as unicode I have to 
> declare it as unicode s=u'héhéhé' and magically python will look for those
> hex values 'x82' in the Unicode table. Still OK?
> Now: how come when I declare s='héhéhé', print(s) displays well 'héhéhé'? Is 
> it because of my shell windows that is dealing well with unicode? Or is it
> because the print function is magic?

What actually happens is this:

1) Your file contains bytes. Hopefully, your editor will follow the
same encoding that you've declared for your file, but that's not
guaranteed.
2) The string contains bytes. If you ask for the representation of
that string, some of them will be shown as characters, but that 'h' is
exactly the same as "\x68".
3) Printing that string sends those bytes to your console.

If EVERYTHING is using the exact same encoding, it all appears to work
correctly. But if your editor saves the file as UTF-8 and your console
is set to Codepage 1252, you'll get a nonsense.

When you use the u-prefix string literal, here's what happens:

1) As above, your file contains bytes, hopefully in the encoding that
you've declared.
2) The Python interpreter reads those bytes using the encoding you
declared. The string contains Unicode characters.
3) Those characters really are characters. They are LATIN SMALL LETTER
H and LATIN SMALL LETTER E WITH ACUTE.
4) Printing that string causes those characters to be sent to your
console in the best way possible for Unicode characters (Python and
the console can negotiate that between them).

If you switch to Python 3 and remove the file's encoding declaration,
here's what happens:

1) Your file contains bytes. Your editor should use UTF-8 because it's
the most logical default; check for this but it's probably going to
happen without any effort.
2) The Python interpreter reads those bytes and understands them as
UTF-8. Your string, like all your source code, contains Unicode
characters.
3) As in the second case, you have those two characters.
4) As above, Python and the console negotiate the best way to display text.

It's a LOT easier. All you have to do is make sure everything's using
UTF-8, and then the defaults will just work. For most text editors,
you don't need to think about this at all, because they're already
configured that way.

As a bonus: Since, in Python 3, *all* your source code is Unicode
text, you can actually switch things around.

>>> s="héhéhé"
>>> héhéhé="s"
>>> print(s)
héhéhé
>>> print(héhéhé)
s

Yep, that's non-ASCII letters in a variable name. Doesn't bother
Python 3 at all!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: cProfile, timed call tree

2018-05-29 Thread Peter J. Holzer

On 2018-05-26 07:38:09 +0200, dieter wrote:
> But, in general, you are right: you cannot reconstruct complete
> call trees. The reason is quite simple: maintaining information
> for the complete caller ancestry (rather than just the immediate
> caller) is expensive (both in terms of runtime and storage).
> Profiling usually is used as a preparation for optimization.
> Optimization has the greatest effects if applied to inner loops.
> And for the analysis of inner loops, complete call tree information
> is not necessary.

I disagree. I have used Tim Bunce's excellent perl profiler
(Devel::NYTProf, for the two people here who also use Perl and don't
already know it), which does record whole call trees, and this is very
useful. You not only see that a function is called 1.5 million times,
you also see where it is called (not just from which functions, but from
which lines) and how much time is spent in calls from each location. 
Often this allowed me find ways to avoid calling a function altogether
or prevented me from chasing down the wrong rabbit hole.

Sometimes it is also useful to find out how your code works, when you
get back to it after a few months ;-).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer

On 2018-05-29 19:46:24 +1000, Chris Angelico wrote:
> On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer  wrote:
> > So if the text is German it will contain more words with
> > umlauts and each byte which is part of a correctly spelled German word
> > when interpreted according to ISO-8859-1 increases the probability that
> > decoding with ISO-8859-1 will produce the correct result. There remains
> > a tiny probability that all those matches are mere coincidence, but I
> > wrote "almost always", not "always", so I can live with an error rate of
> > 0.01% (or something like that).
> 
> That's basically what the chardet module does, and its error rate is
> far FAR higher than that. If you think it's easy to detect encodings,
> I'm sure the chardet maintainers will be happy to accept pull
> requests!

We were talking about humans, not programs.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer

On 2018-05-29 19:47:37 +1000, Chris Angelico wrote:
> On Tue, May 29, 2018 at 6:34 PM, Peter J. Holzer  wrote:
> > On 2018-05-23 06:03:38 +, Steven D'Aprano wrote:
> >> Mojibake is especially difficult to deal with when you are dealing with
> >> short text snippets like file names or user names which can contain
> >> arbitrary characters, where there is rarely any way to recognise the
> >> "correct" string.
> >
> > For single file names or user names, sure. But if you have a list of
> > them, there is still a high probability that many of them will contain
> > recognizable words which can be used to deduce the (or a) correct
> > encoding. (Unless it's from the Ministry of Silly Names).
> 
> Ohh... are you assuming that, in a list of file names, all of them use
> the same encoding? Ah, yes, well, that WOULD make it easier, wouldn't
> it. Sadly, not the case.

Not in general, but it *IS* the case we were talking about here. The
task is to find *an* encoding which can be used to decode *a* file. This
of course assumes that such an encoding exists. If there are several
encodings in the same file (I use "file" loosely here), then there is no
single encoding which can be used to decode it, so the task is
impossible. (You may still be able to split the file into chunks where
each chunk uses a specific encoding and determine that, but this is a
different task - and one for which the solution "ask the source" is even
less likely to work.)

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico

On Tue, May 29, 2018 at 8:09 PM, Peter J. Holzer  wrote:
> On 2018-05-29 19:46:24 +1000, Chris Angelico wrote:
>> On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer  wrote:
>> > So if the text is German it will contain more words with
>> > umlauts and each byte which is part of a correctly spelled German word
>> > when interpreted according to ISO-8859-1 increases the probability that
>> > decoding with ISO-8859-1 will produce the correct result. There remains
>> > a tiny probability that all those matches are mere coincidence, but I
>> > wrote "almost always", not "always", so I can live with an error rate of
>> > 0.01% (or something like that).
>>
>> That's basically what the chardet module does, and its error rate is
>> far FAR higher than that. If you think it's easy to detect encodings,
>> I'm sure the chardet maintainers will be happy to accept pull
>> requests!
>
> We were talking about humans, not programs.
>

Sure, but you're describing a set of rules. If you can define a set of
rules that pin down the encoding, you could teach chardet to follow
those rules. If you can't teach chardet to follow those rules, you
can't teach a human to follow them either. What is the human going to
do? Guess?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: String encoding in Py2.7

2018-05-29 Thread Steven D'Aprano

On Tue, 29 May 2018 09:19:52 +, Fabien LUCE wrote:

> May 29 2018 11:12 AM, "Thomas Jollans"  wrote:
>> On 2018-05-29 09:55, f...@lutix.org wrote:
>> 
>>> Hello,
>>> Using Python 2.7 (will switch to Py3 soon but Before I'd like to
>>> understand how string encoding worked)
>> 
>> Oh dear. This is probably the exact wrong way to go about it: the
>> interplay between string encoding, unicode and bytes is much less clear
>> and easy to understand in Python 2.
> 
> Ok I will quickly jump into py3 then.

Why I applaud this decision -- the latest Python 3.x series is much 
better than 2.7 -- please don't imagine that moving to Python 3 will 
eliminate all encoding issues, especially when dealing with real-world 
data that comes to you in a mix of weird and often broken encodings.

Python 3 eliminates one common source of problems: unlike Python 2, it 
won't try to guess what you mean when you combines bytes and Unicode 
text. In Python 2, that worked for the simple cases, and was often 
convenient, but at the cost of leading to hard to diagnose and hard to 
fix errors in the complex cases. Python 3 no longer guesses, which means 
you have to be more diligent in converting bytes to text and vice versa.

Also, it has to be said that Python 3 makes one use-case harder: mixed 
binary bytes plus ASCII text. (Or so I've been told.)

But for the common case where you have human readable text in Unicode, 
and machine readable bytes in hex bytes, and can keep them separate, 
Python 3 is much better.

I recommend you start with reading these if you haven't already:

https://nedbatchelder.com/text/unipain.html

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-
software-developer-absolutely-positively-must-know-about-unicode-and-
character-sets-no-excuses/

Sorry for the huge URL, try this if your mail client breaks it: 
https://tinyurl.com/h8yg9d7

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: String encoding in Py2.7

2018-05-29 Thread Chris Angelico

On Tue, May 29, 2018 at 8:39 PM, Steven D'Aprano
 wrote:
> On Tue, 29 May 2018 09:19:52 +, Fabien LUCE wrote:
>
>> May 29 2018 11:12 AM, "Thomas Jollans"  wrote:
>>> On 2018-05-29 09:55, f...@lutix.org wrote:
>>>
 Hello,
 Using Python 2.7 (will switch to Py3 soon but Before I'd like to
 understand how string encoding worked)
>>>
>>> Oh dear. This is probably the exact wrong way to go about it: the
>>> interplay between string encoding, unicode and bytes is much less clear
>>> and easy to understand in Python 2.
>>
>> Ok I will quickly jump into py3 then.
>
> Why I applaud this decision -- the latest Python 3.x series is much
> better than 2.7 -- please don't imagine that moving to Python 3 will
> eliminate all encoding issues, especially when dealing with real-world
> data that comes to you in a mix of weird and often broken encodings.
>
> Python 3 eliminates one common source of problems: unlike Python 2, it
> won't try to guess what you mean when you combines bytes and Unicode
> text. In Python 2, that worked for the simple cases, and was often
> convenient, but at the cost of leading to hard to diagnose and hard to
> fix errors in the complex cases. Python 3 no longer guesses, which means
> you have to be more diligent in converting bytes to text and vice versa.

Python 3 eliminates a number of common sources of problems; in fact,
it eliminates a large number of problems. But you're right that it's
no panacea, since there cannot ever be a perfect solution.

> Also, it has to be said that Python 3 makes one use-case harder: mixed
> binary bytes plus ASCII text. (Or so I've been told.)

Early versions of Py3 yes, but the latest versions have had features
added that restore this to its Py2 simplicity (for ASCII
specifically).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer

On 2018-05-29 20:28:54 +1000, Chris Angelico wrote:
> On Tue, May 29, 2018 at 8:09 PM, Peter J. Holzer  wrote:
> > On 2018-05-29 19:46:24 +1000, Chris Angelico wrote:
> >> That's basically what the chardet module does, and its error rate is
> >> far FAR higher than that. If you think it's easy to detect encodings,
> >> I'm sure the chardet maintainers will be happy to accept pull
> >> requests!
> >
> > We were talking about humans, not programs.
> >
> 
> Sure, but you're describing a set of rules. If you can define a set of
> rules that pin down the encoding, you could teach chardet to follow
> those rules. If you can't teach chardet to follow those rules, you
> can't teach a human to follow them either. What is the human going to
> do? Guess?

Xkcd to the rescue:

https://xkcd.com/1425/

There are a lot of things which are easy to do for a human (recognize a
bird, understand a sentence), but very hard to write a program for
(mostly because we don't understand how our brain works, I think).

I haven't looked in detail on how chardet works but it looks like has a
few simple statistical models for the probability of characters and
character sequences. This is very different from what a human does, who
a) recognises whole words, and b) knows what they mean and whether they
make sense in context.

For a sufficiently narrow range of texts, you can write a program which
is better at recognizing encoding or language than any human can (As an
obvious improvement to chardet, you could supply it with dictionaries of
all languages). However, in the general case that would need a strong
AI. And we aren't there yet, by far.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Chris Angelico

On Tue, May 29, 2018 at 8:59 PM, Peter J. Holzer  wrote:
> On 2018-05-29 20:28:54 +1000, Chris Angelico wrote:
>> Sure, but you're describing a set of rules. If you can define a set of
>> rules that pin down the encoding, you could teach chardet to follow
>> those rules. If you can't teach chardet to follow those rules, you
>> can't teach a human to follow them either. What is the human going to
>> do? Guess?
>
> Xkcd to the rescue:
>
> https://xkcd.com/1425/
>
> There are a lot of things which are easy to do for a human (recognize a
> bird, understand a sentence), but very hard to write a program for
> (mostly because we don't understand how our brain works, I think).
>
> I haven't looked in detail on how chardet works but it looks like has a
> few simple statistical models for the probability of characters and
> character sequences. This is very different from what a human does, who
> a) recognises whole words, and b) knows what they mean and whether they
> make sense in context.
>
> For a sufficiently narrow range of texts, you can write a program which
> is better at recognizing encoding or language than any human can (As an
> obvious improvement to chardet, you could supply it with dictionaries of
> all languages). However, in the general case that would need a strong
> AI. And we aren't there yet, by far.

I would go further. Some things aren't just beyond current technology
(the "is it a bird" example is just now coming into current tech), and
others are fundamentally impossible. Here's a challenge: Go through a
collection of usernames and identify the language that they were
derived from. Some of them are arbitrary collections of letters and
have no "base language". Others are concatenations of words, not
individual words. A few are going to be mash-ups. Others might be
reversed or otherwise mangled. Okay. Now figure out how to pronounce
those, because that depends on the language.

Impossible? Yep. Now replace "language" with "encoding" and it's still
just as impossible. Sometimes you'll get it wrong and it won't matter
(because the end result of your guess is the same as the end result of
the actual encoding), but other times it will matter.

You can always solve a subset of problems. Using your own knowledge of
German, you are able to better solve problems involving German text.
But that doesn't make you any better than chardet at validating
Chinese text, or Korean text, or Klingon text, or any other language
you don't know. In fact, you are WORSE than a computer, because a
computer can be programmed to be fluent in six million forms of
communication, where a human is notable with six. (My apologies if you
happen to know Chinese, Korean, or Klingon. Pick other languages.)
Suppose you were to teach a machine all your tricks for understanding
German text - but someone else teaches the same machine how to
understand other languages too. We're right back where we started,
unable to recognize which language something is. Or needing external
information about the language in order to better guess the encoding.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Peter J. Holzer

On 2018-05-29 21:13:43 +1000, Chris Angelico wrote:
> You can always solve a subset of problems. Using your own knowledge of
> German, you are able to better solve problems involving German text.
> But that doesn't make you any better than chardet at validating
> Chinese text, or Korean text, or Klingon text, or any other language
> you don't know.

But I don't have to. Chardet has to be reasonably good at identifying
any encoding. I only have to be good at identifying the encoding of
files which I need to import (or otherwise process.).

Please go back to the original posting. The poster has one file which he
wants to read, and asked how to determine the encoding. He was told
categorically that this is impossible and he must ask the source.

THIS is what I'm responding to, not the problem of finding a generic
solution which works for every possible file.

The OP has one file. He wants to read it. The very fact that he wants to
read this particular file makes it very likely that he knows something
about the contents of the file. So he has domain knowledge. Which makes
it very likely that he can distinguish a correct from an incorrect
decoding. He probably can't distinguish Korean poetry from a Vietnamese
shopping list, but his file probably isn't either.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

ANN: EuroScipy 2018

2018-05-29 Thread Valerio Maggio

*** Apologies if you receive multiple copies ***

Dear Colleagues,

We are delighted to invite you to join us for the
**11th European Conference on Python in Science**.

The EuroSciPy 2018 (https://www.euroscipy.org/2018/) Conference
will be organised by Fondazione Bruno Kessler (FBK)
and will take place from August 28 to September 1 in **Trento, Italy**.

The EuroSciPy meeting is a cross-disciplinary gathering focused on the use
and
development of the Python language in scientific research.
This event strives to bring together both users and developers of
scientific tools, as well as academic research and state of the art
industry.

The conference will be structured as it follows:
Aug, 28-29 : Tutorials and Hands-on
Aug, 30-31 : Main Conference
Sep, 1 : Sprint


TOPICS OF INTEREST:
===

Presentations of scientific tools and libraries using the Python language,
including but not limited to:
- Algorithms implemented or exposed in Python
- Astronomy
- Data Visualisation
- Deep Learning & AI
- Earth, Ocean and Geo Science
- General-purpose Python tools that can be of special interest to the
scientific community.
- Image Processing
- Materials Science
- Parallel computing
- Political and Social Sciences
- Project Jupyter
- Reports on the use of Python in scientific achievements or ongoing
projects.
- Robotics & IoT
- Scientific data flow and persistence
- Scientific visualization
- Simulation
- Statistics
- Vector and array manipulation
- Web applications and portals for science and engineering
- 3D Printing


CALL FOR PROPOSALS:
===

EuroScipy will accept three different kinds of contributions:

* Regular Talks: standard talks for oral presentations, allocated in time
slots of 15, or 30 minutes, depending on your preference and scheduling
constraints. Each time slot considers a Q&A session at the end of the talk
(at least, 5 mins).

* Hands-on Tutorials: These are beginner or advanced training sessions to
dive
into the subject with all details. These sessions are 90 minutes long,
and the audience will be strongly encouraged to bring a laptop to
experiment.

* Poster: EuroScipy will host two poster sessions during the two days of
Main Conference. So attendees and students are highly encourage to present
their work and/or preliminary results as posters.

Proposals should be submitted using the EuroScipy submission system at
https://pretalx.com/euroscipy18.

**Submission deadline is May, 31st 2018**.


REGISTRATION & FEES:


To register to EuroScipy 2018, please go to
http://euroscipy2018.eventbrite.co.uk or to http://www.euroscipy.org/2018/

Fees:
-

| Tutorials   | Student*  | Academic/Individual | Industry|
|-|--|-|--|
| Early Bird (till July, 1st) | € 50,00  | € 70,00 | € 125,00 |
| Regular (till Aug, 5th  | € 100,00 | € 110,00| € 250,00 |
| Late (till Aug, 22nd)   | € 135,00 | € 135,00| € 300,00 |


| Main Conference | Student*  | Academic/Individual | Industry|
|-|--|-|--|
| Early Bird (till July, 1st) | € 50,00  | € 70,00 | € 125,00 |
| Regular (till Aug, 5th  | € 100,00 | € 110,00| € 250,00 |
| Late (till Aug, 22nd)   | € 135,00 | € 135,00| € 300,00 |

* A proof of student status will be required at time of the registration.


kind regards,
Valerio

EuroScipy 2018 Organising Committee,
Email: i...@euroscipy.org
Website: http://www.euroscipy.org/2018
twitter: @euroscipy
-- 
https://mail.python.org/mailman/listinfo/python-list

EuroPython 2018: Financial Aid Program

2018-05-29 Thread M.-A. Lemburg

As part of our commitment to the Python community, we are pleased to
announce that we offer special grants for people in need of a
financial aid to attend EuroPython 2018.

We offer financial aid conference grants in these 3 categories:

- Free and discounted ticket: Get a standard ticket for the conference
  for free (including access to conference days (Wed-Fri), Beginners’
  Day workshop and sprints.). Note: training passes are NOT included
  in the free conference ticket.

- Travel costs: We will cover the travel costs pro rata, depending on
  what you are applying for.

- Accommodation: We can partially cover your accommodation costs

Grant Eligibility
-

Our grants are open to all people in need of financial aid. We will
specifically take into account the following criteria in the selection
process:

- Contributors: Potential speakers/trainers of EuroPython (people who
  submitted a proposal) and all who contribute to EuroPython and/or
  Python community projects.

- Economic factors: We want everybody to have a chance to come to
  EuroPython, regardless of economic situation or income level.

- Diversity: We seek the most diverse and inclusive event possible.

How to apply


You can apply for financial aid by filling the form on the EuroPython
2018 Finance Aid page:

 https://ep2018.europython.eu/en/registration/financial-aid/

If you have any questions, please read the FAQ or send an email to
fin...@europython.eu

Timeline


- June 5th (2018-06-05)  - the deadline for submitting the applications
- June 12th (2018-06-12) - applicants will be notified by e-mail around
- June 17th (2018-06-17) - deadline for applicants to accept the grant
- June 20th (2018-06-20) - applicants will receive confirmation notific
- July 23rd (2018-07-23) - last day when we accept invoices

Refund management
-

Free ticket: The individual coupons will be generated for a free
ticket.

Accommodation and Travel grant: All grants involving reimbursements
will be reimbursed by PayPal or bank transfer. Please send us your
receipts (hotel invoice, plane/bus/train ticket) before the conference
for approval.

Become a special Finance Aid and Diversity sponsor!

You or your company can support our finaid initiative by becoming a sponsor.

We have a special sponsor package “Financial aid sponsor” and a
“Financial aid donation” option that can be booked separately or in
combination with a sponsor package:

- https://ep2018.europython.eu/en/sponsor/packages/#Financial-Aid-Sponsor
- https://ep2018.europython.eu/en/sponsor/options/

For more information please contact the sponsor work group at
sponsor...@europython.eu.

Bring us a new sponsor and get free ticket for EuroPython 2018!


Help spread the word


Please help us spread this message by sharing it on your social
networks as widely as possible. Thank you !

Link to the blog post:

https://blog.europython.eu/post/174368323052/europython-2018-financial-aid-program

Tweet:

https://twitter.com/europython/status/1001437296816742401


Enjoy,
--
EuroPython 2018 Team
https://ep2018.europython.eu/
https://www.europython-society.org/

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Steven D'Aprano

On Tue, 29 May 2018 10:34:50 +0200, Peter J. Holzer wrote:

> On 2018-05-23 06:03:38 +, Steven D'Aprano wrote:
>> On Wed, 23 May 2018 00:31:03 +0200, Peter J. Holzer wrote:
>> > On 2018-05-23 07:38:27 +1000, Chris Angelico wrote:
>> >> You can find an encoding which is capable of decoding a file. That's
>> >> not the same thing.
>> > 
>> > If the result is correct, it is the same thing.
>> 
>> But how do you know what is correct and what isn't? 
[...]
>> If this text is encoding using MacRoman, then decoded in Latin-1, it
>> works, and looks barely any more stupid than the original:
>> 
>> Max Steele strained his mighty thews against his bonds, but the
>> ¤-rays had left him as weak as a kitten. The evil Galactic Emperor,
>> Giµx-ÍÄin The Terrible of the planet Îe¶´, laughed: "I have you
>> now, Steele, and by this time tomorrow my armies will have overrun
>> your pitiful Earth defences!"
>> 
>> but it clearly isn't the original text.
> 
> Please note that I wrote "almost always", not "always". It is of course
> possible to construct contrived examples where it is impossible to find
> the correct encoding, because all encodings lead to equally ludicrous
> results.

Whether they are ludicrous is not the point, the point is whether it is 
the original text intended.

What you describe works for the EASY cases: you have a small number of 
text files in some human-readable language, the text files are all valid 
texts in that language, and you have an expert in that language on hand 
able to distinguish between such valid and invalid decoded texts.

If that applies for your text files, great, you have nothing to fear from 
encoding issues! Even if the supplier of the files wouldn't know ASCII 
from EBCDIC if it fell on them from a great height, you can probably make 
an educated guess what the encoding is. Wonderful.

But that's not always the case. In the real world, especially now that we 
interchange documents from all over the world, it isn't the hard cases 
that are contrived. Depending on the type of document (e.g. web pages you 
scrape are probably different from emails, which are different from 
commercial CSV files...) being able to just look at the file and deduce 
the correct encoding is the contrived example.

Depending on where the text is coming from:

- you might not have an expert on hand who can distinguish between
  valid and invalid text;

- you might have to process a large number of files (thousands or
  millions) automatically, and cannot hand-process those that have
  encoding problems;

- your files might not even be in a single consistent encoding, or
  may have Mojibake introduced at some earlier point that you do not
  have control over;

- you might not know what language the text is supposed to be;

- or it might contain isolated words in some unknown language;

  e.g. your text might be nearly all ASCII English, except for a word
  "Čezare" (if using the Czech Kamenický encoding) or "Çezare" (if
  using the Polish Mazovia encoding) or "Äezare" (Mac Roman).

  How many languages do you need to check to determine which is 
  correct? (Hint: all three words are valid.)

- not all encoding problems are as equally easy to resolve as
  your earlier German/Russian example.

E.g. Like Japanese, Russian has a number of incompatible and popular 
encodings. Mojibake is a Japanese term, but the Russians have their own 
word for it: krakozyabry (кракозя́бры).

Dealing with bad data is *hard*.

https://www.safaribooksonline.com/library/view/bad-data-
handbook/9781449324957/ch04.html

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)

2018-05-29 Thread Ian Kelly

On Tue, May 29, 2018 at 3:19 AM, Peter J. Holzer  wrote:
> On 2018-05-23 11:08:48 -0600, Ian Kelly wrote:
>>
>> How about we instead just use the rules from PEP 257 so that there
>> aren't two different sets of multi-line string indentation rules to
>> have to remember?
>>
>> https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation
>
> These rules are nice for a specific application, but I think they are
> too ad-hoc and not general enough for a language feature which should be
> able to represent arbitrary strings.
>
> In particular:
>
> | will strip a uniform amount of indentation from the second and further
> | lines of the docstring, equal to the minimum indentation of all
> | non-blank lines after the first line
>
> What if I want all lines to start with some white space?
>
> |  Any indentation in the first line of the docstring (i.e., up to the
> |  first newline) is insignificant and removed.
>
> What if I want the string to start with white space?
>
> |  Blank lines should be removed from the beginning and end of the
> |  docstring.
>
> What if I want leading or trailing blank lines?

Fair points. I still dislike reinventing the wheel here. Note that
even as I proposed reusing the single existing indentation-removal
scheme in the language, I misremembered a few things about how it
works.

>> Also, how about using a string prefix character instead of making
>> quad-quote meaningful? Apart from being hard to visually distinguish
>> from triple-quote, this would break existing triple-quote strings that
>> happen to start with the quote character, e.g What?' she asked.'''
>
> No confusion here, since in my proposal there is always a newline after
> the leading delimiter, since otherwise the first line wouldn't line up
> with the rest. So the parser would notice that this is a triple-quote
> and not a quad-quote as soon as it sees the "W".

Then how about a triple-quote string that starts with a quote
character followed by a newline?

>> b = i'''
>> Here is a multi-line string
>> with indentation, which is
>> determined from the second
>> line.'''
>
> Visually, that letter doesn't look like a part of the quote, so I would
> like to pull the contents of the string over to align with the quote:
>
> b = i'''
>  Here is a multi-line string
>  with indentation, which is
>  determined from the second
>  line.'''
>
> But that creates an ambiguity: Is the whole string now indented one
> space or not? Where is the left edge?

I don't follow. In the first case you have a multi-line string where
every line is indented four spaces, so four spaces are to be removed
from every line. In the second case you have a multi-line string where
every line is indented by five spaces, so five spaces are to be
removed from every line. What about the second string would make the
algorithm think that four spaces are to be removed from every line,
leaving one? Why not remove three, leaving two? Or remove one, leaving
four? And why is the first string safe from this?

In any case, Chris made a good point that I agree with. This doesn't
really need to be syntax at all, but could just be implemented as a
new string method.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: '_' and '__'

2018-05-29 Thread Mike McClain

To the many who responded, many thanks.
I,too, found Nick Coghlan's answer iluminating.
Mike
--
There are always gossips everywhere you go and few of them
limit themselves to veracity when what they consider a good
story is available to keep their audience entertained.
- MM
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Paul St George


Should the PIL code be corrected?


On 28/05/2018 06:34, Christian Gollwitzer wrote:

Am 27.05.18 um 23:58 schrieb Cameron Simpson:

On 27May2018 20:15, Paul St George  wrote:

This is very helpful indeed, thank you. Awe-inspiring.

It occurred to me that I could edit the PIL/ImageShow.py, replacing 
‘xv’ (in five places) with the utility of my choice and using 
‘executable’ as the command.


Or, is this just not done?


It becomes a maintenance problem.

Alternatively you could:

Just write your own show function which accepts an Image and displays 
it with your program of choice. You might need to write some 
equivalent code which saves the Image to a file first, and removes it 
afterwards.


You could copy the show() code into a function of your own (i.e. in 
your own codebase) modify that to suit, then monkeypatch the class:


  Image.show = your_personal_show_function

when your programme starts. That way the code changes are not in the 
PIL code.


I think this is a bug/misfeature in the PIL code. On all 3 major 
platforms there is a way to invoke the standard program for a given 
file or URL. On Windows, it is "cmd.exe /c start ...", on OSX it is 
"open " and on Linux it is "xdg-open ...". That way the file is 
opened by whatever the user has set in his desktop environment.


Technically, xdg-open needs not to be present on Linux, though it is 
usually installed.


Christian



--
Paul St George
http://www.paulstgeorge.com
http://www.devices-of-wonder.com

+44(0)7595 37 1302

--
https://mail.python.org/mailman/listinfo/python-list

Re: Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Paul St George


Thank you. For the advice, and for the new word 'monkeypatch'.


On 27/05/2018 23:58, Cameron Simpson wrote:

On 27May2018 20:15, Paul St George  wrote:

This is very helpful indeed, thank you. Awe-inspiring.

It occurred to me that I could edit the PIL/ImageShow.py, replacing 
‘xv’ (in five places) with the utility of my choice and using 
‘executable’ as the command.


Or, is this just not done?


It becomes a maintenance problem.

Alternatively you could:

Just write your own show function which accepts an Image and displays 
it with your program of choice. You might need to write some 
equivalent code which saves the Image to a file first, and removes it 
afterwards.


You could copy the show() code into a function of your own (i.e. in 
your own codebase) modify that to suit, then monkeypatch the class:


 Image.show = your_personal_show_function

when your programme starts. That way the code changes are not in the 
PIL code.


Cheers,
Cameron Simpson 



--
Paul St George
http://www.paulstgeorge.com
http://www.devices-of-wonder.com

+44(0)7595 37 1302

--
https://mail.python.org/mailman/listinfo/python-list

Re: Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Paul St George


I tried this anyway. The error was:

    non-keyword arg after keyword arg



On 27/05/2018 21:51, Dennis Lee Bieber wrote:

On Sun, 27 May 2018 19:59:41 +0200, Paul St George 
declaimed the following:


So, on Unix I would use

Image.show(title=None, nameofdisplayutilty), or Image.show(title=None,
scriptname) #where script with name scriptname invokes the program

I will try this now! And thank you.


That is based upon my interpretation of the documentation... If the
other participant is right, however, then the "command" parameter may not
even be getting used at the lower levels, and only the list of registered
viewers is examined. In that case one would have to register a function to
invoke the viewer.




--
Paul St George
http://www.paulstgeorge.com
http://www.devices-of-wonder.com

+44(0)7595 37 1302

--
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-05-29 Thread Steven D'Aprano

On Tue, 29 May 2018 14:04:19 +0200, Peter J. Holzer wrote:

> The OP has one file. 

We don't know that. All we know is that he had one file which he was 
unable to read. For all we know, he has a million files, and this was 
merely the first of many failures.

> He wants to read it. The very fact that he wants to
> read this particular file makes it very likely that he knows something
> about the contents of the file. So he has domain knowledge.

An unjustified assumption. I've wanted to read many files with only the 
vaguest guess of what they might contain.

As for his domain knowledge, look again at the OP's post. His solution 
was to paper over the error, make the error go away, by moving to Python 
2 which is more lax about getting the encoding right:

"i actually got my script to function by running it in python 2.7"

So he didn't identify the correct encoding, nor did he use an error 
handler, or fix the bug in his code. He just downgraded to an older 
version of Python, because it made the exception (but not the error) go 
away.

My prediction is that he has replaced an explicit exception with a silent 
failure, preferring mojibake to actually dealing with the problem.

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Re: Re: Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Paul St George

Is there, somewhere, a list of viewers and their names (for the purposes 
of this script)?


I am assuming that if I want to ImageMagick (for example), there would 
be some shorter name - such as 'magick' - and it would be lower case .



On 29/05/2018 08:58, Peter Otten wrote:

Paul St George wrote:


This is very helpful indeed, thank you. Awe-inspiring.

It occurred to me that I could edit the PIL/ImageShow.py, replacing ‘xv’
(in five places) with the utility of my choice and using ‘executable’ as
the command.

Or, is this just not done?

No, this tends to become a maintenance problem. Instead write a little
module of your own


from PIL import ImageShow

class MyViewer(ImageShow.UnixViewer):
 def __init__(self, command):
 self.command = command
 def get_command_ex(self, file, **options):
 return (self.command,) * 2

ImageShow.register(MyViewer("gwenview"), -1)


(replace "gwenview" with your favourite viewer) and import it before using
Image.show().




--
Paul St George
http://www.paulstgeorge.com
http://www.devices-of-wonder.com

+44(0)7595 37 1302

--
https://mail.python.org/mailman/listinfo/python-list

[ANN] pluggable-info-monitor 0.2.1 released!

2018-05-29 Thread George Fischhof

Hi everyone,

I’m very excited to announce the release of pluggable-info-monitor 0.2.1
First public release.

You can download it form bitbucket:
https://bitbucket.org/GeorgeFischhof/pluggable_info_monitor

package index page:
https://pypi.python.org/pypi/pluggable-info-monitor


What is pluggable-info-monitor?

A web application that shows the information you gathered with your plugin.
It can be anything ;-) examples:
- in a development environment, bug statistics, build and test results
- in education, some education material
- in an office it can show the weather foreast, namedays, daily quotes
- it can be used as a dashboard for system administrators
- etc

There are example plugins to help developing your own plugins.


Please note:

The full feature set requires Python 3.4 and later.



Have fun using pluggable-info-monitor
George
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Steven D'Aprano

On Tue, 29 May 2018 20:02:22 +0200, Paul St George wrote:

> Is there, somewhere, a list of viewers and their names (for the purposes
> of this script)?

Do you mean a list of programs capable of viewing graphics? Do you think 
there is some sort of central authority that registers the names of all 
such programs? *wink*

You can start here:

https://en.wikipedia.org/wiki/Category:Graphics_software

but that's probably the closest you're going to get.

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list

a Python bug report

2018-05-29 Thread Ruifeng Guo

Hello,
We encountered a bug in Python recently, we checked the behavior for Python 
version 2.7.12, and 3.1.1, both version show the same behavior. Please see 
below the unexpected behavior in "red text".

Thanks,
Ruifeng Guo

From: Brian Archer
Sent: Tuesday, May 29, 2018 5:57 PM
To: Ruifeng Guo 
Subject: Python Bug

Python 3.1.1 (r311:74480, Nov 20 2012, 09:11:57)
[GCC 4.2.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=1017.0
>>> print(int(a))
1017
>>> b=1000*1.017
>>> print(b)
1017.0
>>> int(b)
1016
>>> c=1017.0
>>> int(c)
1017
>>>



-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Re: The PIL show() method looks for the default viewer. How do I change this to a different viewer (of my choice)?

2018-05-29 Thread Ian Kelly

On Sat, May 26, 2018 at 9:17 AM, Paul St George  wrote:
> Thank you.
> You are very right. The show() method is intended for debugging purposes and
> is useful for that, but what method should I be using and is PIL the best
> imaging library for my purposes? I do not want to manipulate images, I only
> want to show images (full screen) on an external display. I want to use
> Python to control the timing of the images.

You probably shouldn't be using PIL at all then. Why open the file in
Python just to export it and re-open it in an image viewer? It would
be simpler just to point whichever image viewer you prefer at the
original file directly. Your entire script could just be something
like this:

import subprocess

# Some timing logic

subprocess.call("display " + imagepath, shell=True)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: a Python bug report

2018-05-29 Thread José María Mateos

On Wed, May 30, 2018 at 01:07:38AM +, Ruifeng Guo wrote:
> Hello,
> We encountered a bug in Python recently, we checked the behavior for Python 
> version 2.7.12, and 3.1.1, both version show the same behavior. Please see 
> below the unexpected behavior in "red text".

Have you tried the round() function, however?

In [1]: round(1000 * 1.017)
Out[1]: 1017.0

This is a floating point precision "issue". int() only gets rid of the 
decimals.

In [2]: int(3.9)
Out[2]: 3

Because:

In [3]: 1000 * 1.017
Out[3]: 1016.9

So there you have it.

Some more reading: 
https://stackoverflow.com/questions/43660910/python-difference-between-round-and-int

Cheers,

-- 
José María (Chema) Mateos
https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: a Python bug report

2018-05-29 Thread Ian Kelly

On Tue, May 29, 2018 at 7:07 PM, Ruifeng Guo  wrote:
> Hello,
> We encountered a bug in Python recently, we checked the behavior for Python 
> version 2.7.12, and 3.1.1, both version show the same behavior. Please see 
> below the unexpected behavior in "red text".
>
> Thanks,
> Ruifeng Guo
>
> From: Brian Archer
> Sent: Tuesday, May 29, 2018 5:57 PM
> To: Ruifeng Guo 
> Subject: Python Bug
>
> Python 3.1.1 (r311:74480, Nov 20 2012, 09:11:57)
> [GCC 4.2.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 a=1017.0
 print(int(a))
> 1017
 b=1000*1.017
 print(b)
> 1017.0
 int(b)
> 1016
 c=1017.0
 int(c)
> 1017

Try this, and you'll see what the problem is:

>>> repr(b)
'1016.9'

The value of b is not really 1017, but fractionally less as a result
of floating point rounding error, because 1.017 cannot be exactly
represented as a float.

In Python 3.2, the str() of the float type was changed to match the
repr(), so that when you use print() as above you will also get this
result:

>>> print(b)
1016.9

By the way, Python 3.1.1 is really old (six years!). I recommend
upgrading if possible.
-- 
https://mail.python.org/mailman/listinfo/python-list

Simple question: how do I print output from .get() method

2018-05-29 Thread MrMagoo2018

Hello folks, imagine I have the code below and I am getting the "error" message 
when attempting to print() the output of 'sw_report'. 
Can you suggest which method I should use to retrieve this? Is that a 
dictionary maybe?

from arista import arista
m = arista()
m.authenticate ("user","password")
sw_report = m.np.sw_report.get("swType="EOS",swMajorVersion="5.0")
print (sw_report)

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: cProfile, timed call tree

2018-05-29 Thread dieter

"Peter J. Holzer"  writes:

> On 2018-05-26 07:38:09 +0200, dieter wrote:
>> But, in general, you are right: you cannot reconstruct complete
>> call trees. The reason is quite simple: maintaining information
>> for the complete caller ancestry (rather than just the immediate
>> caller) is expensive (both in terms of runtime and storage).
>> Profiling usually is used as a preparation for optimization.
>> Optimization has the greatest effects if applied to inner loops.
>> And for the analysis of inner loops, complete call tree information
>> is not necessary.
>
> I disagree. I have used Tim Bunce's excellent perl profiler
> (Devel::NYTProf, for the two people here who also use Perl and don't
> already know it), which does record whole call trees, and this is very
> useful. You not only see that a function is called 1.5 million times,
> you also see where it is called (not just from which functions, but from
> which lines) and how much time is spent in calls from each location. 
> Often this allowed me find ways to avoid calling a function altogether
> or prevented me from chasing down the wrong rabbit hole.

If the profile information is sampled for the call location (rather then
the call function), you still do not get the "complete call tree".
If you want to get results based on call paths (rather than the immediate
caller), the sampling must (in general) sample for call paths (and
not only the immediate caller) -- which means, you must implement
your own profiler.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Simple question: how do I print output from .get() method

2018-05-29 Thread Chris Angelico

On Wed, May 30, 2018 at 3:44 PM, MrMagoo2018  wrote:
> Hello folks, imagine I have the code below and I am getting the "error" 
> message when attempting to print() the output of 'sw_report'.
> Can you suggest which method I should use to retrieve this? Is that a 
> dictionary maybe?
>
> from arista import arista
> m = arista()
> m.authenticate ("user","password")
> sw_report = m.np.sw_report.get("swType="EOS",swMajorVersion="5.0")
> print (sw_report)
> 

That's not an error message. You asked Python to print it out, and it
printed it out. As it happens, the display isn't particularly useful,
but it's not an error.

What you have is a *generator object*, which is something you can
iterate over. I don't know about the arista library, so I don't know
what you'll get from that, but at its simplest, you could convert that
to a list:

print(list(sw_report))

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

41 matches

Mail list logo