So apparently I've been banned from this list

2018-09-30 Thread Steven D'Aprano
I've been unexpectedly in hospital for the past two weeks, without 
internet or email. Just before my unexpected hospital stay, I was 
apparently banned (without warning) by Ethan Furman in what seems to me 
to be an act of retaliation for my protest against his overzealous and 
hostile tone-policing against a newcomer to the list, Reto Brunner:

https://mail.python.org/pipermail/python-list/2018-September/737020.html

(I did make one mistake in that post: I claimed that I hadn't said 
anything at the time on Ethan's last round of bans. That was incorrect, I 
actually did make an objection at the time.)

Since I'm still catching up on emails, I have just come across Ethan's 
notice to me (copied below). 

Notwithstanding Ethan's comment about having posted the suspension notice 
on the list, I see no sign that he actually did so. At the risk of 
further retaliation from the moderators, I am ignoring the ban in this 
instance for the purposes of transparency and openness. (I don't know if
this will show up on the mailing list or the newsgroup.)

Since I believe this ban is illegitimate, I intend to challenge it if 
possible. In the meantime, I may not reply on-list to any responses.



Subject: Fwd: Temporary Suspension
To: 
From: Ethan Furman 
Date: Tue, 11 Sep 2018 11:22:40 -0700
In-Reply-To: 

Steven, you've probably already seen this on Python List, but I forgot
to email it directly to you.  My apologies.

--
~Ethan~
Python List Moderator


 Forwarded Message 
Subject: Temporary Suspension
Date: Mon, 10 Sep 2018 07:09:04 -0700
From: Ethan Furman 
To: Python List Moderators 

As a list moderator, my goal for this list is to keep the list a useful
resource -- but what does "useful" mean?  To me it means a place that
python users can go to ask questions, get answers, offer advice, and all
without sarcasm, name-calling, and deliberate mis-understandings.
Conversations should stay mostly on-topic.

Due to hostile and inappropriate posts*, Steven D'Aprano is temporarily
suspended from Python List for a period of two months.

This suspension, along with past suspensions, is being taken only after
careful consideration and consultation with other Python moderators.

--
~Ethan~
Python List Moderator


* posts in question:

[1] https://mail.python.org/pipermail/python-list/2018-July/735735.html
[2] https://mail.python.org/pipermail/python-list/2018-September/737020.html





-- 
Steven D'Aprano

-- 
https://mail.python.org/mailman/listinfo/python-list


Trying to use threading.local()

2018-09-12 Thread Steven D'Aprano
I'm originally posted this on the Python-Ideas list, but this is probably 
more appropriate.


import time
from threading import Thread, local

def func():
pass

def attach(value):
func.__params__ = local()
func.__params__.value = value


def worker(i):
print("called from thread %s" % i)
attach(i)
assert func.__params__.value == i
time.sleep(3)
value = func.__params__.value
if value != i:
print("mismatch", i, value)

for i in range(5):
t = Thread(target=worker, args=(i,))
t.start()

print()





When I run that, each of the threads print their "called from ..."
message, the assertions all pass, then a couple of seconds later they
consistently all raise exceptions:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/threading.py", line 914, in
_bootstrap_inner
self.run()
  File "/usr/local/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
  File "", line 5, in worker
AttributeError: '_thread._local' object has no attribute 'value'



What am I doing wrong?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Any SML coders able to translate this to Python?

2018-09-07 Thread Steven D'Aprano
On Fri, 07 Sep 2018 15:10:10 +0100, Paul Moore wrote:

> On Fri, 7 Sep 2018 at 14:06, Steven D'Aprano
>  wrote:
[...]

>> However I have a follow up question. Why the "let" construct in the
>> first place? Is this just a matter of principle, "put everything in its
>> own scope as a matter of precautionary code hygiene"? Because I can't
>> see any advantage to the inner function:
> 
> My impression is that this is just functional programming "good style".
> As you say, it's not needed, it's just "keep things valid in the
> smallest range possible". Probably also related to the mathematical
> style of naming sub-expressions. Also, it's probably the case that in a
> (compiled) functional language like SML, the compiler can optimise this
> to avoid any actual inner function, leaving it as nothing more than a
> temporary name.

I guessed it would be something like that.


Thanks Paul, and especially Marko for going above and beyond the call of 
duty with his multiple translations into functional-style Python, and 
everyone else who answered.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: don't quite understand mailing list

2018-09-07 Thread Steven D'Aprano
On Fri, 07 Sep 2018 07:39:33 +0300, Marko Rauhamaa wrote:

> I'm with Ethan on this one.
> 
> There was nothing in the original posting that merited ridicule.

Then its a good thing there was nothing in the response that was ridicule.

(A mild rebuke for a mild social faux pas is not ridicule.)


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Object-oriented philosophy

2018-09-07 Thread Steven D'Aprano
On Fri, 07 Sep 2018 16:07:06 -0500, Michael F. Stemper wrote:

>>> In another case where I had a "bare exception", I was using it to see
>>> if something was defined and substitute a default value if it wasn't.
>>> Have I cleaned this up properly?
>>>
>>>   try
>>>     id = xmlmodel.attrib['name']
>>>   except KeyError:
>>>     id = "constant power"
>>>
>>> (Both changes appear to meet my intent, I'm more wondering about how
>>> pythonic they are.)

Yes, catch the direct exception you are expecting. That's perfectly 
Pythonic.


>> There's an alternative that's recommended when the key is often absent:
>> 
>>     id = xmlmodel.attrib.get('name', "constant power")
> 
> Oh, I like that much better than what I showed above, or how I "fixed"
> it cross-thread. Thanks!

However, if the key is nearly always present, and your code is 
performance-critical, calling the "get" method has the slight 
disadvantage that it will be slightly slower than using the try...except 
form you show above.

On the other hand, the get method has the big advantage that it's an 
expression that can be used in place, not a four-line compound statement.

If I don't care about squeezing out every last bit of performance from 
the interpreter, I use whatever looks good to me on the day. That will 
often be the "get" method.

But on the rare occasions I do care about performance, the basic rule of 
thumb I use is that if the key is likely to be missing more than about 
10% of the time, I use the "LBYL" idiom (either an explicit test using 
"if key in dict" or just call the dict.get method).

But don't stress about the choice. Chances are that any of the three 
options you tried (catch KeyError, check with "if" first, or using the 
get method) will be good enough.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Any SML coders able to translate this to Python?

2018-09-07 Thread Steven D'Aprano
On Thu, 06 Sep 2018 13:48:54 +0300, Marko Rauhamaa wrote:

> Chris Angelico :
>> The request was to translate this into Python, not to slavishly imitate
>> every possible semantic difference even if it won't actually affect
>> behaviour.
> 
> I trust Steven to be able to refactor the code into something more
> likable. His only tripping point was the meaning of the "let" construct.

Thanks for the vote of confidence :-) 

However I have a follow up question. Why the "let" construct in the first 
place? Is this just a matter of principle, "put everything in its own 
scope as a matter of precautionary code hygiene"? Because I can't see any 
advantage to the inner function:

>>>>def isqrt(n):
>>>>if n == 0:
>>>>return 0
>>>>else:
>>>>def f2398478957(r):
>>>>if n < (2*r+1)**2:
>>>>return 2*r
>>>>else:
>>>>return 2*r+1
>>>>return f2398478957(isqrt(n//4))

Sure, it ensures that r is in its own namespace. But why is that an 
advantage in a function so small? Perhaps its a SML functional-
programming thing.

Putting aside the observation that recursion may not be the best way to 
do this in Python, I don't think that the inner function is actually 
needed. We can just write:

def isqrt(n):
if n == 0:
return 0
else:
r = isqrt(n//4)
    if n < (2*r+1)**2:
return 2*r
else:
return 2*r+1


By the way I got this from this paper:

https://www.cs.uni-potsdam.de/ti/kreitz/PDF/03cucs-intsqrt.pdf





-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Object-oriented philosophy

2018-09-06 Thread Steven D'Aprano
On Thu, 06 Sep 2018 22:00:26 +0100, MRAB wrote:

> On 2018-09-06 21:24, Michael F. Stemper wrote:
[...]
>>try:
>>  P_0s = xmlmodel.findall( 'RatedPower' )[0].text 
>>  self.P_0 = float( P_0s )
>>except:
[...]

> A word of advice: don't use a "bare" except, i.e. one that doesn't
> specify what exception(s) it should catch.

Excellent advice!

More here:

https://realpython.com/the-most-diabolical-python-antipattern/





-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: don't quite understand mailing list

2018-09-06 Thread Steven D'Aprano
On Thu, 06 Sep 2018 13:06:22 -0700, Ethan Furman wrote:

> On 09/06/2018 12:42 PM, Reto Brunner wrote:
> 
>> What do you think the link, which is attached to every email you
>> receive from the list, is for? Listinfo sounds very promising, doesn't
>> it?
>>
>> And if you actually go to it you'll find: "To unsubscribe from
>> Python-list, get a password reminder, or change your subscription
>> options enter your subscription email address"
>>
>> So how about you try that?
> 
> Reto,  your response is inappropriate.  If you can't be kind and/or
> respectful, let someone else respond.

No it wasn't inappropriate, and your attack on Reto is uncalled for.

Reto's answer was kind and infinitely more respectful than your 
unnecessary criticism. As far as I can tell, this is Reto's first post 
here. After your hostile and unwelcoming response, I wouldn't be 
surprised if it was his last.

His answer was both helpful and an *amazingly* restrained and kind 
response to a stupid question[1] asked by somebody claiming to be an 
professional software engineer. It was not condescending or mean-
spirited, as you said in another post, nor was it snarky.

But even had the OP been a total beginner to computing, it was still a 
helpful response containing the information needed to solve their 
immediate problem (how to unsubscribe from the list) with just the 
*tiniest* (and appropriate) hint of reproach to encourage them to learn 
how to solve their own problems for themselves so that in future, they 
will be a better contributor to whatever discussion forums they might 
find themselves on.

Ethan, you are a great contributor on many of the Python mailing lists, 
but your tone-policing is inappropriate, and your CoC banning of Rick and 
Bart back in July was an excessive and uncalled for misuse of moderator 
power.

To my shame, I didn't say anything at the time, but I won't be 
intimidated any longer by fear of the CoC and accusations of incivility. 
I'm speaking up now because your reply to Reto is unwelcoming, unhelpful 
and disrespectful, and coming from a moderator who has been known to ban 
people, that makes it even more hostile.




[1] Yes, there are such things as stupid questions. If your doctor asked 
you "remind me again, which end of the needle goes into your arm?" what 
would you do?


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why emumerated list is empty on 2nd round of print?

2018-09-06 Thread Steven D'Aprano
On Thu, 06 Sep 2018 11:50:17 -0700, Viet Nguyen via Python-list wrote:

> If I do this "aList = enumerate(numList)", isn't it
> stored permanently in aList now?

Yes, but the question is "what is *it* that is stored? The answer is, it 
isn't a list, despite the name you choose. It is an enumerate iterator 
object, and iterator objects can only be iterated over once.

If you really, truly need a list, call the list constructor:

aList = list(enumerate(numList))

but that's generally a strange thing to do. It is more common to just 
call enumerate when you need it, not to hold on to the reference for 
later.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Any SML coders able to translate this to Python?

2018-09-05 Thread Steven D'Aprano
I have this snippet of SML code which I'm trying to translate to Python:

fun isqrt n = if n=0 then 0
 else let val r = isqrt (n/4)
  in
if n < (2*r+1)^2 then 2*r
else 2*r+1
  end


I've tried reading up on SML and can't make heads or tails of the
"let...in...end" construct.


The best I've come up with is this:

def isqrt(n):
if n == 0:
return 0
else:
r = isqrt(n/4)
if n < (2*r+1)**2:
return 2*r
else:
return 2*r+1

but I don't understand the let ... in part so I'm not sure if I'm doing it
right.


--
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing it everywhere."
 -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Any SML coders able to translate this to Python?

2018-09-04 Thread Steven D'Aprano
I have this snippet of SML code which I'm trying to translate to Python:

fun isqrt n = if n=0 then 0
 else let val r = isqrt (n/4)
  in
if n < (2*r+1)^2 then 2*r
else 2*r+1
  end


I've tried reading up on SML and can't make heads or tails of the 
"let...in...end" construct.


The best I've come up with is this:

def isqrt(n):
if n == 0:
return 0
else:
r = isqrt(n/4)
if n < (2*r+1)**2:
return 2*r
else:
return 2*r+1

but I don't understand the let ... in part so I'm not sure if I'm doing 
it right.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about floating point

2018-09-01 Thread Steven D'Aprano
On Sat, 01 Sep 2018 13:27:59 +0200, Frank Millman wrote:

>>>> from decimal import Decimal as D
>>>> f"{D('1.1')+D('2.2'):.60f}"
> '3.3000'
>>>> '{:.60f}'.format(D('1.1') + D('2.2'))
> '3.3000'
>>>> '%.60f' % (D('1.1') + D('2.2'))
> '3.2998223643160599749535322189331054687500'
>>>>
>>>>
> The first two format methods behave as expected. The old-style '%'
> operator does not.

The % operator casts the argument to a (binary) float. The other two 
don't need to, because they call Decimal's own format method.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about floating point

2018-08-31 Thread Steven D'Aprano
On Fri, 31 Aug 2018 18:45:16 +1200, Gregory Ewing wrote:

> Steven D'Aprano wrote:
>> The right way is to
>> set the rounding mode at the start of your application, and then let
>> the Decimal type round each calculation that needs rounding.
> 
> It's not clear what you mean by "rounding mode" here. If you mean
> whether it's up/down/even/whatever, then yes, you can probably set that
> as a default and leave it.

I mean the rounding mode :-)

https://docs.python.org/3/library/decimal.html#rounding-modes


> However, as far as I can see, Decimal doesn't provide a way of setting a
> default number of decimal places to which results are rounded. You can
> set a default *precision*, but that's not the same thing.

Indeed it is not. That's a very good point, and I had completely 
forgotten about it! Thank you.

The quantize method is intended for the use-case we are discussing, to 
round values to a fixed number of decimal places. The Decimal FAQs 
mention that:

https://docs.python.org/3/library/decimal.html#decimal-faq

I think this is a good use-case for subclassing Decimal as a Money class.

[...]
> I don't think this is a bad thing, because often you don't want to use
> the same number of places for everything, For example, people dealing
> with high-volume low-value goods often calculate with unit prices having
> more than 2 decimal places. In those kinds of situations, you need to
> know exactly what you're doing every step of the way.

As opposed to anyone else calculating with money?




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: __init__ patterns

2018-08-30 Thread Steven D'Aprano
On Thu, 30 Aug 2018 06:01:26 -0700, Tim wrote:

> I saw a thread on reddit/python where just about everyone said they
> never put code in their __init__ files.

Pfft. Reddit users. They're just as bad as Stackoverflow users. *wink*


> Here's a stackoverflow thread saying the same thing.
> https://stackoverflow.com/questions/1944569/how-do-i-write-good-correct-
package-init-py-files
> 
> That's new to me. I like to put functions in there that other modules
> within the module need. Thought that was good practice DRY and so forth.

Its fine to put code in __init__.py files.

If the expected interface is for the user to say:

result = package.spam()

then in the absence of some specific reason why spam needs to be in a 
submodule, why shouldn't it go into package/__init__.py ?

Of course it's okay for the definition of spam to be in a submodule, if 
necessary. But it shouldn't be mandatory.


> And I never do 'from whatever import *' Ever.
> 
> The reddit people said they put all their stuff into different modules
> and leave init empty.


Did any one of them state *why* they do this? What benefit is there to 
make this a hard rule?

Did anyone mention what the standard library does?

Check out the dbm, logging, html, http, collections, importlib, and 
curses packages (and probably others):

https://github.com/python/cpython/tree/3.7/Lib



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

2018-08-30 Thread Steven D'Aprano
On Thu, 30 Aug 2018 05:21:30 -0700, pjmclenon wrote:

> my question is ... at the moment i can only run it on windows cmd prompt
> with a multiple line entry as so::
> 
> python createIndex_tfidf.py stopWords.dat testCollection.dat
> testIndex.dat titleIndex.dat
> 
> and then to query and use the newly created index as so:
> 
> python queryIndex_tfidf.py stopWords.dat testIndex.dat titleIndex.dat
> 
> how can i run just one file at a time?

I don't understand the question. You are running one file at a time. 
First you run createIndex_tfidf.py, then you run queryIndex_tfidf.py

Maybe you mean to ask how to combine them both to one call of Python?

(1) Re-write the createIndex_tfidf.py and queryIndex_tfidf.py files to be 
in a single file.

(2) Or, create a third file which runs them both one after another.

That third file doesn't even need to be a Python script. It could be a 
shell script, it would look something like this:


python createIndex_tfidf.py stopWords.dat testCollection.dat 
testIndex.dat titleIndex.dat
python queryIndex_tfidf.py stopWords.dat testIndex.dat titleIndex.dat


and you would then call it from whatever command line shell you use.


> ..or actually link to a front end
> GUI ,so when an question or word or words is input to the input box..it
> can go to the actiona dnrun the above mentioned lines of code

You can't "link to a front end GUI", you have to write a GUI application 
which calls your scripts.

There are many choices: tkinter is provided in the Python standard 
library, but some people prefer wxPython, PyQT4, or other GUI toolkits.

https://duckduckgo.com/?q=python+gui+toolkits



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about floating point

2018-08-30 Thread Steven D'Aprano
On Thu, 30 Aug 2018 19:22:29 +1200, Gregory Ewing wrote:

> Steven D'Aprano wrote:
>> Why in the name of all that's holy would anyone want to manually round
>> each and every intermediate calculation when they could use the Decimal
>> module and have it do it automatically?
> 
> I agree that Decimal is the safest and probably easiest way to go, but
> saying that it "does the rounding automatically" is a bit misleading.
> 
> If you're adding up dollars and cents in Decimal, no rounding is needed
> in the first place, because it represents whole numbers of cents exactly
> and adds them exactly.

"Round to exact" is still rounding :-P

I did already say that addition and subtraction was exact in Decimal. (I 
also mentioned multiplication, but that's wrong.)


> If you're doing something that doesn't result in a whole number of cents
> (e.g. calculating a unit price from a total price and a quantity) you'll
> need to think about how you want it rounded, and should probably include
> an explicit rounding step, if only for the benefit of someone else
> reading the code.

If you're not using Banker's Rounding for financial calculations, you're 
probably up to no good *wink*

Of course with Decimal you always have to option to round certain 
calculations by hand, if you have some specific need to. But in general, 
that's just annoying and error-prone book-keeping. The right way is to 
set the rounding mode at the start of your application, and then let the 
Decimal type round each calculation that needs rounding.

The whole point of Decimal, the reason it was invented, was to do this 
sort of thing. We have here a brilliant hammer specially designed for 
banging in just this sort of nail, and you're saying "Well, sure, but you 
probably want to bang it in with your elbow, if only for the benefit of 
onlookers..."

:-)


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about floating point

2018-08-29 Thread Steven D'Aprano
On Wed, 29 Aug 2018 11:31:29 +1200, Gregory Ewing wrote:

> Frank Millman wrote:
>> I have been trying to explain why
>> they should use the decimal module. They have had a counter-argument
>> from someone else who says they should just use the rounding technique
>> in my third example above.
> 
> It's possible to get away with this by judicious use of rounding.
> There's a business software package I work with that does this -- every
> number is rounded to a defined number of decimal places before being
> stored in its database, so the small inaccuracies resulting from inexact
> representations don't get a chance to accumulate.

This software package doesn't actually use the *10/10 trick, does it?

As an answer to the question, "Should I use this clever *10/10 trick?" 
I'm not sure it's relevant to say "Yep, sure, this package does something 
completely different and it works fine!" *wink*

> If you're going to do this, I would NOT recommend using the rounding
> technique in your example -- it seems to itself be relying on accidents
> of the arithmetic. Use the round() function:

Or better still, DON'T manually use the round function, let the 
interpreter do the rounding for you by using Decimal. That's what its for.

Why in the name of all that's holy would anyone want to manually round 
each and every intermediate calculation when they could use the Decimal 
module and have it do it automatically?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about floating point

2018-08-29 Thread Steven D'Aprano
On Tue, 28 Aug 2018 16:47:25 +0200, Frank Millman wrote:

> The reason for my query is this. I am assisting someone with an
> application involving monetary values. I have been trying to explain why
> they should use the decimal module. They have had a counter-argument
> from someone else who says they should just use the rounding technique
> in my third example above.

*head-desk*

And this is why we can't have nice things.

Presumably this money application doesn't just work on hard-coded literal 
values, right? So this "programmer" your friend is listening to prefers 
to write this:

money = (a + b)*10/10

instead of:

money = a + b

presumably because programming isn't hard enough without superstitious 
ritual that doesn't actually solve the problem.

In the second case, you have (potentially) *one* rounding error, due to 
the addition.

In the first case, you get the *exact same rounding error* when you do 
(a+b). Then you get a second rounding error by multiplying by ten, and a 
third rounding error when you divide by ten.

Now its true that sometimes those rounding errors will cancel. You found 
an example:

py> (1.1 + 2.2)*10/10 == 3.3
True

but it took me four attempts to find a counter-example, where the errors 
don't cancel:

py> (49675.23 + 10492.95)*10/10 == 60168.18
False

To prove it isn't a fluke:

py> (731984.84 + 173.32)*10/10 == 732158.16
False

py> (170734.84 - 173.39)*10/10 == 170561.45
False


Given that it has three possible three rounding errors instead of one, it 
is even possible that this "clever trick" could end up being *worse* than 
just doing a single addition. But my care factor isn't high enough to 
track down an example (if one exists).

For nearly all applications involving money, one correct solution is to 
use either integer numbers of cents (or whatever the smallest currency 
you ever care about happens to be). Then all additions, subtractions and 
multiplications will be exact, without fail, and you only need to worry 
about rounding divisions. You can minimize (but not eliminate) that by 
calculating in tenths of a cent, which effectively gives you a guard 
digit.

Or, just use decimal, which is *designed* for monetary applications 
(among other things). You decide on how many decimal places to keep (say, 
two, or three if you want a guard digit), a rounding mode (Banker's 
Rounding is recommended for financial applications), and just do your 
calculations with no "clever tricks".

Add two numbers, then add tax:

money = (a+b)*(1+t/100)

compared to the "clever trick":

money = (a+b)*10/10 * (1 + t)*10/10


Which would you rather do?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Generating a specific list of intsgers

2018-08-24 Thread Steven D'Aprano
On Fri, 24 Aug 2018 14:40:00 -0700, tomusatov wrote:

> I am looking for a program able to output a set of integers meeting the
> following requirement:
> 
> a(n) is the minimum k > 0 such that n*2^k - 3 is prime, or 0 if no such
> k exists
> 
> Could anyone get me started? (I am an amateur)


That's more a maths question than a programming question. Find out how to 
tackle it mathematically, and then we can code it.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-21 Thread Steven D'Aprano
On Wed, 22 Aug 2018 03:58:29 +1000, Chris Angelico wrote:

> On Wed, Aug 22, 2018 at 2:38 AM, Marko Rauhamaa 
> wrote:
>> Gregory Ewing :
>>
>>> Marko Rauhamaa wrote:
>>>> Lexically, there is special access:
>>>>
>>>>class C:
>>>>def __init__(self, some, arg):
>>>>c = self
>>>>class D:
>>>>def method(self):
>>>>access(c)
>>>>access(some)
>>>>access(arg)
>>>
>>> [...]
>>>
>>> you can do that without creating a new class every time you want an
>>> instance. You just have to be *slightly* more explicit about the link
>>> between the inner and outer instances.
>>
>> By "*slightly* more explicit," do you mean more syntactic clutter?
>>
>>
> No, he actually means "explicit" in the normal English sense. You're
> trying to use it in the python-ideas sense of "code that I like", and
> since you don't like it, you want to call it "implicit" instead, but it
> obviously isn't that, so you call it "syntactic clutter".

That's an incredible insight into Marko's internal mental state you have 
there. And you get that all from the words "syntactic clutter"? I thought 
he just meant that it was cluttered code. How naive was that?

*wink*



> But this is actually a case of explicit vs implicit.

To be honest, I don't even understand Greg's comment. With no inner 
class, what is this "inner instance" he refers to here?

"you can do that without creating a new class every time you 
want an instance. You just have to be *slightly* more explicit
about the link between the inner and outer instances."


Marko wants to use closures. So how do you close over per-instance 
variables if you create the closures before the instances are created?

If we only needed *one* function, there would be no problem:

class Outer:
def __init__(self, some, arg):
c = self
def closure():
access(c)
access(some)
access(arg)
# then do something useful with closure


But as soon as you have a lot of them, its natural to want to wrap them 
up in a namespace, and the only solution we have for that is to use a 
class.

Its a truism that anything you can do with a closure, you can do with a 
class (or vise versa) so I dare say there are alternative designs which 
avoids closures altogether but we don't know the full requirements here 
and its hard to judge from the outside on why Marko picked the design he 
has and whether its a good idea. It could be a case of "ooh, closures are 
a shiny new hammer, this problem must be a nail!" but let's give him the 
benefit of the doubt and assume he has good reasons, not just reasons.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Partitioning a list

2018-08-21 Thread Steven D'Aprano
On Tue, 21 Aug 2018 14:36:30 -0700, Poul Riis wrote:

> I would like to list all possible ways to put N students in groups of k
> students (suppose that k divides N) with the restriction that no two
> students should ever meet each other in more than one group. I think
> this is a classical problem 

If its a classical problem, there should be many solutions written for 
other languages. Take one of them and port it to Python. (We can help 
with the Python part if needed.)

I've never come across it before. I think the restriction makes it a HARD 
problem to solve efficiently, but I've spent literally less than two 
minutes thinking about it so I could be wrong.

With no additional restriction it sounds like a classical permutations or 
combinations problem. Check out the combinatoric iterators functions in 
the itertools module:

https://docs.python.org/3/library/itertools.html




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-21 Thread Steven D'Aprano
On Tue, 21 Aug 2018 00:36:56 +, Dan Sommers wrote:
[...]

>>>> (Not that I do this using "inner classes", but I do often want to use
>>>> a class as a container for functions, without caring about "self" or
>>>> wrapping everything in staticmethod.)
>>>
>>> Isn't that what modules are for?  (I suspect that I'm missing
>>> something, because I also suspect that you knew/know that.)
>> 
>> What's the syntax for creating an inner module...?
> 
> Why does it have to be an inner anything?  An ordinary, top-level,
> "outer" module is a perfectly good "container for functions, without
> caring about "self.""

And what if you want to subdivide those functions (or other objects) into 
categories that are finer than the module, without introducing a package 
structure?

We can design the structure of our program into *outward* hierarchies, by 
adding packages with subpackages and sub-subpackages:

import spam.eggs.cheese.tomato.aardvark

So using the file system and packages, we can logically nest modules 
inside modules inside modules 'til the cows come home. But that's a 
fairly heavyweight solution, in the sense that it requires separate 
directory for each level of the hierarchy.

Sometimes a package is too much. I want a single module file, but still 
want to pull out a collection of related functions and other objects and 
put them in their own namespace, but without creating a new module.

The Zen says:

Namespaces are one honking great idea -- let's do more of those!

but Python's namespaces are relatively impoverished. We have packages, 
modules, classes and instances, and that's it.

Classes and instances come with inheritance, self etc which is great if 
you want a class, but if you just want a simple module-like namespace 
without the extra file, classes are a pretty poor alternative. But 
they're all we've got.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-20 Thread Steven D'Aprano
On Mon, 20 Aug 2018 22:55:26 +0300, Marko Rauhamaa wrote:

> Dan Sommers :
> 
>> On Mon, 20 Aug 2018 14:39:38 +0000, Steven D'Aprano wrote:
>>> I have often wished Python had proper namespaces, so I didn't have to
>>> abuse classes as containers in this way :-(
>>> 
>>> (Not that I do this using "inner classes", but I do often want to use
>>> a class as a container for functions, without caring about "self" or
>>> wrapping everything in staticmethod.)
>>
>> Isn't that what modules are for?  (I suspect that I'm missing
>> something, because I also suspect that you knew/know that.)
> 
> What's the syntax for creating an inner module...?

from types import ModuleType
m = ModuleType('m')
m.one = 1
m.a = 'a'
m.b = lambda x: x + one


except that not only doesn't it look nice, but it doesn't work because 
the m.b function doesn't pick up the m.one variable, but a global 
variable instead.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Idle

2018-08-20 Thread Steven D'Aprano
On Mon, 20 Aug 2018 14:46:32 +0400, NAB NAJEEB wrote:

> Hi am a beginner can u tell me where can I write my codes I already
> tried pycharm and atom.. both are not worked successfully always shows
> error...pls guide me...

What errors do they show?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-20 Thread Steven D'Aprano
On Mon, 20 Aug 2018 11:40:16 +0300, Marko Rauhamaa wrote:

>class C:
>def __init__(self, some, arg):
>c = self
>class D:
>def method(self):
>access(c)
>access(some)
>access(arg)
> 
> IOW, inner class D is a container for a group of interlinked closure
> functions.

If a class' methods don't use self, it probably shouldn't be a class.

I have often wished Python had proper namespaces, so I didn't have to 
abuse classes as containers in this way :-(

(Not that I do this using "inner classes", but I do often want to use a 
class as a container for functions, without caring about "self" or 
wrapping everything in staticmethod.)



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-20 Thread Steven D'Aprano
On Mon, 20 Aug 2018 15:58:57 +0300, Marko Rauhamaa wrote:

[...]
>> The point is that creating a class object every time you want a closure
>> is pointlessly wasteful. There is *no benefit whatsoever* in doing
>> that. If you think there is, then it's probably because you're trying
>> to write Java programs in Python.
> 
> The benefit, as in using closures in general, is in the idiom.

Yes, but you get closures by nesting functions, not by nesting classes.

We don't *inherently* have to duplicate Java style nested classes in 
order to get the container of closures you mentioned earlier. We don't 
have to duplicate the idiom exactly, if it doesn't match the execution 
model of Python.

On the other hand, there's no need to optimize this if it isn't critical 
code in your application. As I have often said, the question isn't 
whether Python is fast or not, but whether it is *fast enough*.

If you are aware of the potential pitfalls, and the code is fast enough, 
and refactoring it to something faster and less pitfall-y is too 
difficult (or not a priority), then that's fine too.



>>> But now I'm thinking the original Java approach (anonymous inner
>>> classes) is probably the most versatile of them all. A single function
>>> rarely captures behavior. That's the job of an object with its
>>> multiple methods. In in order to create an ad-hoc object in Python,
>>> you will need an ad-hoc class.
>>
>> An important difference between Python and Java here is that in Python
>> the class statement is an *executable* statement, whereas in Java it's
>> just a declaration. So putting a class statement inside a Python
>> function incurs a large runtime overhead that you don't get with a Java
>> inner class.
> 
> The same is true for inner def statements.

Indeed. Inner def statements are not very useful (in my opinion, although 
I believe Tim Peters disagrees) unless they are used as closures. The 
biggest problem with the idea of using inner functions in the Pascal 
sense is that you can't test them since they aren't visible from the 
outside.



> I don't see how creating a class would be fundamentally slower to
> execute than, say, adding two integers.

Well, fundamentally adding two integers could be as quick as a single 
machine instruction to add two fixed-width ints. CPUs are pretty much 
optimized to do that *really quickly*.

Creating a new class requires allocating a chunk of memory (about 500 
bytes in Python), calling the appropriate metaclass, setting a bunch of 
fields, possibly even executing arbitrary metaclass methods. Its not as 
expensive as (say) listing the first trillion digits of pi but it surely 
is going to be more costly than adding two ints.


[...]
> Anyway, in practice on my laptop it takes 7 µs to execute a class
> statement, which is clearly worse than executing a def statement (0.1
> µs) or integer addition (0.05 µs). However, 7 microseconds is the least
> of my programming concerns.

And fair enough... premature optimization and all that.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Writing bytes to stdout reverses the bytes

2018-08-19 Thread Steven D'Aprano
On Mon, 20 Aug 2018 00:31:35 +, Steven D'Aprano wrote:

> When I write bytes to stdout, why are they reversed?

Answer: they aren't, use hexdump -C.

Thanks to all replies!


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to multiply dictionary values with other values based on the dictionary's key?

2018-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2018 05:29:46 -0700, giannis.dafnomilis wrote:

> With your help I have arrived at this point: I have the dictionary
> varsdict (size 5) as below
>
> KeyTypeSize  Value
> FEq_(0,_0,_0,_0)   float11.0
> FEq_(0,_0,_1,_1)   float11.0 
> FEq_(0,_0,_2,_2)   float11.0 
> FEq_(0,_0,_3,_0)   float11.0 
> FEq_(0,_0,_4,_1)   float11.0

That's not a Python dict. It looks like some sort of table structure. How 
do you get this? (What menu command do you run, what buttons to you 
click, etc?) I'm guessing you are using an IDE ("Integrated Development 
Environment") like Anaconda or similar. Is that right?

Python dicts print something like this:

{'FEq_(0,_0,_4,_1)': , 'FEq_(0,_0,_3,_0)': }

If you run 

print(varsdict)

what does it show?






(I have limited time to respond at the moment, so apologies for the brief 
answers. Hopefully someone else will step in with some help too.)


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Writing bytes to stdout reverses the bytes

2018-08-19 Thread Steven D'Aprano
When I write bytes to stdout, why are they reversed?

[steve@ando ~]$ python2.7 -c "print('\xfd\x84\x04\x08')" | hexdump
000 84fd 0804 000a
005

[steve@ando ~]$ python3.5 -c "import sys; sys.stdout.buffer.write(b'\xfd
\x84\x04\x08\n')" | hexdump
000 84fd 0804 000a
005




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to multiply dictionary values with other values based on the dictionary's key?

2018-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2018 03:35:24 -0700, giannis.dafnomilis wrote:

> On Sunday, August 19, 2018 at 3:53:39 AM UTC+2, Steven D'Aprano wrote:
[...]

>> If you know absolutely for sure that the key format is ALWAYS going to
>> be 'FEq_()' then you can extract the fields using slicing, like
>> this:
>> 
>>   key = 'FEq_(0,_0,_2,_2)'
>>   fields = key[5, -1]  # cut from char 5 to 1 back from the end
[...]
>> - delete any underscores
>> - split it on commas
>> - convert each field to int
>> - convert the list of fields to a tuple
>> 
>>   fields = fields.replace('_', '')
>>   fields = string.split(',)
>>   fields = tuple([int(x) for x in fields])
>> 
>> 
>> and then you can use that tuple as the key for A.
> 
> When I try to this, I get the message 'fields = key[5, -1]. TypeError:
> string indices must be integers'.

Ouch! That was my fault, sorry, it was a typo. You need a colon, not a 
comma. Sorry about that!

Try this instead:

key = 'FEq_(0,_0,_2,_2)'
fields = key[5:-1]
fields = fields.replace('_', '')
fields = fields.split(',')
fields = tuple([int(x) for x in fields])
print(fields)


which this time I have tested.


(More comments later, time permitting.)


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to multiply dictionary values with other values based on the dictionary's key?

2018-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2018 03:15:32 -0700, giannis.dafnomilis wrote:

> Thank you MRAB!
> 
> Now I can get the corresponding dictionary value A[i,j,k,l] for each key
> in the varsdict dictionary.
> 
> However how would I go about multiplying the value of each
> FEq_(i,_j,_k,_l) key with the A[i,j,k,l] one? Do you have any insight in
> that?

Do you want to modify the varsdict values in place?

varsdict['Feq_(i,_j,_k,_l)'] *= A[i,j,k,l]

which is a short-cut for this slightly longer version:

temp = varsdict['Feq_(i,_j,_k,_l)'] * A[i,j,k,l]
varsdict['Feq_(i,_j,_k,_l)'] = temp



If you want to leave the original in place and do something else with the 
result:

result = varsdict['Feq_(i,_j,_k,_l)'] * A[i,j,k,l]
print(result)




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2018 11:43:44 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano :
> 
>> On Sun, 19 Aug 2018 00:11:30 +0300, Marko Rauhamaa wrote:
>>
>>> In Python programming, I mostly run into closures through inner
>>> classes (as in Java).
>>
>> Inner classes aren't closures.
> 
> At least some of the methods of inner classes are closures (or there
> would be no point to an inner class).

(1) Ironically, the only times I've used an inner class, its methods were 
not closures. So yes, there are sometimes uses for inner classes that 
don't include closures. There's an example in the argparse module in the 
standard library, and it too has no closures.

(2) Whether or not the methods of an inner class are closures depends on 
the methods, not the fact that it is an inner class. There are no 
closures here:

class Outer:
class Inner:
   ...

no matter what methods Inner has. Nor is this a closure:

class Outer:
def method(self):
class Inner:
def spam(self):
return self.eggs
return Inner


since the spam method doesn't close over any of the variables in method.

You made a vague comment about inner classes being equivalent to closures 
in some unknown fashion, but inner classes are not themselves closures, 
and the methods of inner classes are not necessarily closures.


>> Its also quite expensive to be populating your application with lots of
>> classes used only once each, which is a common pitfall when using inner
>> classes. Memory is cheap, but it's not so cheap that we ought to just
>> profligately waste it needlessly.
> 
> That is a completely separate question.

It wasn't a question, it was an observation.


> There's is no a-priori reason for inner classes to be wasteful;

Not in languages where classes are declared statically and built at 
compile-time, no.

But in a language like Python where classes are executable statements 
that are built at run time, like constructing any other mutable object, 
it is very easy to use them badly and waste memory.

This doesn't look harmful:

def func(x):
class Record:
def __init__(self, a):
self.a = a
return Record(x)

but it is.

You might not like that design, but it is part of Python's execution 
model and whether you like it or not you have to deal with the 
consequences :-)



> they
> have been part and parcel of Java programming from its early days, and
> Java is widely used for high-performance applications.

https://dirtsimple.org/2004/12/python-is-not-java.html



> CPython does use memory quite liberally. I don't mind that as
> expressivity beats performance in 99% of programming tasks.

Fair enough, but in the example I showed above, the practical effect is 
to increase the de facto size of the objects returned by func() twenty 
times. And fragment memory as well. In a long-lived application where you 
are calling func() a lot, and saving the objects, it all adds up.


>>> populating an object with fields (methods) in a loop is very rarely a
>>> good idea.
>>
>> Of course it is *rarely* a good idea
> 
> So no dispute then.

Isn't there? Then why are you disagreeing with me about the exceptional 
cases where it *is* a good idea?




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to multiply dictionary values with other values based on the dictionary's key?

2018-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2018 16:16:54 -0700, giannis.dafnomilis wrote:

> I have the results of an optimization run in the form found in the
> following pic: https://i.stack.imgur.com/pIA7i.jpg.

Unless you edit your code with Photoshop, why do you think a JPEG is a 
good idea?

That discriminates against the blind and visually impaired, who can use 
screen-readers with text but can't easily read text inside images, and 
those who have access to email but not imgur.

For the record, here's the table:

Key TypeSizeValue
FEq_(0,_0,_0,_0)float   1   1.0
FEq_(0,_0,_1,_1)float   1   1.0
FEq_(0,_0,_2,_2)float   1   1.0
FEq_(0,_0,_3,_0)float   1   1.0
FEq_(0,_0,_4,_1)float   1   1.0


It took me about 30 seconds to copy out by hand from the image. But what 
it means is a complete mystery. Optimization of what? What you show isn't 
either Python code or a Python object (like a dict or list) so it isn't 
any value to us.


> How can I multiply the dictionary values of the keys FEq_(i,_j,_k,_l)
> with preexisting values of the form A[i,j,k,l]?
> 
> For example I want the value of key 'FEq_(0,_0,_2,_2)' multiplied with
> A[0,0,2,2], the value of key 'FEq_(0,_0,_4,_1)' multiplied with
> A[0,0,4,1] etc. for all the keys present in my specific dictionary.


Sounds like you have to parse the key for the number fields:

- extract out the part between the parentheses '0,_0,_2,_2'

If you know absolutely for sure that the key format is ALWAYS going to be 
'FEq_()' then you can extract the fields using slicing, like this:

  key = 'FEq_(0,_0,_2,_2)'
  fields = key[5, -1]  # cut from char 5 to 1 back from the end

If you're concerned about that "char 5" part, it isn't an error. Python 
starts counting from 0, not 1, so char 1 is "E" not "F".



- delete any underscores
- split it on commas
- convert each field to int
- convert the list of fields to a tuple

  fields = fields.replace('_', '')
  fields = string.split(',)
  fields = tuple([int(x) for x in fields])


and then you can use that tuple as the key for A.

It might be easier and/or faster to convert A to use string keys 
"FEq_(0,_0,_2,_2)" instead. Or, depending on the size of A, simply make a 
copy:

B = {}
for (key, value) in A.items():
B['FEq(%d,_%d,_%d,_%d)' % key] = value


and then do your look ups in B rather than A.


> I have been trying to correspondingly multiply the dictionary values in
> the form of
> varsdict["FEq_({0},_{1},_{2},_{3})".format(i,j,k,l)]
> 
> but this is not working as the indexes do not iterate consequently over
> all their initial range values, they are the results of the optimization
> so some elements are missing.

I don't see why the dictionary lookup won't work just because the indexes 
aren't consistent. When you look up 

varsdict['FEq_(0,_0,_2,_2)']

it has no way of knowing whether or not 'FEq_(0,_0,_1,_2)' previously 
existed. I think you need to explain more of what you are doing rather 
than just dropping hints.

*Ah, the penny drops* ...


Are you trying to generate the keys by using nested loops?

for i in range(1000):  # up to some maximum value
for j in range(1000):
for k in range(1000):
for l in range(1000):
key = "FEq_({0},_{1},_{2},_{3})".format(i,j,k,l)
value = varsdict[key]  # this fails


That's going to be spectacularly wasteful if the majority of keys don't 
exist. Rather, you should just iterate over the ones that *do* exist:

for key in varsdict:
...



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-18 Thread Steven D'Aprano
On Sun, 19 Aug 2018 00:11:30 +0300, Marko Rauhamaa wrote:

> In Python programming, I mostly run into closures through inner classes
> (as in Java).

Inner classes aren't closures.

Its also quite expensive to be populating your application with lots of 
classes used only once each, which is a common pitfall when using inner 
classes. Memory is cheap, but it's not so cheap that we ought to just 
profligately waste it needlessly.


> populating an object with fields (methods) in a loop is very 
> rarely a good idea.

Of course it is *rarely* a good idea, because it is rare for the fields 
to be either identical (except for the name) or algebraically derived 
from the loop counter. Using a dict in place of an object, it's hard to 
see any elegant way to move this into a loop:

{'a': 10, 'B': -2, 'c': 97, 'd': None, 'h': 'surprise!', 'm': []}


and so we should not. Any such loop would surely be complex, complicated, 
obscure, even obfuscated compared to writing out the dict/object 
assignments manually.

But in context, we're not discussing the millions of cases were the 
methods/fields are naturally written out manually.

So give me credit for not being a total idiot. Not once in this thread 
have I suggested that we ought to run through all our projects, changing 
every class and putting all methods inside factories. It goes without 
saying that under usual, common circumstances we write out our methods 
manually. I was speaking about one very specific case:


* You have a fair number of identical methods in a single class.


Our choices are, (1):

- write a large block of mindless boilerplate;

- even worse, have that same boilerplate but split it up,
  scattering the individual methods all around the class;

- either way, it is repetitious and error-prone, with
  obvious reliability and maintenance problems:

def foo(self):
return NotImplemented

def bar(self):
return NotImplemented

def baz(self):
return NonImplemented



or, (2):

- automate the repetitious code by moving the method 
  definitions into a loop.


Obviously there is some (small) complexity cost to automating it. I 
didn't specify what a fair number of methods would be (my example showed 
four, but that was just an illustration, not real code). In practice I 
wouldn't even consider this for three methods. Six or eight seems like a 
reasonable cut-of point for me, but it depends on the specifics of the 
code and who I was writing it for.

(Note that this makes me much more conservative than the usual advice 
given by system admins, when you need to do the same thing for the third 
time, write a script to automate it.)



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2018 00:33:26 +0300, Marko Rauhamaa wrote:

> Chris Angelico :
>> Programming is heavily about avoiding duplicated work.
> 
> That is one aspect, but overcondensing and overabstracting programming
> logic usually makes code less obvious to its maintainer. 

That may very well be true, but we're not talking about those evils here. 
We're talking about a simple factory technique for creating a number of 
identical objects in a loop.


[...]
> I would guess such techniques could come in handy in some framework
> development but virtually never in ordinary application development. In
> a word, steer clear of metaprogramming.

Depending on your definition of metaprogramming, either:

(1) this either isn't metaprogramming at all, merely programming and no 
more scary than populating a dict at runtime; or


(2) if you mean what you say, that means no decorators, no closures, no 
introspection ("reflection" in Java terms), no metaclasses (other than 
type), no use of descriptors (other than the built-in ones), no template-
based programming, no source-code generators. No namedtuples, Enums, or 
data-classes.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-17 Thread Steven D'Aprano
On Fri, 17 Aug 2018 15:19:05 +0200, Peter Otten wrote:

> You usually do not want many identical (or very similar) methods because
> invoking the right one is then errorprone, too, and you end up with an
> interface that is hard to maintain. At some point you may need to
> introduce subtle changes to one out of ten methods, 

These hypotheticals are fairly tedious. "At some point you might..." 
yeah, you might, but you probably won't, and in the meantime, YAGNI.

And if you do, it is easy to do: pull the special method out of the loop. 
Or add code to modify it after the loop.

Or make the changes in a subclass.

We do these things *all the time* for data objects, creating them in a 
loop then modifying those that need modifying. There is *no difference* 
here: methods can be treated as data objects too.


> and later someone
> else may overlook that specific angle in the documentation...

You say that as if people never failed to read the documentation about 
"regular" methods that are made by hand in the conventional way.



> If you have many similar methods you should spend your time on reducing
> their number rather than to find shortcuts to automate their creation.

The assumption here is that the basic design is sound. Why do you assume 
it isn't?

According to the OP Frank, this design has been in production for many 
years and works well. While I personally have some reservations that 
using subclasses is the best solution, I'm willing to give him the 
benefit of the doubt rather than insult his competence by assuming it is 
a broken design without ever seeing the code.



> Programming is not only about avoiding duplication, it is also about
> stating your intents clearly.

Indeed. And what states the intention "These methods are identical except 
in their name" more strongly than creating them in a loop?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-17 Thread Steven D'Aprano
On Fri, 17 Aug 2018 11:49:01 +, Jon Ribbens wrote:

> On 2018-08-17, Steven D'Aprano 
> wrote:
>> On the other hand, your objection to the following three idioms is as
>> good an example of the Blurb Paradox as I've ever seen.
> 
> Do you mean the Blub Paradox? If so, you're misunderstanding or at least
> misapplying it.

Yes, that was a simple typo, and no, I'm not misunderstanding it.

You're looking up the ladder to a more powerful technique available in 
Python (methods as first-class values capable of being manipulated like 
any other object) and dismissing it in favour of mindless boilerplate 
containing duplicated code, and requiring oodles of copy-and-paste 
programming to maintain.

Graham used the "Blub Paradox" to describe programmers' failure to 
understand more powerful features available in languages they didn't use, 
but there's no reason why this failure applies only to comparisons 
between languages. It also applies to arguments about idioms within a 
single language. That's the Blub Paradox too, even though only a single 
language is involved.


>>>   * code running directly under the class definition 
>>>   * creating a method then changing its name with foo.__name__ 
>>>   * poking things into to the class namespace with locals()
>>
>> Each of these are standard Python techniques, utterly unexceptional.
> 
> I guess we'll have to agree to disagree there.


>> "Code running directly under the class" describes every use of the
>> class keyword (except those with an empty body). If you write:
>>
>> class Spam:
>> x = 1
>>
>> you are running code under the class. This is not just a pedantic
>> technicality,
> 
> Yes, it absolutely is, in this context. Having code other than
> assignments and function definitions under the class statement is
> extremely rare.

Its rare because it isn't needed often, not because it is broken or 
dangerous or illegal or fattening.



[...]
>> You might be thinking of the warning in the docs:
>>
>> "Dynamically adding abstract methods to a class, [...] [is] not
>> supported."
>>
>> but that is talking about the case where you add the method to the
>> class after the class is created, from the outside:
> 
> Yes, I was referring to that. You may well be right about what it means
> to say, but it's not what it actually says.

*shrug*

It was obvious to me that it wasn't talking about methods dynamically 
inserted inside the class body since ALL methods are dynamically inserted 
inside the class body. If that was what it meant, it would be saying that 
abstractmethod never works. Clearly that's absurd, since it does work. So 
why interpret it as saying something absurd instead of using a bit of 
common sense and knowledge of how Python words to interpret it correctly?

Inside a class (or at the global scope) there is no meaningful difference 
between these:

spam = eggs

locals()['spam'] = eggs



>>> (Not to mention your code means the methods cannot have meaningful
>>> docstrings.)
>>
>> Of course they can, provided they're all identical, give or take some
>> simple string substitutions.
> 
> Hence "meaningful".

They can still be meaningful even if identical.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-16 Thread Steven D'Aprano
On Fri, 17 Aug 2018 08:14:02 +0200, Frank Millman wrote:

> How would you extend it without a long chain of
> if isinstance(v, str):
>   [perform checks for str]
> elif isinstance(v, int)
>   [perform checks for int]
> etc
> etc
> 
> I find that using a separate method per subclass does exactly what I
> want, and that part of my project has been working stably for some time.

You might consider using single dispatch instead:

https://docs.python.org/3/library/functools.html#functools.singledispatch





-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-16 Thread Steven D'Aprano
e] = inner
... del inner
...
py> class Bar(Foo):
... pass
...
py> Bar()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: Can't instantiate abstract class Bar with abstract
methods eggs, spam



> (Not to mention your code means the methods cannot have meaningful
> docstrings.)

Of course they can, provided they're all identical, give or take some 
simple string substitutions.

The idea here is to remove boilerplate code, not to have to do large 
amounts of significant computation for each placeholder. If it required 
more than a few template substitutions:

inner.__doc__ %= (a, b)

at that point I'd bite the bullet and prefer the pain of swaths of dumb 
boilerplate. But simple transformations are no big deal. You create the 
method, set the docstring, change its name, set it to abstract, and make 
it an attribute of the class.

Who can't reason about four simple steps like that? Cross out the method-
specific details, using a "widget" instead:

class X:
for name in ('red', 'green', 'blue', 'yellow'):
   widget = Widget(1, 2, 3)
   widget.set_state('not ready')
   widget.serial_number = get_serial_number()
   locals()[name] = widget


Anyone who couldn't reason about that probably shouldn't be calling 
themselves a programmer. Making the widgets methods instead doesn't 
change that.



> I would refuse a pull request containing code such as the above, unless
> the number of methods being dynamically created was much larger than 4,
> in which case I would refuse it because the design of a class requiring
> huge numbers of dynamically created methods is almost certainly
> fundamentally broken.

If a class has "huge" (what, a hundred? a thousand?) methods, regardless 
of whether they are abstract or concrete or generated inside a factory or 
written out by hand, the class probably does too much.

But a class with 30 methods is fine (strings have at least 50), and if 
six or ten of them are generated by a factory, what's the big deal?

Writing out methods with identical bodies is brainless boilerplate. I 
don't clog up my code with brainless boilerplate unless there is a really 
good reason for it.

There always a trade-off to be made, choosing between the (slight) extra 
complexity of a factory solution versus the tedious, error-prone volume 
of boilerplate, so in practice, I probably wouldn't switch to a factory 
solution for merely four methods with empty bodies. But I certainly would 
for eight.

When making this trade-off, "my developers don't understand Python's 
execution model or its dynamic features" is not a good reason to stick to 
large amounts of mindless code. That's a good reason to send the 
developer in question to a good Python course to update their skills.

(Of course if you can't do this for political or budget reasons, I 
sympathise.)




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Program to output a subset of the composite numbers

2018-08-15 Thread Steven D'Aprano
On Wed, 15 Aug 2018 05:34:06 -0700, tomusatov wrote:

> I am not terribly familiar with Python, but am currently authoring an
> integer sequence for www.oeis.org and was wondering if anyone in the
> community could help me with authoring a Python program that outputs,
> "Composite numbers that are one less than a composite number."


Do you have a function to test for primality? For now, I'll assume you do.


def is_prime(n):
# Returns True if n is prime, False otherwise.
# implementation left as an exercise


# 0 (and 1) are neither prime nor composite; skip them.
# 2 and 3 are prime; start at the first composite, 4
i = 4
for j in range(5, 1001):
if not is_prime(j):
print(i)
i = j


The above will stop at 999. To go forever, use this instead:



from itertools import count
i = 4
for j in count(5):
if not is_prime(j):
print(i)
i = j



Alternatively, if you have a function which efficiently returns primes 
one at a time, you can do this:


n = 4  # start at the first composite
for p in primes(5):  # primes starting at 5
print(list(range(n, p-1))
n = p + 1



This ought to print out lists of composites, starting with:

[]
[]
[8, 9]
[]
[14, 15]


etc. Take care though: I have not tested this code.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-15 Thread Steven D'Aprano
On Tue, 14 Aug 2018 15:18:13 +, Jon Ribbens wrote:

> On 2018-08-14, Steven D'Aprano 
> wrote:
>> If there really are a lot of such missing methods, I'd consider writing
>> something like this:
>>
>> class A:
>> def __init__(self, ...):
>> ...
>>
>> # === process abstract methods en masse === 
>> for name in "method_a method_b method_c method_d".split():
>> @abstractmethod
>> def inner(self):
>> raise NotImplementedError
>> inner.__name__ = name
>> # This is okay, writing to locals works inside the class body.
>> locals()[name] = inner
>>
>> del inner, name  # Clean up the class namespace.
> 
> You have a peculiar idea of "good style"...

Yes, very peculiar. It's called "factor out common operations" and "Don't 
Repeat Yourself" :-)

In a world full of people who write:

d[1] = None
d[2] = None
d[3] = None
d[4] = None

I prefer to write:

for i in range(1, 5):
   d[i] = None

Shocking, I know.

Literally my first professional programming job was working on a 
Hypercard project written by a professional programmer. (He was paid for 
it, so he was professional.) The first time I looked at his code, as a 
fresh-out-of-uni naive coder, I was surprised to read his GUI set-up 
code. By memory, it was something like this:

set the name of button 1 to "Wibble 1"
set the name of button 2 to "Wibble 2"
set the name of button 3 to "Wibble 3"
set the name of button 4 to "Wibble 4"
# and so on...
set the name of button 100 to "Wibble 100"

(using "Wibble" as a placeholder for the actual name, which I don't 
recall). The first thing I did was replace that with a loop:

for i = 1 to 100 do
set the name of button 100 to ("Wibble " & i)
end for


Hypertalk uses & for string concatenation. That one change cut startup 
time from something like 90 seconds to about 30, and a few more equally 
trivial changes got it down to about 15 seconds.

Hypertalk in 1988 was not the fastest language in the world, but it was 
fun to work with.



>> although to be honest I'm not sure if that would be enough to stop
>> PyLint from complaining.
> 
> No - if you think about it, there's no way Pylint could possibly know
> that the above class has methods method_a, method_b, etc. 

Well, if a human reader can do it, a sufficiently advanced source-code 
analyser could do it too... *wink*

Yes, of course you are right, in practical terms I think it is extremely 
unlikely that PyLint or any other linter is smart enough to recognise 
that locals()[name] = inner is equivalent to setting attributes method_a 
etc. I actually knew that... "although to be honest I'm not sure" is an 
understated way of saying "It isn't" :-)

https://en.wikipedia.org/wiki/Litotes


> It also
> doesn't like the `del inner, name` because theoretically neither of
> those names might be defined, if the loop executed zero times.

That's a limitation of the linter. Don't blame me if it is too stupid to 
recognise that looping over a non-empty string literal cannot possibly 
loop zero times :-)



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pylint false positives

2018-08-14 Thread Steven D'Aprano
On Tue, 14 Aug 2018 10:58:17 +0200, Frank Millman wrote:

>> > I have an abstract class ClassA with a number of concrete
>> > sub-classes. ClassA has a method which invokes 'self.method_b()'
>> > which is defined separately on each sub-class. Pylint complains that
>> > "Instance of 'ClassA' has no  'method_b' member".
[...]

> I do mean a lot of methods, not classes. I don't have any problem adding
> the lines. It is just that, before I starting using pylint, it had not
> occurred to me that there was any problem with my approach. If an
> experienced python programmer was reviewing my code, would they flag it
> as 'bad style'?

*shrug*

I wouldn't necessarily call it *bad*, but perhaps *not-quite good* style.

I think its fine for a small projects and quick scripts, especially if 
they're written and maintained by a single person for their own use. 
Perhaps not so much for large projects intended for long-term use with 
continual development.

If there really are a lot of such missing methods, I'd consider writing 
something like this:

class A:
def __init__(self, ...):
...

# === process abstract methods en masse ===
for name in "method_a method_b method_c method_d".split():
@abstractmethod
def inner(self):
raise NotImplementedError
inner.__name__ = name
# This is okay, writing to locals works inside the class body.
locals()[name] = inner

del inner, name  # Clean up the class namespace.

def concrete_method_a(self):
...


although to be honest I'm not sure if that would be enough to stop PyLint 
from complaining.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python-Tkinter issue. Multiple overlaping event routines called by single click

2018-08-11 Thread Steven D'Aprano
On Sun, 12 Aug 2018 01:30:43 +0100, MRAB wrote:

> On 2018-08-11 21:01, wfgazd...@gmail.com wrote:
>> I have a main window open.  Then I open a tk.TopLevel dialog window
>> giving the user multiple choices.  He selects one, the corresponding
>> event is executed.  Then in the underlining main window, just by chance
>> there is another button exactly under the mouse click in the TopLevel
>> dialog window.  Its corresponding event is then triggered.
>> 
>> How can I keep the main window button that just happens to be in the
>> wrong place from being triggered?
>> 
> The handler should return the string "break" to prevent the event from
> propagating further. Are you doing that? It's surprising how far you can
> go without it before running into a problem!

I think you are mistaken:

https://stackoverflow.com/a/12357536

but since the description of the problem is so vague, it is hard to tell 
exactly what's happening.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python-Tkinter issue. Multiple overlaping event routines called by single click

2018-08-11 Thread Steven D'Aprano
On Sat, 11 Aug 2018 13:01:44 -0700, wfgazdzik wrote:

> I have a main window open.  Then I open a tk.TopLevel dialog window
> giving the user multiple choices.  He selects one, the corresponding
> event is executed.  Then in the underlining main window, just by chance
> there is another button exactly under the mouse click in the TopLevel
> dialog window.  Its corresponding event is then triggered.

Sounds to me that the user is clicking twice, once in the dialog, and 
then a second time just as it disappears and the main window takes focus.

Possibly they are trying to double-click.

Or their mouse is faulty.

Unless you can replicate this with multiple users, the most likely cause 
is user-error. And unless you can eliminate user-error, trying to work-
around users who click randomly is a nightmare. How do you decide which 
clicks are intended and which are not?


> How can I keep the main window button that just happens to be in the
> wrong place from being triggered?

If you put in a delay between enacting the event and closing the dialog, 
I reckon the problem will go away... but instead you'll have two click 
events in the dialog.

But it seems like an interesting experiment... put time.sleep(0.3) at the 
end of the event handler and see what happens.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Can't figure out how to do something using ctypes (and maybe struct?)

2018-08-10 Thread Steven D'Aprano
On Fri, 10 Aug 2018 18:40:11 -0400, inhahe wrote:

> I need to make a list of instances of a Structure, then I need to make
> an instance of another Structure, one of the fields of which needs to be
> an arbitrary-length array of pointers to the instances in the list. How
> do I do that?
> 
> Just in case it helps, I'll include what I tried that didn't work:

How about simplifying your example to the smallest and simplest example 
of the problem? Your example has:

- two functions;

- one method that seems to have become unattached from its class;

- two classes;

- using 12 different fields.

Surely not all of that detail is specific to the problem you are 
happening. If you can simplify the problem, the solution may be more 
obvious.

It might help to read this: http://sscce.org/

By the way, unrelated to your specific problem but possibly relevant 
elsewhere, you have this function:


> def mkVstEvents(events):
> class Events(ctypes.Structure):
> _fields_ = [ ... ]
> return Events( ... )

You might not be aware of this, but that means that every time you call 
mkVstEvents, you get a singleton instance of a new and distinct class 
that just happens to have the same name and layout.

So if you did this:

a = mkVstEvents( ... )
b = mkVstEvents( ... )

then a and b would *not* be instances of the same class:

isinstance(a, type(b))  # returns False
isinstance(b, type(a))  # returns False
type(a) == type(b)  # also False


Each time you call the function, it creates a brand new class, always 
called Events, creates a single instance of that class, and returns it. 
That is especially wasteful of memory, since classes aren't small.

py> class Events(ctypes.Structure):
... pass
...
py> sys.getsizeof(Events)
508


Unless that's what you intended, you ought to move the class outside of 
the function.


class Events(ctypes.Structure):
_fields_ = [ ... ]

def mkVstEvents(events):
return Events( ... )




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: name 'aLOCK' is not defined When I add aLOCK = threading.RLock() behind if __name__ == "__main__"

2018-08-09 Thread Steven D'Aprano
On Fri, 10 Aug 2018 08:15:09 +0200, Karsten Hilbert wrote:

> On Fri, Aug 10, 2018 at 12:24:25AM +0800, xuanwu348 wrote:
> 
>> Yes, move the code from positionA(can run normally) to
>> positionB(exception with name undefined) I find this content
>> "https://docs.python.org/3.3/tutorial/classes.html#python-scopes-and-
namespaces"
>> But I still don't undewrstand the differenct of  scopes-and-namespaces
>> between positionA and positionB,
>>
>> I think the variable "aLock" at these positionA or  positionB are all
>> global.
> 
> When something goes wrong in an unexpected way: test your assumptions
> ;-)



xuanwu348's assumptions are correct. aLock is a global, in both 
positions. The problem is not the scope of the variable, but whether or 
not the variable is assigned to or not.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


RFC -- custom operators

2018-08-07 Thread Steven D'Aprano
Request for comments -- proposal to allow custom binary operators.

I'll looking for comments on custom binary operators: would it be useful, 
if so, what use-cases do you have?

The most obvious and often-requested use-case would be for a form of 
logical operator (AND, OR, XOR) that is distinct from the bitwise 
operators & | ^ but unlike the standard `and` and `or` operators, calls 
dunder methods.

The proposal is to add custom operators. A placeholder syntax might be:

spam OP eggs

which would then delegate to special dunder methods __OP__ or __ROP__ 
similar to existing operators such as + and similar.

I don't want to get into arguments about syntax, or implementation 
details, unless there is some interest in the functionality. Please focus 
on *functional requirements* only.

(1) This proposal requires operators to be legal identifiers, 
such as "XOR" or "spam", not punctuation like % and
absolutely not Unicode symbols like ∉

(2) For the sake of the functional requirements, assume that 
we can parse `spam OP eggs` without any ambiguity;

(3) This only proposes binary infix operators, not unary
prefix or postfix operators;

infix:argument1 OP argument2
prefix:   OP argument
postfix:  argument OP

(4) This does not propose to allow the precedence to be
set on a case-by-case basis. All custom operators will
have the same precedence.

(5) What should that precedence be?

(6) This does not propose to set the associativity on a
case-by-case basis. All custom operators will have
the same associativity.

(7) Should the operators be left-associative (like multiplication),
right-associative (like exponentiation), or non-associative?

# Left-associative:
a OP b OP c# like (a OP b) op c

# Right-associative:
a OP b OP c# like a OP (b op c)

In the last case, that would make chained custom operators intentionally 
ambiguous (and hence a SyntaxError) unless disambiguated with parentheses:

# Non-associative:
a OP b OP c# SyntaxError
(a OP b) OP c  # okay
a OP (b OP c)  # okay


(8) This does not propose to support short-circuiting operators.


I'm not interested in hearing theoretical arguments that every infix 
operator can be written as a function or method call. I know that. I'm 
interested in hearing about use-cases where the code is improved and made 
more expressive by using operator syntax and existing operators aren't 
sufficient.

(If there aren't any such use-cases, then there's no need for custom 
operators.)


Thoughts?



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: NLTK

2018-08-06 Thread Steven D'Aprano
On Fri, 03 Aug 2018 07:49:40 +, mausg wrote:

> I like to analyse text. my method consisted of something like
> words=text.split(), which would split the text into space-seperated
> units. 

In natural language, words are more complicated than just space-separated 
units. Some languages don't use spaces as a word delimiter. Some don't 
use word delimiters at all. Even in English, the we have *compound words* 
which exist in three forms:

- open: "ice cream"
- closed: "notebook"
- hyphenated: "long-term"

Recognising open compound words is difficult. "Real estate" is an open 
compound word, but "real cheese" and "my estate" are both two words.

Another problem for English speakers is deciding whether to treat 
contractions as a single word, or split them?

"don't" --> "do" "n't"

"they'll" --> "they" "'ll"

Punctuation marks should either be stripped out of sentences before 
splitting into words, or treated as distinct tokens. We don't want 
"tokens" and "tokens." to be treated as distinct words, just because one 
happened to fall at the end of a sentence and one didn't.


> then I tried to use the Python NLTK library, which had alot of
> features I wanted, but using `word-tokenize' gives a different
>  answer.-
> 
> What gives?.

I'm pretty sure the function isn't called "word-tokenize". That would 
mean "word subtract tokenize" in Python code. Do you mean word_tokenize?

Have you compared the output of the two and looked at how they differ? If 
there is too much output to compare by eye, you could convert to sets and 
check the set difference.

Or try reading the documentation for word_tokenize:

http://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.treebank.TreebankWordTokenizer



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: machine learning forums

2018-08-05 Thread Steven D'Aprano
On Sun, 05 Aug 2018 06:05:46 -0700, Sharan Basappa wrote:

> I am quite new to Python. I am learning Python as I am interested in
> machine learning. The issue is, I have not found any ML forum where
> novices like me can get help. I have tried reddit and each of my posts
> have gone unanswered. 

Which subreddits have you posted to?


> Looks like reddit forum prefers either abstract
> topics on ML or very complex issues for discussions.
> 
> I have tried stackoverflow also but there only programming issues are
> entertained 

I believe Stackoverflow has a dedicated machine-learning site, "Cross 
Validated":

https://meta.stackexchange.com/questions/130524/which-stack-exchange-
website-for-machine-learning-and-computational-algorithms

https://meta.stackexchange.com/questions/227757/where-to-ask-basic-
questions-about-machine-learning


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: beware of linked in - mail used on this list

2018-08-02 Thread Steven D'Aprano
On Thu, 02 Aug 2018 22:35:10 +0400, Abdur-Rahmaan Janhangeer wrote:

> just an info if you are using the mail you use in this list for linked
> in you might get surprises
> 
> apologies if you got a mail from linkedin somewhere

LinkedIn is a spammer. I frequently get friend requests from people who I 
don't know from LinkedIn, and most of them don't even know they sent 
them. I got three from you yesterday.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Are dicts supposed to raise comparison errors

2018-08-02 Thread Steven D'Aprano
On Wed, 01 Aug 2018 22:14:54 +0300, Serhiy Storchaka wrote:

> 01.08.18 21:03, Chris Angelico пише:
>> And in any code that does not and cannot run on Python 2, the warning
>> about bytes and text comparing unequal is nothing more than a false
>> positive.
> 
> Not always. If your code supported Python 2 in the past, or third-party
> dependencies supports or supported Python 2, this warning can expose a
> real bug. Even if all your and third-party code always was Python 3
> only, the standard library can contain such kind of bugs.
> 
> Several years after the EOL of Python 2.7 and moving all living code to
> Python 3 we can ignore bytes warnings as always false positive.

Even then, I don't know that we should do that. I do not believe that the 
EOL of Python 2 will end all confusion between byte strings and text 
strings. There is ample opportunity for code to accidentally compare 
bytes and text even in pure Python 3 code, e.g. comparing data read from 
files reading from files which are supposed to be opened in the same 
binary/text mode but aren't.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Are dicts supposed to raise comparison errors

2018-08-02 Thread Steven D'Aprano
On Wed, 01 Aug 2018 19:00:27 +0100, Paul Moore wrote:

[...]
> My point was that it's a *warning*, and as such it's perfectly possible
> for a warning to *not* need addressing (other than to suppress or ignore
> it once you're happy that doing so is the right approach).

And my point was that ignoring warnings is not the right approach.

Suppressing them on a case-by-case basis (if possible) is one thing, but 
a blanket suppression goes to far, for the reasons I gave earlier.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Are dicts supposed to raise comparison errors

2018-08-01 Thread Steven D'Aprano
On Wed, 01 Aug 2018 16:22:16 +0100, Paul Moore wrote:

> On Wed, 1 Aug 2018 at 16:10, Robin Becker  wrote:
>>
>> On 01/08/2018 14:38, Chris Angelico wrote:
>> > t's a warning designed to help people port code from Py2 to Py3. It's
>> > not meant to catch every possible comparison. Unless you are actually
>> > porting Py2 code and are worried that you'll be accidentally
>> > comparing bytes and text, just*don't use the -b switch*  and there
>> > will be no problems.
>> >
>> > I don't understand what the issue is here.
>>
>> I don't either, I have never used the  -b flag until the issue was
>> raised on bitbucket. If someone is testing a program with reportlab and
>> uses that flag then they get a lot of warnings from this dictionary
>> assignment. Probably the code needs tightening so that we insist on
>> using native strings everywhere; that's quite hard for py2/3 compatible
>> code.
> 
> They should probably use the warnings module to disable the warning in
> library code that they don't control, in that case.
> 
> If they've reported to you that your code produces warnings under -b,
> your response can quite reasonably be "thanks for the information, we've
> reviewed our bytes/string handling and can confirm that it's safe, so
> there's no fixes needed in reportlab".

I'm sorry, I don't understand this reasoning. (Perhaps I have missed 
something.) Robin says his code runs under both Python2 and Python3. He's 
getting a warning that the behaviour has changed between the two, and 
there's a dubious comparison being made between bytes and strings. 
Consequently, there's a very real chance that he has a dicts which have 
one key in Python 2 but two in Python 3:

- in Python 2, b'x' and u'x' are the same key;

- in Python 3, b'x' and u'x' are different keys;


# Python 2
py> {u'x': 1, b'x': 2}
{u'x': 2}


#Python 3
py> {u'x': 1, b'x': 2}
{b'x': 2, 'x': 1}


This means that Robin very likely has subtly or not-so-subtly different 
behaviour his software depending on which version of Python it runs 
under. If not an outright bug that silently does the wrong thing.

Even if Robin has audited the entire code base and can confidently say 
today that despite the warning, no such bug has manifested, he cannot 
possibly be sure that it won't manifest tomorrow. (Not unless the 
software is frozen and will never be modified.)


In another post, Chris says:

I suspect that there may be a bit of non-thinking-C-mentality
creeping in: "if I can turn on warnings, I should, and any
warning is a problem". That simply isn't the case in Python.

I strongly disagree. Unless Chris' intention is to say bugs don't matter 
if they're written in Python, I don't know why one would say that 
warnings aren't a problem.

Every warning is one of three cases:

- it reveals an existing actual problem;

- it reveals a potential problem which might somebody become 
  an actual problem;

- or it is a false positive which (if unfixed) distracts 
  attention and encourages a blasé attitude which could
  easily lead to problems in the future.


Warnings are a code smell. Avoiding warnings is a way of "working clean":

https://blog.codinghorror.com/programmers-and-chefs/


Ignoring warnings because they haven't *yet* manifested as a bug, or 
worse, because you *assume* that they haven't manifested as a bug, is 
about as sensible as ignoring the oil warning light on your car because 
the engine hasn't yet seized up. Regardless of which language the 
software is written in.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking whether type is None

2018-07-28 Thread Steven D'Aprano
On Sat, 28 Jul 2018 09:47:07 +, Gilmeh Serda wrote:

> On Tue, 24 Jul 2018 12:33:27 -0700, Tobiah wrote:
> 
>> I'm trying to get away from things like:
>> 
>>  >>> type(thing) is type(None)
> 
> How about:
> 
> >>> some_thing = None
> >>> type(some_thing).__str__(some_thing)
> 'None'
> 
> Equally weird, I'd say, but what the heck...

class Foo:
def __str__(self):
return 'None'



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking whether type is None

2018-07-25 Thread Steven D'Aprano
On Wed, 25 Jul 2018 16:14:18 +, Schachner, Joseph wrote:

> While I appreciate that use of "is" in   thing is None, I claim this
> relies on knowledge of how Python works internally, to know that every
> None actually is the same ID (the same object) - it is singular.

No, it isn't knowledge of Python's internal working. None is a singleton 
object is a language guarantee, a promise that will always be true in any 
Python interpreter.

It is no more about "how Python works internally" than knowing that the 
keyword is spelled "class" rather than Class, or that we use ** for 
exponentiation rather than ^.


> That
> probably works for 0 and 1 also but you probably wouldn't consider
> testing   thing is 1, at least I hope you wouldn't.  thing is None looks
> just as odd to me.  Why not thing == None ?  That works.

It is wrong (in other words, it doesn't work) because it allows non-None 
objects to masquerade as None and pretend to be what they are not.

If that's your intent, then of course you may do so. But without a 
comment explaining your intent, don't be surprised if more experienced 
Python programmers correct your "mistake".



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking whether type is None

2018-07-24 Thread Steven D'Aprano
On Tue, 24 Jul 2018 12:33:27 -0700, Tobiah wrote:

[...]
> So what would I compare type(None) to?

Why would you need to? The fastest, easiest, most reliable way to check 
if something is None is:

if something is None



>   >>> type(None)
>   
>   >>> type(None) is NoneType
>   Traceback (most recent call last):
> File "", line 1, in 
>   NameError: name 'NoneType' is not defined


You can do:

from types import NoneType


or if you prefer:

NoneType = type(None)


but why bother?


> I know I ask whether:
> 
>   >>> thing is None
> 
> but I wanted a generic test.

That *is* a generic test.


> I'm trying to get away from things like:
> 
>   >>> type(thing) is type(None)

That is a good move.


> because of something I read somewhere preferring my original test
> method.

Oh, you read "something" "somewhere"? Then it must be good advice!

*wink*

Writing code like:

type(something) is dict

was the standard way to do a type check back in the Python 1.5 days. 
That's about 20 years ago now. These days, that is rarely what we need 
now.

The usual way to check a type is:

isinstance(something, dict)

but even that should be rare. If you find yourself doing lots of type 
checking, using isinstance() or type(), then you're probably writing 
slow, inconvenient Python code.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 2.7.14 and Python 3.6.0 netcdf4

2018-07-23 Thread Steven D'Aprano
On Mon, 23 Jul 2018 19:39:18 -0300, jorge.conrado wrote:

>   Traceback (most recent call last):
>   File "", line 1, in 
>   ModuleNotFoundError: No module named 'netCDF4'
> 
> 
>   What can I do to solve this error for Python 3.6.0

Just because you have the Python 2.7 version of the netCDF4 module 
installed in the Python 2.7 environment, doesn't mean it will magically 
work for Python 3.6. You have to install the module for 3.6 as well.

How did you install it for Python 2.7?


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: coding style - where to declare variables

2018-07-23 Thread Steven D'Aprano
On Mon, 23 Jul 2018 14:39:56 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano :
> 
>> Lambda calculus has the concept of a binding operator, which is
>> effectively an assignment operator: it takes a variable and a value and
>> binds the value to the variable, changing a free variable to a bound
>> variable. In other words, it assigns the value to the variable, just
>> like assignment does.
> 
> In traditional Lambda Calculus semantics, there are no values at all.

It is more common to say "pure lambda calculus" rather than 
"traditional", and it is not correct to say there are no values at all. 
Rather, all values are functions (and all functions are values).

http://scienceblogs.com/goodmath/2006/08/29/a-lambda-calculus-rerun-1/

and:

"As this suggests, functions are just ordinary values, and can
be the results of functions or passed as arguments to functions
(even to themselves!).  Thus, in the lambda calculus, functions are
first-class values.  Lambda terms serve both as functions and data."

http://www.cs.cornell.edu/courses/cs6110/2013sp/lectures/lec02-sp13.pdf

And from the same notes:

"So, what is a value?  In the pure lambda calculus, any abstraction
is a value.  Remember, an abstraction λx:e is a function; in the
pure lambda calculus, the only values are functions. In an applied
lambda calculus with integers and arithmetic operations, values
also include integers.  Intuitively, a value is an expression
that can not be reduced/executed/simplified any further."



[...]
> The lambda calculus comment is just an aside. The main point is that you
> shouldn't lead people to believe that Python has variables that are any
> different than, say, Pascal's variables (even if you, for whatever
> reason, want to call them "names"). They are memory slots that hold
> values until you assign new values to them.

Nevertheless, they are still different.

My computer has an ethernet slot and a USB slot, and while they are both 
slots that hold a cable and transmit information in and out of the 
computer, they are nevertheless different. The differences are just as 
important as the similarities.


> It *is* true that Python has a more limited data model than Pascal (all
> of Python's values are objects in the heap and only accessible through
> pointers).

Calling it "more limited" is an inaccurate and pejorative way of putting 
it. Rather, I would say it is a more minimalist, *elegant* data model:

* a single kind of variable (objects in the heap where the interpreter
  manages the lifetime of objects for you)

as opposed to Pascal's more complex and more difficult model:

* two kinds of variables:

  - first-class variables that the compiler manages for you 
(allocating and deallocating them on the stack)

  - second-class variables that the programmer has to manage
manually (declaring pointers, allocating memory by hand,
tracking the lifetime of the memory block yourself,
deallocating it when you are done, and carefully avoiding
accessing the pointed-to memory block after deallocation).


At least more modern languages with both value-types and reference-types 
(such as Java, C#, Objective C, Swift) manage to elevate their reference-
type variables to first-class citizenship.


> Also, unlike Pascal, variables can hold (pointers to) values
> of any type. IOW, Python has the data model of Lisp.
> 
> Lisp talks about binding and rebinding variables as well:
> 
>https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node79.html>
> 
> which might be Lambda Calculus legacy, but at least they are not shy to
> talk about variables and assignment.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: coding style - where to declare variables

2018-07-23 Thread Steven D'Aprano
On Mon, 23 Jul 2018 09:22:55 +0300, Marko Rauhamaa wrote:

> Dennis Lee Bieber :
[...]
>>  In my world, Java and Python are the ones that are not "common".
> 
> Yes, "boxed" is a Java term. However, the programming pattern of using
> dynamic memory and pointers is ubiquitous and ancient:

Not that ancient -- the first version(s) of Fortran didn't have dynamic 
memory allocation or pointers. (Admittedly, Lisp did follow not long 
afterwards.) But it is certainly not ubiquitous: many languages don't 
have pointers at all.


> FILE *f = fopen("xyz", "r");
> 
> where f holds a pointer, fopen() returns a pointer, and "xyz" and "r"
> evaluate to pointer values.
> 
> In Python, every expression evaluates to a pointer and every variable
> holds a pointer.

Within the semantics of the Python language, there are no pointer values, 
no way to get a pointer to a memory location or a pointer to an object. 
No expression in Python evaluates to a pointer, no variables hold 
pointers in Python. The Python language is defined in terms of objects: 
expressions evaluate to objects, and variables are names bound to objects.

If you don't believe me, believe the interpreter:

# Marko expects a pointer, but unfortunately he gets an int
py> type(1 + 2) 


Marko is making a similar category error as those who insist that Python 
uses "call by reference" or "call by value" for parameter passing. He 
mistakes an irrelevant implementation detail used by *some* but not all 
Python interpreters[1] for entities which exist in the Python computation 
model. As Fredrick puts it:

"Joe, I think our son might be lost in the woods"
"Don't worry, I have his social security number"

http://effbot.org/zone/call-by-object.htm

(The *pointer to an object* used in the implementation is not the same as 
the object itself.)

Evaluating 1 + 2 gives the value (an object) 3, not a pointer to the 
value 3. Pointers are not merely "not first-class citizens" of Python, 
they aren't citizens at all: there is nothing we can do in pure Python to 
get hold of pointers, manipulate pointers, or dereference pointers.

https://en.wikipedia.org/wiki/First-class_citizen

Pointers are merely one convenient, useful mechanism to implement 
Python's model of computation in an efficient manner on a digital 
computer. They are not part of the computation model, and pointers are 
not values available to the Python programmer[2].







[1] The CPython interpreter uses pointers; the Jython interpreter uses 
whatever kind of memory indirection the JVM provides; when I emulate a 
Python interpreter using pencil and paper, there's not a pointer in sight 
but a lot of copying of values and crossing them out. ("Copy on access" 
perhaps?) A Python interpreter emulated by a Turing machine would use 
dots on a long paper tape, and an analog computer emulating Python would 
use I-have-no-idea. Clockwork? Hydraulics?

https://en.wikipedia.org/wiki/MONIAC
https://makezine.com/2012/01/24/early-russian-hydraulic-computer/


[2] Except by dropping into ctypes or some other interface to the 
implementation, and even then the pointers have to be converted to and 
from int objects as they cross the boundary between the Python realm and 
the implementation realm.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: coding style - where to declare variables

2018-07-23 Thread Steven D'Aprano
On Mon, 23 Jul 2018 11:49:37 +0300, Marko Rauhamaa wrote:

> People new to Python are unnecessarily confused by talking about names
> and binding when it's really just ordinary variables and assignment.

It really isn't, not to those people who expect ordinary variables and 
assignment to be the same as that of C, C++, C#, Objective C, Swift, 
Pascal, Java, Go etc.

There are at least two common models for the association between symbolic 
names and values in programming: 

1. variables are named boxes at a statically-allocated, fixed 
   location in memory, usually on the stack ("value types");

2. variables are names that refer to dynamically-allocated
   objects in the heap, often movable ("reference types").

It is absolutely true that both are "variables" of a kind, and that "name 
binding" is abstract enough to refer to both models. But in *practice*, 
the influence of Algol, C and BASIC especially is so great that many 
people think of variables and assignment exclusively in the first sense. 
Since Python uses the second sense, having a distinct name to contrast 
the two is desirable, and "name binding" seems to fit that need.

I no longer believe that we should actively avoid the word "variable" 
when referring to Python. I think that's an extreme position which isn't 
justified. But "name binding" is an accurate technical term and not that 
hard to understand (on a scale of 0 to "monad", it's about 1) and I think 
it is elitist to claim that "people new to Python"[1] will necessarily be 
confused and we therefore ought to avoid the term.

There are lots of confusing terms and concepts in Python. People learn 
them. Name binding is no different.






[1] What, all of them? Even those with a comp sci PhD and 40 years 
programming experience in two dozen different languages?


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: coding style - where to declare variables

2018-07-23 Thread Steven D'Aprano
On Mon, 23 Jul 2018 20:24:30 +1200, Gregory Ewing wrote:

> Steven D'Aprano wrote:
>> So let me see if I understand your argument...
>> 
>> - we should stop using the term "binding", because it means
>>   nothing different from assignment;
>> - binding (a.k.a. "assignment") comes from lambda calculus;
>> - which has no assignment (a.k.a. "binding").
> 
> No, that's not what Marko is saying at all. He's pointing out that the
> term "binding" means something completely different in lambda calculus.

Well done in reading Marko's intent. Unfortunately, I'm not as good as 
inferring meaning as you seem to be, consequently I had to judge by what 
he wrote, not what he meant.

When a writer fails to communicate their intent, that's usually the 
failure of the writer, not the reader. We aren't mind-readers and writers 
should not blame the reader when they fail to communicate their intended 
meaning.


> The terms "bound variable" and "free variable" in lambda calculus mean
> what in Python we would call a "local variable" vs. a "non-local
> variable".

Actually, no, they are called "bound variable" and "free variable" in 
Python too.

https://docs.python.org/3/reference/executionmodel.html

See also: http://effbot.org/zone/closure.htm

Alas, I don't think Fredrik Lundh got it *quite* right. I think that 
globals (and builtins) in Python are "open free variables", as opposed to 
nonlocals which are closed. And sadly, the Python glossary currently 
doesn't define free variables nor bound variables, or even name binding.


> They have nothing to do with assignment at all.

That's not quite correct either.

Lambda calculus has the concept of a binding operator, which is 
effectively an assignment operator: it takes a variable and a value and 
binds the value to the variable, changing a free variable to a bound 
variable. In other words, it assigns the value to the variable, just like 
assignment does.

In Python terms, = is a binary binding operator: it takes a left hand 
operand, the variable (a name, for the sake of simplicity) and a right 
hand operand (a value) and binds the value to the name.

 
> Marko is asking us to stop using the word "binding" to refer to
> assignment because of the potential confusion with this other meaning.

Marko has some idiosyncratic beliefs about Python (and apparently other 
languages as well) that are difficult to justify.

Especially in this case. Anyone who understands lambda calculus is 
unlikely to be confused by Python using the same terms to mean something 
*almost identical* to what they mean in lambda calculus. (The only 
difference I can see is that lambda calculus treats variables as abstract 
mathematical entities, while Python and other programming languages 
vivify them and give them a concrete implementation.)

If one in ten thousand programmers are even aware of the existence of 
lambda calculus, I would be surprised. To give up using perfectly good, 
accurate terminology in favour of worse, less accurate terminology in 
order to avoid unlikely and transient confusion among a minuscule subset 
of programmers seems a poor tradeoff to me.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: coding style - where to declare variables

2018-07-22 Thread Steven D'Aprano
On Sun, 22 Jul 2018 17:50:06 -0400, Dennis Lee Bieber wrote:

> On Mon, 23 Jul 2018 00:08:00 +0300, Marko Rauhamaa 
> declaimed the following:
> 
>>I Java terms, all Python values are boxed. That's a very usual pattern
>>in virtually all programming languages (apart from FORTRAN).
>>
>>
>   FORTRAN, C, COBOL, BASIC, Pascal, ALGOL, BCPL, REXX, VMS DCL, 
> probably R, Matlab, APL.
> 
>   I never encountered the term "boxed" until trying to read some of 
> the O'Reilly books on Java.
> 
>   In my world, Java and Python are the ones that are not "common".

Indeed. Its not just older languages from the 60s and 70s with value-type 
variables. Newer languages intended as systems languages, like Rust and 
Go, do the same.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: coding style - where to declare variables

2018-07-22 Thread Steven D'Aprano
On Mon, 23 Jul 2018 00:08:00 +0300, Marko Rauhamaa wrote:

> Would you call it binding in this case:
> 
>X[0]["z"] = getit()
>X[3]["q"] = X[0]["z"]
>X[0]["z"].changeit()

It is a binding, but it is not a *name* binding. Since we are talking 
about name bindings, and comparing/contrasting them to variable 
assignment in classical languages, I don't think that binding to slots in 
hash tables or arrays is relevant except to muddy the waters and make 
things more complicated than they need be.


> I think what you are talking about is more usually called "referencing."

I don't think so. Its certainly not a term I've ever heard in this 
context before.


>> With a language with more ‘classical’ variable, the assignment of Y = X
>> would normal make a copy of that object, so the value Y does not get
>> changed by X.changeit().
> 
> I Java terms, all Python values are boxed. 

Correct. 

Java mixes two different models of variable assignment: it uses classical 
C- and Pascal-like variable assignment for primitive values, and Lisp- 
and Smalltalk-like name binding for boxed values (objects), leading to 
two distinct sets of behaviour. That makes Java a good lesson in why it 
is useful to distinguish between two models of name binding.

Java is not the only language with the distinction between "value 
types" (primitive values usually stored on the stack) and "reference 
types" (usually objects stored in the heap). C# and other .Net languages 
often make that distinction:

http://net-informations.com/faq/general/valuetype-referencetype.htm

Swift is another such language.

Other languages which use primarily or exclusively value-types (i.e. the 
"variables are a named box at a fixed memory location" model) include 
Algol, Pascal, Modula-3, C, C++, C#, Objective C, D, Swift, COBOL, Forth, 
Ada, PL/I, Rust and many others.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: coding style - where to declare variables

2018-07-22 Thread Steven D'Aprano
On Sun, 22 Jul 2018 22:50:52 +0300, Marko Rauhamaa wrote:

> I wish people stopped talking about "name binding" and "rebinding,"
> which are simply posh synonyms for variable assignment. Properly, the
> term "binding" comes from lambda calculus, whose semantics is defined
> using "bound" and "free" variables. Lambda calculus doesn't have
> assignment.

So let me see if I understand your argument...

- we should stop using the term "binding", because it means 
  nothing different from assignment;
- binding (a.k.a. "assignment") comes from lambda calculus;
- which has no assignment (a.k.a. "binding").

Which leads us to the conclusion that lambda calculus both has and 
doesn't have binding a.k.a. assignment at the same time. Perhaps it is a 
quantum phenomenon.

Are you happy with the contradiction inherent in your statements, or 
would you prefer to reword your argument?




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Non-GUI, single processort inter process massaging - how?

2018-07-21 Thread Steven D'Aprano
On Sat, 21 Jul 2018 09:07:23 +0100, Chris Green wrote:

[...]
> I want to be able to interrogate the server process from several client
> processes, some will interrogate it multiple times, others once only. 
> They are mostly (all?) run from the command line (bash).


This sounds like a good approach for signals. Your server script sets up 
one or more callbacks that print the desired information to stdout, or 
writes it to a file, whichever is more convenient, and then you send the 
appropriate signal to the server process from the client processes.

At the bash command line, you use the kill command: see `man kill` for 
details.


Here's a tiny demo:

# === cut ===

import signal, os, time
state = 0

def sig1(signum, stack):
print(time.strftime('it is %H:%m:%S'))

def sig2(signum, stack):
print("Current state:", stack.f_globals['state'])

# Register signal handlers
signal.signal(signal.SIGUSR1, sig1)
signal.signal(signal.SIGUSR2, sig2)

# Print the process ID.
print('My PID is:', os.getpid())

while True:
state += 1
time.sleep(0.2)

# === cut ===


Run that in one terminal, and the first thing it does is print the 
process ID. Let's say it prints 12345, over in another terminal, you can 
run:

kill -USR1 12345
kill -USR2 12345

to send the appropriate signals.

To do this programmatically from another Python script, use the os.kill() 
function.


https://docs.python.org/3/library/signal.html

https://pymotw.com/3/signal/




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Better way / regex to extract values form a dictionary

2018-07-21 Thread Steven D'Aprano
On Sat, 21 Jul 2018 17:07:04 +0530, Ganesh Pal wrote:

> I have one of the dictionary values in the below format
> 
> '/usr/local/ABCD/EDF/ASASAS/GTH/HELLO/MELLO/test04_Failures.log'
> '/usr/local/ABCD/EDF/GTH/HEL/OOLO/MELLO/test02_Failures.log'
> '/usr/local/ABCD/EDF/GTH/BEL/LO/MELLO/test03_Failures.log'
> 
> I need to extract the file name in the path example, say
> test04_Failure.log and testcase no i.e test04

The dictionary is irrelevant to your question. It doesn't matter whether 
the path came from a dict, a list, read directly from stdin, an 
environment variable, extracted from a CSV file, or plucked directly from 
outer space by the programmer. The process remains the same regardless of 
where the path came from.


import os
path = '/usr/local/ABCD/EDF/ASASAS/GTH/HELLO/MELLO/test04_Failures.log'

filename = os.path.basename(path)
print filename
# prints 'test04_Failures.log'

testcase, remaining_junk = filename.split('_', 1)
print testcase
# prints 'test04'



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: try except inside a with open

2018-07-20 Thread Steven D'Aprano
On Fri, 20 Jul 2018 23:29:21 +0530, Ganesh Pal wrote:

> Dear python Friends,
> 
> 
> I need a quick suggestion on the below code.
> 
> def modify_various_line(f):
> """ Try modifiying various line """
> try:
> f.write('0123456789abcdef')
> f.seek(5) # Go to the 6th byte in the file
> print f.read(1)
> f.seek(-3, 2) # Go to the 3rd byte before the end
> print f.read(1)
> f.write('END')
> except IOError as e:
>raise
> return True

(1) Since this function always returns True (if it returns at all), what 
is the point? There's no point checking the return result, since it's 
always true, so why bother returning anything?

(2) What's the point of catching an exception only to immediately, and 
always, re-raise it?

It seems to me that your code above is better written like this:

def modify_various_line(f):
""" Try modifying various line """
f.write('0123456789abcdef')
f.seek(5) # Go to the 6th byte in the file
print f.read(1)
f.seek(-3, 2) # Go to the 3rd byte before the end
print f.read(1)
f.write('END')



> def h():
> try:
> with open('/tmp/file.txt', 'r+') as f:
>  try:
>  modify_various_line(f)
>  except Exception as e:
>print e
> except IOError as e:
> print(e)

Debugging is hard enough without destroying useful debugging information. 
Tracebacks are not your enemy to be hidden and suppressed (at least not 
during development) but your greatest friend in the world, one who tells 
you the embarrassing errors you have made (bugs) so you can fix them.

https://realpython.com/the-most-diabolical-python-antipattern/


def h():
with open('/tmp/file.txt', 'r+') as f:
modify_various_line(f)


is much shorter, easier to read, and if an error occurs, you get the 
benefit of the full traceback not just the abbreviated error message.

Tracebacks are printed to standard error, not standard out, so they can 
be redirected to a log file more easily. Or you can set an error handler 
for your entire application, so that in production any uncaught exception 
can be logged without having to fill your application with boilerplate 
"try...except...print >>sys.stderr, err".

But if you *really* have to catch the exception and suppress the 
traceback, try this:

def h():
try:
with open('/tmp/file.txt', 'r+') as f:
modify_various_line(f)
except IOError as e:
print(e)

There's no need to nest two except clauses, both of which do the same 
thing with an exception, and one of which will cover up bugs in your code 
as well as expected failures.



> (1) Can we  use try and expect  in  a 'with open' function as shown in
> the below example code .

Yes.


> (2)  If I hit any other exceptions  say Value-error can I catch them as
> show below

If you hit ValueError, that is almost always a bug in your code. That's 
exactly the sort of thing you *shouldn't* be covering up with an except 
clause unless you really know what you are doing.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [OT] Bit twiddling homework

2018-07-19 Thread Steven D'Aprano
On Fri, 20 Jul 2018 08:25:04 +0200, Brian Oney via Python-list wrote:

> PS: Can I twiddle bits in Python?

Yes.

These operators work on ints:

  bitwise AND:  &
  bitwise OR:   |
  bitwise XOR:  ^



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-19 Thread Steven D'Aprano
On Thu, 19 Jul 2018 20:34:26 +0200, Christian Gollwitzer wrote:

> Am 19.07.2018 um 14:50 schrieb Gregory Ewing:
>> Chris Angelico wrote:
>>> On Thu, Jul 19, 2018 at 4:41 PM, Gregory Ewing
>>>  wrote:
>>>
>>>> (Google doesn't seem to think so -- it asks me whether I meant
>>>> "assist shop". Although it does offer to translateč it into Czech...)
>>>
>>> Into or from?? I'm thoroughly confused now!
>>
>> Hard to tell. This is what the link said:
>>
>> assistshop - Czech translation - bab.la English-Czech dictionary
>> https://en.bab.la/dictionary/english-czech/assistshop Translation for
>> 'assistshop' in the free English-Czech dictionary and" many other Czech
>> translations.
> 
> Well that link tries to translate "assistshop" into the czech word
> "prodavač" which is the usual word for a person in a shop who consults
> the customers and sells the goods to them; I don't know if "assist shop"
> in English comes close, as I don't understand it (I'm a native German
> speaker)

In English, that would be "shop assistant". "Assist shop" would be 
grammatically incorrect: it should be written as "assist the shop", 
meaning "help the shop".


Relevant:

https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: test for absence of infinite loop

2018-07-17 Thread Steven D'Aprano
On Tue, 17 Jul 2018 10:10:49 +0100, Robin Becker wrote:

> A user reported an infinite loop in reportlab. I determined a possible
> cause and fix and would like to test for absence of the loop. Is there
> any way to check for presence/absence of an infinite loop in python? I
> imagine we could do something like call an external process and see if
> it takes too long, but that seems a bit flaky.

In general, no, it is impossible to detect infinite loops.
https://en.wikipedia.org/wiki/Halting_problem

That's not to say that either human readers or the compiler can't detect 
*some* infinite loops ahead of time:

# obviously an infinite loop
while True:
pass

and then there's this:

https://www.usenix.org/legacy/publications/library/proceedings/vhll/
full_papers/koenig.a


but Python's compiler isn't capable of anything like that.

The way I sometimes deal with that sort of thing is to re-write selected 
potentially-infinite loops:

while condition:
# condition may never become False
do something

to something like this:

for counter in range(1000):
if not condition: break
do something
else:
    raise TooManyIterationsError




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-17 Thread Steven D'Aprano
On Tue, 17 Jul 2018 10:51:38 +0300, Marko Rauhamaa wrote:

> in which Python3's honor is defended in a good many of the discussions
> in this newsgroup: anger, condescension, ridicule, name-calling.

You call it defending Python 3's honour. I call it responding to people 
who insist on spreading misinformation and falsehoods even when given the 
correct details.

Some people have their self-image wrapped up in being able to portray 
themselves as a maverick who, almost alone, sees through the "lies" about 
 to see "the truth". Others prefer reality 
instead, and get upset when false facts are repeated, over and over 
again, as truth.

If instead you want to discuss actual concrete areas where Python's text/
bytes divide hurts, you'll find that there are plenty of people who 
agree. Especially if they have to write string-handling code that needs 
to run under both 2 and 3. Been there, done that, don't want to do it 
again.

The Python 3 redesign was done to fix certain common, hard-to-diagnose 
problems in string handling caused by Python2's violation of the Zen "in 
the face of ambiguity, refuse the temptation to guess". (Python 2 guesses 
what encoding you probably mean when it comes to strings and bytes, and 
when it gets it right it is convenient, but when it gets it wrong, it is 
badly wrong, and hard to diagnose and fix.)

It impossible to improve the text handling experience for every single 
programmer writing every single kind of program under every single set of 
circumstances. Like any semantic change, there are going to be winners 
and losers, and the core devs' position is that if the losers have 
concrete and backwards-compatible suggestions for improving their 
experience (e.g. re-adding % support for byte strings) they will consider 
them, but going back to the Python 2 misdesign is off the table.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-17 Thread Steven D'Aprano
On Tue, 17 Jul 2018 15:20:16 +0900, INADA Naoki wrote (replying to Marko):

> I still don't understand what's your original point. I think UTF-8 vs
> UTF-32 is totally different from Python 2 vs 3.
> 
> For example, string in Rust and Swift (2010s languages!) are *valid*
> UTF-8. There are strong separation between byte array and string, even
> they use UTF-8. They looks similar to Python 3, not Python 2.
> 
> And Python can use UTF-8 for internal encoding in the future. AFAIK,
> PyPy tries it now.  After they succeeded,  I want to try port it to
> CPython after we removed legacy Unicode APIs. (ref PEP 393)

I'm not sure about PyPy, but I'm fairly certain that MicroPython uses 
UTF-8.

I would be very interested to see the results of using UTF-8 in CPython. 
At the least, it would remove the need to keep a separate UTF-8 
representation in the string object, as they do now. It might even be 
more compact, although a naive implementation would lose the ability to 
do constant time indexing into strings.

That might be a tradeoff worth keeping, if indexing remained sufficiently 
fast.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-17 Thread Steven D'Aprano
On Tue, 17 Jul 2018 09:52:13 +0300, Marko Rauhamaa wrote:

> Both Python2 and Python3 provide two forms of string, one containing
> 8-bit integers and another one containing 21-bit integers.

Why do you insist on making counter-factual statements as facts? Don't 
you have a Python REPL you can try these outrageous claims out before 
making them?

py> b'abcd'[2] + 1  # bytes are sequences of integers
100

py> 'abcd'[2] + 1  # strings are not sequences of integers
Traceback (most recent call last):
  File "", line 1, in 
TypeError: Can't convert 'int' object to str implicitly


Python strings are sequences of abstract characters.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-17 Thread Steven D'Aprano
On Tue, 17 Jul 2018 08:26:45 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano :
>> On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:
>>> UTF-8 bytes can only represent the first 128 code points of Unicode.
>>
>> This is DailyWTF material. Perhaps you want to rethink your wording and
>> maybe even learn a bit more about Unicode and the UTF encodings before
>> making such statements.
>>
>> The idea that UTF-8 bytes cannot represent the whole of Unicode is not
>> even wrong. Of course a *single* byte cannot, but a single byte is not
>> "UTF-8 bytes".
> 
> So I hope that by now you have understood my point and been able to
> decide if you agree with it or not.

If your point was not what you wrote, then no, I'm sorry, my crystal ball 
unexpectedly broke down (why it didn't foresee its own failure I'll never 
know...). I can't tell what you are thinking, only what you write. 
Sometimes I can guess (like my earlier guess that you meant grapheme, 
rather than glyph) but in this case, if you mean something other than 

"UTF-8 bytes can only represent the first 128 code points of Unicode"

I'm flummoxed.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-17 Thread Steven D'Aprano
On Mon, 16 Jul 2018 21:25:20 -0500, Tim Chase wrote:

> On 2018-07-17 01:08, Steven D'Aprano wrote:
>> In English, I think most people would prefer to use a different term
>> for whatever "sh" and "ch" represent than "character".
> 
> The term you may be reaching for is "consonant cluster"?
> 
> https://en.wikipedia.org/wiki/Consonant_cluster

Thanks!


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-17 Thread Steven D'Aprano
On Mon, 16 Jul 2018 21:48:42 -0400, Richard Damon wrote:

>> On Jul 16, 2018, at 9:21 PM, Steven D'Aprano
>>  wrote:
>> 
>>> On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote:
>>> 
>>> You are defining a variable/fixed width codepoint set. Many others
>>> want to deal with CHARACTER sets.
>> 
>> Good luck coming up with a universal, objective, language-neutral,
>> consistent definition for a character.
>> 
> Who says there needs to be one. A good engineer will use the definition
> that is most appropriate to the task at hand. Some things need very
> solid definitions, and some things don’t.

The the problem is solved: we have a perfectly good de facto definition 
of character: it is a synonym for "code point", and every single one of 
Marko's objections disappears.


> This goes back to my original point, where I said some people consider
> UTF-32 as a variable width encoding. For very many things, practically,
> the ‘codepoint’ isn’t the important thing, 

Ah, is this another one of those "let's pick a definition that nobody 
else uses, and state it as a fact" like UTF-32 being variable width?

If by "very many things", you mean "not very many things", I agree with 
you. In my experience, dealing with code points is "good enough", 
especially if you use Western European alphabets, and even more so if 
you're willing to do a normalization step before processing text.

But of course other people's experience may vary. I'm interested in 
learning about the library you use to process graphemes in your software.


> so the fact that every UTF-32
> code point takes the same number of bytes or code words isn’t that
> important. They are dealing with something that needs to be rendered and
> preserving larger units, like the grapheme is important.

If you're writing a text widget or a shell, you need to worry about 
rendering glyphs. Everyone else just delegates to their text widget, GUI 
framework, or shell.


>>> This doesn’t mean that UTF-32 is an awful system, just that it isn’t
>>> the magical cure that some were hoping for.
>> 
>> Nobody ever claimed it was, except for the people railing that since it
>> isn't a magically system we ought to go back to the Good Old Days of
>> code page hell, or even further back when everyone just used ASCII.
>> 
> Sometimes ASCII is good enough, especially on a small machine with
> limited resources.

I doubt that there are many general purpose computers with resources 
*that* limited. Even MicroPython supports Unicode, and that runs on 
embedded devices with memory measured in kilobytes. 8K is considered the 
smallest amount of memory usable with MicroPython, although 128K is more 
realistic as the *practical* lower limit.

In the mid 1980s, I was using computers with 128K of RAM, and they were 
still able to deal with more than just ASCII. I think the "limited 
resources" argument is bogus.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:

> All UTF-8. No unicode strings.

That just means you are re-implementing the bits of Unicode you care 
about (which may be "nothing at all") as UTF-8. If your application is 
nothing but middleware squirting bytes from one layer to another layer, 
that might be all you need care about.

But then you're not processing text in your application, and why should 
your experience in not-processing-text be given any weight over the 
experiences of those who do process text?


And later, in another post:

> UTF-8 bytes can only represent the first 128 code points of Unicode.

This is DailyWTF material. Perhaps you want to rethink your wording and 
maybe even learn a bit more about Unicode and the UTF encodings before 
making such statements.

The idea that UTF-8 bytes cannot represent the whole of Unicode is not 
even wrong. Of course a *single* byte cannot, but a single byte is not 
"UTF-8 bytes".


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 15:28:51 -0400, Terry Reedy wrote:

> On 7/16/2018 1:11 PM, Richard Damon wrote:
> 
>> Many consider that UTF-32 is a variable-width encoding because of the
>> combining characters. It can take multiple ‘codepoints’ to define what
>> should be a single ‘character’ for display.
> 
> I hope you realize that this is not the standard meaning of
> 'variable-width encoding', which is 'variable number of bytes for a
> codepoint'.

A minor correction Terry: it is the number of code units, not bytes.

UTF-8 uses 1-byte code units, and from 1 to 4 code units per code point;

UTF-16 uses 2-byte code units (a 16-bit word), and 1 or 2 words per code 
point;

UTF-32 uses 4-byte code units (a 32-bit word), and only ever a single 
code unit for every code point.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote:

> You are defining a variable/fixed width codepoint set. Many others want
> to deal with CHARACTER sets.

Good luck coming up with a universal, objective, language-neutral, 
consistent definition for a character.


> This doesn’t mean that UTF-32 is an awful system, just that it isn’t the
> magical cure that some were hoping for.

Nobody ever claimed it was, except for the people railing that since it 
isn't a magically system we ought to go back to the Good Old Days of code 
page hell, or even further back when everyone just used ASCII.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cult-like behaviour [was Re: Kindness]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 23:50:12 +0200, Roel Schroeven wrote:

> There are times (encoding/decoding network protocols and other data
> formats) when I have a byte string and I want/need to process it like
> Python 2 does, and that is the one area where I feel Python 3 make
> things a bit more difficult.

Ah yes, the unfortunate design error that iterating over byte-strings 
returns ints rather than single-byte strings.

That decision seemed to make sense at the time it was made, but turned 
out to be an annoyance. It's a wart on Python 3, but fortunately one 
which is fairly easily dealt with by a helper function.

That *is* a nice example of where byte strings in Python 3 aren't as nice 
as in Python 2.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Tue, 17 Jul 2018 06:15:25 +1000, Chris Angelico wrote:

> On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano
>  wrote:
>> There is nothing special about diacritics such that we ought to treat
>> some combinations like "Ch" (two code points = one character) as "fixed
>> width" while others like "â" (two code points = one character) as
>> "variable width".
> 
> When you reverse a word, do you treat "ch" and "sh" as one character or
> two? 

In English, "ch" is always two letters of the alphabet. In Welsh and 
Czech, they can be one or two letters. (I think they will be two letters 
only in loan words, but I'm not certain about that.) Whether that makes 
them one or two characters depends on how you define "character".

Good luck with finding a universal, objective, unambiguous definition.


> I'm of the opinion that they're single characters, and thus this
> should be "dalokosh":
> 
> https://wiki.teamfortress.com/wiki/Dalokohs_Bar
> 
> (It's the Russian for "chocolate" - "шоколад" - transliterated to
> English/Latin - "šokolad" or "shokolad" - and then reversed.)

In English, I think most people would prefer to use a different term for 
whatever "sh" and "ch" represent than "character". But you make a good 
point that even in English, we sometimes want to treat two letter 
combinations as a single unit.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Unicode is not UTF-32 [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 22:40:13 +0300, Marko Rauhamaa wrote:

> Terry Reedy :
> 
>> On 7/15/2018 5:28 PM, Marko Rauhamaa wrote:
>>> if your new system used Python3's UTF-32 strings as a foundation,
>>
>> Since 3.3, Python's strings are not (always) UFT-32 strings.
> 
> You are right. Python's strings are a superset of UTF-32. More
> accurately, Python's strings are UTF-32 plus surrogate characters.

The first thing you are doing wrong is conflating the semantics of the 
data type with one possible implementation of that data type. UTF-32 is 
implementation, not semantics: it specifies how to represent Unicode code 
points as bytes in memory, not what Unicode code points are.

Python 3 strings are sequences of abstract characters ("code points") 
with no mandatory implementation. In CPython, some string objects are 
encoded in Latin-1. Some are encoded in UTF-16. Some are encoded in 
UTF-32. Some implementations (MicroPython) use UTF-8.

Your second error is a more minor point: it isn't clear (at least not to 
me) that "Unicode plus surrogates" is a superset of Unicode. Surrogates 
are part of Unicode. The only extension here is that Python strings are 
not necessarily well-formed surrogate-free Unicode strings, but they're 
still Unicode strings.


>> Nor are they always UCS-2 (or partly UTF-16) strings. Nor are the
>> always Latin-1 or Ascii strings. Python's Flexible String
>> Representation uses the narrowest possible internal code for any
>> particular string. This is all transparent to the user except for
>> memory size.
> 
> How CPython chooses to represent its strings internally is not what I'm
> talking about.

Then why do you repeatedly talk about the internal storage representation?

UTF-32 is not a character set, it is an encoding. It specifies how to 
implement a sequence of Unicode abstract characters.


>>> UTF-32, after all, is a variable-width encoding.
>>
>> Nope.  It a fixed-width (32 bits, 4 bytes) encoding.
>>
>> Perhaps you should ask more questions before pontificating.
> 
> You mean each code point is one code point wide. But that's rather an
> irrelevant thing to state.

No, he means that each code point is one code unit wide.


> The main point is that UTF-32 (aka Unicode)

UTF-32 is not a synonym for Unicode. Many legacy encodings don't 
distinguish between the character set and the mapping between bytes and 
characters, but Unicode is not one of those.


> uses one or more code points to represent what people would consider an
> individual character.

That's a reasonable observation to make. But that's not what fixed- and 
variable-width refers to.

So does ASCII, and in both cases, it is irrelevant since the term of art 
is to define fixed- and variable-width in terms of *code points* not 
human meaningful characters. "Character" is context- and language-
dependent and frequently ambiguous. "LL" or "CH" (for example) could be a 
single character or a double character, depending on context and language.

Even in ASCII English, something as large as "ough" might be considered 
to be a single unit of language, which some people might choose to call a 
character. (But not a single letter, naturally.) If you don't like that 
example, "qu" is probably a better one: aside from acronyms and loan 
words, no modern English word can fail to follow a Q with a U.


> Code points are about as interesting as individual bytes in UTF-8.

That's your opinion. I see no justification for it.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Python 4000 was Re: [SUSPICIOUS MESSAGE] Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 15:09:16 -0400, Terry Reedy wrote:

> On 7/16/2018 11:50 AM, Dennis Lee Bieber wrote:
> 
>>  For Python 4000 maybe
> 
> Please don't give people the idea that there is any current intention to
> have a 'Python 4000' similar to 'Python 3000'.  Call it 'a mythical
> Python 4000', if you must use such a term.

I prefer to say Python 5000, to make it even more clear that should such 
a thing happen again, it will be a *REALLY* long time from now.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Users banned

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 20:03:39 +0100, Steve Simmons wrote:

> +1  Seems to me Bart is being banned for "being a dick" and "talking
> rubbish" (my words/interpretation) with irritating persistence.

I know that when I first started here, I often talked rubbish. The 
difference is, I was willing to listen and consider when people gave 
alternate viewpoints. Eventually.

And I know that some people think that I'm sometimes still being a dick. 
They're wrong, I'm just charmingly forthright *wink*

Bart is often frustratingly resistant to reasonable argument, and has 
been obnoxious in his habit of bringing virtually every conversation into 
an opportunity to make a dig at Python.

But neither of these are prohibited by the CoC, neither of these should 
be banning offense, and even if they were, he should have had a formal 
warning first.

Preferably TWO formal warnings: the first privately, the second publicly, 
and only on the third offence a ban.

And I question the fairness of a six month ban, rather than (let's say) 
an initial one month ban.

As for banning Rick, when he isn't even posting at the moment, I don't 
even have words for that. There's no statute of limitation for murder, 
but surely "being obnoxious on the internet" ought to come with a fairly 
short period of forgiveness.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 14:22:27 -0400, Richard Damon wrote:

[...]
> But I am not talking about those sort of characters or ligatures, 

So what? I am.

You don't get to say "only non-standard definitions I approve of count".

There is the industry standard definition of what it means to be a fixed- 
or variable-width encoding, which we can all agree on, or we can have a 
free-for-all where I reject your non-standard meaning and you reject mine 
and nobody can understand anything that anyone else says.

You (generic "you", not necessarily you personally) don't get to demand 
that I must accept your redefinition, while simultaneously refusing to 
return the favour. If you try, I will simply dismiss what you say as 
nonsense on stilts: you (still generic you) clearly don't know what 
variable-width means and are trying to shift the terms of the debate by 
redefining terms so that black means white and white means purple.


> but
> ‘characters’ that are built up of a combining diacritical marks (like
> accents) and a base character. Unicode define many code points for the
> more common of these, but many others do not.

I am aware how Unicode works, and it doesn't change a thing.

Fixed/variable width is NOT defined in terms of "characters", but if it 
were, ASCII would be variable width too. Limiting the definition to only 
diacritics is just a feeble attempt to wiggle out of the logical 
consequences of your (generic your) position.

There is nothing special about diacritics such that we ought to treat 
some combinations like "Ch" (two code points = one character) as "fixed 
width" while others like "â" (two code points = one character) as 
"variable width".

To do so is just special pleading. And the thing about special pleading 
is that we're not obliged to accept it. Plead as much as you like, the 
answer is still no.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote:

> Had you actually read my words with *intent* rather than *reaction*, you
> would notice that I suggested the *option* of turning off Unicode.

Yes, I know what you wrote, and I read it with intent.

Jim, you seem to be labouring under the misapprehension that anytime 
somebody spots a flaw in your argument, or an unpleasant implication of 
your words, it can only be because they must not have read your words 
carefully. Believe me, that is not the case.

YOU are the one who raised the specter of politically correct groupthink, 
not me. That's dog-whistle politics. But okay, let's move on from that.

You say that all you want is a switch to turn off Unicode (and replace it 
with what? Kanji strings? Cyrillic? Shift_JS? no of course not, I'm being 
absurd -- replace it with ASCII, what else could any right-thinking 
person want, right?). Let's look at this from a purely technical 
perspective:

Python already has two string data types, bytes and text. You want 
something that is almost functionally identical to bytes, but to call it 
text, presumably because you don't want to have to prefix your strings 
with a b"" (that was also Marko's objection to byte strings).

Let's say we do it. Now we have three string implementations that need to 
be added, documented, tested, maintained, instead of two.

(Are you volunteering to do this work?)

Now we need to double the testing: every library needs to be tested 
twice, once with the "Unicode text" switch on, once with it off, to 
ensure that features behave as expected in the appropriate mode.

Is this switch a build-time option, so that we have interpreters built 
with support for Unicode and interpreters built without it? We've been 
there: it's a horribly bad idea. We used to have Python builds with 
threading support, and others without threading support. We used to have 
Python builds with "wide Unicode" and others with "narrow Unicode". 
Nothing good comes of this design.

Or perhaps the switch is a runtime global option?

Surely you can imagine the opportunities for bugs, both obvious crashing 
bugs and non-obvious silent failure bugs, that will occur when users run 
libraries intended for one mode under the other mode. Not every library 
is going to be fully tested under both modes.

Perhaps it is a compile-time option that only affects the current module, 
like the __future__ imports. That's a bit more promising, it might even 
use the __future__ infrastructure -- but then you have the problem of 
interaction between modules that have this switch enabled and those that 
have it disabled.

More complexity, more cruft, more bugs.

It's not clear that your switch gives us *any* advantage at all, except 
the warm fuzzy feelings that no dirty foreign characters might creep into 
our pure ASCII strings. Hmm, okay, but frankly apart from when I copy and 
paste code from the internet and it ends up bringing in en-dashes and 
curly quotes instead of hyphens and type-writer quotes, that never 
happens to me by accident, and I'm having a lot of trouble seeing how it 
could.

If you want ASCII byte strings, you have them right now -- you just have 
to use the b"" string syntax.

If you want ASCII strings without the b prefix, you have them right now. 
Just use only ASCII characters in your strings.

I'm simply not seeing the advantage of:

from __future__ import no_unicode
print("Hello World!")  # stand in for any string handling on ASCII

over 

print("Hello World!")

which works just as well if you control the data you are working with and 
know that it is pure ASCII.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 13:11:23 -0400, Richard Damon wrote:

>> On Jul 16, 2018, at 12:51 PM, Steven D'Aprano
>>  wrote:
>> 
>>> On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote:
>>> 
>>> if your new system used Python3's UTF-32 strings as a foundation, that
>>> would be an equally naïve misstep. You'd need to reach a notch higher
>>> and use glyphs or other "semiotic atoms" as building blocks. UTF-32,
>>> after all, is a variable-width encoding.
>> 
>> Python's strings aren't UTF-32. They are sequences of abstract code
>> points.
>> 
>> UTF-32 is not a variable-width encoding.
>> 
>> --
>> Steven D'Aprano
>> 
>> 
> Many consider that UTF-32 is a variable-width encoding because of the
> combining characters. It can take multiple ‘codepoints’ to define what
> should be a single ‘character’ for display.

Ah, well if we're going to start making up our own definitions of terms, 
then ASCII is a variable-width encoding too.

"Ch" (a single letter of the alphabet in a number of European languages, 
including Welsh and Czech) requires two code points in ASCII. Even in 
English, "qu" could be considered a two-byte "character" (grapheme), and 
for ASCII users, (c) is a THREE code point character for what ought to be 
a single character ©.

The standard definition of variable- and fixed-width encodings refers to 
how many *code units* is required to make up a single *code point*.

Under that standard definition, UTF-8 and UTF-16 are variable-width, and 
UTF-32 is fixed-width. 

But I'll accept that UTF-32 is variable-width if Marko accepts that ASCII 
is too.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Tue, 17 Jul 2018 02:22:59 +1000, Chris Angelico wrote:

> On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence 
> wrote:
>> Out of curiosity where does my mum's Welsh come into the equation as I
>> believe that it is not recognised by the EU as a language?
>>
>>
> What characters does it use? Mostly Latin letters? 

Yes, Welsh uses the Latin script. It has an alphabet of 29 letters 
(including 8 digraphs), plus four diacritics used on some vowels:

circumflex   e.g. â

acute accent e.g. é

diaeresise.g. ï

grave accent e.g. ẁ

Yes, w is a vowel in Welsh -- and very occasionally in English as well.

http://www.dictionary.com/e/w-vowel/


Accented vowels are not considered separate letters.

https://en.wikipedia.org/wiki/Welsh_orthography

Some older sources will exclude J (making 28 letters). Patagonian Welsh 
also includes the letter "V", although that's non-standard.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Glyphs and graphemes [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote:

> if your new system used Python3's UTF-32 strings as a foundation, that
> would be an equally naïve misstep. You'd need to reach a notch higher
> and use glyphs or other "semiotic atoms" as building blocks. UTF-32,
> after all, is a variable-width encoding.

Python's strings aren't UTF-32. They are sequences of abstract code 
points.

UTF-32 is not a variable-width encoding.

I don't know what *you* mean by "semiotic atoms", (possibly you mean 
graphemes?) but "glyphs" are the visual images of characters, and there's 
a virtual infinity of those for each character, differing in type-face, 
size, and style (roman, italic, bold, reverse-oblique, etc).

There is no evidence aside from your say-so that a programming language 
"need" support "glyphs" as a native data type, or even graphemes. For 
starters, such a system would be exceedingly complex: graphemes are both 
language and context dependent.

English, for example, has around 250 distinct graphemes:

https://books.google.com.au/books?
id=QrBQAmfXYooC&pg=PT238&lpg=PT238&dq=250
+graphemes&source=bl&ots=abiymnQ5pq&sig=eq3k06BkuGfpuGC6wKqPkCR_8Bw&hl=en&sa=X&ei=HAdyUbfULpCnqwGRi4DYAg&redir_esc=y


Certainly it would be utterly impractical for a programming language 
designer, knowing nothing but a few half-remembered jargon terms, to try 
to design a native string type that matched the grapheme rules for the 
hundreds of human languages around the world. Or even just for English. 
Let third-party libraries blaze that trail first.


By no means is Unicode the last word in text processing. It might not 
even be the last word in native string types for programming languages. 
But it is a true international standard which provides a universal 
character set and a selection of useful algorithms able to be used as 
powerful building blocks for text-processing libraries.

Honestly Marko, your argument strikes me as akin to somebody who insists 
that because Python's float data type doesn't support full CAS (computer 
algebra system) and theorem prover, its useless and a step backwards and 
we should abandon IEEE-754 float semantics and let users implement their 
own floating point maths using nothing but fixed 1-byte integers.

A float, after all, is nothing but 8 bytes.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


I18N and Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Sun, 15 Jul 2018 17:28:15 -0700, Jim Lee wrote:

> Unicode is an attempt to solve at least one I18N issue

If you're going to insist on digging your heels in and using definitions 
which nobody else does, this discussion is going to go nowhere fast.

Unicode is (ideally) a universal character set; in practice it is an 
industry standard for the consistent encoding, representation, and 
handling of text expressed in most of the world's writing systems. 

I18N is recognised as the abbreviation for internationalization and 
localization.

https://en.wikipedia.org/wiki/Internationalization_and_localization

There is no overlap between the two: Unicode doesn't help with 
internationalization (except in the non-trivial but purely mechanical 
sense that it removes the need for metadata specifying the current code 
page), and internationalization doesn't require Unicode:

(1) Unicode provides no support for internationalization or localization. 
Just because I have the Unicode string "street" in my application, 
doesn't mean it magically transforms to "Straße" when used by German 
users.

(2) Internationalization can occur even between groups of users who share 
a single character set, even ASCII. My application might display "Rubbish 
Bin" in the UK and Australia and "Trash Can" in the USA.



If you think that Unicode is about internationalization, you are 
labouring under serious misapprehensions about the nature of both Unicode 
and internationalization.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cult-like behaviour [was Re: Kindness]

2018-07-16 Thread Steven D'Aprano
On Sun, 15 Jul 2018 16:38:41 -0700, Jim Lee wrote:

> As I said, there are programming situations where the programmer only
> needs to deal with a single language - his own.

This might come as a shock to you, but just because Python's native 
string type supports (for example) the Devanagari alphabet, that doesn't 
mean you are forced to use it in your code or application.

# Look ma, not a single Cyrillic or Greek or Tagalog letter in sight!
label = "something interesting"


Don't worry, the UN Language Police aren't going to force you at gunpoint 
to label your output in Khmer, Hiragana and Gujarati if you don't want to.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 14:17:35 +, Dan Sommers wrote:

> On Mon, 16 Jul 2018 10:39:49 +0000, Steven D'Aprano wrote:
> 
>> ... people who think that if ISO-8859-7 was good enough for Jesus ...
> 
> It may have been good enough for his disciples, but Jesus spoke Aramaic.

The buzzing noise you just heard was the joke whizzing past your head 
*wink*

It was a riff on the apocryphal American (occasionally other nationality) 
who said that if English was good enough for Jesus Christ, it is good 
enough for everyone:

http://itre.cis.upenn.edu/~myl/languagelog/archives/003084.html

with the twist that in my example, I picked *another* language rather 
than English. I shouldn't have picked Greek, an unfortunate choice that 
may have lead you to imagine I was serious. Perhaps ISO-8859-5 (Cyrillic) 
or Shift_JIS would have been funnier :-(

And of course there is the absurdity of any ISO standards existing two 
thousand years ago.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Getting process details on an operating system process/question answer by vouce

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 06:47:53 -0600, John T. Haggerty wrote:

> So, it's early for me---and I'm not sure if these things can be done but
> I'd like to know the following:
> 
> 1. How can Python connect to a running operating system process on a
> host operating system to see what part of the execution is like?---ie
> keep track of health stats like it's stuck on disk access or inside some
> kind of wait state etc.

Start by answering the question: "How can *any* process connect to 
another running process ...". Once you know how to do that, that may give 
us a hint how to do the same using Python.

I would expect that there needs to be some sort of OS-specific interface 
where you pass the process ID you care about to some OS function, and it 
will report the process state.

If your operating system doesn't support that, the only other option that 
I know of is if the application you are interested in *itself* provides 
an interface to question it while it is running. On Linux, that might 
including sending it a signal (see the signal.py library) or it might 
include some form of interprocess communication. But whatever it is, it 
will likely be application specific.

So if the OS doesn't support this, and the process you are interested in 
doesn't support this, then it likely can't be done.


> 2. Be able to pass questions and take answers via say a customized "okay
> google" to try to explain:
> 
> Ask: how was your day
> record answer in voice translate it via google ask new question

Sorry, I don't understand this.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Sun, 15 Jul 2018 18:02:51 -0700, Jim Lee wrote:

> On 07/15/18 17:17, MRAB wrote:
>> On 2018-07-16 00:10, Jim Lee wrote:
[...]
>>> Have you never heard of programming BEFORE Unicode existed?
>>>
>>> How ever did we get along?

Mostly by not exchanging data with anyone else using a different language 
or operating system.

As one of those people who *did* need to exchange data, between Windows 
using Latin-1 and Macs using MacRoman, I can absolutely tell you that we 
got on **REALLY, REALLY, REALLY BADLY** with data loss and corruption an 
almost guarantee.


[...]
> Yes, it was.  However, dealing with Unicode is also annoying.  If there
> were only one encoding, such as UTF-8, I wouldn't mind so much.

O_o

As an application developer, you should (almost) never need to use any 
Unicode encoding other than UTF-8.

[...]
> But I don't speak Esperanto,  and my programs don't generally care what
> characters are used for European currencies.  When I create a simple
> program that takes a text file (created by me) and munges it into a
> different format, I don't care if someone from Uzbekistan can read it or
> not.

Good for you.

But Python is not a programming language written to satisfy the needs of 
people like you, and ONLY people like you.

It is a language written to satisfy the needs of people from Uzbekistan, 
and China, and Japan, and India, and Brazil, and France, and Russia, and 
Australia, and the UK, and mathematicians, and historians, and linguists, 
and, yes, even people who think that if ISO-8859-7 was good enough for 
Jesus, the whole world ought to be using it.


> When I create a one-time use program to visualize some data on a
> graph, I don't care if anyone else can read the axis labels but me.
> These are realities.  A good programming language will allow for these
> realities without putting the burden on the programmer to turn *every*
> program into a politically correct, globalization compliant model of
> modern groupthink.

And here we get to the crux of the matter. It isn't really the technical 
issues of Unicode that annoy you. It is the loss of privilege that you, 
as an ASCII user, no longer get to dismiss 90% of the world as beneath 
your notice.

Nice.

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Sun, 15 Jul 2018 17:39:55 -0700, Jim Lee wrote:

> On 07/15/18 17:18, Steven D'Aprano wrote:
>> On Sun, 15 Jul 2018 16:08:15 -0700, Jim Lee wrote:
>>
>>> Python3 is intrinsically tied to Unicode for string handling.
>>> Therefore, the Python programmer is forced to deal with it (in all but
>>> trivial cases), rather than given a choice.  So I don't understand how
>>> I can illustrate my point with Python code since Python won't let me
>>> deal with strings without also dealing with Unicode.
>> Nonsense.
>>
>> b"Look ma, a Python 2 style ASCII string."
>>
>>
> As I said, all but trivial cases.
> 
> Do you consider separating Unicode strings from byte strings, having to
> decode and encode from one to the other, 

If you use nothing but byte strings, you don't need to separate the non-
existent text strings from the byte strings, nor do you need to decode or 
encode.


> and knowing which
> functions/methods accept one, the other, or both as arguments, 

That's certainly a real complication, if I may stretch the meaning of the 
word "complication" beyond breaking point. Surely you are already having 
to read the documentation of the function to learn what arguments it 
takes, and what types they are (int or float, list or iterator, 'r' or 
'a', etc). If someone can't deal with the question of "unicode or bytes" 
as well, then perhaps they ought to consider a career change to something 
less demanding, like politics.

If, as you insinuate, all your data is 100% ASCII, then you have nothing 
to fear. Just treat 

str(bytes_obj, 'ASCII')
bytes(str_obj, 'ASCII')

as the equivalent of a cast or coercion, and you won't go wrong. (Of 
course, in 2018, the number of applications that can truly say all their 
data is pure ASCII is vanishingly small.)

Or use Latin-1, if you want to do the most simple-minded thing that you 
can to make errors go away, without caring about correctness.

But the thing is, that complexity is *inherent in the domain*. You can 
try to deal with it without Unicode, and as soon as you have users 
expecting to use more than one code page, you're doomed.


> as "not dealing with Unicode"?  I don't.

Frankly, I do.

Dealing with all the vagaries of human text *is* complicated, that's the 
nature of the beast. Dealing with the complexities of Unicode can be as 
complex as dealing with the complexities of floating point arithmetic.

(But neither of those are even in the same ballpark as dealing with the 
complexities of *not* using Unicode: legacy code pages and encodings are 
a nightmare to deal with.)

Nevertheless, just as casual users can go a very, very long way just 
treating floats as the real numbers we learn about in school, and trust 
that IEEE-754 semantics will mean your answers are "close enough", so the 
casual user can go a very long way ignoring the complexities of Unicode, 
so long as they control their own data and know what it is.

If you don't know what your data is, then you're doomed, Unicode or no 
Unicode. (If you don't think that's a problem, if you think that "just 
treat text as octets" works, then people like you are the reason there is 
so much mojibake in the world, screwing it up for the rest of us.)



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cult-like behaviour [was Re: Kindness]

2018-07-15 Thread Steven D'Aprano
On Sun, 15 Jul 2018 16:08:15 -0700, Jim Lee wrote:

> Python3 is intrinsically tied to Unicode for string handling. Therefore,
> the Python programmer is forced to deal with it (in all but trivial
> cases), rather than given a choice.  So I don't understand how I can
> illustrate my point with Python code since Python won't let me deal with
> strings without also dealing with Unicode.

Nonsense.

b"Look ma, a Python 2 style ASCII string."





-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cult-like behaviour [was Re: Kindness]

2018-07-15 Thread Steven D'Aprano
On Sun, 15 Jul 2018 13:09:59 -0700, Jim Lee wrote:

> On 07/15/18 12:37, MRAB wrote:
>> To me, Unicode and UTF-8 aren't things to be reserved for I18N. I use
>> them as a matter of course because I find it a lot easier to stick with
>> just one encoding, one that will work with _any_ text I have.
> 
> Which is exactly the same rationale for using any other single encoding
> (including ASCII).

Which encoding should I choose?

Having chosen one today, which encoding should I choose tomorrow?


> If the text you deal with is not multi-lingual, why
> complicate matters by trying to support a plethora of encodings which
> will never be used (and the attendant opportunity for more bugs)?

Who mentioned a plethora of encodings? With the boundaries of your 
application, using Python 3 text strings means never needing to even 
consider encodings. The only time you should care about them is when your 
data crosses the boundary between your application and the rest of the 
world (e.g. writing to files), and in that case, we should standardise on 
UTF-8 (unless there's a really good reason not to).

Honestly Jim, your response sounds to me the equivalent of:

"... and that's why structured programming will never catch 
on, and why unstructured programming with GOTO is better,
faster, more reliable, and can do everything that the
programmer needs."


Aside from occasional legacy software reasons, I believe that one would 
have to ignore the last 30+ years of "code page hell" to even consider 
using anything but Unicode in modern application software.

> Note that I'm *not* saying Unicode  is *bad*, just that it's an
> unnecessary complication for a great deal of programming tasks.  For a
> great deal more, it's absolutely necessary.  That why I said a "smart"
> language would make it easy to turn on and off.

You actually said that I18N features should be able to be turned on and 
off. Unicode and I18N are unrelated.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cult-like behaviour [was Re: Kindness]

2018-07-15 Thread Steven D'Aprano
On Sun, 15 Jul 2018 11:22:11 -0700, James Lee wrote:

> On 7/15/2018 3:43 AM, Steven D'Aprano wrote:
>>
>> No. The real ten billion dollar question is how people in 2018 can
>> stick their head in the sand and take seriously the position that
>> Latin-1 (let alone ASCII) is enough for text strings.
>>
>>
>>
> Easy - for many people, 90% of the Python code they write is not
> intended for world-wide distribution, let alone use.

But they're not making claims about what works for *them*. If they did, 
I'd say "Okay, that works for you. Sorry you got left behind by 
progress." They're making grand sweeping claims about what works best for 
a language intended to be used by *everyone*.

Marko isn't saying "I know my use-case is atypical, but I inherited a 
code base where the bytes/pseudo-text duality of Python2 strings was 
helpful to me, and Python3's strict division into byte strings and text 
strings is less useful."

Rather, he is making the sweeping generalisation that having a text 
string type *at all* is a mistake, because the Python 2 dual bytes+pseudo 
text approach is superior, *for everyone*.


 
> The smart thing would be for a language to have a switch of some sort to
> turn on/off all I18N features.

The Python language has no builtin I18N features.




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cult-like behaviour [was Re: Kindness]

2018-07-15 Thread Steven D'Aprano
On Sun, 15 Jul 2018 14:17:51 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano :
> 
>> On Sun, 15 Jul 2018 11:43:14 +0300, Marko Rauhamaa wrote:
>>> Paul Rubin :
>>>> I don't think Go is the answer either, but it probably got strings
>>>> right.  What is the answer?
>>
>> Go strings aren't text strings. They're byte strings. When you say that
>> Go got them right, that depends on your definition of success.
>>
>> If your definition of "success" is:
>>
>> - fail to be able to support 80% + of the world's languages
>>   and a majority of the world's text;
> 
> Of course byte strings can support at least as many languages as
> Python3's code point strings and at least equally well.

You cannot possibly be serious.

There are 256 possible byte values. China alone has over 10,000 different 
characters. You can't represent 10,000+ characters using only 256 
distinct code points.

You can't even represent the world's languages using 16-bit word-strings 
instead of byte strings.

Watching somebody argue that byte strings are "equally as good" as a 
dedicated Unicode string type in 2018 is like seeing people argue in the 
late 1990s that this new-fangled "structured code" will never be better 
than unstructured code with GOTO.


>> - perpetuate the anti-pattern where a single code point
>>   (hex value) can represent multiple characters, depending on what
>>   encoding you have in mind;
> 
> That doesn't follow at all.

Of course it does. You talked about using Latin-1. What's so special 
about Latin-1? Ask your Greek customers how useful that is to them, and 
explain why they can't use ISO-8859-7 instead.


>> - to have a language where legal variable names cannot be
>>   represented as strings; [1]
> 
> That's a rather Go-specific 

We were talking about whether or not Go had done strings right.

> and uninteresting question, 

It's not a question, its a statement. And it might be uninteresting to 
you, but I find it astonishing.

> but I'm fairly certain you can write a Go parser in Go

So what? You can write a Go parser in Floop if you like.

https://en.wikipedia.org/wiki/BlooP_and_FlooP


> (if that's not how it's done already).
> 
>> - to have a language where text strings are a second-class
>>   data type, not available in the language itself, only in the
>>   libraries;
> 
> Unicode code point strings *ought* to be a second--class data type. They
> were a valiant idea but in the end turned out to be a mistake.

Just because you say they were a mistake, doesn't make it so.


>> - to have a language where text characters are *literally*
>>   32-bit integers ("rune" is an alias to int32);
>>
>>   (you can multiple a linefeed by a grave accent and get pi)
> 
> Again, that has barely anything to do with the topic at hand.

It has *everything* to do with the topic at hand: did Go get strings 
right?


> I don't
> think there's any unproblematic way to capture a true text character,
> period. Python3 certainly hasn't been able to capture it.

Isaac Asimov's quote here is appropriate:

When people thought the Earth was flat, they were wrong. 
When people thought the Earth was spherical, they were 
wrong. But if you think that thinking the Earth is 
spherical is just as wrong as thinking the Earth is flat,
then your view is wronger than both of them put together.


Unicode does not perfectly capture the human concept of "text 
characters" (and no consistent system ever will, because the human 
concept of a character is not consistent). But if you think that makes 
byte-strings *better* than Unicode text strings at representing text, 
then you are wronger than wrong.

 
>>> That's the ten-billion-dollar question, isn't it?!
>>
>> No. The real ten billion dollar question is how people in 2018 can
>> stick their head in the sand and take seriously the position that
>> Latin-1 (let alone ASCII) is enough for text strings.
> 
> Here's the deal: text strings are irrelevant for most modern programming
> needs. Most software is middleware between the human and the terminal
> device.

Your view is completely, utterly inside out. The terminal is the middle 
layer, between the software and the human, not the software.


> Carrying opaque octet strings from end to end is often the most
> correct and least problematic thing to do.

> On the other hand, Python3's code point strings mess things up for no
> added value. You still can't upcase or downcase strings.

Ah, the ol' "argument by counter-factual assertions". State something 
that isn't true, and claim it is true.

py> "αγω".upper()
'ΑΓΩ'

Looks like uppercasing to me. What does it look like to you? Taking a 
square root?

(I can't believe I need to actually demonstrate this.)




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cult-like behaviour [was Re: Kindness]

2018-07-15 Thread Steven D'Aprano
On Sun, 15 Jul 2018 11:39:40 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano :
> 
>> Of course we have no idea what Marko's software is, or what it is
>> doing,
> 
> Correct, you don't, but the link Paul Rubin posted gives you an idea:
> 
>Python 3 says: everything is Unicode (by default, except in certain
>situations, and except if we send you crazy reencoded data, and even
>then it's sometimes still unicode, albeit wrong unicode).

I have a lot of respect for Armin Ronacher, but I think here he is badly 
wrong and he's just ranting.

It is ludicrous to say "everything" is Unicode when Python provides a 
rich set of bytes APIs. He squeezes in a parenthesised "by default" 
there, but that undermines his rant. That's like saying that "everything 
in Python is an int" rather than a float, because is you don't include a 
decimal point or an exponent in numeric literals, you get ints. Or that 
"files in Python are always read-only" because the default for open() is 
to use read mode rather than write mode.

>Filenames
>are Unicode, Terminals are Unicode, stdin and out are Unicode,

And indeed they are, in Windows, and so they should be, in Unix too. 
Maybe some day POSIX will recognise that the rest of the world exists and 
stop privileging ASCII.


>there
>is so much Unicode! And because UNIX is not Unicode, Python 3 now has
>the stance that it's right and UNIX is wrong

Armin seems to be implying that Unix is (1) the only OS in the world, and 
(2) beyond criticism. Neither of these are correct. Windows users might 
rightly ask why Armin cares what Unix does.

Unix does a lot right, but not everything

http://web.mit.edu/~simsong/www/ugh.pdf

and its "everything is bytes" stance is badly wrong when it comes to user-
visible textual elements like file names and the command prompt. We write 
`less README`, not `6c7320524541444d45`, and we should stop pretending 
that we're using bytes just because the underlying infrastructure uses 
bytes. We're using text.



>> That's because URLs are fundamentally text strings.
> 
> https://tools.ietf.org/html/rfc1738>:

Irrelevant or obsolete or both.


> A URL consists of ASCII-only characters that represent an octet string.

Wrong.

>> Quick quiz: which of the following are real URLs? (a) 
>> http://правительство.рф
> 
> On the face of it, that is not a valid URL.

If you had read the link I gave, or even if you copied and pasted the URL 
into any reasonably modern browser, you might have learned that it is a 
valid URL.



> But try this:
[snip]

Indeed. Is there a reason why these shouldn't be considered serious bugs 
in the http library?




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >