So apparently I've been banned from this list
I've been unexpectedly in hospital for the past two weeks, without internet or email. Just before my unexpected hospital stay, I was apparently banned (without warning) by Ethan Furman in what seems to me to be an act of retaliation for my protest against his overzealous and hostile tone-policing against a newcomer to the list, Reto Brunner: https://mail.python.org/pipermail/python-list/2018-September/737020.html (I did make one mistake in that post: I claimed that I hadn't said anything at the time on Ethan's last round of bans. That was incorrect, I actually did make an objection at the time.) Since I'm still catching up on emails, I have just come across Ethan's notice to me (copied below). Notwithstanding Ethan's comment about having posted the suspension notice on the list, I see no sign that he actually did so. At the risk of further retaliation from the moderators, I am ignoring the ban in this instance for the purposes of transparency and openness. (I don't know if this will show up on the mailing list or the newsgroup.) Since I believe this ban is illegitimate, I intend to challenge it if possible. In the meantime, I may not reply on-list to any responses. Subject: Fwd: Temporary Suspension To: From: Ethan Furman Date: Tue, 11 Sep 2018 11:22:40 -0700 In-Reply-To: Steven, you've probably already seen this on Python List, but I forgot to email it directly to you. My apologies. -- ~Ethan~ Python List Moderator Forwarded Message Subject: Temporary Suspension Date: Mon, 10 Sep 2018 07:09:04 -0700 From: Ethan Furman To: Python List Moderators As a list moderator, my goal for this list is to keep the list a useful resource -- but what does "useful" mean? To me it means a place that python users can go to ask questions, get answers, offer advice, and all without sarcasm, name-calling, and deliberate mis-understandings. Conversations should stay mostly on-topic. Due to hostile and inappropriate posts*, Steven D'Aprano is temporarily suspended from Python List for a period of two months. This suspension, along with past suspensions, is being taken only after careful consideration and consultation with other Python moderators. -- ~Ethan~ Python List Moderator * posts in question: [1] https://mail.python.org/pipermail/python-list/2018-July/735735.html [2] https://mail.python.org/pipermail/python-list/2018-September/737020.html -- Steven D'Aprano -- https://mail.python.org/mailman/listinfo/python-list
Trying to use threading.local()
I'm originally posted this on the Python-Ideas list, but this is probably more appropriate. import time from threading import Thread, local def func(): pass def attach(value): func.__params__ = local() func.__params__.value = value def worker(i): print("called from thread %s" % i) attach(i) assert func.__params__.value == i time.sleep(3) value = func.__params__.value if value != i: print("mismatch", i, value) for i in range(5): t = Thread(target=worker, args=(i,)) t.start() print() When I run that, each of the threads print their "called from ..." message, the assertions all pass, then a couple of seconds later they consistently all raise exceptions: Exception in thread Thread-1: Traceback (most recent call last): File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/usr/local/lib/python3.5/threading.py", line 862, in run self._target(*self._args, **self._kwargs) File "", line 5, in worker AttributeError: '_thread._local' object has no attribute 'value' What am I doing wrong? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Any SML coders able to translate this to Python?
On Fri, 07 Sep 2018 15:10:10 +0100, Paul Moore wrote: > On Fri, 7 Sep 2018 at 14:06, Steven D'Aprano > wrote: [...] >> However I have a follow up question. Why the "let" construct in the >> first place? Is this just a matter of principle, "put everything in its >> own scope as a matter of precautionary code hygiene"? Because I can't >> see any advantage to the inner function: > > My impression is that this is just functional programming "good style". > As you say, it's not needed, it's just "keep things valid in the > smallest range possible". Probably also related to the mathematical > style of naming sub-expressions. Also, it's probably the case that in a > (compiled) functional language like SML, the compiler can optimise this > to avoid any actual inner function, leaving it as nothing more than a > temporary name. I guessed it would be something like that. Thanks Paul, and especially Marko for going above and beyond the call of duty with his multiple translations into functional-style Python, and everyone else who answered. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: don't quite understand mailing list
On Fri, 07 Sep 2018 07:39:33 +0300, Marko Rauhamaa wrote: > I'm with Ethan on this one. > > There was nothing in the original posting that merited ridicule. Then its a good thing there was nothing in the response that was ridicule. (A mild rebuke for a mild social faux pas is not ridicule.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Object-oriented philosophy
On Fri, 07 Sep 2018 16:07:06 -0500, Michael F. Stemper wrote: >>> In another case where I had a "bare exception", I was using it to see >>> if something was defined and substitute a default value if it wasn't. >>> Have I cleaned this up properly? >>> >>> try >>> id = xmlmodel.attrib['name'] >>> except KeyError: >>> id = "constant power" >>> >>> (Both changes appear to meet my intent, I'm more wondering about how >>> pythonic they are.) Yes, catch the direct exception you are expecting. That's perfectly Pythonic. >> There's an alternative that's recommended when the key is often absent: >> >> id = xmlmodel.attrib.get('name', "constant power") > > Oh, I like that much better than what I showed above, or how I "fixed" > it cross-thread. Thanks! However, if the key is nearly always present, and your code is performance-critical, calling the "get" method has the slight disadvantage that it will be slightly slower than using the try...except form you show above. On the other hand, the get method has the big advantage that it's an expression that can be used in place, not a four-line compound statement. If I don't care about squeezing out every last bit of performance from the interpreter, I use whatever looks good to me on the day. That will often be the "get" method. But on the rare occasions I do care about performance, the basic rule of thumb I use is that if the key is likely to be missing more than about 10% of the time, I use the "LBYL" idiom (either an explicit test using "if key in dict" or just call the dict.get method). But don't stress about the choice. Chances are that any of the three options you tried (catch KeyError, check with "if" first, or using the get method) will be good enough. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Any SML coders able to translate this to Python?
On Thu, 06 Sep 2018 13:48:54 +0300, Marko Rauhamaa wrote: > Chris Angelico : >> The request was to translate this into Python, not to slavishly imitate >> every possible semantic difference even if it won't actually affect >> behaviour. > > I trust Steven to be able to refactor the code into something more > likable. His only tripping point was the meaning of the "let" construct. Thanks for the vote of confidence :-) However I have a follow up question. Why the "let" construct in the first place? Is this just a matter of principle, "put everything in its own scope as a matter of precautionary code hygiene"? Because I can't see any advantage to the inner function: >>>>def isqrt(n): >>>>if n == 0: >>>>return 0 >>>>else: >>>>def f2398478957(r): >>>>if n < (2*r+1)**2: >>>>return 2*r >>>>else: >>>>return 2*r+1 >>>>return f2398478957(isqrt(n//4)) Sure, it ensures that r is in its own namespace. But why is that an advantage in a function so small? Perhaps its a SML functional- programming thing. Putting aside the observation that recursion may not be the best way to do this in Python, I don't think that the inner function is actually needed. We can just write: def isqrt(n): if n == 0: return 0 else: r = isqrt(n//4) if n < (2*r+1)**2: return 2*r else: return 2*r+1 By the way I got this from this paper: https://www.cs.uni-potsdam.de/ti/kreitz/PDF/03cucs-intsqrt.pdf -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Object-oriented philosophy
On Thu, 06 Sep 2018 22:00:26 +0100, MRAB wrote: > On 2018-09-06 21:24, Michael F. Stemper wrote: [...] >>try: >> P_0s = xmlmodel.findall( 'RatedPower' )[0].text >> self.P_0 = float( P_0s ) >>except: [...] > A word of advice: don't use a "bare" except, i.e. one that doesn't > specify what exception(s) it should catch. Excellent advice! More here: https://realpython.com/the-most-diabolical-python-antipattern/ -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: don't quite understand mailing list
On Thu, 06 Sep 2018 13:06:22 -0700, Ethan Furman wrote: > On 09/06/2018 12:42 PM, Reto Brunner wrote: > >> What do you think the link, which is attached to every email you >> receive from the list, is for? Listinfo sounds very promising, doesn't >> it? >> >> And if you actually go to it you'll find: "To unsubscribe from >> Python-list, get a password reminder, or change your subscription >> options enter your subscription email address" >> >> So how about you try that? > > Reto, your response is inappropriate. If you can't be kind and/or > respectful, let someone else respond. No it wasn't inappropriate, and your attack on Reto is uncalled for. Reto's answer was kind and infinitely more respectful than your unnecessary criticism. As far as I can tell, this is Reto's first post here. After your hostile and unwelcoming response, I wouldn't be surprised if it was his last. His answer was both helpful and an *amazingly* restrained and kind response to a stupid question[1] asked by somebody claiming to be an professional software engineer. It was not condescending or mean- spirited, as you said in another post, nor was it snarky. But even had the OP been a total beginner to computing, it was still a helpful response containing the information needed to solve their immediate problem (how to unsubscribe from the list) with just the *tiniest* (and appropriate) hint of reproach to encourage them to learn how to solve their own problems for themselves so that in future, they will be a better contributor to whatever discussion forums they might find themselves on. Ethan, you are a great contributor on many of the Python mailing lists, but your tone-policing is inappropriate, and your CoC banning of Rick and Bart back in July was an excessive and uncalled for misuse of moderator power. To my shame, I didn't say anything at the time, but I won't be intimidated any longer by fear of the CoC and accusations of incivility. I'm speaking up now because your reply to Reto is unwelcoming, unhelpful and disrespectful, and coming from a moderator who has been known to ban people, that makes it even more hostile. [1] Yes, there are such things as stupid questions. If your doctor asked you "remind me again, which end of the needle goes into your arm?" what would you do? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Why emumerated list is empty on 2nd round of print?
On Thu, 06 Sep 2018 11:50:17 -0700, Viet Nguyen via Python-list wrote: > If I do this "aList = enumerate(numList)", isn't it > stored permanently in aList now? Yes, but the question is "what is *it* that is stored? The answer is, it isn't a list, despite the name you choose. It is an enumerate iterator object, and iterator objects can only be iterated over once. If you really, truly need a list, call the list constructor: aList = list(enumerate(numList)) but that's generally a strange thing to do. It is more common to just call enumerate when you need it, not to hold on to the reference for later. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Any SML coders able to translate this to Python?
I have this snippet of SML code which I'm trying to translate to Python: fun isqrt n = if n=0 then 0 else let val r = isqrt (n/4) in if n < (2*r+1)^2 then 2*r else 2*r+1 end I've tried reading up on SML and can't make heads or tails of the "let...in...end" construct. The best I've come up with is this: def isqrt(n): if n == 0: return 0 else: r = isqrt(n/4) if n < (2*r+1)**2: return 2*r else: return 2*r+1 but I don't understand the let ... in part so I'm not sure if I'm doing it right. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Any SML coders able to translate this to Python?
I have this snippet of SML code which I'm trying to translate to Python: fun isqrt n = if n=0 then 0 else let val r = isqrt (n/4) in if n < (2*r+1)^2 then 2*r else 2*r+1 end I've tried reading up on SML and can't make heads or tails of the "let...in...end" construct. The best I've come up with is this: def isqrt(n): if n == 0: return 0 else: r = isqrt(n/4) if n < (2*r+1)**2: return 2*r else: return 2*r+1 but I don't understand the let ... in part so I'm not sure if I'm doing it right. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about floating point
On Sat, 01 Sep 2018 13:27:59 +0200, Frank Millman wrote: >>>> from decimal import Decimal as D >>>> f"{D('1.1')+D('2.2'):.60f}" > '3.3000' >>>> '{:.60f}'.format(D('1.1') + D('2.2')) > '3.3000' >>>> '%.60f' % (D('1.1') + D('2.2')) > '3.2998223643160599749535322189331054687500' >>>> >>>> > The first two format methods behave as expected. The old-style '%' > operator does not. The % operator casts the argument to a (binary) float. The other two don't need to, because they call Decimal's own format method. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about floating point
On Fri, 31 Aug 2018 18:45:16 +1200, Gregory Ewing wrote: > Steven D'Aprano wrote: >> The right way is to >> set the rounding mode at the start of your application, and then let >> the Decimal type round each calculation that needs rounding. > > It's not clear what you mean by "rounding mode" here. If you mean > whether it's up/down/even/whatever, then yes, you can probably set that > as a default and leave it. I mean the rounding mode :-) https://docs.python.org/3/library/decimal.html#rounding-modes > However, as far as I can see, Decimal doesn't provide a way of setting a > default number of decimal places to which results are rounded. You can > set a default *precision*, but that's not the same thing. Indeed it is not. That's a very good point, and I had completely forgotten about it! Thank you. The quantize method is intended for the use-case we are discussing, to round values to a fixed number of decimal places. The Decimal FAQs mention that: https://docs.python.org/3/library/decimal.html#decimal-faq I think this is a good use-case for subclassing Decimal as a Money class. [...] > I don't think this is a bad thing, because often you don't want to use > the same number of places for everything, For example, people dealing > with high-volume low-value goods often calculate with unit prices having > more than 2 decimal places. In those kinds of situations, you need to > know exactly what you're doing every step of the way. As opposed to anyone else calculating with money? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: __init__ patterns
On Thu, 30 Aug 2018 06:01:26 -0700, Tim wrote: > I saw a thread on reddit/python where just about everyone said they > never put code in their __init__ files. Pfft. Reddit users. They're just as bad as Stackoverflow users. *wink* > Here's a stackoverflow thread saying the same thing. > https://stackoverflow.com/questions/1944569/how-do-i-write-good-correct- package-init-py-files > > That's new to me. I like to put functions in there that other modules > within the module need. Thought that was good practice DRY and so forth. Its fine to put code in __init__.py files. If the expected interface is for the user to say: result = package.spam() then in the absence of some specific reason why spam needs to be in a submodule, why shouldn't it go into package/__init__.py ? Of course it's okay for the definition of spam to be in a submodule, if necessary. But it shouldn't be mandatory. > And I never do 'from whatever import *' Ever. > > The reddit people said they put all their stuff into different modules > and leave init empty. Did any one of them state *why* they do this? What benefit is there to make this a hard rule? Did anyone mention what the standard library does? Check out the dbm, logging, html, http, collections, importlib, and curses packages (and probably others): https://github.com/python/cpython/tree/3.7/Lib -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to
On Thu, 30 Aug 2018 05:21:30 -0700, pjmclenon wrote: > my question is ... at the moment i can only run it on windows cmd prompt > with a multiple line entry as so:: > > python createIndex_tfidf.py stopWords.dat testCollection.dat > testIndex.dat titleIndex.dat > > and then to query and use the newly created index as so: > > python queryIndex_tfidf.py stopWords.dat testIndex.dat titleIndex.dat > > how can i run just one file at a time? I don't understand the question. You are running one file at a time. First you run createIndex_tfidf.py, then you run queryIndex_tfidf.py Maybe you mean to ask how to combine them both to one call of Python? (1) Re-write the createIndex_tfidf.py and queryIndex_tfidf.py files to be in a single file. (2) Or, create a third file which runs them both one after another. That third file doesn't even need to be a Python script. It could be a shell script, it would look something like this: python createIndex_tfidf.py stopWords.dat testCollection.dat testIndex.dat titleIndex.dat python queryIndex_tfidf.py stopWords.dat testIndex.dat titleIndex.dat and you would then call it from whatever command line shell you use. > ..or actually link to a front end > GUI ,so when an question or word or words is input to the input box..it > can go to the actiona dnrun the above mentioned lines of code You can't "link to a front end GUI", you have to write a GUI application which calls your scripts. There are many choices: tkinter is provided in the Python standard library, but some people prefer wxPython, PyQT4, or other GUI toolkits. https://duckduckgo.com/?q=python+gui+toolkits -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about floating point
On Thu, 30 Aug 2018 19:22:29 +1200, Gregory Ewing wrote: > Steven D'Aprano wrote: >> Why in the name of all that's holy would anyone want to manually round >> each and every intermediate calculation when they could use the Decimal >> module and have it do it automatically? > > I agree that Decimal is the safest and probably easiest way to go, but > saying that it "does the rounding automatically" is a bit misleading. > > If you're adding up dollars and cents in Decimal, no rounding is needed > in the first place, because it represents whole numbers of cents exactly > and adds them exactly. "Round to exact" is still rounding :-P I did already say that addition and subtraction was exact in Decimal. (I also mentioned multiplication, but that's wrong.) > If you're doing something that doesn't result in a whole number of cents > (e.g. calculating a unit price from a total price and a quantity) you'll > need to think about how you want it rounded, and should probably include > an explicit rounding step, if only for the benefit of someone else > reading the code. If you're not using Banker's Rounding for financial calculations, you're probably up to no good *wink* Of course with Decimal you always have to option to round certain calculations by hand, if you have some specific need to. But in general, that's just annoying and error-prone book-keeping. The right way is to set the rounding mode at the start of your application, and then let the Decimal type round each calculation that needs rounding. The whole point of Decimal, the reason it was invented, was to do this sort of thing. We have here a brilliant hammer specially designed for banging in just this sort of nail, and you're saying "Well, sure, but you probably want to bang it in with your elbow, if only for the benefit of onlookers..." :-) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about floating point
On Wed, 29 Aug 2018 11:31:29 +1200, Gregory Ewing wrote: > Frank Millman wrote: >> I have been trying to explain why >> they should use the decimal module. They have had a counter-argument >> from someone else who says they should just use the rounding technique >> in my third example above. > > It's possible to get away with this by judicious use of rounding. > There's a business software package I work with that does this -- every > number is rounded to a defined number of decimal places before being > stored in its database, so the small inaccuracies resulting from inexact > representations don't get a chance to accumulate. This software package doesn't actually use the *10/10 trick, does it? As an answer to the question, "Should I use this clever *10/10 trick?" I'm not sure it's relevant to say "Yep, sure, this package does something completely different and it works fine!" *wink* > If you're going to do this, I would NOT recommend using the rounding > technique in your example -- it seems to itself be relying on accidents > of the arithmetic. Use the round() function: Or better still, DON'T manually use the round function, let the interpreter do the rounding for you by using Decimal. That's what its for. Why in the name of all that's holy would anyone want to manually round each and every intermediate calculation when they could use the Decimal module and have it do it automatically? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about floating point
On Tue, 28 Aug 2018 16:47:25 +0200, Frank Millman wrote: > The reason for my query is this. I am assisting someone with an > application involving monetary values. I have been trying to explain why > they should use the decimal module. They have had a counter-argument > from someone else who says they should just use the rounding technique > in my third example above. *head-desk* And this is why we can't have nice things. Presumably this money application doesn't just work on hard-coded literal values, right? So this "programmer" your friend is listening to prefers to write this: money = (a + b)*10/10 instead of: money = a + b presumably because programming isn't hard enough without superstitious ritual that doesn't actually solve the problem. In the second case, you have (potentially) *one* rounding error, due to the addition. In the first case, you get the *exact same rounding error* when you do (a+b). Then you get a second rounding error by multiplying by ten, and a third rounding error when you divide by ten. Now its true that sometimes those rounding errors will cancel. You found an example: py> (1.1 + 2.2)*10/10 == 3.3 True but it took me four attempts to find a counter-example, where the errors don't cancel: py> (49675.23 + 10492.95)*10/10 == 60168.18 False To prove it isn't a fluke: py> (731984.84 + 173.32)*10/10 == 732158.16 False py> (170734.84 - 173.39)*10/10 == 170561.45 False Given that it has three possible three rounding errors instead of one, it is even possible that this "clever trick" could end up being *worse* than just doing a single addition. But my care factor isn't high enough to track down an example (if one exists). For nearly all applications involving money, one correct solution is to use either integer numbers of cents (or whatever the smallest currency you ever care about happens to be). Then all additions, subtractions and multiplications will be exact, without fail, and you only need to worry about rounding divisions. You can minimize (but not eliminate) that by calculating in tenths of a cent, which effectively gives you a guard digit. Or, just use decimal, which is *designed* for monetary applications (among other things). You decide on how many decimal places to keep (say, two, or three if you want a guard digit), a rounding mode (Banker's Rounding is recommended for financial applications), and just do your calculations with no "clever tricks". Add two numbers, then add tax: money = (a+b)*(1+t/100) compared to the "clever trick": money = (a+b)*10/10 * (1 + t)*10/10 Which would you rather do? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Generating a specific list of intsgers
On Fri, 24 Aug 2018 14:40:00 -0700, tomusatov wrote: > I am looking for a program able to output a set of integers meeting the > following requirement: > > a(n) is the minimum k > 0 such that n*2^k - 3 is prime, or 0 if no such > k exists > > Could anyone get me started? (I am an amateur) That's more a maths question than a programming question. Find out how to tackle it mathematically, and then we can code it. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Wed, 22 Aug 2018 03:58:29 +1000, Chris Angelico wrote: > On Wed, Aug 22, 2018 at 2:38 AM, Marko Rauhamaa > wrote: >> Gregory Ewing : >> >>> Marko Rauhamaa wrote: >>>> Lexically, there is special access: >>>> >>>>class C: >>>>def __init__(self, some, arg): >>>>c = self >>>>class D: >>>>def method(self): >>>>access(c) >>>>access(some) >>>>access(arg) >>> >>> [...] >>> >>> you can do that without creating a new class every time you want an >>> instance. You just have to be *slightly* more explicit about the link >>> between the inner and outer instances. >> >> By "*slightly* more explicit," do you mean more syntactic clutter? >> >> > No, he actually means "explicit" in the normal English sense. You're > trying to use it in the python-ideas sense of "code that I like", and > since you don't like it, you want to call it "implicit" instead, but it > obviously isn't that, so you call it "syntactic clutter". That's an incredible insight into Marko's internal mental state you have there. And you get that all from the words "syntactic clutter"? I thought he just meant that it was cluttered code. How naive was that? *wink* > But this is actually a case of explicit vs implicit. To be honest, I don't even understand Greg's comment. With no inner class, what is this "inner instance" he refers to here? "you can do that without creating a new class every time you want an instance. You just have to be *slightly* more explicit about the link between the inner and outer instances." Marko wants to use closures. So how do you close over per-instance variables if you create the closures before the instances are created? If we only needed *one* function, there would be no problem: class Outer: def __init__(self, some, arg): c = self def closure(): access(c) access(some) access(arg) # then do something useful with closure But as soon as you have a lot of them, its natural to want to wrap them up in a namespace, and the only solution we have for that is to use a class. Its a truism that anything you can do with a closure, you can do with a class (or vise versa) so I dare say there are alternative designs which avoids closures altogether but we don't know the full requirements here and its hard to judge from the outside on why Marko picked the design he has and whether its a good idea. It could be a case of "ooh, closures are a shiny new hammer, this problem must be a nail!" but let's give him the benefit of the doubt and assume he has good reasons, not just reasons. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Partitioning a list
On Tue, 21 Aug 2018 14:36:30 -0700, Poul Riis wrote: > I would like to list all possible ways to put N students in groups of k > students (suppose that k divides N) with the restriction that no two > students should ever meet each other in more than one group. I think > this is a classical problem If its a classical problem, there should be many solutions written for other languages. Take one of them and port it to Python. (We can help with the Python part if needed.) I've never come across it before. I think the restriction makes it a HARD problem to solve efficiently, but I've spent literally less than two minutes thinking about it so I could be wrong. With no additional restriction it sounds like a classical permutations or combinations problem. Check out the combinatoric iterators functions in the itertools module: https://docs.python.org/3/library/itertools.html -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Tue, 21 Aug 2018 00:36:56 +, Dan Sommers wrote: [...] >>>> (Not that I do this using "inner classes", but I do often want to use >>>> a class as a container for functions, without caring about "self" or >>>> wrapping everything in staticmethod.) >>> >>> Isn't that what modules are for? (I suspect that I'm missing >>> something, because I also suspect that you knew/know that.) >> >> What's the syntax for creating an inner module...? > > Why does it have to be an inner anything? An ordinary, top-level, > "outer" module is a perfectly good "container for functions, without > caring about "self."" And what if you want to subdivide those functions (or other objects) into categories that are finer than the module, without introducing a package structure? We can design the structure of our program into *outward* hierarchies, by adding packages with subpackages and sub-subpackages: import spam.eggs.cheese.tomato.aardvark So using the file system and packages, we can logically nest modules inside modules inside modules 'til the cows come home. But that's a fairly heavyweight solution, in the sense that it requires separate directory for each level of the hierarchy. Sometimes a package is too much. I want a single module file, but still want to pull out a collection of related functions and other objects and put them in their own namespace, but without creating a new module. The Zen says: Namespaces are one honking great idea -- let's do more of those! but Python's namespaces are relatively impoverished. We have packages, modules, classes and instances, and that's it. Classes and instances come with inheritance, self etc which is great if you want a class, but if you just want a simple module-like namespace without the extra file, classes are a pretty poor alternative. But they're all we've got. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Mon, 20 Aug 2018 22:55:26 +0300, Marko Rauhamaa wrote: > Dan Sommers : > >> On Mon, 20 Aug 2018 14:39:38 +0000, Steven D'Aprano wrote: >>> I have often wished Python had proper namespaces, so I didn't have to >>> abuse classes as containers in this way :-( >>> >>> (Not that I do this using "inner classes", but I do often want to use >>> a class as a container for functions, without caring about "self" or >>> wrapping everything in staticmethod.) >> >> Isn't that what modules are for? (I suspect that I'm missing >> something, because I also suspect that you knew/know that.) > > What's the syntax for creating an inner module...? from types import ModuleType m = ModuleType('m') m.one = 1 m.a = 'a' m.b = lambda x: x + one except that not only doesn't it look nice, but it doesn't work because the m.b function doesn't pick up the m.one variable, but a global variable instead. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Idle
On Mon, 20 Aug 2018 14:46:32 +0400, NAB NAJEEB wrote: > Hi am a beginner can u tell me where can I write my codes I already > tried pycharm and atom.. both are not worked successfully always shows > error...pls guide me... What errors do they show? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Mon, 20 Aug 2018 11:40:16 +0300, Marko Rauhamaa wrote: >class C: >def __init__(self, some, arg): >c = self >class D: >def method(self): >access(c) >access(some) >access(arg) > > IOW, inner class D is a container for a group of interlinked closure > functions. If a class' methods don't use self, it probably shouldn't be a class. I have often wished Python had proper namespaces, so I didn't have to abuse classes as containers in this way :-( (Not that I do this using "inner classes", but I do often want to use a class as a container for functions, without caring about "self" or wrapping everything in staticmethod.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Mon, 20 Aug 2018 15:58:57 +0300, Marko Rauhamaa wrote: [...] >> The point is that creating a class object every time you want a closure >> is pointlessly wasteful. There is *no benefit whatsoever* in doing >> that. If you think there is, then it's probably because you're trying >> to write Java programs in Python. > > The benefit, as in using closures in general, is in the idiom. Yes, but you get closures by nesting functions, not by nesting classes. We don't *inherently* have to duplicate Java style nested classes in order to get the container of closures you mentioned earlier. We don't have to duplicate the idiom exactly, if it doesn't match the execution model of Python. On the other hand, there's no need to optimize this if it isn't critical code in your application. As I have often said, the question isn't whether Python is fast or not, but whether it is *fast enough*. If you are aware of the potential pitfalls, and the code is fast enough, and refactoring it to something faster and less pitfall-y is too difficult (or not a priority), then that's fine too. >>> But now I'm thinking the original Java approach (anonymous inner >>> classes) is probably the most versatile of them all. A single function >>> rarely captures behavior. That's the job of an object with its >>> multiple methods. In in order to create an ad-hoc object in Python, >>> you will need an ad-hoc class. >> >> An important difference between Python and Java here is that in Python >> the class statement is an *executable* statement, whereas in Java it's >> just a declaration. So putting a class statement inside a Python >> function incurs a large runtime overhead that you don't get with a Java >> inner class. > > The same is true for inner def statements. Indeed. Inner def statements are not very useful (in my opinion, although I believe Tim Peters disagrees) unless they are used as closures. The biggest problem with the idea of using inner functions in the Pascal sense is that you can't test them since they aren't visible from the outside. > I don't see how creating a class would be fundamentally slower to > execute than, say, adding two integers. Well, fundamentally adding two integers could be as quick as a single machine instruction to add two fixed-width ints. CPUs are pretty much optimized to do that *really quickly*. Creating a new class requires allocating a chunk of memory (about 500 bytes in Python), calling the appropriate metaclass, setting a bunch of fields, possibly even executing arbitrary metaclass methods. Its not as expensive as (say) listing the first trillion digits of pi but it surely is going to be more costly than adding two ints. [...] > Anyway, in practice on my laptop it takes 7 µs to execute a class > statement, which is clearly worse than executing a def statement (0.1 > µs) or integer addition (0.05 µs). However, 7 microseconds is the least > of my programming concerns. And fair enough... premature optimization and all that. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Writing bytes to stdout reverses the bytes
On Mon, 20 Aug 2018 00:31:35 +, Steven D'Aprano wrote: > When I write bytes to stdout, why are they reversed? Answer: they aren't, use hexdump -C. Thanks to all replies! -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: How to multiply dictionary values with other values based on the dictionary's key?
On Sun, 19 Aug 2018 05:29:46 -0700, giannis.dafnomilis wrote: > With your help I have arrived at this point: I have the dictionary > varsdict (size 5) as below > > KeyTypeSize Value > FEq_(0,_0,_0,_0) float11.0 > FEq_(0,_0,_1,_1) float11.0 > FEq_(0,_0,_2,_2) float11.0 > FEq_(0,_0,_3,_0) float11.0 > FEq_(0,_0,_4,_1) float11.0 That's not a Python dict. It looks like some sort of table structure. How do you get this? (What menu command do you run, what buttons to you click, etc?) I'm guessing you are using an IDE ("Integrated Development Environment") like Anaconda or similar. Is that right? Python dicts print something like this: {'FEq_(0,_0,_4,_1)': , 'FEq_(0,_0,_3,_0)': } If you run print(varsdict) what does it show? (I have limited time to respond at the moment, so apologies for the brief answers. Hopefully someone else will step in with some help too.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Writing bytes to stdout reverses the bytes
When I write bytes to stdout, why are they reversed? [steve@ando ~]$ python2.7 -c "print('\xfd\x84\x04\x08')" | hexdump 000 84fd 0804 000a 005 [steve@ando ~]$ python3.5 -c "import sys; sys.stdout.buffer.write(b'\xfd \x84\x04\x08\n')" | hexdump 000 84fd 0804 000a 005 -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: How to multiply dictionary values with other values based on the dictionary's key?
On Sun, 19 Aug 2018 03:35:24 -0700, giannis.dafnomilis wrote: > On Sunday, August 19, 2018 at 3:53:39 AM UTC+2, Steven D'Aprano wrote: [...] >> If you know absolutely for sure that the key format is ALWAYS going to >> be 'FEq_()' then you can extract the fields using slicing, like >> this: >> >> key = 'FEq_(0,_0,_2,_2)' >> fields = key[5, -1] # cut from char 5 to 1 back from the end [...] >> - delete any underscores >> - split it on commas >> - convert each field to int >> - convert the list of fields to a tuple >> >> fields = fields.replace('_', '') >> fields = string.split(',) >> fields = tuple([int(x) for x in fields]) >> >> >> and then you can use that tuple as the key for A. > > When I try to this, I get the message 'fields = key[5, -1]. TypeError: > string indices must be integers'. Ouch! That was my fault, sorry, it was a typo. You need a colon, not a comma. Sorry about that! Try this instead: key = 'FEq_(0,_0,_2,_2)' fields = key[5:-1] fields = fields.replace('_', '') fields = fields.split(',') fields = tuple([int(x) for x in fields]) print(fields) which this time I have tested. (More comments later, time permitting.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: How to multiply dictionary values with other values based on the dictionary's key?
On Sun, 19 Aug 2018 03:15:32 -0700, giannis.dafnomilis wrote: > Thank you MRAB! > > Now I can get the corresponding dictionary value A[i,j,k,l] for each key > in the varsdict dictionary. > > However how would I go about multiplying the value of each > FEq_(i,_j,_k,_l) key with the A[i,j,k,l] one? Do you have any insight in > that? Do you want to modify the varsdict values in place? varsdict['Feq_(i,_j,_k,_l)'] *= A[i,j,k,l] which is a short-cut for this slightly longer version: temp = varsdict['Feq_(i,_j,_k,_l)'] * A[i,j,k,l] varsdict['Feq_(i,_j,_k,_l)'] = temp If you want to leave the original in place and do something else with the result: result = varsdict['Feq_(i,_j,_k,_l)'] * A[i,j,k,l] print(result) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Sun, 19 Aug 2018 11:43:44 +0300, Marko Rauhamaa wrote: > Steven D'Aprano : > >> On Sun, 19 Aug 2018 00:11:30 +0300, Marko Rauhamaa wrote: >> >>> In Python programming, I mostly run into closures through inner >>> classes (as in Java). >> >> Inner classes aren't closures. > > At least some of the methods of inner classes are closures (or there > would be no point to an inner class). (1) Ironically, the only times I've used an inner class, its methods were not closures. So yes, there are sometimes uses for inner classes that don't include closures. There's an example in the argparse module in the standard library, and it too has no closures. (2) Whether or not the methods of an inner class are closures depends on the methods, not the fact that it is an inner class. There are no closures here: class Outer: class Inner: ... no matter what methods Inner has. Nor is this a closure: class Outer: def method(self): class Inner: def spam(self): return self.eggs return Inner since the spam method doesn't close over any of the variables in method. You made a vague comment about inner classes being equivalent to closures in some unknown fashion, but inner classes are not themselves closures, and the methods of inner classes are not necessarily closures. >> Its also quite expensive to be populating your application with lots of >> classes used only once each, which is a common pitfall when using inner >> classes. Memory is cheap, but it's not so cheap that we ought to just >> profligately waste it needlessly. > > That is a completely separate question. It wasn't a question, it was an observation. > There's is no a-priori reason for inner classes to be wasteful; Not in languages where classes are declared statically and built at compile-time, no. But in a language like Python where classes are executable statements that are built at run time, like constructing any other mutable object, it is very easy to use them badly and waste memory. This doesn't look harmful: def func(x): class Record: def __init__(self, a): self.a = a return Record(x) but it is. You might not like that design, but it is part of Python's execution model and whether you like it or not you have to deal with the consequences :-) > they > have been part and parcel of Java programming from its early days, and > Java is widely used for high-performance applications. https://dirtsimple.org/2004/12/python-is-not-java.html > CPython does use memory quite liberally. I don't mind that as > expressivity beats performance in 99% of programming tasks. Fair enough, but in the example I showed above, the practical effect is to increase the de facto size of the objects returned by func() twenty times. And fragment memory as well. In a long-lived application where you are calling func() a lot, and saving the objects, it all adds up. >>> populating an object with fields (methods) in a loop is very rarely a >>> good idea. >> >> Of course it is *rarely* a good idea > > So no dispute then. Isn't there? Then why are you disagreeing with me about the exceptional cases where it *is* a good idea? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: How to multiply dictionary values with other values based on the dictionary's key?
On Sat, 18 Aug 2018 16:16:54 -0700, giannis.dafnomilis wrote: > I have the results of an optimization run in the form found in the > following pic: https://i.stack.imgur.com/pIA7i.jpg. Unless you edit your code with Photoshop, why do you think a JPEG is a good idea? That discriminates against the blind and visually impaired, who can use screen-readers with text but can't easily read text inside images, and those who have access to email but not imgur. For the record, here's the table: Key TypeSizeValue FEq_(0,_0,_0,_0)float 1 1.0 FEq_(0,_0,_1,_1)float 1 1.0 FEq_(0,_0,_2,_2)float 1 1.0 FEq_(0,_0,_3,_0)float 1 1.0 FEq_(0,_0,_4,_1)float 1 1.0 It took me about 30 seconds to copy out by hand from the image. But what it means is a complete mystery. Optimization of what? What you show isn't either Python code or a Python object (like a dict or list) so it isn't any value to us. > How can I multiply the dictionary values of the keys FEq_(i,_j,_k,_l) > with preexisting values of the form A[i,j,k,l]? > > For example I want the value of key 'FEq_(0,_0,_2,_2)' multiplied with > A[0,0,2,2], the value of key 'FEq_(0,_0,_4,_1)' multiplied with > A[0,0,4,1] etc. for all the keys present in my specific dictionary. Sounds like you have to parse the key for the number fields: - extract out the part between the parentheses '0,_0,_2,_2' If you know absolutely for sure that the key format is ALWAYS going to be 'FEq_()' then you can extract the fields using slicing, like this: key = 'FEq_(0,_0,_2,_2)' fields = key[5, -1] # cut from char 5 to 1 back from the end If you're concerned about that "char 5" part, it isn't an error. Python starts counting from 0, not 1, so char 1 is "E" not "F". - delete any underscores - split it on commas - convert each field to int - convert the list of fields to a tuple fields = fields.replace('_', '') fields = string.split(',) fields = tuple([int(x) for x in fields]) and then you can use that tuple as the key for A. It might be easier and/or faster to convert A to use string keys "FEq_(0,_0,_2,_2)" instead. Or, depending on the size of A, simply make a copy: B = {} for (key, value) in A.items(): B['FEq(%d,_%d,_%d,_%d)' % key] = value and then do your look ups in B rather than A. > I have been trying to correspondingly multiply the dictionary values in > the form of > varsdict["FEq_({0},_{1},_{2},_{3})".format(i,j,k,l)] > > but this is not working as the indexes do not iterate consequently over > all their initial range values, they are the results of the optimization > so some elements are missing. I don't see why the dictionary lookup won't work just because the indexes aren't consistent. When you look up varsdict['FEq_(0,_0,_2,_2)'] it has no way of knowing whether or not 'FEq_(0,_0,_1,_2)' previously existed. I think you need to explain more of what you are doing rather than just dropping hints. *Ah, the penny drops* ... Are you trying to generate the keys by using nested loops? for i in range(1000): # up to some maximum value for j in range(1000): for k in range(1000): for l in range(1000): key = "FEq_({0},_{1},_{2},_{3})".format(i,j,k,l) value = varsdict[key] # this fails That's going to be spectacularly wasteful if the majority of keys don't exist. Rather, you should just iterate over the ones that *do* exist: for key in varsdict: ... -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Sun, 19 Aug 2018 00:11:30 +0300, Marko Rauhamaa wrote: > In Python programming, I mostly run into closures through inner classes > (as in Java). Inner classes aren't closures. Its also quite expensive to be populating your application with lots of classes used only once each, which is a common pitfall when using inner classes. Memory is cheap, but it's not so cheap that we ought to just profligately waste it needlessly. > populating an object with fields (methods) in a loop is very > rarely a good idea. Of course it is *rarely* a good idea, because it is rare for the fields to be either identical (except for the name) or algebraically derived from the loop counter. Using a dict in place of an object, it's hard to see any elegant way to move this into a loop: {'a': 10, 'B': -2, 'c': 97, 'd': None, 'h': 'surprise!', 'm': []} and so we should not. Any such loop would surely be complex, complicated, obscure, even obfuscated compared to writing out the dict/object assignments manually. But in context, we're not discussing the millions of cases were the methods/fields are naturally written out manually. So give me credit for not being a total idiot. Not once in this thread have I suggested that we ought to run through all our projects, changing every class and putting all methods inside factories. It goes without saying that under usual, common circumstances we write out our methods manually. I was speaking about one very specific case: * You have a fair number of identical methods in a single class. Our choices are, (1): - write a large block of mindless boilerplate; - even worse, have that same boilerplate but split it up, scattering the individual methods all around the class; - either way, it is repetitious and error-prone, with obvious reliability and maintenance problems: def foo(self): return NotImplemented def bar(self): return NotImplemented def baz(self): return NonImplemented or, (2): - automate the repetitious code by moving the method definitions into a loop. Obviously there is some (small) complexity cost to automating it. I didn't specify what a fair number of methods would be (my example showed four, but that was just an illustration, not real code). In practice I wouldn't even consider this for three methods. Six or eight seems like a reasonable cut-of point for me, but it depends on the specifics of the code and who I was writing it for. (Note that this makes me much more conservative than the usual advice given by system admins, when you need to do the same thing for the third time, write a script to automate it.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Sat, 18 Aug 2018 00:33:26 +0300, Marko Rauhamaa wrote: > Chris Angelico : >> Programming is heavily about avoiding duplicated work. > > That is one aspect, but overcondensing and overabstracting programming > logic usually makes code less obvious to its maintainer. That may very well be true, but we're not talking about those evils here. We're talking about a simple factory technique for creating a number of identical objects in a loop. [...] > I would guess such techniques could come in handy in some framework > development but virtually never in ordinary application development. In > a word, steer clear of metaprogramming. Depending on your definition of metaprogramming, either: (1) this either isn't metaprogramming at all, merely programming and no more scary than populating a dict at runtime; or (2) if you mean what you say, that means no decorators, no closures, no introspection ("reflection" in Java terms), no metaclasses (other than type), no use of descriptors (other than the built-in ones), no template- based programming, no source-code generators. No namedtuples, Enums, or data-classes. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Fri, 17 Aug 2018 15:19:05 +0200, Peter Otten wrote: > You usually do not want many identical (or very similar) methods because > invoking the right one is then errorprone, too, and you end up with an > interface that is hard to maintain. At some point you may need to > introduce subtle changes to one out of ten methods, These hypotheticals are fairly tedious. "At some point you might..." yeah, you might, but you probably won't, and in the meantime, YAGNI. And if you do, it is easy to do: pull the special method out of the loop. Or add code to modify it after the loop. Or make the changes in a subclass. We do these things *all the time* for data objects, creating them in a loop then modifying those that need modifying. There is *no difference* here: methods can be treated as data objects too. > and later someone > else may overlook that specific angle in the documentation... You say that as if people never failed to read the documentation about "regular" methods that are made by hand in the conventional way. > If you have many similar methods you should spend your time on reducing > their number rather than to find shortcuts to automate their creation. The assumption here is that the basic design is sound. Why do you assume it isn't? According to the OP Frank, this design has been in production for many years and works well. While I personally have some reservations that using subclasses is the best solution, I'm willing to give him the benefit of the doubt rather than insult his competence by assuming it is a broken design without ever seeing the code. > Programming is not only about avoiding duplication, it is also about > stating your intents clearly. Indeed. And what states the intention "These methods are identical except in their name" more strongly than creating them in a loop? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Fri, 17 Aug 2018 11:49:01 +, Jon Ribbens wrote: > On 2018-08-17, Steven D'Aprano > wrote: >> On the other hand, your objection to the following three idioms is as >> good an example of the Blurb Paradox as I've ever seen. > > Do you mean the Blub Paradox? If so, you're misunderstanding or at least > misapplying it. Yes, that was a simple typo, and no, I'm not misunderstanding it. You're looking up the ladder to a more powerful technique available in Python (methods as first-class values capable of being manipulated like any other object) and dismissing it in favour of mindless boilerplate containing duplicated code, and requiring oodles of copy-and-paste programming to maintain. Graham used the "Blub Paradox" to describe programmers' failure to understand more powerful features available in languages they didn't use, but there's no reason why this failure applies only to comparisons between languages. It also applies to arguments about idioms within a single language. That's the Blub Paradox too, even though only a single language is involved. >>> * code running directly under the class definition >>> * creating a method then changing its name with foo.__name__ >>> * poking things into to the class namespace with locals() >> >> Each of these are standard Python techniques, utterly unexceptional. > > I guess we'll have to agree to disagree there. >> "Code running directly under the class" describes every use of the >> class keyword (except those with an empty body). If you write: >> >> class Spam: >> x = 1 >> >> you are running code under the class. This is not just a pedantic >> technicality, > > Yes, it absolutely is, in this context. Having code other than > assignments and function definitions under the class statement is > extremely rare. Its rare because it isn't needed often, not because it is broken or dangerous or illegal or fattening. [...] >> You might be thinking of the warning in the docs: >> >> "Dynamically adding abstract methods to a class, [...] [is] not >> supported." >> >> but that is talking about the case where you add the method to the >> class after the class is created, from the outside: > > Yes, I was referring to that. You may well be right about what it means > to say, but it's not what it actually says. *shrug* It was obvious to me that it wasn't talking about methods dynamically inserted inside the class body since ALL methods are dynamically inserted inside the class body. If that was what it meant, it would be saying that abstractmethod never works. Clearly that's absurd, since it does work. So why interpret it as saying something absurd instead of using a bit of common sense and knowledge of how Python words to interpret it correctly? Inside a class (or at the global scope) there is no meaningful difference between these: spam = eggs locals()['spam'] = eggs >>> (Not to mention your code means the methods cannot have meaningful >>> docstrings.) >> >> Of course they can, provided they're all identical, give or take some >> simple string substitutions. > > Hence "meaningful". They can still be meaningful even if identical. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Fri, 17 Aug 2018 08:14:02 +0200, Frank Millman wrote: > How would you extend it without a long chain of > if isinstance(v, str): > [perform checks for str] > elif isinstance(v, int) > [perform checks for int] > etc > etc > > I find that using a separate method per subclass does exactly what I > want, and that part of my project has been working stably for some time. You might consider using single dispatch instead: https://docs.python.org/3/library/functools.html#functools.singledispatch -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
e] = inner ... del inner ... py> class Bar(Foo): ... pass ... py> Bar() Traceback (most recent call last): File "", line 1, in TypeError: Can't instantiate abstract class Bar with abstract methods eggs, spam > (Not to mention your code means the methods cannot have meaningful > docstrings.) Of course they can, provided they're all identical, give or take some simple string substitutions. The idea here is to remove boilerplate code, not to have to do large amounts of significant computation for each placeholder. If it required more than a few template substitutions: inner.__doc__ %= (a, b) at that point I'd bite the bullet and prefer the pain of swaths of dumb boilerplate. But simple transformations are no big deal. You create the method, set the docstring, change its name, set it to abstract, and make it an attribute of the class. Who can't reason about four simple steps like that? Cross out the method- specific details, using a "widget" instead: class X: for name in ('red', 'green', 'blue', 'yellow'): widget = Widget(1, 2, 3) widget.set_state('not ready') widget.serial_number = get_serial_number() locals()[name] = widget Anyone who couldn't reason about that probably shouldn't be calling themselves a programmer. Making the widgets methods instead doesn't change that. > I would refuse a pull request containing code such as the above, unless > the number of methods being dynamically created was much larger than 4, > in which case I would refuse it because the design of a class requiring > huge numbers of dynamically created methods is almost certainly > fundamentally broken. If a class has "huge" (what, a hundred? a thousand?) methods, regardless of whether they are abstract or concrete or generated inside a factory or written out by hand, the class probably does too much. But a class with 30 methods is fine (strings have at least 50), and if six or ten of them are generated by a factory, what's the big deal? Writing out methods with identical bodies is brainless boilerplate. I don't clog up my code with brainless boilerplate unless there is a really good reason for it. There always a trade-off to be made, choosing between the (slight) extra complexity of a factory solution versus the tedious, error-prone volume of boilerplate, so in practice, I probably wouldn't switch to a factory solution for merely four methods with empty bodies. But I certainly would for eight. When making this trade-off, "my developers don't understand Python's execution model or its dynamic features" is not a good reason to stick to large amounts of mindless code. That's a good reason to send the developer in question to a good Python course to update their skills. (Of course if you can't do this for political or budget reasons, I sympathise.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Program to output a subset of the composite numbers
On Wed, 15 Aug 2018 05:34:06 -0700, tomusatov wrote: > I am not terribly familiar with Python, but am currently authoring an > integer sequence for www.oeis.org and was wondering if anyone in the > community could help me with authoring a Python program that outputs, > "Composite numbers that are one less than a composite number." Do you have a function to test for primality? For now, I'll assume you do. def is_prime(n): # Returns True if n is prime, False otherwise. # implementation left as an exercise # 0 (and 1) are neither prime nor composite; skip them. # 2 and 3 are prime; start at the first composite, 4 i = 4 for j in range(5, 1001): if not is_prime(j): print(i) i = j The above will stop at 999. To go forever, use this instead: from itertools import count i = 4 for j in count(5): if not is_prime(j): print(i) i = j Alternatively, if you have a function which efficiently returns primes one at a time, you can do this: n = 4 # start at the first composite for p in primes(5): # primes starting at 5 print(list(range(n, p-1)) n = p + 1 This ought to print out lists of composites, starting with: [] [] [8, 9] [] [14, 15] etc. Take care though: I have not tested this code. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Tue, 14 Aug 2018 15:18:13 +, Jon Ribbens wrote: > On 2018-08-14, Steven D'Aprano > wrote: >> If there really are a lot of such missing methods, I'd consider writing >> something like this: >> >> class A: >> def __init__(self, ...): >> ... >> >> # === process abstract methods en masse === >> for name in "method_a method_b method_c method_d".split(): >> @abstractmethod >> def inner(self): >> raise NotImplementedError >> inner.__name__ = name >> # This is okay, writing to locals works inside the class body. >> locals()[name] = inner >> >> del inner, name # Clean up the class namespace. > > You have a peculiar idea of "good style"... Yes, very peculiar. It's called "factor out common operations" and "Don't Repeat Yourself" :-) In a world full of people who write: d[1] = None d[2] = None d[3] = None d[4] = None I prefer to write: for i in range(1, 5): d[i] = None Shocking, I know. Literally my first professional programming job was working on a Hypercard project written by a professional programmer. (He was paid for it, so he was professional.) The first time I looked at his code, as a fresh-out-of-uni naive coder, I was surprised to read his GUI set-up code. By memory, it was something like this: set the name of button 1 to "Wibble 1" set the name of button 2 to "Wibble 2" set the name of button 3 to "Wibble 3" set the name of button 4 to "Wibble 4" # and so on... set the name of button 100 to "Wibble 100" (using "Wibble" as a placeholder for the actual name, which I don't recall). The first thing I did was replace that with a loop: for i = 1 to 100 do set the name of button 100 to ("Wibble " & i) end for Hypertalk uses & for string concatenation. That one change cut startup time from something like 90 seconds to about 30, and a few more equally trivial changes got it down to about 15 seconds. Hypertalk in 1988 was not the fastest language in the world, but it was fun to work with. >> although to be honest I'm not sure if that would be enough to stop >> PyLint from complaining. > > No - if you think about it, there's no way Pylint could possibly know > that the above class has methods method_a, method_b, etc. Well, if a human reader can do it, a sufficiently advanced source-code analyser could do it too... *wink* Yes, of course you are right, in practical terms I think it is extremely unlikely that PyLint or any other linter is smart enough to recognise that locals()[name] = inner is equivalent to setting attributes method_a etc. I actually knew that... "although to be honest I'm not sure" is an understated way of saying "It isn't" :-) https://en.wikipedia.org/wiki/Litotes > It also > doesn't like the `del inner, name` because theoretically neither of > those names might be defined, if the loop executed zero times. That's a limitation of the linter. Don't blame me if it is too stupid to recognise that looping over a non-empty string literal cannot possibly loop zero times :-) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Pylint false positives
On Tue, 14 Aug 2018 10:58:17 +0200, Frank Millman wrote: >> > I have an abstract class ClassA with a number of concrete >> > sub-classes. ClassA has a method which invokes 'self.method_b()' >> > which is defined separately on each sub-class. Pylint complains that >> > "Instance of 'ClassA' has no 'method_b' member". [...] > I do mean a lot of methods, not classes. I don't have any problem adding > the lines. It is just that, before I starting using pylint, it had not > occurred to me that there was any problem with my approach. If an > experienced python programmer was reviewing my code, would they flag it > as 'bad style'? *shrug* I wouldn't necessarily call it *bad*, but perhaps *not-quite good* style. I think its fine for a small projects and quick scripts, especially if they're written and maintained by a single person for their own use. Perhaps not so much for large projects intended for long-term use with continual development. If there really are a lot of such missing methods, I'd consider writing something like this: class A: def __init__(self, ...): ... # === process abstract methods en masse === for name in "method_a method_b method_c method_d".split(): @abstractmethod def inner(self): raise NotImplementedError inner.__name__ = name # This is okay, writing to locals works inside the class body. locals()[name] = inner del inner, name # Clean up the class namespace. def concrete_method_a(self): ... although to be honest I'm not sure if that would be enough to stop PyLint from complaining. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Python-Tkinter issue. Multiple overlaping event routines called by single click
On Sun, 12 Aug 2018 01:30:43 +0100, MRAB wrote: > On 2018-08-11 21:01, wfgazd...@gmail.com wrote: >> I have a main window open. Then I open a tk.TopLevel dialog window >> giving the user multiple choices. He selects one, the corresponding >> event is executed. Then in the underlining main window, just by chance >> there is another button exactly under the mouse click in the TopLevel >> dialog window. Its corresponding event is then triggered. >> >> How can I keep the main window button that just happens to be in the >> wrong place from being triggered? >> > The handler should return the string "break" to prevent the event from > propagating further. Are you doing that? It's surprising how far you can > go without it before running into a problem! I think you are mistaken: https://stackoverflow.com/a/12357536 but since the description of the problem is so vague, it is hard to tell exactly what's happening. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Python-Tkinter issue. Multiple overlaping event routines called by single click
On Sat, 11 Aug 2018 13:01:44 -0700, wfgazdzik wrote: > I have a main window open. Then I open a tk.TopLevel dialog window > giving the user multiple choices. He selects one, the corresponding > event is executed. Then in the underlining main window, just by chance > there is another button exactly under the mouse click in the TopLevel > dialog window. Its corresponding event is then triggered. Sounds to me that the user is clicking twice, once in the dialog, and then a second time just as it disappears and the main window takes focus. Possibly they are trying to double-click. Or their mouse is faulty. Unless you can replicate this with multiple users, the most likely cause is user-error. And unless you can eliminate user-error, trying to work- around users who click randomly is a nightmare. How do you decide which clicks are intended and which are not? > How can I keep the main window button that just happens to be in the > wrong place from being triggered? If you put in a delay between enacting the event and closing the dialog, I reckon the problem will go away... but instead you'll have two click events in the dialog. But it seems like an interesting experiment... put time.sleep(0.3) at the end of the event handler and see what happens. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Can't figure out how to do something using ctypes (and maybe struct?)
On Fri, 10 Aug 2018 18:40:11 -0400, inhahe wrote: > I need to make a list of instances of a Structure, then I need to make > an instance of another Structure, one of the fields of which needs to be > an arbitrary-length array of pointers to the instances in the list. How > do I do that? > > Just in case it helps, I'll include what I tried that didn't work: How about simplifying your example to the smallest and simplest example of the problem? Your example has: - two functions; - one method that seems to have become unattached from its class; - two classes; - using 12 different fields. Surely not all of that detail is specific to the problem you are happening. If you can simplify the problem, the solution may be more obvious. It might help to read this: http://sscce.org/ By the way, unrelated to your specific problem but possibly relevant elsewhere, you have this function: > def mkVstEvents(events): > class Events(ctypes.Structure): > _fields_ = [ ... ] > return Events( ... ) You might not be aware of this, but that means that every time you call mkVstEvents, you get a singleton instance of a new and distinct class that just happens to have the same name and layout. So if you did this: a = mkVstEvents( ... ) b = mkVstEvents( ... ) then a and b would *not* be instances of the same class: isinstance(a, type(b)) # returns False isinstance(b, type(a)) # returns False type(a) == type(b) # also False Each time you call the function, it creates a brand new class, always called Events, creates a single instance of that class, and returns it. That is especially wasteful of memory, since classes aren't small. py> class Events(ctypes.Structure): ... pass ... py> sys.getsizeof(Events) 508 Unless that's what you intended, you ought to move the class outside of the function. class Events(ctypes.Structure): _fields_ = [ ... ] def mkVstEvents(events): return Events( ... ) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: name 'aLOCK' is not defined When I add aLOCK = threading.RLock() behind if __name__ == "__main__"
On Fri, 10 Aug 2018 08:15:09 +0200, Karsten Hilbert wrote: > On Fri, Aug 10, 2018 at 12:24:25AM +0800, xuanwu348 wrote: > >> Yes, move the code from positionA(can run normally) to >> positionB(exception with name undefined) I find this content >> "https://docs.python.org/3.3/tutorial/classes.html#python-scopes-and- namespaces" >> But I still don't undewrstand the differenct of scopes-and-namespaces >> between positionA and positionB, >> >> I think the variable "aLock" at these positionA or positionB are all >> global. > > When something goes wrong in an unexpected way: test your assumptions > ;-) xuanwu348's assumptions are correct. aLock is a global, in both positions. The problem is not the scope of the variable, but whether or not the variable is assigned to or not. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
RFC -- custom operators
Request for comments -- proposal to allow custom binary operators. I'll looking for comments on custom binary operators: would it be useful, if so, what use-cases do you have? The most obvious and often-requested use-case would be for a form of logical operator (AND, OR, XOR) that is distinct from the bitwise operators & | ^ but unlike the standard `and` and `or` operators, calls dunder methods. The proposal is to add custom operators. A placeholder syntax might be: spam OP eggs which would then delegate to special dunder methods __OP__ or __ROP__ similar to existing operators such as + and similar. I don't want to get into arguments about syntax, or implementation details, unless there is some interest in the functionality. Please focus on *functional requirements* only. (1) This proposal requires operators to be legal identifiers, such as "XOR" or "spam", not punctuation like % and absolutely not Unicode symbols like ∉ (2) For the sake of the functional requirements, assume that we can parse `spam OP eggs` without any ambiguity; (3) This only proposes binary infix operators, not unary prefix or postfix operators; infix:argument1 OP argument2 prefix: OP argument postfix: argument OP (4) This does not propose to allow the precedence to be set on a case-by-case basis. All custom operators will have the same precedence. (5) What should that precedence be? (6) This does not propose to set the associativity on a case-by-case basis. All custom operators will have the same associativity. (7) Should the operators be left-associative (like multiplication), right-associative (like exponentiation), or non-associative? # Left-associative: a OP b OP c# like (a OP b) op c # Right-associative: a OP b OP c# like a OP (b op c) In the last case, that would make chained custom operators intentionally ambiguous (and hence a SyntaxError) unless disambiguated with parentheses: # Non-associative: a OP b OP c# SyntaxError (a OP b) OP c # okay a OP (b OP c) # okay (8) This does not propose to support short-circuiting operators. I'm not interested in hearing theoretical arguments that every infix operator can be written as a function or method call. I know that. I'm interested in hearing about use-cases where the code is improved and made more expressive by using operator syntax and existing operators aren't sufficient. (If there aren't any such use-cases, then there's no need for custom operators.) Thoughts? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: NLTK
On Fri, 03 Aug 2018 07:49:40 +, mausg wrote: > I like to analyse text. my method consisted of something like > words=text.split(), which would split the text into space-seperated > units. In natural language, words are more complicated than just space-separated units. Some languages don't use spaces as a word delimiter. Some don't use word delimiters at all. Even in English, the we have *compound words* which exist in three forms: - open: "ice cream" - closed: "notebook" - hyphenated: "long-term" Recognising open compound words is difficult. "Real estate" is an open compound word, but "real cheese" and "my estate" are both two words. Another problem for English speakers is deciding whether to treat contractions as a single word, or split them? "don't" --> "do" "n't" "they'll" --> "they" "'ll" Punctuation marks should either be stripped out of sentences before splitting into words, or treated as distinct tokens. We don't want "tokens" and "tokens." to be treated as distinct words, just because one happened to fall at the end of a sentence and one didn't. > then I tried to use the Python NLTK library, which had alot of > features I wanted, but using `word-tokenize' gives a different > answer.- > > What gives?. I'm pretty sure the function isn't called "word-tokenize". That would mean "word subtract tokenize" in Python code. Do you mean word_tokenize? Have you compared the output of the two and looked at how they differ? If there is too much output to compare by eye, you could convert to sets and check the set difference. Or try reading the documentation for word_tokenize: http://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.treebank.TreebankWordTokenizer -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: machine learning forums
On Sun, 05 Aug 2018 06:05:46 -0700, Sharan Basappa wrote: > I am quite new to Python. I am learning Python as I am interested in > machine learning. The issue is, I have not found any ML forum where > novices like me can get help. I have tried reddit and each of my posts > have gone unanswered. Which subreddits have you posted to? > Looks like reddit forum prefers either abstract > topics on ML or very complex issues for discussions. > > I have tried stackoverflow also but there only programming issues are > entertained I believe Stackoverflow has a dedicated machine-learning site, "Cross Validated": https://meta.stackexchange.com/questions/130524/which-stack-exchange- website-for-machine-learning-and-computational-algorithms https://meta.stackexchange.com/questions/227757/where-to-ask-basic- questions-about-machine-learning -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: beware of linked in - mail used on this list
On Thu, 02 Aug 2018 22:35:10 +0400, Abdur-Rahmaan Janhangeer wrote: > just an info if you are using the mail you use in this list for linked > in you might get surprises > > apologies if you got a mail from linkedin somewhere LinkedIn is a spammer. I frequently get friend requests from people who I don't know from LinkedIn, and most of them don't even know they sent them. I got three from you yesterday. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Are dicts supposed to raise comparison errors
On Wed, 01 Aug 2018 22:14:54 +0300, Serhiy Storchaka wrote: > 01.08.18 21:03, Chris Angelico пише: >> And in any code that does not and cannot run on Python 2, the warning >> about bytes and text comparing unequal is nothing more than a false >> positive. > > Not always. If your code supported Python 2 in the past, or third-party > dependencies supports or supported Python 2, this warning can expose a > real bug. Even if all your and third-party code always was Python 3 > only, the standard library can contain such kind of bugs. > > Several years after the EOL of Python 2.7 and moving all living code to > Python 3 we can ignore bytes warnings as always false positive. Even then, I don't know that we should do that. I do not believe that the EOL of Python 2 will end all confusion between byte strings and text strings. There is ample opportunity for code to accidentally compare bytes and text even in pure Python 3 code, e.g. comparing data read from files reading from files which are supposed to be opened in the same binary/text mode but aren't. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Are dicts supposed to raise comparison errors
On Wed, 01 Aug 2018 19:00:27 +0100, Paul Moore wrote: [...] > My point was that it's a *warning*, and as such it's perfectly possible > for a warning to *not* need addressing (other than to suppress or ignore > it once you're happy that doing so is the right approach). And my point was that ignoring warnings is not the right approach. Suppressing them on a case-by-case basis (if possible) is one thing, but a blanket suppression goes to far, for the reasons I gave earlier. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Are dicts supposed to raise comparison errors
On Wed, 01 Aug 2018 16:22:16 +0100, Paul Moore wrote: > On Wed, 1 Aug 2018 at 16:10, Robin Becker wrote: >> >> On 01/08/2018 14:38, Chris Angelico wrote: >> > t's a warning designed to help people port code from Py2 to Py3. It's >> > not meant to catch every possible comparison. Unless you are actually >> > porting Py2 code and are worried that you'll be accidentally >> > comparing bytes and text, just*don't use the -b switch* and there >> > will be no problems. >> > >> > I don't understand what the issue is here. >> >> I don't either, I have never used the -b flag until the issue was >> raised on bitbucket. If someone is testing a program with reportlab and >> uses that flag then they get a lot of warnings from this dictionary >> assignment. Probably the code needs tightening so that we insist on >> using native strings everywhere; that's quite hard for py2/3 compatible >> code. > > They should probably use the warnings module to disable the warning in > library code that they don't control, in that case. > > If they've reported to you that your code produces warnings under -b, > your response can quite reasonably be "thanks for the information, we've > reviewed our bytes/string handling and can confirm that it's safe, so > there's no fixes needed in reportlab". I'm sorry, I don't understand this reasoning. (Perhaps I have missed something.) Robin says his code runs under both Python2 and Python3. He's getting a warning that the behaviour has changed between the two, and there's a dubious comparison being made between bytes and strings. Consequently, there's a very real chance that he has a dicts which have one key in Python 2 but two in Python 3: - in Python 2, b'x' and u'x' are the same key; - in Python 3, b'x' and u'x' are different keys; # Python 2 py> {u'x': 1, b'x': 2} {u'x': 2} #Python 3 py> {u'x': 1, b'x': 2} {b'x': 2, 'x': 1} This means that Robin very likely has subtly or not-so-subtly different behaviour his software depending on which version of Python it runs under. If not an outright bug that silently does the wrong thing. Even if Robin has audited the entire code base and can confidently say today that despite the warning, no such bug has manifested, he cannot possibly be sure that it won't manifest tomorrow. (Not unless the software is frozen and will never be modified.) In another post, Chris says: I suspect that there may be a bit of non-thinking-C-mentality creeping in: "if I can turn on warnings, I should, and any warning is a problem". That simply isn't the case in Python. I strongly disagree. Unless Chris' intention is to say bugs don't matter if they're written in Python, I don't know why one would say that warnings aren't a problem. Every warning is one of three cases: - it reveals an existing actual problem; - it reveals a potential problem which might somebody become an actual problem; - or it is a false positive which (if unfixed) distracts attention and encourages a blasé attitude which could easily lead to problems in the future. Warnings are a code smell. Avoiding warnings is a way of "working clean": https://blog.codinghorror.com/programmers-and-chefs/ Ignoring warnings because they haven't *yet* manifested as a bug, or worse, because you *assume* that they haven't manifested as a bug, is about as sensible as ignoring the oil warning light on your car because the engine hasn't yet seized up. Regardless of which language the software is written in. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Checking whether type is None
On Sat, 28 Jul 2018 09:47:07 +, Gilmeh Serda wrote: > On Tue, 24 Jul 2018 12:33:27 -0700, Tobiah wrote: > >> I'm trying to get away from things like: >> >> >>> type(thing) is type(None) > > How about: > > >>> some_thing = None > >>> type(some_thing).__str__(some_thing) > 'None' > > Equally weird, I'd say, but what the heck... class Foo: def __str__(self): return 'None' -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Checking whether type is None
On Wed, 25 Jul 2018 16:14:18 +, Schachner, Joseph wrote: > While I appreciate that use of "is" in thing is None, I claim this > relies on knowledge of how Python works internally, to know that every > None actually is the same ID (the same object) - it is singular. No, it isn't knowledge of Python's internal working. None is a singleton object is a language guarantee, a promise that will always be true in any Python interpreter. It is no more about "how Python works internally" than knowing that the keyword is spelled "class" rather than Class, or that we use ** for exponentiation rather than ^. > That > probably works for 0 and 1 also but you probably wouldn't consider > testing thing is 1, at least I hope you wouldn't. thing is None looks > just as odd to me. Why not thing == None ? That works. It is wrong (in other words, it doesn't work) because it allows non-None objects to masquerade as None and pretend to be what they are not. If that's your intent, then of course you may do so. But without a comment explaining your intent, don't be surprised if more experienced Python programmers correct your "mistake". -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Checking whether type is None
On Tue, 24 Jul 2018 12:33:27 -0700, Tobiah wrote: [...] > So what would I compare type(None) to? Why would you need to? The fastest, easiest, most reliable way to check if something is None is: if something is None > >>> type(None) > > >>> type(None) is NoneType > Traceback (most recent call last): > File "", line 1, in > NameError: name 'NoneType' is not defined You can do: from types import NoneType or if you prefer: NoneType = type(None) but why bother? > I know I ask whether: > > >>> thing is None > > but I wanted a generic test. That *is* a generic test. > I'm trying to get away from things like: > > >>> type(thing) is type(None) That is a good move. > because of something I read somewhere preferring my original test > method. Oh, you read "something" "somewhere"? Then it must be good advice! *wink* Writing code like: type(something) is dict was the standard way to do a type check back in the Python 1.5 days. That's about 20 years ago now. These days, that is rarely what we need now. The usual way to check a type is: isinstance(something, dict) but even that should be rare. If you find yourself doing lots of type checking, using isinstance() or type(), then you're probably writing slow, inconvenient Python code. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 2.7.14 and Python 3.6.0 netcdf4
On Mon, 23 Jul 2018 19:39:18 -0300, jorge.conrado wrote: > Traceback (most recent call last): > File "", line 1, in > ModuleNotFoundError: No module named 'netCDF4' > > > What can I do to solve this error for Python 3.6.0 Just because you have the Python 2.7 version of the netCDF4 module installed in the Python 2.7 environment, doesn't mean it will magically work for Python 3.6. You have to install the module for 3.6 as well. How did you install it for Python 2.7? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: coding style - where to declare variables
On Mon, 23 Jul 2018 14:39:56 +0300, Marko Rauhamaa wrote: > Steven D'Aprano : > >> Lambda calculus has the concept of a binding operator, which is >> effectively an assignment operator: it takes a variable and a value and >> binds the value to the variable, changing a free variable to a bound >> variable. In other words, it assigns the value to the variable, just >> like assignment does. > > In traditional Lambda Calculus semantics, there are no values at all. It is more common to say "pure lambda calculus" rather than "traditional", and it is not correct to say there are no values at all. Rather, all values are functions (and all functions are values). http://scienceblogs.com/goodmath/2006/08/29/a-lambda-calculus-rerun-1/ and: "As this suggests, functions are just ordinary values, and can be the results of functions or passed as arguments to functions (even to themselves!). Thus, in the lambda calculus, functions are first-class values. Lambda terms serve both as functions and data." http://www.cs.cornell.edu/courses/cs6110/2013sp/lectures/lec02-sp13.pdf And from the same notes: "So, what is a value? In the pure lambda calculus, any abstraction is a value. Remember, an abstraction λx:e is a function; in the pure lambda calculus, the only values are functions. In an applied lambda calculus with integers and arithmetic operations, values also include integers. Intuitively, a value is an expression that can not be reduced/executed/simplified any further." [...] > The lambda calculus comment is just an aside. The main point is that you > shouldn't lead people to believe that Python has variables that are any > different than, say, Pascal's variables (even if you, for whatever > reason, want to call them "names"). They are memory slots that hold > values until you assign new values to them. Nevertheless, they are still different. My computer has an ethernet slot and a USB slot, and while they are both slots that hold a cable and transmit information in and out of the computer, they are nevertheless different. The differences are just as important as the similarities. > It *is* true that Python has a more limited data model than Pascal (all > of Python's values are objects in the heap and only accessible through > pointers). Calling it "more limited" is an inaccurate and pejorative way of putting it. Rather, I would say it is a more minimalist, *elegant* data model: * a single kind of variable (objects in the heap where the interpreter manages the lifetime of objects for you) as opposed to Pascal's more complex and more difficult model: * two kinds of variables: - first-class variables that the compiler manages for you (allocating and deallocating them on the stack) - second-class variables that the programmer has to manage manually (declaring pointers, allocating memory by hand, tracking the lifetime of the memory block yourself, deallocating it when you are done, and carefully avoiding accessing the pointed-to memory block after deallocation). At least more modern languages with both value-types and reference-types (such as Java, C#, Objective C, Swift) manage to elevate their reference- type variables to first-class citizenship. > Also, unlike Pascal, variables can hold (pointers to) values > of any type. IOW, Python has the data model of Lisp. > > Lisp talks about binding and rebinding variables as well: > >https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node79.html> > > which might be Lambda Calculus legacy, but at least they are not shy to > talk about variables and assignment. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: coding style - where to declare variables
On Mon, 23 Jul 2018 09:22:55 +0300, Marko Rauhamaa wrote: > Dennis Lee Bieber : [...] >> In my world, Java and Python are the ones that are not "common". > > Yes, "boxed" is a Java term. However, the programming pattern of using > dynamic memory and pointers is ubiquitous and ancient: Not that ancient -- the first version(s) of Fortran didn't have dynamic memory allocation or pointers. (Admittedly, Lisp did follow not long afterwards.) But it is certainly not ubiquitous: many languages don't have pointers at all. > FILE *f = fopen("xyz", "r"); > > where f holds a pointer, fopen() returns a pointer, and "xyz" and "r" > evaluate to pointer values. > > In Python, every expression evaluates to a pointer and every variable > holds a pointer. Within the semantics of the Python language, there are no pointer values, no way to get a pointer to a memory location or a pointer to an object. No expression in Python evaluates to a pointer, no variables hold pointers in Python. The Python language is defined in terms of objects: expressions evaluate to objects, and variables are names bound to objects. If you don't believe me, believe the interpreter: # Marko expects a pointer, but unfortunately he gets an int py> type(1 + 2) Marko is making a similar category error as those who insist that Python uses "call by reference" or "call by value" for parameter passing. He mistakes an irrelevant implementation detail used by *some* but not all Python interpreters[1] for entities which exist in the Python computation model. As Fredrick puts it: "Joe, I think our son might be lost in the woods" "Don't worry, I have his social security number" http://effbot.org/zone/call-by-object.htm (The *pointer to an object* used in the implementation is not the same as the object itself.) Evaluating 1 + 2 gives the value (an object) 3, not a pointer to the value 3. Pointers are not merely "not first-class citizens" of Python, they aren't citizens at all: there is nothing we can do in pure Python to get hold of pointers, manipulate pointers, or dereference pointers. https://en.wikipedia.org/wiki/First-class_citizen Pointers are merely one convenient, useful mechanism to implement Python's model of computation in an efficient manner on a digital computer. They are not part of the computation model, and pointers are not values available to the Python programmer[2]. [1] The CPython interpreter uses pointers; the Jython interpreter uses whatever kind of memory indirection the JVM provides; when I emulate a Python interpreter using pencil and paper, there's not a pointer in sight but a lot of copying of values and crossing them out. ("Copy on access" perhaps?) A Python interpreter emulated by a Turing machine would use dots on a long paper tape, and an analog computer emulating Python would use I-have-no-idea. Clockwork? Hydraulics? https://en.wikipedia.org/wiki/MONIAC https://makezine.com/2012/01/24/early-russian-hydraulic-computer/ [2] Except by dropping into ctypes or some other interface to the implementation, and even then the pointers have to be converted to and from int objects as they cross the boundary between the Python realm and the implementation realm. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: coding style - where to declare variables
On Mon, 23 Jul 2018 11:49:37 +0300, Marko Rauhamaa wrote: > People new to Python are unnecessarily confused by talking about names > and binding when it's really just ordinary variables and assignment. It really isn't, not to those people who expect ordinary variables and assignment to be the same as that of C, C++, C#, Objective C, Swift, Pascal, Java, Go etc. There are at least two common models for the association between symbolic names and values in programming: 1. variables are named boxes at a statically-allocated, fixed location in memory, usually on the stack ("value types"); 2. variables are names that refer to dynamically-allocated objects in the heap, often movable ("reference types"). It is absolutely true that both are "variables" of a kind, and that "name binding" is abstract enough to refer to both models. But in *practice*, the influence of Algol, C and BASIC especially is so great that many people think of variables and assignment exclusively in the first sense. Since Python uses the second sense, having a distinct name to contrast the two is desirable, and "name binding" seems to fit that need. I no longer believe that we should actively avoid the word "variable" when referring to Python. I think that's an extreme position which isn't justified. But "name binding" is an accurate technical term and not that hard to understand (on a scale of 0 to "monad", it's about 1) and I think it is elitist to claim that "people new to Python"[1] will necessarily be confused and we therefore ought to avoid the term. There are lots of confusing terms and concepts in Python. People learn them. Name binding is no different. [1] What, all of them? Even those with a comp sci PhD and 40 years programming experience in two dozen different languages? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: coding style - where to declare variables
On Mon, 23 Jul 2018 20:24:30 +1200, Gregory Ewing wrote: > Steven D'Aprano wrote: >> So let me see if I understand your argument... >> >> - we should stop using the term "binding", because it means >> nothing different from assignment; >> - binding (a.k.a. "assignment") comes from lambda calculus; >> - which has no assignment (a.k.a. "binding"). > > No, that's not what Marko is saying at all. He's pointing out that the > term "binding" means something completely different in lambda calculus. Well done in reading Marko's intent. Unfortunately, I'm not as good as inferring meaning as you seem to be, consequently I had to judge by what he wrote, not what he meant. When a writer fails to communicate their intent, that's usually the failure of the writer, not the reader. We aren't mind-readers and writers should not blame the reader when they fail to communicate their intended meaning. > The terms "bound variable" and "free variable" in lambda calculus mean > what in Python we would call a "local variable" vs. a "non-local > variable". Actually, no, they are called "bound variable" and "free variable" in Python too. https://docs.python.org/3/reference/executionmodel.html See also: http://effbot.org/zone/closure.htm Alas, I don't think Fredrik Lundh got it *quite* right. I think that globals (and builtins) in Python are "open free variables", as opposed to nonlocals which are closed. And sadly, the Python glossary currently doesn't define free variables nor bound variables, or even name binding. > They have nothing to do with assignment at all. That's not quite correct either. Lambda calculus has the concept of a binding operator, which is effectively an assignment operator: it takes a variable and a value and binds the value to the variable, changing a free variable to a bound variable. In other words, it assigns the value to the variable, just like assignment does. In Python terms, = is a binary binding operator: it takes a left hand operand, the variable (a name, for the sake of simplicity) and a right hand operand (a value) and binds the value to the name. > Marko is asking us to stop using the word "binding" to refer to > assignment because of the potential confusion with this other meaning. Marko has some idiosyncratic beliefs about Python (and apparently other languages as well) that are difficult to justify. Especially in this case. Anyone who understands lambda calculus is unlikely to be confused by Python using the same terms to mean something *almost identical* to what they mean in lambda calculus. (The only difference I can see is that lambda calculus treats variables as abstract mathematical entities, while Python and other programming languages vivify them and give them a concrete implementation.) If one in ten thousand programmers are even aware of the existence of lambda calculus, I would be surprised. To give up using perfectly good, accurate terminology in favour of worse, less accurate terminology in order to avoid unlikely and transient confusion among a minuscule subset of programmers seems a poor tradeoff to me. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: coding style - where to declare variables
On Sun, 22 Jul 2018 17:50:06 -0400, Dennis Lee Bieber wrote: > On Mon, 23 Jul 2018 00:08:00 +0300, Marko Rauhamaa > declaimed the following: > >>I Java terms, all Python values are boxed. That's a very usual pattern >>in virtually all programming languages (apart from FORTRAN). >> >> > FORTRAN, C, COBOL, BASIC, Pascal, ALGOL, BCPL, REXX, VMS DCL, > probably R, Matlab, APL. > > I never encountered the term "boxed" until trying to read some of > the O'Reilly books on Java. > > In my world, Java and Python are the ones that are not "common". Indeed. Its not just older languages from the 60s and 70s with value-type variables. Newer languages intended as systems languages, like Rust and Go, do the same. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: coding style - where to declare variables
On Mon, 23 Jul 2018 00:08:00 +0300, Marko Rauhamaa wrote: > Would you call it binding in this case: > >X[0]["z"] = getit() >X[3]["q"] = X[0]["z"] >X[0]["z"].changeit() It is a binding, but it is not a *name* binding. Since we are talking about name bindings, and comparing/contrasting them to variable assignment in classical languages, I don't think that binding to slots in hash tables or arrays is relevant except to muddy the waters and make things more complicated than they need be. > I think what you are talking about is more usually called "referencing." I don't think so. Its certainly not a term I've ever heard in this context before. >> With a language with more ‘classical’ variable, the assignment of Y = X >> would normal make a copy of that object, so the value Y does not get >> changed by X.changeit(). > > I Java terms, all Python values are boxed. Correct. Java mixes two different models of variable assignment: it uses classical C- and Pascal-like variable assignment for primitive values, and Lisp- and Smalltalk-like name binding for boxed values (objects), leading to two distinct sets of behaviour. That makes Java a good lesson in why it is useful to distinguish between two models of name binding. Java is not the only language with the distinction between "value types" (primitive values usually stored on the stack) and "reference types" (usually objects stored in the heap). C# and other .Net languages often make that distinction: http://net-informations.com/faq/general/valuetype-referencetype.htm Swift is another such language. Other languages which use primarily or exclusively value-types (i.e. the "variables are a named box at a fixed memory location" model) include Algol, Pascal, Modula-3, C, C++, C#, Objective C, D, Swift, COBOL, Forth, Ada, PL/I, Rust and many others. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: coding style - where to declare variables
On Sun, 22 Jul 2018 22:50:52 +0300, Marko Rauhamaa wrote: > I wish people stopped talking about "name binding" and "rebinding," > which are simply posh synonyms for variable assignment. Properly, the > term "binding" comes from lambda calculus, whose semantics is defined > using "bound" and "free" variables. Lambda calculus doesn't have > assignment. So let me see if I understand your argument... - we should stop using the term "binding", because it means nothing different from assignment; - binding (a.k.a. "assignment") comes from lambda calculus; - which has no assignment (a.k.a. "binding"). Which leads us to the conclusion that lambda calculus both has and doesn't have binding a.k.a. assignment at the same time. Perhaps it is a quantum phenomenon. Are you happy with the contradiction inherent in your statements, or would you prefer to reword your argument? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Non-GUI, single processort inter process massaging - how?
On Sat, 21 Jul 2018 09:07:23 +0100, Chris Green wrote: [...] > I want to be able to interrogate the server process from several client > processes, some will interrogate it multiple times, others once only. > They are mostly (all?) run from the command line (bash). This sounds like a good approach for signals. Your server script sets up one or more callbacks that print the desired information to stdout, or writes it to a file, whichever is more convenient, and then you send the appropriate signal to the server process from the client processes. At the bash command line, you use the kill command: see `man kill` for details. Here's a tiny demo: # === cut === import signal, os, time state = 0 def sig1(signum, stack): print(time.strftime('it is %H:%m:%S')) def sig2(signum, stack): print("Current state:", stack.f_globals['state']) # Register signal handlers signal.signal(signal.SIGUSR1, sig1) signal.signal(signal.SIGUSR2, sig2) # Print the process ID. print('My PID is:', os.getpid()) while True: state += 1 time.sleep(0.2) # === cut === Run that in one terminal, and the first thing it does is print the process ID. Let's say it prints 12345, over in another terminal, you can run: kill -USR1 12345 kill -USR2 12345 to send the appropriate signals. To do this programmatically from another Python script, use the os.kill() function. https://docs.python.org/3/library/signal.html https://pymotw.com/3/signal/ -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Better way / regex to extract values form a dictionary
On Sat, 21 Jul 2018 17:07:04 +0530, Ganesh Pal wrote: > I have one of the dictionary values in the below format > > '/usr/local/ABCD/EDF/ASASAS/GTH/HELLO/MELLO/test04_Failures.log' > '/usr/local/ABCD/EDF/GTH/HEL/OOLO/MELLO/test02_Failures.log' > '/usr/local/ABCD/EDF/GTH/BEL/LO/MELLO/test03_Failures.log' > > I need to extract the file name in the path example, say > test04_Failure.log and testcase no i.e test04 The dictionary is irrelevant to your question. It doesn't matter whether the path came from a dict, a list, read directly from stdin, an environment variable, extracted from a CSV file, or plucked directly from outer space by the programmer. The process remains the same regardless of where the path came from. import os path = '/usr/local/ABCD/EDF/ASASAS/GTH/HELLO/MELLO/test04_Failures.log' filename = os.path.basename(path) print filename # prints 'test04_Failures.log' testcase, remaining_junk = filename.split('_', 1) print testcase # prints 'test04' -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: try except inside a with open
On Fri, 20 Jul 2018 23:29:21 +0530, Ganesh Pal wrote: > Dear python Friends, > > > I need a quick suggestion on the below code. > > def modify_various_line(f): > """ Try modifiying various line """ > try: > f.write('0123456789abcdef') > f.seek(5) # Go to the 6th byte in the file > print f.read(1) > f.seek(-3, 2) # Go to the 3rd byte before the end > print f.read(1) > f.write('END') > except IOError as e: >raise > return True (1) Since this function always returns True (if it returns at all), what is the point? There's no point checking the return result, since it's always true, so why bother returning anything? (2) What's the point of catching an exception only to immediately, and always, re-raise it? It seems to me that your code above is better written like this: def modify_various_line(f): """ Try modifying various line """ f.write('0123456789abcdef') f.seek(5) # Go to the 6th byte in the file print f.read(1) f.seek(-3, 2) # Go to the 3rd byte before the end print f.read(1) f.write('END') > def h(): > try: > with open('/tmp/file.txt', 'r+') as f: > try: > modify_various_line(f) > except Exception as e: >print e > except IOError as e: > print(e) Debugging is hard enough without destroying useful debugging information. Tracebacks are not your enemy to be hidden and suppressed (at least not during development) but your greatest friend in the world, one who tells you the embarrassing errors you have made (bugs) so you can fix them. https://realpython.com/the-most-diabolical-python-antipattern/ def h(): with open('/tmp/file.txt', 'r+') as f: modify_various_line(f) is much shorter, easier to read, and if an error occurs, you get the benefit of the full traceback not just the abbreviated error message. Tracebacks are printed to standard error, not standard out, so they can be redirected to a log file more easily. Or you can set an error handler for your entire application, so that in production any uncaught exception can be logged without having to fill your application with boilerplate "try...except...print >>sys.stderr, err". But if you *really* have to catch the exception and suppress the traceback, try this: def h(): try: with open('/tmp/file.txt', 'r+') as f: modify_various_line(f) except IOError as e: print(e) There's no need to nest two except clauses, both of which do the same thing with an exception, and one of which will cover up bugs in your code as well as expected failures. > (1) Can we use try and expect in a 'with open' function as shown in > the below example code . Yes. > (2) If I hit any other exceptions say Value-error can I catch them as > show below If you hit ValueError, that is almost always a bug in your code. That's exactly the sort of thing you *shouldn't* be covering up with an except clause unless you really know what you are doing. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: [OT] Bit twiddling homework
On Fri, 20 Jul 2018 08:25:04 +0200, Brian Oney via Python-list wrote: > PS: Can I twiddle bits in Python? Yes. These operators work on ints: bitwise AND: & bitwise OR: | bitwise XOR: ^ -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Thu, 19 Jul 2018 20:34:26 +0200, Christian Gollwitzer wrote: > Am 19.07.2018 um 14:50 schrieb Gregory Ewing: >> Chris Angelico wrote: >>> On Thu, Jul 19, 2018 at 4:41 PM, Gregory Ewing >>> wrote: >>> >>>> (Google doesn't seem to think so -- it asks me whether I meant >>>> "assist shop". Although it does offer to translateč it into Czech...) >>> >>> Into or from?? I'm thoroughly confused now! >> >> Hard to tell. This is what the link said: >> >> assistshop - Czech translation - bab.la English-Czech dictionary >> https://en.bab.la/dictionary/english-czech/assistshop Translation for >> 'assistshop' in the free English-Czech dictionary and" many other Czech >> translations. > > Well that link tries to translate "assistshop" into the czech word > "prodavač" which is the usual word for a person in a shop who consults > the customers and sells the goods to them; I don't know if "assist shop" > in English comes close, as I don't understand it (I'm a native German > speaker) In English, that would be "shop assistant". "Assist shop" would be grammatically incorrect: it should be written as "assist the shop", meaning "help the shop". Relevant: https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/ -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: test for absence of infinite loop
On Tue, 17 Jul 2018 10:10:49 +0100, Robin Becker wrote: > A user reported an infinite loop in reportlab. I determined a possible > cause and fix and would like to test for absence of the loop. Is there > any way to check for presence/absence of an infinite loop in python? I > imagine we could do something like call an external process and see if > it takes too long, but that seems a bit flaky. In general, no, it is impossible to detect infinite loops. https://en.wikipedia.org/wiki/Halting_problem That's not to say that either human readers or the compiler can't detect *some* infinite loops ahead of time: # obviously an infinite loop while True: pass and then there's this: https://www.usenix.org/legacy/publications/library/proceedings/vhll/ full_papers/koenig.a but Python's compiler isn't capable of anything like that. The way I sometimes deal with that sort of thing is to re-write selected potentially-infinite loops: while condition: # condition may never become False do something to something like this: for counter in range(1000): if not condition: break do something else: raise TooManyIterationsError -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Tue, 17 Jul 2018 10:51:38 +0300, Marko Rauhamaa wrote: > in which Python3's honor is defended in a good many of the discussions > in this newsgroup: anger, condescension, ridicule, name-calling. You call it defending Python 3's honour. I call it responding to people who insist on spreading misinformation and falsehoods even when given the correct details. Some people have their self-image wrapped up in being able to portray themselves as a maverick who, almost alone, sees through the "lies" about to see "the truth". Others prefer reality instead, and get upset when false facts are repeated, over and over again, as truth. If instead you want to discuss actual concrete areas where Python's text/ bytes divide hurts, you'll find that there are plenty of people who agree. Especially if they have to write string-handling code that needs to run under both 2 and 3. Been there, done that, don't want to do it again. The Python 3 redesign was done to fix certain common, hard-to-diagnose problems in string handling caused by Python2's violation of the Zen "in the face of ambiguity, refuse the temptation to guess". (Python 2 guesses what encoding you probably mean when it comes to strings and bytes, and when it gets it right it is convenient, but when it gets it wrong, it is badly wrong, and hard to diagnose and fix.) It impossible to improve the text handling experience for every single programmer writing every single kind of program under every single set of circumstances. Like any semantic change, there are going to be winners and losers, and the core devs' position is that if the losers have concrete and backwards-compatible suggestions for improving their experience (e.g. re-adding % support for byte strings) they will consider them, but going back to the Python 2 misdesign is off the table. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Tue, 17 Jul 2018 15:20:16 +0900, INADA Naoki wrote (replying to Marko): > I still don't understand what's your original point. I think UTF-8 vs > UTF-32 is totally different from Python 2 vs 3. > > For example, string in Rust and Swift (2010s languages!) are *valid* > UTF-8. There are strong separation between byte array and string, even > they use UTF-8. They looks similar to Python 3, not Python 2. > > And Python can use UTF-8 for internal encoding in the future. AFAIK, > PyPy tries it now. After they succeeded, I want to try port it to > CPython after we removed legacy Unicode APIs. (ref PEP 393) I'm not sure about PyPy, but I'm fairly certain that MicroPython uses UTF-8. I would be very interested to see the results of using UTF-8 in CPython. At the least, it would remove the need to keep a separate UTF-8 representation in the string object, as they do now. It might even be more compact, although a naive implementation would lose the ability to do constant time indexing into strings. That might be a tradeoff worth keeping, if indexing remained sufficiently fast. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Tue, 17 Jul 2018 09:52:13 +0300, Marko Rauhamaa wrote: > Both Python2 and Python3 provide two forms of string, one containing > 8-bit integers and another one containing 21-bit integers. Why do you insist on making counter-factual statements as facts? Don't you have a Python REPL you can try these outrageous claims out before making them? py> b'abcd'[2] + 1 # bytes are sequences of integers 100 py> 'abcd'[2] + 1 # strings are not sequences of integers Traceback (most recent call last): File "", line 1, in TypeError: Can't convert 'int' object to str implicitly Python strings are sequences of abstract characters. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Tue, 17 Jul 2018 08:26:45 +0300, Marko Rauhamaa wrote: > Steven D'Aprano : >> On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote: >>> UTF-8 bytes can only represent the first 128 code points of Unicode. >> >> This is DailyWTF material. Perhaps you want to rethink your wording and >> maybe even learn a bit more about Unicode and the UTF encodings before >> making such statements. >> >> The idea that UTF-8 bytes cannot represent the whole of Unicode is not >> even wrong. Of course a *single* byte cannot, but a single byte is not >> "UTF-8 bytes". > > So I hope that by now you have understood my point and been able to > decide if you agree with it or not. If your point was not what you wrote, then no, I'm sorry, my crystal ball unexpectedly broke down (why it didn't foresee its own failure I'll never know...). I can't tell what you are thinking, only what you write. Sometimes I can guess (like my earlier guess that you meant grapheme, rather than glyph) but in this case, if you mean something other than "UTF-8 bytes can only represent the first 128 code points of Unicode" I'm flummoxed. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 21:25:20 -0500, Tim Chase wrote: > On 2018-07-17 01:08, Steven D'Aprano wrote: >> In English, I think most people would prefer to use a different term >> for whatever "sh" and "ch" represent than "character". > > The term you may be reaching for is "consonant cluster"? > > https://en.wikipedia.org/wiki/Consonant_cluster Thanks! -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 21:48:42 -0400, Richard Damon wrote: >> On Jul 16, 2018, at 9:21 PM, Steven D'Aprano >> wrote: >> >>> On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote: >>> >>> You are defining a variable/fixed width codepoint set. Many others >>> want to deal with CHARACTER sets. >> >> Good luck coming up with a universal, objective, language-neutral, >> consistent definition for a character. >> > Who says there needs to be one. A good engineer will use the definition > that is most appropriate to the task at hand. Some things need very > solid definitions, and some things don’t. The the problem is solved: we have a perfectly good de facto definition of character: it is a synonym for "code point", and every single one of Marko's objections disappears. > This goes back to my original point, where I said some people consider > UTF-32 as a variable width encoding. For very many things, practically, > the ‘codepoint’ isn’t the important thing, Ah, is this another one of those "let's pick a definition that nobody else uses, and state it as a fact" like UTF-32 being variable width? If by "very many things", you mean "not very many things", I agree with you. In my experience, dealing with code points is "good enough", especially if you use Western European alphabets, and even more so if you're willing to do a normalization step before processing text. But of course other people's experience may vary. I'm interested in learning about the library you use to process graphemes in your software. > so the fact that every UTF-32 > code point takes the same number of bytes or code words isn’t that > important. They are dealing with something that needs to be rendered and > preserving larger units, like the grapheme is important. If you're writing a text widget or a shell, you need to worry about rendering glyphs. Everyone else just delegates to their text widget, GUI framework, or shell. >>> This doesn’t mean that UTF-32 is an awful system, just that it isn’t >>> the magical cure that some were hoping for. >> >> Nobody ever claimed it was, except for the people railing that since it >> isn't a magically system we ought to go back to the Good Old Days of >> code page hell, or even further back when everyone just used ASCII. >> > Sometimes ASCII is good enough, especially on a small machine with > limited resources. I doubt that there are many general purpose computers with resources *that* limited. Even MicroPython supports Unicode, and that runs on embedded devices with memory measured in kilobytes. 8K is considered the smallest amount of memory usable with MicroPython, although 128K is more realistic as the *practical* lower limit. In the mid 1980s, I was using computers with 128K of RAM, and they were still able to deal with more than just ASCII. I think the "limited resources" argument is bogus. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote: > All UTF-8. No unicode strings. That just means you are re-implementing the bits of Unicode you care about (which may be "nothing at all") as UTF-8. If your application is nothing but middleware squirting bytes from one layer to another layer, that might be all you need care about. But then you're not processing text in your application, and why should your experience in not-processing-text be given any weight over the experiences of those who do process text? And later, in another post: > UTF-8 bytes can only represent the first 128 code points of Unicode. This is DailyWTF material. Perhaps you want to rethink your wording and maybe even learn a bit more about Unicode and the UTF encodings before making such statements. The idea that UTF-8 bytes cannot represent the whole of Unicode is not even wrong. Of course a *single* byte cannot, but a single byte is not "UTF-8 bytes". -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 15:28:51 -0400, Terry Reedy wrote: > On 7/16/2018 1:11 PM, Richard Damon wrote: > >> Many consider that UTF-32 is a variable-width encoding because of the >> combining characters. It can take multiple ‘codepoints’ to define what >> should be a single ‘character’ for display. > > I hope you realize that this is not the standard meaning of > 'variable-width encoding', which is 'variable number of bytes for a > codepoint'. A minor correction Terry: it is the number of code units, not bytes. UTF-8 uses 1-byte code units, and from 1 to 4 code units per code point; UTF-16 uses 2-byte code units (a 16-bit word), and 1 or 2 words per code point; UTF-32 uses 4-byte code units (a 32-bit word), and only ever a single code unit for every code point. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 19:02:36 -0400, Richard Damon wrote: > You are defining a variable/fixed width codepoint set. Many others want > to deal with CHARACTER sets. Good luck coming up with a universal, objective, language-neutral, consistent definition for a character. > This doesn’t mean that UTF-32 is an awful system, just that it isn’t the > magical cure that some were hoping for. Nobody ever claimed it was, except for the people railing that since it isn't a magically system we ought to go back to the Good Old Days of code page hell, or even further back when everyone just used ASCII. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Cult-like behaviour [was Re: Kindness]
On Mon, 16 Jul 2018 23:50:12 +0200, Roel Schroeven wrote: > There are times (encoding/decoding network protocols and other data > formats) when I have a byte string and I want/need to process it like > Python 2 does, and that is the one area where I feel Python 3 make > things a bit more difficult. Ah yes, the unfortunate design error that iterating over byte-strings returns ints rather than single-byte strings. That decision seemed to make sense at the time it was made, but turned out to be an annoyance. It's a wart on Python 3, but fortunately one which is fairly easily dealt with by a helper function. That *is* a nice example of where byte strings in Python 3 aren't as nice as in Python 2. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Tue, 17 Jul 2018 06:15:25 +1000, Chris Angelico wrote: > On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano > wrote: >> There is nothing special about diacritics such that we ought to treat >> some combinations like "Ch" (two code points = one character) as "fixed >> width" while others like "â" (two code points = one character) as >> "variable width". > > When you reverse a word, do you treat "ch" and "sh" as one character or > two? In English, "ch" is always two letters of the alphabet. In Welsh and Czech, they can be one or two letters. (I think they will be two letters only in loan words, but I'm not certain about that.) Whether that makes them one or two characters depends on how you define "character". Good luck with finding a universal, objective, unambiguous definition. > I'm of the opinion that they're single characters, and thus this > should be "dalokosh": > > https://wiki.teamfortress.com/wiki/Dalokohs_Bar > > (It's the Russian for "chocolate" - "шоколад" - transliterated to > English/Latin - "šokolad" or "shokolad" - and then reversed.) In English, I think most people would prefer to use a different term for whatever "sh" and "ch" represent than "character". But you make a good point that even in English, we sometimes want to treat two letter combinations as a single unit. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Unicode is not UTF-32 [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 22:40:13 +0300, Marko Rauhamaa wrote: > Terry Reedy : > >> On 7/15/2018 5:28 PM, Marko Rauhamaa wrote: >>> if your new system used Python3's UTF-32 strings as a foundation, >> >> Since 3.3, Python's strings are not (always) UFT-32 strings. > > You are right. Python's strings are a superset of UTF-32. More > accurately, Python's strings are UTF-32 plus surrogate characters. The first thing you are doing wrong is conflating the semantics of the data type with one possible implementation of that data type. UTF-32 is implementation, not semantics: it specifies how to represent Unicode code points as bytes in memory, not what Unicode code points are. Python 3 strings are sequences of abstract characters ("code points") with no mandatory implementation. In CPython, some string objects are encoded in Latin-1. Some are encoded in UTF-16. Some are encoded in UTF-32. Some implementations (MicroPython) use UTF-8. Your second error is a more minor point: it isn't clear (at least not to me) that "Unicode plus surrogates" is a superset of Unicode. Surrogates are part of Unicode. The only extension here is that Python strings are not necessarily well-formed surrogate-free Unicode strings, but they're still Unicode strings. >> Nor are they always UCS-2 (or partly UTF-16) strings. Nor are the >> always Latin-1 or Ascii strings. Python's Flexible String >> Representation uses the narrowest possible internal code for any >> particular string. This is all transparent to the user except for >> memory size. > > How CPython chooses to represent its strings internally is not what I'm > talking about. Then why do you repeatedly talk about the internal storage representation? UTF-32 is not a character set, it is an encoding. It specifies how to implement a sequence of Unicode abstract characters. >>> UTF-32, after all, is a variable-width encoding. >> >> Nope. It a fixed-width (32 bits, 4 bytes) encoding. >> >> Perhaps you should ask more questions before pontificating. > > You mean each code point is one code point wide. But that's rather an > irrelevant thing to state. No, he means that each code point is one code unit wide. > The main point is that UTF-32 (aka Unicode) UTF-32 is not a synonym for Unicode. Many legacy encodings don't distinguish between the character set and the mapping between bytes and characters, but Unicode is not one of those. > uses one or more code points to represent what people would consider an > individual character. That's a reasonable observation to make. But that's not what fixed- and variable-width refers to. So does ASCII, and in both cases, it is irrelevant since the term of art is to define fixed- and variable-width in terms of *code points* not human meaningful characters. "Character" is context- and language- dependent and frequently ambiguous. "LL" or "CH" (for example) could be a single character or a double character, depending on context and language. Even in ASCII English, something as large as "ough" might be considered to be a single unit of language, which some people might choose to call a character. (But not a single letter, naturally.) If you don't like that example, "qu" is probably a better one: aside from acronyms and loan words, no modern English word can fail to follow a Q with a U. > Code points are about as interesting as individual bytes in UTF-8. That's your opinion. I see no justification for it. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Python 4000 was Re: [SUSPICIOUS MESSAGE] Re: Cult-like behaviour]
On Mon, 16 Jul 2018 15:09:16 -0400, Terry Reedy wrote: > On 7/16/2018 11:50 AM, Dennis Lee Bieber wrote: > >> For Python 4000 maybe > > Please don't give people the idea that there is any current intention to > have a 'Python 4000' similar to 'Python 3000'. Call it 'a mythical > Python 4000', if you must use such a term. I prefer to say Python 5000, to make it even more clear that should such a thing happen again, it will be a *REALLY* long time from now. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Users banned
On Mon, 16 Jul 2018 20:03:39 +0100, Steve Simmons wrote: > +1 Seems to me Bart is being banned for "being a dick" and "talking > rubbish" (my words/interpretation) with irritating persistence. I know that when I first started here, I often talked rubbish. The difference is, I was willing to listen and consider when people gave alternate viewpoints. Eventually. And I know that some people think that I'm sometimes still being a dick. They're wrong, I'm just charmingly forthright *wink* Bart is often frustratingly resistant to reasonable argument, and has been obnoxious in his habit of bringing virtually every conversation into an opportunity to make a dig at Python. But neither of these are prohibited by the CoC, neither of these should be banning offense, and even if they were, he should have had a formal warning first. Preferably TWO formal warnings: the first privately, the second publicly, and only on the third offence a ban. And I question the fairness of a six month ban, rather than (let's say) an initial one month ban. As for banning Rick, when he isn't even posting at the moment, I don't even have words for that. There's no statute of limitation for murder, but surely "being obnoxious on the internet" ought to come with a fairly short period of forgiveness. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 14:22:27 -0400, Richard Damon wrote: [...] > But I am not talking about those sort of characters or ligatures, So what? I am. You don't get to say "only non-standard definitions I approve of count". There is the industry standard definition of what it means to be a fixed- or variable-width encoding, which we can all agree on, or we can have a free-for-all where I reject your non-standard meaning and you reject mine and nobody can understand anything that anyone else says. You (generic "you", not necessarily you personally) don't get to demand that I must accept your redefinition, while simultaneously refusing to return the favour. If you try, I will simply dismiss what you say as nonsense on stilts: you (still generic you) clearly don't know what variable-width means and are trying to shift the terms of the debate by redefining terms so that black means white and white means purple. > but > ‘characters’ that are built up of a combining diacritical marks (like > accents) and a base character. Unicode define many code points for the > more common of these, but many others do not. I am aware how Unicode works, and it doesn't change a thing. Fixed/variable width is NOT defined in terms of "characters", but if it were, ASCII would be variable width too. Limiting the definition to only diacritics is just a feeble attempt to wiggle out of the logical consequences of your (generic your) position. There is nothing special about diacritics such that we ought to treat some combinations like "Ch" (two code points = one character) as "fixed width" while others like "â" (two code points = one character) as "variable width". To do so is just special pleading. And the thing about special pleading is that we're not obliged to accept it. Plead as much as you like, the answer is still no. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote: > Had you actually read my words with *intent* rather than *reaction*, you > would notice that I suggested the *option* of turning off Unicode. Yes, I know what you wrote, and I read it with intent. Jim, you seem to be labouring under the misapprehension that anytime somebody spots a flaw in your argument, or an unpleasant implication of your words, it can only be because they must not have read your words carefully. Believe me, that is not the case. YOU are the one who raised the specter of politically correct groupthink, not me. That's dog-whistle politics. But okay, let's move on from that. You say that all you want is a switch to turn off Unicode (and replace it with what? Kanji strings? Cyrillic? Shift_JS? no of course not, I'm being absurd -- replace it with ASCII, what else could any right-thinking person want, right?). Let's look at this from a purely technical perspective: Python already has two string data types, bytes and text. You want something that is almost functionally identical to bytes, but to call it text, presumably because you don't want to have to prefix your strings with a b"" (that was also Marko's objection to byte strings). Let's say we do it. Now we have three string implementations that need to be added, documented, tested, maintained, instead of two. (Are you volunteering to do this work?) Now we need to double the testing: every library needs to be tested twice, once with the "Unicode text" switch on, once with it off, to ensure that features behave as expected in the appropriate mode. Is this switch a build-time option, so that we have interpreters built with support for Unicode and interpreters built without it? We've been there: it's a horribly bad idea. We used to have Python builds with threading support, and others without threading support. We used to have Python builds with "wide Unicode" and others with "narrow Unicode". Nothing good comes of this design. Or perhaps the switch is a runtime global option? Surely you can imagine the opportunities for bugs, both obvious crashing bugs and non-obvious silent failure bugs, that will occur when users run libraries intended for one mode under the other mode. Not every library is going to be fully tested under both modes. Perhaps it is a compile-time option that only affects the current module, like the __future__ imports. That's a bit more promising, it might even use the __future__ infrastructure -- but then you have the problem of interaction between modules that have this switch enabled and those that have it disabled. More complexity, more cruft, more bugs. It's not clear that your switch gives us *any* advantage at all, except the warm fuzzy feelings that no dirty foreign characters might creep into our pure ASCII strings. Hmm, okay, but frankly apart from when I copy and paste code from the internet and it ends up bringing in en-dashes and curly quotes instead of hyphens and type-writer quotes, that never happens to me by accident, and I'm having a lot of trouble seeing how it could. If you want ASCII byte strings, you have them right now -- you just have to use the b"" string syntax. If you want ASCII strings without the b prefix, you have them right now. Just use only ASCII characters in your strings. I'm simply not seeing the advantage of: from __future__ import no_unicode print("Hello World!") # stand in for any string handling on ASCII over print("Hello World!") which works just as well if you control the data you are working with and know that it is pure ASCII. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 13:11:23 -0400, Richard Damon wrote: >> On Jul 16, 2018, at 12:51 PM, Steven D'Aprano >> wrote: >> >>> On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote: >>> >>> if your new system used Python3's UTF-32 strings as a foundation, that >>> would be an equally naïve misstep. You'd need to reach a notch higher >>> and use glyphs or other "semiotic atoms" as building blocks. UTF-32, >>> after all, is a variable-width encoding. >> >> Python's strings aren't UTF-32. They are sequences of abstract code >> points. >> >> UTF-32 is not a variable-width encoding. >> >> -- >> Steven D'Aprano >> >> > Many consider that UTF-32 is a variable-width encoding because of the > combining characters. It can take multiple ‘codepoints’ to define what > should be a single ‘character’ for display. Ah, well if we're going to start making up our own definitions of terms, then ASCII is a variable-width encoding too. "Ch" (a single letter of the alphabet in a number of European languages, including Welsh and Czech) requires two code points in ASCII. Even in English, "qu" could be considered a two-byte "character" (grapheme), and for ASCII users, (c) is a THREE code point character for what ought to be a single character ©. The standard definition of variable- and fixed-width encodings refers to how many *code units* is required to make up a single *code point*. Under that standard definition, UTF-8 and UTF-16 are variable-width, and UTF-32 is fixed-width. But I'll accept that UTF-32 is variable-width if Marko accepts that ASCII is too. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode [was Re: Cult-like behaviour]
On Tue, 17 Jul 2018 02:22:59 +1000, Chris Angelico wrote: > On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence > wrote: >> Out of curiosity where does my mum's Welsh come into the equation as I >> believe that it is not recognised by the EU as a language? >> >> > What characters does it use? Mostly Latin letters? Yes, Welsh uses the Latin script. It has an alphabet of 29 letters (including 8 digraphs), plus four diacritics used on some vowels: circumflex e.g. â acute accent e.g. é diaeresise.g. ï grave accent e.g. ẁ Yes, w is a vowel in Welsh -- and very occasionally in English as well. http://www.dictionary.com/e/w-vowel/ Accented vowels are not considered separate letters. https://en.wikipedia.org/wiki/Welsh_orthography Some older sources will exclude J (making 28 letters). Patagonian Welsh also includes the letter "V", although that's non-standard. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Glyphs and graphemes [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote: > if your new system used Python3's UTF-32 strings as a foundation, that > would be an equally naïve misstep. You'd need to reach a notch higher > and use glyphs or other "semiotic atoms" as building blocks. UTF-32, > after all, is a variable-width encoding. Python's strings aren't UTF-32. They are sequences of abstract code points. UTF-32 is not a variable-width encoding. I don't know what *you* mean by "semiotic atoms", (possibly you mean graphemes?) but "glyphs" are the visual images of characters, and there's a virtual infinity of those for each character, differing in type-face, size, and style (roman, italic, bold, reverse-oblique, etc). There is no evidence aside from your say-so that a programming language "need" support "glyphs" as a native data type, or even graphemes. For starters, such a system would be exceedingly complex: graphemes are both language and context dependent. English, for example, has around 250 distinct graphemes: https://books.google.com.au/books? id=QrBQAmfXYooC&pg=PT238&lpg=PT238&dq=250 +graphemes&source=bl&ots=abiymnQ5pq&sig=eq3k06BkuGfpuGC6wKqPkCR_8Bw&hl=en&sa=X&ei=HAdyUbfULpCnqwGRi4DYAg&redir_esc=y Certainly it would be utterly impractical for a programming language designer, knowing nothing but a few half-remembered jargon terms, to try to design a native string type that matched the grapheme rules for the hundreds of human languages around the world. Or even just for English. Let third-party libraries blaze that trail first. By no means is Unicode the last word in text processing. It might not even be the last word in native string types for programming languages. But it is a true international standard which provides a universal character set and a selection of useful algorithms able to be used as powerful building blocks for text-processing libraries. Honestly Marko, your argument strikes me as akin to somebody who insists that because Python's float data type doesn't support full CAS (computer algebra system) and theorem prover, its useless and a step backwards and we should abandon IEEE-754 float semantics and let users implement their own floating point maths using nothing but fixed 1-byte integers. A float, after all, is nothing but 8 bytes. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
I18N and Unicode [was Re: Cult-like behaviour]
On Sun, 15 Jul 2018 17:28:15 -0700, Jim Lee wrote: > Unicode is an attempt to solve at least one I18N issue If you're going to insist on digging your heels in and using definitions which nobody else does, this discussion is going to go nowhere fast. Unicode is (ideally) a universal character set; in practice it is an industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. I18N is recognised as the abbreviation for internationalization and localization. https://en.wikipedia.org/wiki/Internationalization_and_localization There is no overlap between the two: Unicode doesn't help with internationalization (except in the non-trivial but purely mechanical sense that it removes the need for metadata specifying the current code page), and internationalization doesn't require Unicode: (1) Unicode provides no support for internationalization or localization. Just because I have the Unicode string "street" in my application, doesn't mean it magically transforms to "Straße" when used by German users. (2) Internationalization can occur even between groups of users who share a single character set, even ASCII. My application might display "Rubbish Bin" in the UK and Australia and "Trash Can" in the USA. If you think that Unicode is about internationalization, you are labouring under serious misapprehensions about the nature of both Unicode and internationalization. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Cult-like behaviour [was Re: Kindness]
On Sun, 15 Jul 2018 16:38:41 -0700, Jim Lee wrote: > As I said, there are programming situations where the programmer only > needs to deal with a single language - his own. This might come as a shock to you, but just because Python's native string type supports (for example) the Devanagari alphabet, that doesn't mean you are forced to use it in your code or application. # Look ma, not a single Cyrillic or Greek or Tagalog letter in sight! label = "something interesting" Don't worry, the UN Language Police aren't going to force you at gunpoint to label your output in Khmer, Hiragana and Gujarati if you don't want to. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Unicode [was Re: Cult-like behaviour]
On Mon, 16 Jul 2018 14:17:35 +, Dan Sommers wrote: > On Mon, 16 Jul 2018 10:39:49 +0000, Steven D'Aprano wrote: > >> ... people who think that if ISO-8859-7 was good enough for Jesus ... > > It may have been good enough for his disciples, but Jesus spoke Aramaic. The buzzing noise you just heard was the joke whizzing past your head *wink* It was a riff on the apocryphal American (occasionally other nationality) who said that if English was good enough for Jesus Christ, it is good enough for everyone: http://itre.cis.upenn.edu/~myl/languagelog/archives/003084.html with the twist that in my example, I picked *another* language rather than English. I shouldn't have picked Greek, an unfortunate choice that may have lead you to imagine I was serious. Perhaps ISO-8859-5 (Cyrillic) or Shift_JIS would have been funnier :-( And of course there is the absurdity of any ISO standards existing two thousand years ago. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Getting process details on an operating system process/question answer by vouce
On Mon, 16 Jul 2018 06:47:53 -0600, John T. Haggerty wrote: > So, it's early for me---and I'm not sure if these things can be done but > I'd like to know the following: > > 1. How can Python connect to a running operating system process on a > host operating system to see what part of the execution is like?---ie > keep track of health stats like it's stuck on disk access or inside some > kind of wait state etc. Start by answering the question: "How can *any* process connect to another running process ...". Once you know how to do that, that may give us a hint how to do the same using Python. I would expect that there needs to be some sort of OS-specific interface where you pass the process ID you care about to some OS function, and it will report the process state. If your operating system doesn't support that, the only other option that I know of is if the application you are interested in *itself* provides an interface to question it while it is running. On Linux, that might including sending it a signal (see the signal.py library) or it might include some form of interprocess communication. But whatever it is, it will likely be application specific. So if the OS doesn't support this, and the process you are interested in doesn't support this, then it likely can't be done. > 2. Be able to pass questions and take answers via say a customized "okay > google" to try to explain: > > Ask: how was your day > record answer in voice translate it via google ask new question Sorry, I don't understand this. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Unicode [was Re: Cult-like behaviour]
On Sun, 15 Jul 2018 18:02:51 -0700, Jim Lee wrote: > On 07/15/18 17:17, MRAB wrote: >> On 2018-07-16 00:10, Jim Lee wrote: [...] >>> Have you never heard of programming BEFORE Unicode existed? >>> >>> How ever did we get along? Mostly by not exchanging data with anyone else using a different language or operating system. As one of those people who *did* need to exchange data, between Windows using Latin-1 and Macs using MacRoman, I can absolutely tell you that we got on **REALLY, REALLY, REALLY BADLY** with data loss and corruption an almost guarantee. [...] > Yes, it was. However, dealing with Unicode is also annoying. If there > were only one encoding, such as UTF-8, I wouldn't mind so much. O_o As an application developer, you should (almost) never need to use any Unicode encoding other than UTF-8. [...] > But I don't speak Esperanto, and my programs don't generally care what > characters are used for European currencies. When I create a simple > program that takes a text file (created by me) and munges it into a > different format, I don't care if someone from Uzbekistan can read it or > not. Good for you. But Python is not a programming language written to satisfy the needs of people like you, and ONLY people like you. It is a language written to satisfy the needs of people from Uzbekistan, and China, and Japan, and India, and Brazil, and France, and Russia, and Australia, and the UK, and mathematicians, and historians, and linguists, and, yes, even people who think that if ISO-8859-7 was good enough for Jesus, the whole world ought to be using it. > When I create a one-time use program to visualize some data on a > graph, I don't care if anyone else can read the axis labels but me. > These are realities. A good programming language will allow for these > realities without putting the burden on the programmer to turn *every* > program into a politically correct, globalization compliant model of > modern groupthink. And here we get to the crux of the matter. It isn't really the technical issues of Unicode that annoy you. It is the loss of privilege that you, as an ASCII user, no longer get to dismiss 90% of the world as beneath your notice. Nice. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Unicode [was Re: Cult-like behaviour]
On Sun, 15 Jul 2018 17:39:55 -0700, Jim Lee wrote: > On 07/15/18 17:18, Steven D'Aprano wrote: >> On Sun, 15 Jul 2018 16:08:15 -0700, Jim Lee wrote: >> >>> Python3 is intrinsically tied to Unicode for string handling. >>> Therefore, the Python programmer is forced to deal with it (in all but >>> trivial cases), rather than given a choice. So I don't understand how >>> I can illustrate my point with Python code since Python won't let me >>> deal with strings without also dealing with Unicode. >> Nonsense. >> >> b"Look ma, a Python 2 style ASCII string." >> >> > As I said, all but trivial cases. > > Do you consider separating Unicode strings from byte strings, having to > decode and encode from one to the other, If you use nothing but byte strings, you don't need to separate the non- existent text strings from the byte strings, nor do you need to decode or encode. > and knowing which > functions/methods accept one, the other, or both as arguments, That's certainly a real complication, if I may stretch the meaning of the word "complication" beyond breaking point. Surely you are already having to read the documentation of the function to learn what arguments it takes, and what types they are (int or float, list or iterator, 'r' or 'a', etc). If someone can't deal with the question of "unicode or bytes" as well, then perhaps they ought to consider a career change to something less demanding, like politics. If, as you insinuate, all your data is 100% ASCII, then you have nothing to fear. Just treat str(bytes_obj, 'ASCII') bytes(str_obj, 'ASCII') as the equivalent of a cast or coercion, and you won't go wrong. (Of course, in 2018, the number of applications that can truly say all their data is pure ASCII is vanishingly small.) Or use Latin-1, if you want to do the most simple-minded thing that you can to make errors go away, without caring about correctness. But the thing is, that complexity is *inherent in the domain*. You can try to deal with it without Unicode, and as soon as you have users expecting to use more than one code page, you're doomed. > as "not dealing with Unicode"? I don't. Frankly, I do. Dealing with all the vagaries of human text *is* complicated, that's the nature of the beast. Dealing with the complexities of Unicode can be as complex as dealing with the complexities of floating point arithmetic. (But neither of those are even in the same ballpark as dealing with the complexities of *not* using Unicode: legacy code pages and encodings are a nightmare to deal with.) Nevertheless, just as casual users can go a very, very long way just treating floats as the real numbers we learn about in school, and trust that IEEE-754 semantics will mean your answers are "close enough", so the casual user can go a very long way ignoring the complexities of Unicode, so long as they control their own data and know what it is. If you don't know what your data is, then you're doomed, Unicode or no Unicode. (If you don't think that's a problem, if you think that "just treat text as octets" works, then people like you are the reason there is so much mojibake in the world, screwing it up for the rest of us.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Cult-like behaviour [was Re: Kindness]
On Sun, 15 Jul 2018 16:08:15 -0700, Jim Lee wrote: > Python3 is intrinsically tied to Unicode for string handling. Therefore, > the Python programmer is forced to deal with it (in all but trivial > cases), rather than given a choice. So I don't understand how I can > illustrate my point with Python code since Python won't let me deal with > strings without also dealing with Unicode. Nonsense. b"Look ma, a Python 2 style ASCII string." -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Cult-like behaviour [was Re: Kindness]
On Sun, 15 Jul 2018 13:09:59 -0700, Jim Lee wrote: > On 07/15/18 12:37, MRAB wrote: >> To me, Unicode and UTF-8 aren't things to be reserved for I18N. I use >> them as a matter of course because I find it a lot easier to stick with >> just one encoding, one that will work with _any_ text I have. > > Which is exactly the same rationale for using any other single encoding > (including ASCII). Which encoding should I choose? Having chosen one today, which encoding should I choose tomorrow? > If the text you deal with is not multi-lingual, why > complicate matters by trying to support a plethora of encodings which > will never be used (and the attendant opportunity for more bugs)? Who mentioned a plethora of encodings? With the boundaries of your application, using Python 3 text strings means never needing to even consider encodings. The only time you should care about them is when your data crosses the boundary between your application and the rest of the world (e.g. writing to files), and in that case, we should standardise on UTF-8 (unless there's a really good reason not to). Honestly Jim, your response sounds to me the equivalent of: "... and that's why structured programming will never catch on, and why unstructured programming with GOTO is better, faster, more reliable, and can do everything that the programmer needs." Aside from occasional legacy software reasons, I believe that one would have to ignore the last 30+ years of "code page hell" to even consider using anything but Unicode in modern application software. > Note that I'm *not* saying Unicode is *bad*, just that it's an > unnecessary complication for a great deal of programming tasks. For a > great deal more, it's absolutely necessary. That why I said a "smart" > language would make it easy to turn on and off. You actually said that I18N features should be able to be turned on and off. Unicode and I18N are unrelated. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Cult-like behaviour [was Re: Kindness]
On Sun, 15 Jul 2018 11:22:11 -0700, James Lee wrote: > On 7/15/2018 3:43 AM, Steven D'Aprano wrote: >> >> No. The real ten billion dollar question is how people in 2018 can >> stick their head in the sand and take seriously the position that >> Latin-1 (let alone ASCII) is enough for text strings. >> >> >> > Easy - for many people, 90% of the Python code they write is not > intended for world-wide distribution, let alone use. But they're not making claims about what works for *them*. If they did, I'd say "Okay, that works for you. Sorry you got left behind by progress." They're making grand sweeping claims about what works best for a language intended to be used by *everyone*. Marko isn't saying "I know my use-case is atypical, but I inherited a code base where the bytes/pseudo-text duality of Python2 strings was helpful to me, and Python3's strict division into byte strings and text strings is less useful." Rather, he is making the sweeping generalisation that having a text string type *at all* is a mistake, because the Python 2 dual bytes+pseudo text approach is superior, *for everyone*. > The smart thing would be for a language to have a switch of some sort to > turn on/off all I18N features. The Python language has no builtin I18N features. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Cult-like behaviour [was Re: Kindness]
On Sun, 15 Jul 2018 14:17:51 +0300, Marko Rauhamaa wrote: > Steven D'Aprano : > >> On Sun, 15 Jul 2018 11:43:14 +0300, Marko Rauhamaa wrote: >>> Paul Rubin : >>>> I don't think Go is the answer either, but it probably got strings >>>> right. What is the answer? >> >> Go strings aren't text strings. They're byte strings. When you say that >> Go got them right, that depends on your definition of success. >> >> If your definition of "success" is: >> >> - fail to be able to support 80% + of the world's languages >> and a majority of the world's text; > > Of course byte strings can support at least as many languages as > Python3's code point strings and at least equally well. You cannot possibly be serious. There are 256 possible byte values. China alone has over 10,000 different characters. You can't represent 10,000+ characters using only 256 distinct code points. You can't even represent the world's languages using 16-bit word-strings instead of byte strings. Watching somebody argue that byte strings are "equally as good" as a dedicated Unicode string type in 2018 is like seeing people argue in the late 1990s that this new-fangled "structured code" will never be better than unstructured code with GOTO. >> - perpetuate the anti-pattern where a single code point >> (hex value) can represent multiple characters, depending on what >> encoding you have in mind; > > That doesn't follow at all. Of course it does. You talked about using Latin-1. What's so special about Latin-1? Ask your Greek customers how useful that is to them, and explain why they can't use ISO-8859-7 instead. >> - to have a language where legal variable names cannot be >> represented as strings; [1] > > That's a rather Go-specific We were talking about whether or not Go had done strings right. > and uninteresting question, It's not a question, its a statement. And it might be uninteresting to you, but I find it astonishing. > but I'm fairly certain you can write a Go parser in Go So what? You can write a Go parser in Floop if you like. https://en.wikipedia.org/wiki/BlooP_and_FlooP > (if that's not how it's done already). > >> - to have a language where text strings are a second-class >> data type, not available in the language itself, only in the >> libraries; > > Unicode code point strings *ought* to be a second--class data type. They > were a valiant idea but in the end turned out to be a mistake. Just because you say they were a mistake, doesn't make it so. >> - to have a language where text characters are *literally* >> 32-bit integers ("rune" is an alias to int32); >> >> (you can multiple a linefeed by a grave accent and get pi) > > Again, that has barely anything to do with the topic at hand. It has *everything* to do with the topic at hand: did Go get strings right? > I don't > think there's any unproblematic way to capture a true text character, > period. Python3 certainly hasn't been able to capture it. Isaac Asimov's quote here is appropriate: When people thought the Earth was flat, they were wrong. When people thought the Earth was spherical, they were wrong. But if you think that thinking the Earth is spherical is just as wrong as thinking the Earth is flat, then your view is wronger than both of them put together. Unicode does not perfectly capture the human concept of "text characters" (and no consistent system ever will, because the human concept of a character is not consistent). But if you think that makes byte-strings *better* than Unicode text strings at representing text, then you are wronger than wrong. >>> That's the ten-billion-dollar question, isn't it?! >> >> No. The real ten billion dollar question is how people in 2018 can >> stick their head in the sand and take seriously the position that >> Latin-1 (let alone ASCII) is enough for text strings. > > Here's the deal: text strings are irrelevant for most modern programming > needs. Most software is middleware between the human and the terminal > device. Your view is completely, utterly inside out. The terminal is the middle layer, between the software and the human, not the software. > Carrying opaque octet strings from end to end is often the most > correct and least problematic thing to do. > On the other hand, Python3's code point strings mess things up for no > added value. You still can't upcase or downcase strings. Ah, the ol' "argument by counter-factual assertions". State something that isn't true, and claim it is true. py> "αγω".upper() 'ΑΓΩ' Looks like uppercasing to me. What does it look like to you? Taking a square root? (I can't believe I need to actually demonstrate this.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Cult-like behaviour [was Re: Kindness]
On Sun, 15 Jul 2018 11:39:40 +0300, Marko Rauhamaa wrote: > Steven D'Aprano : > >> Of course we have no idea what Marko's software is, or what it is >> doing, > > Correct, you don't, but the link Paul Rubin posted gives you an idea: > >Python 3 says: everything is Unicode (by default, except in certain >situations, and except if we send you crazy reencoded data, and even >then it's sometimes still unicode, albeit wrong unicode). I have a lot of respect for Armin Ronacher, but I think here he is badly wrong and he's just ranting. It is ludicrous to say "everything" is Unicode when Python provides a rich set of bytes APIs. He squeezes in a parenthesised "by default" there, but that undermines his rant. That's like saying that "everything in Python is an int" rather than a float, because is you don't include a decimal point or an exponent in numeric literals, you get ints. Or that "files in Python are always read-only" because the default for open() is to use read mode rather than write mode. >Filenames >are Unicode, Terminals are Unicode, stdin and out are Unicode, And indeed they are, in Windows, and so they should be, in Unix too. Maybe some day POSIX will recognise that the rest of the world exists and stop privileging ASCII. >there >is so much Unicode! And because UNIX is not Unicode, Python 3 now has >the stance that it's right and UNIX is wrong Armin seems to be implying that Unix is (1) the only OS in the world, and (2) beyond criticism. Neither of these are correct. Windows users might rightly ask why Armin cares what Unix does. Unix does a lot right, but not everything http://web.mit.edu/~simsong/www/ugh.pdf and its "everything is bytes" stance is badly wrong when it comes to user- visible textual elements like file names and the command prompt. We write `less README`, not `6c7320524541444d45`, and we should stop pretending that we're using bytes just because the underlying infrastructure uses bytes. We're using text. >> That's because URLs are fundamentally text strings. > > https://tools.ietf.org/html/rfc1738>: Irrelevant or obsolete or both. > A URL consists of ASCII-only characters that represent an octet string. Wrong. >> Quick quiz: which of the following are real URLs? (a) >> http://правительство.рф > > On the face of it, that is not a valid URL. If you had read the link I gave, or even if you copied and pasted the URL into any reasonably modern browser, you might have learned that it is a valid URL. > But try this: [snip] Indeed. Is there a reason why these shouldn't be considered serious bugs in the http library? -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list