Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
-- Neil Hodgson: The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string. Serious developers/typographers/users know that you can not compose a text in French with latin-1. This is now also the case with German (Germany). --- Neil's comment is correct, sys.getsizeof('a' * 1000 + 'z') 1026 sys.getsizeof('a' * 1000 + '€') 2040 This is not really the problem. Serious users may notice sooner or later, Python and Unicode are walking in opposite directions (technically and in spirit). timeit.repeat('a' * 1000 + 'ẞ') [1.1088995672090292, 1.0842266613261913, 1.1010779011941594] timeit.repeat('a' * 1000 + 'z') [0.6362570846925735, 0.6159128762502917, 0.6200501673623791] (Just an opinion) jmf -- http://mail.python.org/mailman/listinfo/python-list
ASCII versus non-ASCII [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Sun, 31 Mar 2013 00:35:23 -0700, jmfauth wrote: This is not really the problem. Serious users may notice sooner or later, Python and Unicode are walking in opposite directions (technically and in spirit). timeit.repeat('a' * 1000 + 'ẞ') [1.1088995672090292, 1.0842266613261913, 1.1010779011941594] timeit.repeat('a' * 1000 + 'z') [0.6362570846925735, 0.6159128762502917, 0.6200501673623791] Perhaps you should stick to Python 3.2, where ASCII strings are no faster than non-ASCII strings. Python 3.2 versus Python 3.3, no significant difference: # 3.2 py timeit.repeat('a' * 1000 + 'ẞ') [1.7418999671936035, 1.7198870182037354, 1.763346004486084] # 3.3 py timeit.repeat('a' * 1000 + 'ẞ') [1.8083378580026329, 1.818592812011484, 1.7922867869958282] Python 3.2, ASCII vs Non-ASCII: py timeit.repeat('a' * 1000 + 'z') [1.756322135925293, 1.8002049922943115, 1.721085958480835] py timeit.repeat('a' * 1000 + 'ẞ') [1.7209150791168213, 1.7162668704986572, 1.7260780334472656] In other words, if you stick to non-ASCII strings, Python 3.3 is no slower than Python 3.2. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 31/03/2013 08:35, jmfauth wrote: -- Neil Hodgson: The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string. Serious developers/typographers/users know that you can not compose a text in French with latin-1. This is now also the case with German (Germany). --- Neil's comment is correct, sys.getsizeof('a' * 1000 + 'z') 1026 sys.getsizeof('a' * 1000 + '€') 2040 This is not really the problem. Serious users may notice sooner or later, Python and Unicode are walking in opposite directions (technically and in spirit). timeit.repeat('a' * 1000 + 'ẞ') [1.1088995672090292, 1.0842266613261913, 1.1010779011941594] timeit.repeat('a' * 1000 + 'z') [0.6362570846925735, 0.6159128762502917, 0.6200501673623791] (Just an opinion) jmf I'm feeling very sorry for this horse, it's been flogged so often it's down to bare bones. -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Thu, Mar 28, 2013 at 8:37 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: I also wonder why the implementation bothers keeping a UTF-8 representation. That sounds like premature optimization to me. Surely you only need it when writing to a file with UTF-8 encoding? For most strings, that will never happen. ... the UTF-8 version. It'll keep it if it has it, and not else. A lot of content will go out in the same encoding it came in in, so it makes sense to hang onto it where possible. Not to me. That almost doubles the size of the string, on the off-chance that you'll need the UTF-8 encoding. Which for many uses, you don't, and even if you do, it seems like premature optimization to keep it around just in case. Encoding to UTF-8 will be fast for small N, and for large N, why carry around (potentially) multiple megabytes of duplicated data just in case the encoded version is needed some time? From the PEP: A new function PyUnicode_AsUTF8 is provided to access the UTF-8 representation. It is thus identical to the existing _PyUnicode_AsString, which is removed. The function will compute the utf8 representation when first called. Since this representation will consume memory until the string object is released, applications should use the existing PyUnicode_AsUTF8String where possible (which generates a new string object every time). APIs that implicitly converts a string to a char* (such as the ParseTuple functions) will use PyUnicode_AsUTF8 to compute a conversion. So the utf8 representation is not populated when the string is created, but when a utf8 representation is requested, and only when requested by the API that returns a char*, not by the API that returns a bytes object. -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Fri, Mar 29, 2013 at 12:11 AM, Ian Kelly ian.g.ke...@gmail.com wrote: From the PEP: A new function PyUnicode_AsUTF8 is provided to access the UTF-8 representation. It is thus identical to the existing _PyUnicode_AsString, which is removed. The function will compute the utf8 representation when first called. Since this representation will consume memory until the string object is released, applications should use the existing PyUnicode_AsUTF8String where possible (which generates a new string object every time). APIs that implicitly converts a string to a char* (such as the ParseTuple functions) will use PyUnicode_AsUTF8 to compute a conversion. So the utf8 representation is not populated when the string is created, but when a utf8 representation is requested, and only when requested by the API that returns a char*, not by the API that returns a bytes object. Since the PEP specifically mentions ParseTuple string conversion, I am thinking that this is probably the motivation for caching it. A string that is passed into a C function (that uses one of the various UTF-8 char* format specifiers) is perhaps likely to be passed into that function again at some point, so the UTF-8 representation is kept around to avoid the need to recompose it at on each call. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 2013-03-28, Ethan Furman et...@stoneleaf.us wrote: I cannot speak for the borg mind, but for myself a troll is anyone who continually posts rants (such as RR XL) or who continuously hijacks threads to talk about their pet peeve (such as jmf). Assuming jmf actually does care deeply and genuinely about Unicode implementations, and his postings reflect his actual position/opinion, then he's not a troll. Traditionally, a troll is someone who posts statements purely to provoke a response -- they don't really care about the topic and often don't believe what they're posting. -- Grant Edwards grant.b.edwardsYow! BARBARA STANWYCK makes at me nervous!! gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 03/29/2013 07:52 AM, Grant Edwards wrote: On 2013-03-28, Ethan Furman et...@stoneleaf.us wrote: I cannot speak for the borg mind, but for myself a troll is anyone who continually posts rants (such as RR XL) or who continuously hijacks threads to talk about their pet peeve (such as jmf). Assuming jmf actually does care deeply and genuinely about Unicode implementations, and his postings reflect his actual position/opinion, then he's not a troll. Traditionally, a troll is someone who posts statements purely to provoke a response -- they don't really care about the topic and often don't believe what they're posting. Even if he does care deeply and genuinely he still hijacks threads, still refuses the challenges to try X or Y and report back, and (ISTM) still refuses to learn. If that's not trollish behavior, what is it? FWIW I don't think he does care deeply and genuinely (at least not genuinely) or he would do more than whine about micro benchmarks and make sweeping statements like nobody here understands unicode (paraphrased). -- ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 2013-03-29, Ethan Furman et...@stoneleaf.us wrote: On 03/29/2013 07:52 AM, Grant Edwards wrote: On 2013-03-28, Ethan Furman et...@stoneleaf.us wrote: I cannot speak for the borg mind, but for myself a troll is anyone who continually posts rants (such as RR XL) or who continuously hijacks threads to talk about their pet peeve (such as jmf). Assuming jmf actually does care deeply and genuinely about Unicode implementations, and his postings reflect his actual position/opinion, then he's not a troll. Traditionally, a troll is someone who posts statements purely to provoke a response -- they don't really care about the topic and often don't believe what they're posting. Even if he does care deeply and genuinely he still hijacks threads, still refuses the challenges to try X or Y and report back, and (ISTM) still refuses to learn. If that's not trollish behavior, what is it? He might indeed be trolling. But what defines a troll is motive/intent, not behavior. Those behaviors are all common in non-troll net.kooks. Maybe I'm being a bit too old-school Usenet, but being rude, ignorant (even stubbornly so), wrong, or irrational doesn't make you a troll. What makes you a troll is intent. If you don't actually care about the topic but are posting because you enjoy poking people with a stick to watch them jump and howl, then you're a troll. FWIW I don't think he does care deeply and genuinely (at least not genuinely) or he would do more than whine about micro benchmarks and make sweeping statements like nobody here understands unicode (paraphrased). Perhaps he doesn't care about Unicode or Python performance. If so he's putting on a pretty good act -- if he's a troll, he's a good one and he's running a long game. Personally, I don't think he's a troll. I think he's obsessed with what he percieves as an issue with Python's string implementation. IOW, if he's a troll, he's got me fooled. -- Grant Edwards grant.b.edwardsYow! It's a hole all the at way to downtown Burbank! gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On 3/28/2013 10:37 PM, Steven D'Aprano wrote: Under what circumstances will a string be created from a wchar_t string? How, and why, would such a string be created? Why would Python still support strings containing surrogates when it now has a nice, shiny, surrogate-free flexible representation? I believe because surrogates are legal codepoints and users may put them in strings even though python does not (except for surrogate_escape error handling). I believe some of the internal complexity comes from supporting the old C-api so as to not immediately invalidate existing extensions. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 03/28/2013 02:31 PM, Ethan Furman wrote: On 03/28/2013 12:54 PM, ru...@yahoo.com wrote: On 03/28/2013 01:48 AM, Steven D'Aprano wrote: For someone who delights in pointing out the logical errors of others you are often remarkably sloppy in your own logic. Of course language can be both helpful and excessively strong. That is the case when language less strong would be equally or more helpful. It can also be the case when language less strong would be useless. I don't get your point. I was pointing out the fallacy in Steven's logic (which you cut). How is your statement relevant to that? Further, liar is both so non-objective and so pejoratively emotive that it is a word much more likely to be used by someone interested in trolling than in a serious discussion, so most sensible people here likely would not bite. Non-objective? If today poster B says X, and tomorrow poster B says s/he was unaware of X until just now, is not liar a reasonable conclusion? Of course not. People forget what they posted previously, change their mind, don't express what they intended perfectly, sometimes express a complex thought that the reader inaccurately perceives as contradictory, don't realize themselves that their thinking is contradictory, ... And of course who among us *not* a liar since we all lie from time to time. Lying involves intent to deceive. I haven't been following jmfauth's claims since they are not of interest to me, but going back and quickly looking at the posts that triggered the liar and idiot posts, I did not see anything that made me think that jmfauth was not sincere in his beliefs. Being wrong and being sincere are not exclusive. Nor did Steven even try to justify the liar claim. As to Mark Lawrence, that seemed like a pure I don't like you insult whose proper place is /dev/null. Even if the odds are 80% that the person is lying, why risk your own credibility by making a nearly impossible to substantiate claim? Someone may praise some company's product constantly online and be discovered to be a salesperson at that company. Most of the time you would be right to accuse the person of dishonesty. But I knew a person who was very young and naive, who really believed in the product and truly didn't see anything wrong in doing that. That doesn't make it good behavior but those who claimed he was hiding his identity for personal gain were wrong (at least as far as I could tell, knowing the person personally.) Just post the facts and let people draw their own conclusions; that's better than making aggressive and offensive claims than can never be proven. Calling people liars or idiots not only damages the reputation of the Python community in general [*1] but hurts your own credibility as well, since any sensible reader will wonder if other opinions you post are more influenced by your emotions than by your intelligence. I hope that we all agree that we want a nice, friendly, productive community where everyone is welcome. I hope so too but it is likely that some people want a place to develop and assert some sense of influence, engage in verbal duels, instigate arguments, etc. That can be true of regulars here as well as drive-by posters. But some people simply cannot or will not behave in ways that are compatible with those community values. There are some people whom we *do not want here* In other words, everyone is NOT welcome. Correct. Do you not agree? Don't ask me, ask Steven. He was the one who wrote two sentences earlier, ...we want a...community where everyone is welcome. I'll snip the rest of your post because it is your opinions and I've already said why I disagree. Most people are smart enough to make their own evaluations of posters here and if they are not, and reject python based on what they read from a single poster who obviously has strong views, then perhaps that's for the best. That possibility (which I think is very close to zero) is a tiny price to pay to avoid all the hostility and noise. [*1] See for example the blog post at http://joepie91.wordpress.com/2013/02/19/the-python-documentation-is-bad-and-you-should-feel-bad/ which was recently discussed in this list and in which the author wrote, the community around Python is one of the most hostile and unhelpful communities around any programming-related topic that I have ever seen. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 03/29/2013 02:26 PM, ru...@yahoo.com wrote: On 03/28/2013 02:31 PM, Ethan Furman wrote: On 03/28/2013 12:54 PM, ru...@yahoo.com wrote: On 03/28/2013 01:48 AM, Steven D'Aprano wrote: For someone who delights in pointing out the logical errors of others you are often remarkably sloppy in your own logic. Of course language can be both helpful and excessively strong. That is the case when language less strong would be equally or more helpful. It can also be the case when language less strong would be useless. I don't get your point. I was pointing out the fallacy in Steven's logic (which you cut). How is your statement relevant to that? Ah. I thought you were saying that in all cases helpful strong language would be even more helpful if less strong. Further, liar is both so non-objective and so pejoratively emotive that it is a word much more likely to be used by someone interested in trolling than in a serious discussion, so most sensible people here likely would not bite. Non-objective? If today poster B says X, and tomorrow poster B says s/he was unaware of X until just now, is not liar a reasonable conclusion? Of course not. People forget what they posted previously, change their mind, don't express what they intended perfectly, sometimes express a complex thought that the reader inaccurately perceives as contradictory, don't realize themselves that their thinking is contradictory, ... I agree, which is why I resisted my own impulse to call him a liar; however, he has been harping on this subject for months now, so I would be suprised if he actually was surprised and had forgotten... Lying involves intent to deceive. I haven't been following jmfauth's claims since they are not of interest to me, but going back and quickly looking at the posts that triggered the liar and idiot posts, I did not see anything that made me think that jmfauth was not sincere in his beliefs. Being wrong and being sincere are not exclusive. Nor did Steven even try to justify the liar claim. As to Mark Lawrence, that seemed like a pure I don't like you insult whose proper place is /dev/null. After months of jmf's antagonist posts, I don't blame them. I hope that we all agree that we want a nice, friendly, productive community where everyone is welcome. I hope so too but it is likely that some people want a place to develop and assert some sense of influence, engage in verbal duels, instigate arguments, etc. That can be true of regulars here as well as drive-by posters. But some people simply cannot or will not behave in ways that are compatible with those community values. There are some people whom we *do not want here* In other words, everyone is NOT welcome. Correct. Do you not agree? Don't ask me, ask Steven. He was the one who wrote two sentences earlier, ...we want a...community where everyone is welcome. Ah, right -- missed that! -- ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 03/27/2013 08:49 PM, rusi wrote: In particular You are a liar is as bad as You are an idiot The same statement can be made non-abusively thus: ... is not true because ... I don't agree. With all the posts and micro benchmarks and other drivel that jmf has inflicted on us, I find it /very/ hard to believe that he forgot -- which means he was deliberately lying. At some point we have to stop being gentle / polite / politically correct and call a shovel a shovel... er, spade. -- ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote: More seriously Ive never seen anyone -- cause or person -- aided by the use of excessively strong language. Of course not. By definition, if it helps, it wasn't *excessively* strong language. IOW I repeat my support for Ned's request: Ad hominiem attacks are not welcome, irrespective of the context/provocation. Insults are not ad hominem attacks. You sir, are a bounder and a cad. Furthermore, your argument is wrong, because of reasons. may very well be an insult, but it also may be correct, and the reasons logically valid. Your argument is wrong, because you are a bounder and a cad. is an ad hominem fallacy, because even bounders and cads may tell the truth occasionally, or be correct by accident. I find it interesting that nobody has yet felt the need to defend JMF, and tell me I was factually incorrect about him (as opposed to merely impolite or mean-spirited). In any case, I don't want this to be specifically about any one person, so let's move away from JMF. I disagree that hostile language is *always* inappropriate, although I agree that it is *usually* inappropriate. Although even that depends on what you define as hostile -- I would much prefer that people confronted me for being (supposedly) dishonest than silently shunning me without giving me any way to respond or correct either my behaviour or their (mis)apprehensions. Quite frankly, I think that the passive-aggressive silent treatment (kill-filing) is MUCH more hostile and mean-spirited[1] than honest, respectful, direct criticism, even when that criticism is about character (you sir are a lying scoundrel). I treat people the way I hope to be treated. As galling as it would be to be accused of lying, I would rather that you called me a liar to my face and gave me the opportunity to respond, than for you to ignore everything I said. I hope that we all agree that we want a nice, friendly, productive community where everyone is welcome. But some people simply cannot or will not behave in ways that are compatible with those community values. There are some people whom we *do not want here* -- spoilers and messers, vandals and spammers and cheats and liars and trolls and crackpots of all sorts. We only disagree as to the best way to make it clear to them that they are not welcome so long as they continue their behaviour. [1] Although sadly, given the reality of communication on the Internet, sometimes kill-filing is the least-worst option. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 07:12, Ethan Furman et...@stoneleaf.us wrote: On 03/27/2013 08:49 PM, rusi wrote: In particular You are a liar is as bad as You are an idiot The same statement can be made non-abusively thus: ... is not true because ... I don't agree. With all the posts and micro benchmarks and other drivel that jmf has inflicted on us, I find it /very/ hard to believe that he forgot -- which means he was deliberately lying. At some point we have to stop being gentle / polite / politically correct and call a shovel a shovel... er, spade. -- ~Ethan~ --- The problem is elsewhere. Nobody understand the examples I gave on this list, because nobody understand Unicode. These examples are not random examples, they are well thought. If you were understanding the coding of the characters, Unicode and what this flexible representation does, it would not be a problem for you to create analog examples. So, we are turning into circles. This flexible representation succeeds to cumulate in one shoot all the design mistakes it is possible to do, when one wishes to implements Unicode. Example of a good Unicode understanding. If you wish 1) to preserve memory, 2) to cover the whole range of Unicode, 3) to keep maximum performance while preserving the good work Unicode.org as done (normalization, sorting), there is only one solution: utf-8. For this you have to understand, what is really a unicode transformation format. Why all the actors, active in the text field, like MicroSoft, Apple, Adobe, the unicode compliant TeX engines, the foundries, the organisation in charge of the OpenType font specifications, are able to handle all this stuff correctly (understanding + implementation) and Python not?, I should say this is going beyond my understanding. Python has certainly and definitvely not revolutionize Unicode. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28/03/13 09:03, jmfauth wrote: The problem is elsewhere. Nobody understand the examples I gave on this list, because nobody understand Unicode. These examples are not random examples, they are well thought. If you were understanding the coding of the characters, Unicode and what this flexible representation does, it would not be a problem for you to create analog examples. So, we are turning into circles. This flexible representation succeeds to cumulate in one shoot all the design mistakes it is possible to do, when one wishes to implements Unicode. Example of a good Unicode understanding. If you wish 1) to preserve memory, 2) to cover the whole range of Unicode, 3) to keep maximum performance while preserving the good work Unicode.org as done (normalization, sorting), there is only one solution: utf-8. For this you have to understand, what is really a unicode transformation format. Why all the actors, active in the text field, like MicroSoft, Apple, Adobe, the unicode compliant TeX engines, the foundries, the organisation in charge of the OpenType font specifications, are able to handle all this stuff correctly (understanding + implementation) and Python not?, I should say this is going beyond my understanding. Python has certainly and definitvely not revolutionize Unicode. jmf You're confusing python's choice of internal string representation with the programmer's choice of encoding for communicating with other programs. I think most people agree that utf-8 is usually the best encoding to use for interoperating with other unicode aware software, but as a variable-length encoding it has disadvantages that make it unsuitable for use as an internal representation. Specifically, indexing a variable-length encoding like utf-8 is not as efficient as indexing a fixed-length encoding. Regards, Ian F -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 March 2013 09:03, jmfauth wxjmfa...@gmail.com wrote: The problem is elsewhere. Nobody understand the examples I gave on this list, because nobody understand Unicode. These examples are not random examples, they are well thought. There are many people here and among the Python devs who understand unicode. Similarly they have understood the examples that you have given. It has been accepted that there are a handful of cases where performance has been reduced as a result of the change. There are also many cases where the performance has improved. It is certainly not clear that there is an *overall* performance reduction for people using non latin-1 characters as you have often suggested. The reason your initial posts received a poor reception is that they were accompanied with pointless rants and arrogant claims that no one understood the problem. Had you simply reported the timing differences without the rants then I imagine that you would have received a response like Okay, there might be a few regressions. Can you open an issue on the tracker please?. Since then you have been relentlessly hijacking unrelated threads and this is clearly just trolling. If you were understanding the coding of the characters, Unicode and what this flexible representation does, it would not be a problem for you to create analog examples. So, we are turning into circles. This flexible representation succeeds to cumulate in one shoot all the design mistakes it is possible to do, when one wishes to implements Unicode. This is clearly untrue.The most significant design mistakes are the ones that lead to incorrect handling of unicode characters. This new implementation in Python 3.3 has been designed in a way that makes it possible to handle all unicode characters correctly. Example of a good Unicode understanding. If you wish 1) to preserve memory, 2) to cover the whole range of Unicode, 3) to keep maximum performance while preserving the good work Unicode.org as done (normalization, sorting), there is only one solution: utf-8. For this you have to understand, what is really a unicode transformation format. Again you pretend that others here don't understand. Most people here are well aware of utf-8 is. Your suggestion that maximum performance would be achieved if Python use utf-8 internally ignores the fact that it would have many negative performance implications for slicing and indexing and so on. Why all the actors, active in the text field, like MicroSoft, Apple, Adobe, the unicode compliant TeX engines, the foundries, the organisation in charge of the OpenType font specifications, are able to handle all this stuff correctly (understanding + implementation) and Python not?, I should say this is going beyond my understanding. Python has certainly and definitvely not revolutionize Unicode. Perhaps not, but it does now correctly handle all unicode characters (unlike many other languages and pieces of software). Oscar -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 4:20 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote: In particular You are a liar is as bad as You are an idiot The same statement can be made non-abusively thus: ... is not true because ... I accept that criticism, even if I disagree with it. Does that make sense? I mean it in the sense that I accept that your opinion differs from mine. Politeness does not always trump honesty, and stating that somebody's statement is not true because... is not the same as stating that they are deliberately telling lies (rather than merely being mistaken or confused). There comes a time when a bit of rudeness is a small cost to pay for forum maintenance. Before you criticize someone for nit-picking, think what happens when someone reads the thread archive. Of course, that particular example can be done courteously too - cf the def vs class nit from a recent thread. But it'd still be of value even if done rudely, so the hundreds of subsequent readers would have a chance to know what's going on. I was researching a problem with ALSA a couple of weeks ago, and came across a forum thread that discussed exactly what I needed to know. A dozen or so courteous posts delivered misinformation; finally someone had the guts to be rude and call people out for posting incorrect points (and got criticized for doing so), and that one post was the most useful in the whole thread. I'd rather this list have some vinegar than it devolve into uselessness. Or, worse, if there's a hard-and-fast rule about courtesy, devolve into aspartame... everyone's courteous in words but hates each other underneath. Or am I taking the analogy too far? :) ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wxjmfa...@gmail.com wrote: Example of a good Unicode understanding. If you wish 1) to preserve memory, 2) to cover the whole range of Unicode, 3) to keep maximum performance while preserving the good work Unicode.org as done (normalization, sorting), there is only one solution: utf-8. For this you have to understand, what is really a unicode transformation format. You really REALLY need to sort out in your head the difference between correctness and performance. I still haven't seen one single piece of evidence from you that Python 3.3 fails on any point of Unicode correctness. Covering the whole range of Unicode has never been a problem. In terms of memory usage and performance, though, there's one obvious solution. Fork CPython 3.3 (or the current branch head[1]), change the internal representation of a string to be UTF-8 (by the way, that's the official spelling), and run the string benchmarks. Then post your code and benchmark figures so other people can replicate your results. Python has certainly and definitvely not revolutionize Unicode. This is one place where you're actually correct, though, because PEP 393 isn't the first instance of this kind of format - Pike's had it for years. Funny though, I don't think that was your point :) [1] Apologies if my terminology is wrong, I'm a git user and did one quick Google search to see if hg uses the same term. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Ian Foote: Specifically, indexing a variable-length encoding like utf-8 is not as efficient as indexing a fixed-length encoding. Many common string operations do not require indexing by character which reduces the impact of this inefficiency. UTF-8 seems like a reasonable choice for an internal representation to me. One benefit of UTF-8 over Python's flexible representation is that it is, on average, more compact over a wide set of samples. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28/03/2013 03:18, Ethan Furman wrote: I wouldn't call it unproductive -- a half-dozen amusing posts followed because of Mark's initial post, and they were a great relief from the tedium and (dare I say it?) idiocy of jmf's posts. -- ~Ethan~ Thanks for those words. They're a tonic as I've just clawed my way out of bed at 12:00 GMT having slept for 15 hours. Once the PEP393 unicode debacle has been sorted, does anyone have a cure for Chronic Fatigue Syndrome? :) -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote: Ian Foote: Specifically, indexing a variable-length encoding like utf-8 is not as efficient as indexing a fixed-length encoding. Many common string operations do not require indexing by character which reduces the impact of this inefficiency. Which common string operations do you have in mind? Specifically in Python's case, the most obvious counter-example is the length of a string. But that's only because Python strings are immutable objects, and include a field that records the length. So once the string is created, checking its length takes constant time. Some string operations need to inspect every character, e.g. str.upper(). Even for them, the increased complexity of a variable-width encoding costs. It's not sufficient to walk the string inspecting a fixed 1, 2 or 4 bytes per character. You have to walk the string grabbing 1 byte at a time, and then decide whether you need another 1, 2 or 3 bytes. Even though it's still O(N), the added bit-masking and overhead of variable- width encoding adds to the overall cost. Any string method that takes a starting offset requires the method to walk the string byte-by-byte. I've even seen languages put responsibility for dealing with that onto the programmer: the start offset is given in *bytes*, not characters. I don't remember what language this was... it might have been Haskell? Whatever it was, it horrified me. UTF-8 seems like a reasonable choice for an internal representation to me. It's not. Haskell, for example, uses UTF-8 internally, and it warns that this makes string operations O(N) instead of O(1) precisely because of the need to walk the string inspecting every byte. Remember, when your string primitives are O(N), it is very easy to write code that becomes O(N**2). Using UTF-8 internally is just begging for user-written code to be O(N**2). One benefit of UTF-8 over Python's flexible representation is that it is, on average, more compact over a wide set of samples. Sure. And over a different set of samples, it is less compact. If you write a lot of Latin-1, Python will use one byte per character, while UTF-8 will use two bytes per character. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 11:30, Chris Angelico ros...@gmail.com wrote: On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wxjmfa...@gmail.com wrote: - You really REALLY need to sort out in your head the difference between correctness and performance. I still haven't seen one single piece of evidence from you that Python 3.3 fails on any point of Unicode correctness. That's because you are not understanding unicode. Unicode takes you from the character to the unicoded transformed fomat via the code point, working with a unique set of characters with a contigoous range of code points. Then it is up to the implementors (languages, compilers, ...) to implement this utf. Covering the whole range of Unicode has never been a problem. ... for all those, who are following the scheme explained above. And it magically works smoothly. Of course, there are some variations due to the Character Encoding Form wich is later influenced by the Character Encoding Scheme (the serialization of the character Encoding Scheme). Rough explanation in other words. I does not matter if you are using utf-8, -16, -32, ucs2 or ucs4. All the single characters are handled in the same way with the same algorithm. --- The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... PS I never propose to use utf-8. I only spoke about utf-8 as an example. If you start to discuss indexing, you are off-topic. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 14:01, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote: Ian Foote: One benefit of UTF-8 over Python's flexible representation is that it is, on average, more compact over a wide set of samples. Sure. And over a different set of samples, it is less compact. If you write a lot of Latin-1, Python will use one byte per character, while UTF-8 will use two bytes per character. This flexible string representation is so absurd that not only it does not know you can not write Western European Languages with latin-1, it penalizes you by just attempting to optimize latin-1. Shown in my multiple examples. (This is a similar case of the long and short int question/dicussion Chris Angelico opened). PS1: I received plenty of private mails. I'm suprise, how the dev do not understand unicode. PS2: Question I received once from a registrated French Python Developper (in another context). What are those French characters you can handle with cp1252 and not with latin-1? jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote: This flexible string representation is so absurd that not only it does not know you can not write Western European Languages with latin-1, it penalizes you by just attempting to optimize latin-1. Shown in my multiple examples. PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Both are optimized down to a single byte per character. Option 2 is optimized to two bytes per character. Option 3 is stored in UTF-32. Once again, jmf, you are forgetting that option 2 is a safe and bug-free optimization. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28/03/2013 12:11, Neil Hodgson wrote: Ian Foote: Specifically, indexing a variable-length encoding like utf-8 is not as efficient as indexing a fixed-length encoding. Many common string operations do not require indexing by character which reduces the impact of this inefficiency. UTF-8 seems like a reasonable choice for an internal representation to me. One benefit of UTF-8 over Python's flexible representation is that it is, on average, more compact over a wide set of samples. Implementing the regex module (http://pypi.python.org/pypi/regex) would have been more difficult if the internal representation had been UTF-8, because of the need to decode, and the implementation would also have been slower for that reason. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 1:51 AM, MRAB pyt...@mrabarnett.plus.com wrote: On 28/03/2013 12:11, Neil Hodgson wrote: Ian Foote: Specifically, indexing a variable-length encoding like utf-8 is not as efficient as indexing a fixed-length encoding. Many common string operations do not require indexing by character which reduces the impact of this inefficiency. UTF-8 seems like a reasonable choice for an internal representation to me. One benefit of UTF-8 over Python's flexible representation is that it is, on average, more compact over a wide set of samples. Implementing the regex module (http://pypi.python.org/pypi/regex) would have been more difficult if the internal representation had been UTF-8, because of the need to decode, and the implementation would also have been slower for that reason. In fact, nearly ALL string parsing operations would need to be done differently. The only method that I can think of that wouldn't be impacted is a linear state-machine parser - something that could be written inside a for character in string loop. text = [] def initial(c): global state if c=='': state=tag else: text.append(c) def tag(c): global state if c=='': state=initial state = initial for character in string: state(character) print(''.join(text)) I'm pretty sure this will run in O(N) time, even with UTF-8 strings. But it's an *extremely* simple parser. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote: On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote: This flexible string representation is so absurd that not only it does not know you can not write Western European Languages with latin-1, it penalizes you by just attempting to optimize latin-1. Shown in my multiple examples. PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Both are optimized down to a single byte per character. Option 2 is optimized to two bytes per character. Option 3 is stored in UTF-32. Once again, jmf, you are forgetting that option 2 is a safe and bug-free optimization. ChrisA As long as you are attempting to devide a set of characters in chunks and try to handle them seperately, it will never work. Read my previous post about the unicode transformation format. I know what pep393 does. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 2:14 AM, jmfauth wxjmfa...@gmail.com wrote: As long as you are attempting to devide a set of characters in chunks and try to handle them seperately, it will never work. Okay. Let's look at integers. To properly represent the Python 3 'int' type (or the Python 2 'long'), we need to be able to encode ANY integer. And of course, any attempt to divide them up into chunks will never work. So we need a single representation that will cover ANY integer, right? Perfect. We already have one of those, detailed in RFC 2795. (It's coming up to its thirteenth anniversary in a day or two. Very appropriate.) http://tools.ietf.org/html/rfc2795#section-4 Are you saying Python's integers should be stored as I-TAGs? ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 16:14, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 15:38, Chris Angelico ros...@gmail.com wrote: On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wxjmfa...@gmail.com wrote: This flexible string representation is so absurd that not only it does not know you can not write Western European Languages with latin-1, it penalizes you by just attempting to optimize latin-1. Shown in my multiple examples. PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Both are optimized down to a single byte per character. Option 2 is optimized to two bytes per character. Option 3 is stored in UTF-32. Once again, jmf, you are forgetting that option 2 is a safe and bug-free optimization. ChrisA As long as you are attempting to devide a set of characters in chunks and try to handle them seperately, it will never work. Read my previous post about the unicode transformation format. I know what pep393 does. jmf Addendum. This was you correctly percieved in one another thread. You qualified it as a switch. Now you have to understand from where this switch is coming from. jmf by toy with -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 7:01 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Any string method that takes a starting offset requires the method to walk the string byte-by-byte. I've even seen languages put responsibility for dealing with that onto the programmer: the start offset is given in *bytes*, not characters. I don't remember what language this was... it might have been Haskell? Whatever it was, it horrified me. Go does this. I remember because it came up in one of these threads, where jmf (or was it Ranting Rick?) was praising Go for just getting Unicode right. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 3/28/2013 10:38 AM, Chris Angelico wrote: PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Yes. 'Encoding' an ascii-only string to any ascii-compatible encoding amounts to a simple copy of the internal bytes. I do not know if *all* the codecs for such encodings are 393-aware, but I do know that the utf-8 and latin-1 group are. This is one operation that 3.3+ does much faster than 3.2- -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico ros...@gmail.com wrote: PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Both are optimized down to a single byte per character. The only difference for ASCII-only strings is that they are kept in a struct with a smaller header. The smaller header omits the utf8 pointer (which optionally points to an additional UTF-8 representation of the string) and its associated length variable. These are not needed for ASCII-only strings because an ASCII string can be directly interpreted as a UTF-8 string for the same result. The smaller header also omits the wstr_length field which, according to the PEP, differs from length only if there are surrogate pairs in the representation. For an ASCII string, of course there would not be any surrogate pairs. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 3:01 AM, Terry Reedy tjre...@udel.edu wrote: On 3/28/2013 10:38 AM, Chris Angelico wrote: PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Yes. 'Encoding' an ascii-only string to any ascii-compatible encoding amounts to a simple copy of the internal bytes. I do not know if *all* the codecs for such encodings are 393-aware, but I do know that the utf-8 and latin-1 group are. This is one operation that 3.3+ does much faster than 3.2- Thanks Terry. So that's not so much a representation difference as a flag that costs little or nothing to retain, and can improve performance in the encode later on. Sounds like a useful tweak to the basics of flexible string representation, without being particularly germane to jmf's complaints. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Chris, Your problem with int/long, the start of this thread, is very intersting. This is not a demonstration, a proof, rather an illustration. Assume you have a set of integers {0...9} and an operator, let say, the addition. Idea. Just devide this set in two chunks, {0...4} and {5...9} and work hardly to optimize the addition of 2 operands in the sets {0...4}. The problems. - When optimizing {0...4}, your algorithm will most probably weaken {5...9}. - When using {5...9}, you do not benefit from your algorithm, you will be penalized just by the fact you has optimized {0...4} - And the first mistake, you are just penalized and impacted by the fact you have to select in which subset you operands are when working with {0...9}. Very interestingly, working with the representation (bytes) of these integers will not help. You have to consider conceptually {0..9} as numbers. Now, replace numbers by characters, bytes by encoded code points, and you have qualitatively the flexible string representation. In Unicode, there is one more level of abstraction: one conceptually neither works with characters, nor with encoded code points, but with unicode transformed formated entities. (see my previous post). That means you can work very hardly on the bytes levels, you will never solves the problem which is one level higher in the unicode hierarchy: character - code point - utf - bytes (implementation) with the important fact that this construct can only go from left to right. --- In fact, by proposing a flexible representation of ints, you may just fall in the same trap the flexible string representation presents. All this stuff is explained in good books about the coding of the characters and/or unicode. The unicode.org documention explains it too. It is a little bit harder to discover, because the doc is presenting always this stuff from a technical perspective. You get it when reading a large part of the Unicode doc. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 3:55 AM, jmfauth wxjmfa...@gmail.com wrote: Assume you have a set of integers {0...9} and an operator, let say, the addition. Idea. Just devide this set in two chunks, {0...4} and {5...9} and work hardly to optimize the addition of 2 operands in the sets {0...4}. The problems. - When optimizing {0...4}, your algorithm will most probably weaken {5...9}. - When using {5...9}, you do not benefit from your algorithm, you will be penalized just by the fact you has optimized {0...4} - And the first mistake, you are just penalized and impacted by the fact you have to select in which subset you operands are when working with {0...9}. Very interestingly, working with the representation (bytes) of these integers will not help. You have to consider conceptually {0..9} as numbers. Yeah, and there's an easy representation of those numbers. But let's look at Python's representations of integers. I have a sneaking suspicion something takes note of how large the number is before deciding how to represent it. Look! sys.getsizeof(1) 14 sys.getsizeof(12) 14 sys.getsizeof(14) 14 sys.getsizeof(18) 14 sys.getsizeof(131) 18 sys.getsizeof(130) 18 sys.getsizeof(116) 16 sys.getsizeof(112345) 1660 sys.getsizeof(1123456) 16474 Small numbers are represented more compactly than large ones! And it's not like in REXX, where all numbers are stored as strings. Go fork CPython and make the changes you suggest. Then run real-world code on it and see how it performs. Or at very least, run plausible benchmarks like the strings benchmark from the standard tests. My original post about integers was based on two comparisons: Python 2 and Pike. Both languages have an optimization for small integers (where small is within machine word - on rechecking some of my stats, I find that I perhaps should have used a larger offset, as the 64-bit Linux Python I used appeared to be a lot faster than it should have been), which Python 3 doesn't have. Real examples, real statistics, real discussion. (I didn't include Pike stats in what I posted, for two reasons: firstly, it would require a reworking of the code, rather than simply run this under both interpreters; and secondly, Pike performance is completely different from CPython performance, and is non-comparable. Pike is more similar to PyPy, able to compile - in certain circumstances - to machine code. So the comparisons were Py2 vs Py3.) ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wxjmfa...@gmail.com wrote: If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. I'm not following your grammar perfectly here, but if Python were implementing Unicode correctly, there would be no difference between any of those characters, which is the way a *wide* build works. With a narrow build, there is a difference between BMP and non-BMP characters. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 03/28/2013 01:48 AM, Steven D'Aprano wrote: On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote: More seriously Ive never seen anyone -- cause or person -- aided by the use of excessively strong language. Of course not. By definition, if it helps, it wasn't *excessively* strong language. For someone who delights in pointing out the logical errors of others you are often remarkably sloppy in your own logic. Of course language can be both helpful and excessively strong. That is the case when language less strong would be equally or more helpful. IOW I repeat my support for Ned's request: Ad hominiem attacks are not welcome, irrespective of the context/provocation. Insults are not ad hominem attacks. Insults may or may not be ad hominem attacks. There is nothing mutually exclusive about those terms. You sir, are a bounder and a cad. Furthermore, your argument is wrong, because of reasons. may very well be an insult, but it also may be correct, and the reasons logically valid. Those are two different statements. The first is an ad hominem attack and is not welcome here. The second is an acceptable response. Your argument is wrong, because you are a bounder and a cad. is an ad hominem fallacy, because even bounders and cads may tell the truth occasionally, or be correct by accident. That it is a fallacy does not mean it is not also an attack. I find it interesting that nobody has yet felt the need to defend JMF, and tell me I was factually incorrect about him (as opposed to merely impolite or mean-spirited). Nothing interesting about it at all. Most of us (perhaps unlike you) are not interested in discussing the personal characteristics of posters here (in contrast to discussing the technical opinions they post). Further, liar is both so non-objective and so pejoratively emotive that it is a word much more likely to be used by someone interested in trolling than in a serious discussion, so most sensible people here likely would not bite. [...] I would rather that you called me a liar to my face and gave me the opportunity to respond, than for you to ignore everything I said. Even if you personally would prefer someone to respond by calling you a liar, your personal preferences do not form a basis for desirable posting behavior here. But again you're creating a false dichotomy. Those are not the only two choices. A third choice is neither ignore you nor call you a liar but to factually point out where you are wrong, or (if it is a matter of opinion) why one holds a different opinion. That was the point Ned Deily was making I believe. I hope that we all agree that we want a nice, friendly, productive community where everyone is welcome. I hope so too but it is likely that some people want a place to develop and assert some sense of influence, engage in verbal duels, instigate arguments, etc. That can be true of regulars here as well as drive-by posters. But some people simply cannot or will not behave in ways that are compatible with those community values. There are some people whom we *do not want here* In other words, everyone is NOT welcome. -- spoilers and messers, vandals and spammers and cheats and liars and trolls and crackpots of all sorts. Where those terms are defined by you and a handful of other voracious posters. Troll in particular is often used to mean someone who disagrees with the borg mind here, or who says anything negative about Python, or who due attitude or lack of full English fluency do not express themselves in a sufficiently submissive way. We only disagree as to the best way to make it clear to them that they are not welcome so long as they continue their behaviour. No, we disagree on who fits those definitions and even how tolerant we are to those who do fit the definitions. The policing that you and a handful of other self-appointed net-cops try to do is far more obnoxious that the original posts are. [1] Although sadly, given the reality of communication on the Internet, sometimes kill-filing is the least-worst option. Please, please, killfile jmfauth, ranting rick, xaw lee and anyone else you don't like so that the rest of us can be spared the orders of magnitude larger, more disruptive and more offensive posts generated by your (plural) responses to them. Believe or not, most of the rest of us here are smart enough to form our own opinions of such posters without you and the other c.l.p truthsquad members telling us what to think. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
In article captjjmozdhsmuqx7vcpuii2bwrcnzcx76pm-6unb1duq4do...@mail.gmail.com, Chris Angelico ros...@gmail.com wrote: I'd rather this list have some vinegar than it devolve into uselessness. Or, worse, if there's a hard-and-fast rule about courtesy, devolve into aspartame... everyone's courteous in words but hates each other underneath. Or am I taking the analogy too far? :) I think you are positing false choices. No one - at least I'm not - is advocating to avoid challenging false or misleading statements in the interests of maintaining some false see how well we all get along facade. The point is we can have meaningful, hard-nosed discussions without resorting to personal insults, i.e. flaming. I think the discussion in this topic over the past 24 hours or so demonstrates that. -- Ned Deily, n...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 18:55, Chris Angelico ros...@gmail.com wrote: On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wxjmfa...@gmail.com wrote: If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. I'm not following your grammar perfectly here, but if Python were implementing Unicode correctly, there would be no difference between any of those characters, which is the way a *wide* build works. With a narrow build, there is a difference between BMP and non-BMP characters. ChrisA The wide build (I never used) is in my mind as correct as the narrow build. It just covers a different range in unicode (the whole range). Claiming that the narrow build is buggy, because it does not cover the whole unicode is not correct. Unicode does not stipulate, one has to cover the whole range. Unicode expects that every character in a range behaves the same way. This is clearly not realized with the flexible string representation. An user should not be somehow penalized simply because it not an ascii user. If you take the fonts in consideration (btw a problem nobody is speaking about) and you ensure your application, toolkit, ... is MES-X or WGL4 compliant, your are also deliberately (and correctly) working with a restriced unicode range. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf You still have yet to explain how Python's string representation is wrong. Just how it isn't optimal for one specific case. Here's how I understand it: 1) Strings are sequences of stuff. Generally, we talk about strings as either sequences of bytes or sequences of characters. 2) Unicode is a format used to represent characters. Therefore, Unicode strings are character strings, not byte strings. 2) Encodings are functions that map characters to bytes. They typically also define an inverse function that converts from bytes back to characters. 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I mentioned in the previous point. It happens to be one of the five standard encodings that is defined for all characters in the Unicode standard (the others being the little and big endian variants of UTF-16 and UTF-32). 4) The internal representation of a character string DOES NOT MATTER. All that matters is that the API represents it as a string of characters, regardless of the representation. We could implement character strings by putting the Unicode code-points in binary-coded decimal and it would be a Unicode character string. 5) The String type that .NET and Java (and unicode type in Python narrow builds) use is not a character string. It is a string of shorts, each of which corresponds to a UTF-16 code point. I know this is the case because in all of these, the length of \u1f435 is 2 even though it only consists of one character. 6) The new string representation in Python 3.3 can successfully represent all characters in the Unicode standard. The actual number of bytes that each character consumes is invisible to the user. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 03/28/2013 12:54 PM, ru...@yahoo.com wrote: On 03/28/2013 01:48 AM, Steven D'Aprano wrote: On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote: For someone who delights in pointing out the logical errors of others you are often remarkably sloppy in your own logic. Of course language can be both helpful and excessively strong. That is the case when language less strong would be equally or more helpful. It can also be the case when language less strong would be useless. Further, liar is both so non-objective and so pejoratively emotive that it is a word much more likely to be used by someone interested in trolling than in a serious discussion, so most sensible people here likely would not bite. Non-objective? If today poster B says X, and tomorrow poster B says s/he was unaware of X until just now, is not liar a reasonable conclusion? I hope that we all agree that we want a nice, friendly, productive community where everyone is welcome. I hope so too but it is likely that some people want a place to develop and assert some sense of influence, engage in verbal duels, instigate arguments, etc. That can be true of regulars here as well as drive-by posters. But some people simply cannot or will not behave in ways that are compatible with those community values. There are some people whom we *do not want here* In other words, everyone is NOT welcome. Correct. Do you not agree? -- spoilers and messers, vandals and spammers and cheats and liars and trolls and crackpots of all sorts. Where those terms are defined by you and a handful of other voracious posters. Troll in particular is often used to mean someone who disagrees with the borg mind here, or who says anything negative about Python, or who due attitude or lack of full English fluency do not express themselves in a sufficiently submissive way. I cannot speak for the borg mind, but for myself a troll is anyone who continually posts rants (such as RR XL) or who continuously hijacks threads to talk about their pet peeve (such as jmf). We only disagree as to the best way to make it clear to them that they are not welcome so long as they continue their behaviour. No, we disagree on who fits those definitions and even how tolerant we are to those who do fit the definitions. The policing that you and a handful of other self-appointed net-cops try to do is far more obnoxious that the original posts are. I completely disagree, and I am grateful to those who bother to take the time to continually point out the errors from those posters and to warn newcomers that those posters should not be believed. Believe or not, most of the rest of us here are smart enough to form our own opinions of such posters without you and the other c.l.p truthsquad members telling us what to think. If one of my first few posts on c.l.p netted a response from a troll I would greatly appreciate a reply from one of the regulars saying that was a troll so I didn't waste time trying to use whatever they said, or be concerned that the language I was trying to use and learn was horribly flawed. If the truthsquad posts are so offensive to you, why don't you kill-file them? -- ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf You still have yet to explain how Python's string representation is wrong. Just how it isn't optimal for one specific case. Here's how I understand it: 1) Strings are sequences of stuff. Generally, we talk about strings as either sequences of bytes or sequences of characters. 2) Unicode is a format used to represent characters. Therefore, Unicode strings are character strings, not byte strings. 2) Encodings are functions that map characters to bytes. They typically also define an inverse function that converts from bytes back to characters. 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I mentioned in the previous point. It happens to be one of the five standard encodings that is defined for all characters in the Unicode standard (the others being the little and big endian variants of UTF-16 and UTF-32). 4) The internal representation of a character string DOES NOT MATTER. All that matters is that the API represents it as a string of characters, regardless of the representation. We could implement character strings by putting the Unicode code-points in binary-coded decimal and it would be a Unicode character string. 5) The String type that .NET and Java (and unicode type in Python narrow builds) use is not a character string. It is a string of shorts, each of which corresponds to a UTF-16 code point. I know this is the case because in all of these, the length of \u1f435 is 2 even though it only consists of one character. 6) The new string representation in Python 3.3 can successfully represent all characters in the Unicode standard. The actual number of bytes that each character consumes is invisible to the user. -- I shew enough examples. As soon as you are using non latin-1 chars your optimization just became irrelevant and not only this, you are penalized. I'm sorry, saying Python now is just covering the whole unicode range is not a valuable excuse. I prefer a correct version with a narrower range of chars, especially if this range represents the daily used chars. I can go a step further, if I wish to write an application for Western European users, I'm better served if I'm using a coding scheme covering all thesee languages/scripts. What about cp1252 [*]? Does this not remind somthing? Python can do better, it only succeeds to do worth! [*] yes, I kwnow, internally jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28 mar, 22:11, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf You still have yet to explain how Python's string representation is wrong. Just how it isn't optimal for one specific case. Here's how I understand it: 1) Strings are sequences of stuff. Generally, we talk about strings as either sequences of bytes or sequences of characters. 2) Unicode is a format used to represent characters. Therefore, Unicode strings are character strings, not byte strings. 2) Encodings are functions that map characters to bytes. They typically also define an inverse function that converts from bytes back to characters. 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I mentioned in the previous point. It happens to be one of the five standard encodings that is defined for all characters in the Unicode standard (the others being the little and big endian variants of UTF-16 and UTF-32). 4) The internal representation of a character string DOES NOT MATTER. All that matters is that the API represents it as a string of characters, regardless of the representation. We could implement character strings by putting the Unicode code-points in binary-coded decimal and it would be a Unicode character string. 5) The String type that .NET and Java (and unicode type in Python narrow builds) use is not a character string. It is a string of shorts, each of which corresponds to a UTF-16 code point. I know this is the case because in all of these, the length of \u1f435 is 2 even though it only consists of one character. 6) The new string representation in Python 3.3 can successfully represent all characters in the Unicode standard. The actual number of bytes that each character consumes is invisible to the user. -- I shew enough examples. As soon as you are using non latin-1 chars your optimization just became irrelevant and not only this, you are penalized. I'm sorry, saying Python now is just covering the whole unicode range is not a valuable excuse. I prefer a correct version with a narrower range of chars, especially if this range represents the daily used chars. I can go a step further, if I wish to write an application for Western European users, I'm better served if I'm using a coding scheme covering all thesee languages/scripts. What about cp1252 [*]? Does this not remind somthing? Python can do better, it only succeeds to do worth! [*] yes, I kwnow, internally jmf - Addendum. And you kwow what? Py34 will suffer from the same desease. You are spending your time in improving chunks of bytes, when the problem is elsewhere. In fact you are working for peanuts, eg the replacing method. If you are not satisfied with my examples, just pick up the examples of GvR (ascii-string) on the bug tracker, timeit them and you will see there is already a problem. Better, timeit them afeter having replaced his ascii-strings with non ascii characters... jmf and you will see, there is -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 7:26 AM, jmfauth wxjmfa...@gmail.com wrote: The wide build (I never used) is in my mind as correct as the narrow build. It just covers a different range in unicode (the whole range). Actually it does; it covers all of the Unicode range, by using (effectively) UTF-16. Characters that cannot be represented in one 16-bit number are represented in two. That's not just covering a different range. It's being buggy. And it's creating a way for code to unexpectedly behave fundamentally differently on Windows and Linux (since the most common builds for Windows were narrow and for Linux were wide). This is a Bad Thing for Python. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28/03/2013 21:11, jmfauth wrote: On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf You still have yet to explain how Python's string representation is wrong. Just how it isn't optimal for one specific case. Here's how I understand it: 1) Strings are sequences of stuff. Generally, we talk about strings as either sequences of bytes or sequences of characters. 2) Unicode is a format used to represent characters. Therefore, Unicode strings are character strings, not byte strings. 2) Encodings are functions that map characters to bytes. They typically also define an inverse function that converts from bytes back to characters. 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I mentioned in the previous point. It happens to be one of the five standard encodings that is defined for all characters in the Unicode standard (the others being the little and big endian variants of UTF-16 and UTF-32). 4) The internal representation of a character string DOES NOT MATTER. All that matters is that the API represents it as a string of characters, regardless of the representation. We could implement character strings by putting the Unicode code-points in binary-coded decimal and it would be a Unicode character string. 5) The String type that .NET and Java (and unicode type in Python narrow builds) use is not a character string. It is a string of shorts, each of which corresponds to a UTF-16 code point. I know this is the case because in all of these, the length of \u1f435 is 2 even though it only consists of one character. 6) The new string representation in Python 3.3 can successfully represent all characters in the Unicode standard. The actual number of bytes that each character consumes is invisible to the user. -- I shew enough examples. As soon as you are using non latin-1 chars your optimization just became irrelevant and not only this, you are penalized. I'm sorry, saying Python now is just covering the whole unicode range is not a valuable excuse. I prefer a correct version with a narrower range of chars, especially if this range represents the daily used chars. I can go a step further, if I wish to write an application for Western European users, I'm better served if I'm using a coding scheme covering all thesee languages/scripts. What about cp1252 [*]? Does this not remind somthing? Python can do better, it only succeeds to do worth! [*] yes, I kwnow, internally If you're that concerned about it, why don't you modify the source code so that the string representation chooses between only 2 bytes and 4 bytes per codepoint, and then see whether that you prefer that situation. How do the memory usage and speed compare? -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 2:11 PM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 21:29, Benjamin Kaplan benjamin.kap...@case.edu wrote: On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wxjmfa...@gmail.com wrote: On 28 mar, 17:33, Ian Kelly ian.g.ke...@gmail.com wrote: On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wxjmfa...@gmail.com wrote: The flexible string representation takes the problem from the other side, it attempts to work with the characters by using their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string ASCII or Latin-1 or UCS-2 for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. - You know, we can discuss this ad nauseam. What is important is Unicode. You have transformed Python back in an ascii oriented product. If Python had imlemented Unicode correctly, there would be no difference in using an a, é, € or any character, what the narrow builds did. If I am practically the only one, who speakes /discusses about this, I can ensure you, this has been noticed. Now, it's time to prepare the Asparagus, the jambon cru and a good bottle a dry white wine. jmf You still have yet to explain how Python's string representation is wrong. Just how it isn't optimal for one specific case. Here's how I understand it: 1) Strings are sequences of stuff. Generally, we talk about strings as either sequences of bytes or sequences of characters. 2) Unicode is a format used to represent characters. Therefore, Unicode strings are character strings, not byte strings. 2) Encodings are functions that map characters to bytes. They typically also define an inverse function that converts from bytes back to characters. 3) UTF-8 IS NOT UNICODE. It is an encoding- one of those functions I mentioned in the previous point. It happens to be one of the five standard encodings that is defined for all characters in the Unicode standard (the others being the little and big endian variants of UTF-16 and UTF-32). 4) The internal representation of a character string DOES NOT MATTER. All that matters is that the API represents it as a string of characters, regardless of the representation. We could implement character strings by putting the Unicode code-points in binary-coded decimal and it would be a Unicode character string. 5) The String type that .NET and Java (and unicode type in Python narrow builds) use is not a character string. It is a string of shorts, each of which corresponds to a UTF-16 code point. I know this is the case because in all of these, the length of \u1f435 is 2 even though it only consists of one character. 6) The new string representation in Python 3.3 can successfully represent all characters in the Unicode standard. The actual number of bytes that each character consumes is invisible to the user. -- I shew enough examples. As soon as you are using non latin-1 chars your optimization just became irrelevant and not only this, you are penalized. I'm sorry, saying Python now is just covering the whole unicode range is not a valuable excuse. I prefer a correct version with a narrower range of chars, especially if this range represents the daily used chars. I can go a step further, if I wish to write an application for Western European users, I'm better served if I'm using a coding scheme covering all thesee languages/scripts. What about cp1252 [*]? Does this not remind somthing? Python can do better, it only succeeds to do worth! [*] yes, I kwnow, internally jmf By that logic, we should all be using ASCII because it's correct for the 127 characters that I (as an English speaker) use, and therefore it's all that we should care about. I don't care if é counts as two characters, it's faster and more memory efficient for all of my strings to just count bytes. There are certain domains where characters outside the basic multilingual plane are used. Python's job is to be correct in all of those circumstances, not just the ones you care about. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Chris Angelico於 2013年3月28日星期四UTC+8上午11時40分17秒寫道: On Thu, Mar 28, 2013 at 2:18 PM, Ethan Furman et...@stoneleaf.us wrote: Has anybody else thought that [jmf's] last few responses are starting to sound bot'ish? Yes, I did wonder. It's like he and Dihedral have been trading accounts sometimes. Hey, Dihedral, I hear there's a discussion of Unicode and PEP 393 and Python 3.3 and Unicode and lots of keywords for you to trigger on and Python and bots are funny and this text is almost grammatical! There. Let's see if he takes the bait. ChrisA Well, we need some cheap ram to hold 4 bytes per character in a text segment to be observed. For those not to be observed or shown, the old way still works. Windows got this job done right to collect taxes in areas of different languages. -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 3/28/2013 4:26 PM, jmfauth wrote: Please provide references for your assertions. I have read the unicode standard, parts more than once, and your assertions contradict my memory. Unicode does not stipulate, one has to cover the whole range. I believe it does. As I remember, the recognized encodings all encode the entire unicode codepoint range Unicode expects that every character in a range behaves the same way. I have no idea what you mean by 'same way'. Each codepoint is supposed to behave differently in some way. That is the reason for having multiple codepoints. One causes an 'a' to appear, another a 'b'. Indeed, the standard define multiple categories of codepoints and chars in different categories are supposed to act differently (or be treated differently). Glyphic chars versus control chars are one example. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 10:53 AM, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: On Wed, 27 Mar 2013 23:12:21 -0700, Ethan Furman et...@stoneleaf.us declaimed the following in gmane.comp.python.general: At some point we have to stop being gentle / polite / politically correct and call a shovel a shovel... er, spade. Call it an Instrument For the Transplantation of Dirt (Is an antique Steam Shovel ever a Steam Spade?) I don't know, but I'm pretty sure there's a private detective who wouldn't appreciate being called Sam Shovel. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 28/03/2013 23:53, Dennis Lee Bieber wrote: On Wed, 27 Mar 2013 23:12:21 -0700, Ethan Furman et...@stoneleaf.us declaimed the following in gmane.comp.python.general: At some point we have to stop being gentle / polite / politically correct and call a shovel a shovel... er, spade. Call it an Instrument For the Transplantation of Dirt (Is an antique Steam Shovel ever a Steam Spade?) Surely you can spade a lot more things than dirt? -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Thu, 28 Mar 2013 10:11:59 -0600, Ian Kelly wrote: On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico ros...@gmail.com wrote: PEP393 strings have two optimizations, or kinda three: 1a) ASCII-only strings 1b) Latin1-only strings 2) BMP-only strings 3) Everything else Options 1a and 1b are almost identical - I'm not sure what the detail is, but there's something flagging those strings that fit inside seven bits. (Something to do with optimizing encodings later?) Both are optimized down to a single byte per character. The only difference for ASCII-only strings is that they are kept in a struct with a smaller header. The smaller header omits the utf8 pointer (which optionally points to an additional UTF-8 representation of the string) and its associated length variable. These are not needed for ASCII-only strings because an ASCII string can be directly interpreted as a UTF-8 string for the same result. The smaller header also omits the wstr_length field which, according to the PEP, differs from length only if there are surrogate pairs in the representation. For an ASCII string, of course there would not be any surrogate pairs. I wonder why they need care about surrogate pairs? ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only strings. It's only strings in the SMPs that could need surrogate pairs, and they don't need them in Python's implementation since it's a full 32- bit implementation. So where do the surrogate pairs come into this? I also wonder why the implementation bothers keeping a UTF-8 representation. That sounds like premature optimization to me. Surely you only need it when writing to a file with UTF-8 encoding? For most strings, that will never happen. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, 28 Mar 2013 12:54:20 -0700, rurpy wrote: Even if you personally would prefer someone to respond by calling you a liar, your personal preferences do not form a basis for desirable posting behavior here. Whereas yours apparently are. Thanks for the feedback, I'll take it under advisement. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only strings. It's only strings in the SMPs that could need surrogate pairs, and they don't need them in Python's implementation since it's a full 32- bit implementation. So where do the surrogate pairs come into this? PEP 393 says: wstr_length, wstr: representation in platform's wchar_t (null-terminated). If wchar_t is 16-bit, this form may use surrogate pairs (in which cast wstr_length differs form length). wstr_length differs from length only if there are surrogate pairs in the representation. utf8_length, utf8: UTF-8 representation (null-terminated). data: shortest-form representation of the unicode string. The string is null-terminated (in its respective representation). All three representations are optional, although the data form is considered the canonical representation which can be absent only while the string is being created. If the representation is absent, the pointer is NULL, and the corresponding length field may contain arbitrary data. If the string was created from a wchar_t string, that string will be retained, and presumably can be used to re-output the original for a clean and fast round-trip. Same with... I also wonder why the implementation bothers keeping a UTF-8 representation. That sounds like premature optimization to me. Surely you only need it when writing to a file with UTF-8 encoding? For most strings, that will never happen. ... the UTF-8 version. It'll keep it if it has it, and not else. A lot of content will go out in the same encoding it came in in, so it makes sense to hang onto it where possible. Though, from the same quote: The UTF-8 representation is null-terminated. Does this mean that it can't be used if there might be a \0 in the string? Minor nitpick, btw: (in which cast wstr_length differs form length) Should be in which case and from. Who has the power to correct typos in PEPs? ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On 29/03/2013 00:54, Chris Angelico wrote: Minor nitpick, btw: (in which cast wstr_length differs form length) Should be in which case and from. Who has the power to correct typos in PEPs? ChrisA Sneak it in here? http://bugs.python.org/issue13604 -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Fri, Mar 29, 2013 at 12:03 PM, Mark Lawrence breamore...@yahoo.co.uk wrote: On 29/03/2013 00:54, Chris Angelico wrote: Minor nitpick, btw: (in which cast wstr_length differs form length) Should be in which case and from. Who has the power to correct typos in PEPs? Sneak it in here? http://bugs.python.org/issue13604 Ah! Turns out it's already been fixed; a reword of that section, as shown in the attached files, no longer has the parenthesis, and thus its typos. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On 29/03/2013 00:54, Chris Angelico wrote: On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only strings. It's only strings in the SMPs that could need surrogate pairs, and they don't need them in Python's implementation since it's a full 32- bit implementation. So where do the surrogate pairs come into this? PEP 393 says: wstr_length, wstr: representation in platform's wchar_t (null-terminated). If wchar_t is 16-bit, this form may use surrogate pairs (in which cast wstr_length differs form length). wstr_length differs from length only if there are surrogate pairs in the representation. utf8_length, utf8: UTF-8 representation (null-terminated). data: shortest-form representation of the unicode string. The string is null-terminated (in its respective representation). All three representations are optional, although the data form is considered the canonical representation which can be absent only while the string is being created. If the representation is absent, the pointer is NULL, and the corresponding length field may contain arbitrary data. If the string was created from a wchar_t string, that string will be retained, and presumably can be used to re-output the original for a clean and fast round-trip. Same with... I also wonder why the implementation bothers keeping a UTF-8 representation. That sounds like premature optimization to me. Surely you only need it when writing to a file with UTF-8 encoding? For most strings, that will never happen. ... the UTF-8 version. It'll keep it if it has it, and not else. A lot of content will go out in the same encoding it came in in, so it makes sense to hang onto it where possible. Though, from the same quote: The UTF-8 representation is null-terminated. Does this mean that it can't be used if there might be a \0 in the string? You could ask the same question about any encoding. It's only an issue if it's passed to a C function which expects a null-terminated string. Minor nitpick, btw: (in which cast wstr_length differs form length) Should be in which case and from. Who has the power to correct typos in PEPs? -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Fri, 29 Mar 2013 11:54:41 +1100, Chris Angelico wrote: On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only strings. It's only strings in the SMPs that could need surrogate pairs, and they don't need them in Python's implementation since it's a full 32- bit implementation. So where do the surrogate pairs come into this? PEP 393 says: wstr_length, wstr: representation in platform's wchar_t (null-terminated). If wchar_t is 16-bit, this form may use surrogate pairs (in which cast wstr_length differs form length). wstr_length differs from length only if there are surrogate pairs in the representation. utf8_length, utf8: UTF-8 representation (null-terminated). data: shortest-form representation of the unicode string. The string is null-terminated (in its respective representation). All three representations are optional, although the data form is considered the canonical representation which can be absent only while the string is being created. If the representation is absent, the pointer is NULL, and the corresponding length field may contain arbitrary data. All the words are in English (well, most of them...) but what does it mean? If the string was created from a wchar_t string, that string will be retained, and presumably can be used to re-output the original for a clean and fast round-trip. Under what circumstances will a string be created from a wchar_t string? How, and why, would such a string be created? Why would Python still support strings containing surrogates when it now has a nice, shiny, surrogate-free flexible representation? I also wonder why the implementation bothers keeping a UTF-8 representation. That sounds like premature optimization to me. Surely you only need it when writing to a file with UTF-8 encoding? For most strings, that will never happen. ... the UTF-8 version. It'll keep it if it has it, and not else. A lot of content will go out in the same encoding it came in in, so it makes sense to hang onto it where possible. Not to me. That almost doubles the size of the string, on the off-chance that you'll need the UTF-8 encoding. Which for many uses, you don't, and even if you do, it seems like premature optimization to keep it around just in case. Encoding to UTF-8 will be fast for small N, and for large N, why carry around (potentially) multiple megabytes of duplicated data just in case the encoded version is needed some time? Though, from the same quote: The UTF-8 representation is null-terminated. Does this mean that it can't be used if there might be a \0 in the string? Minor nitpick, btw: (in which cast wstr_length differs form length) Should be in which case and from. Who has the power to correct typos in PEPs? ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On Fri, Mar 29, 2013 at 1:37 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Under what circumstances will a string be created from a wchar_t string? How, and why, would such a string be created? Why would Python still support strings containing surrogates when it now has a nice, shiny, surrogate-free flexible representation? Strings are created from some form of content. If not from another Python string, then - most likely - it's from a stream of bytes. If from a C API that returns wchar_t, then it'd make sense to have that form around. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Steven D'Aprano: Some string operations need to inspect every character, e.g. str.upper(). Even for them, the increased complexity of a variable-width encoding costs. It's not sufficient to walk the string inspecting a fixed 1, 2 or 4 bytes per character. You have to walk the string grabbing 1 byte at a time, and then decide whether you need another 1, 2 or 3 bytes. Even though it's still O(N), the added bit-masking and overhead of variable- width encoding adds to the overall cost. It does add to implementation complexity but should only add a small amount of time. To compare costs, I am using the text of the web site http://www.mofa.go.jp/mofaj/ since it has a reasonable amount (10%) of multi-byte characters. Since the document fits in the the BMP, Python would choose a 2-byte wide implementation so I am emulating that choice with a very simple 16-bit table-based upper-caser. Real Unicode case conversion code is more concerned with edge cases like Turkic and Lithuanian locales and Greek combining characters and also allowing for measurement/reallocation for the cases where the result is smaller/larger. See, for example, glib's real_toupper in https://git.gnome.org/browse/glib/tree/glib/guniprop.c Here is some simplified example code that implements upper-casing over 16-bit wide (utf16_up) and UTF-8 (utf8_up) buffers: http://www.scintilla.org/UTF8Up.cxx Since I didn't want to spend too much time writing code it only handles the BMP and doesn't have upper-case table entries outside ASCII for now. If this was going to be worked on further to be made maintainable, most of the masking and so forth would be in macros similar to UTF8_COMPUTE/UTF8_GET in glib. The UTF-8 case ranges from around 5% slower on average in a 32 bit release build (VC2012 on an i7 870) to averaging a little faster in a 64-bit build. They're both around a billion characters per-second. C:\u\hg\UpUTF\UpUTF..\x64\Release\UpUTF.exe Time taken for UTF8 of 80449=0.006528 Time taken for UTF16 of 71525=0.006610 Relative time taken UTF8/UTF16 0.987581 Any string method that takes a starting offset requires the method to walk the string byte-by-byte. I've even seen languages put responsibility for dealing with that onto the programmer: the start offset is given in *bytes*, not characters. I don't remember what language this was... it might have been Haskell? Whatever it was, it horrified me. It doesn't horrify me - I've been working this way for over 10 years and it seems completely natural. You can wrap access in iterators that hide the byte offsets if you like. This then ensures that all operations on those iterators are safe only allowing the iterator to point at the start/end of valid characters. Sure. And over a different set of samples, it is less compact. If you write a lot of Latin-1, Python will use one byte per character, while UTF-8 will use two bytes per character. I think you mean writing a lot of Latin-1 characters outside ASCII. However, even people writing texts in, say, French will find that only a small proportion of their text is outside ASCII and so the cost of UTF-8 is correspondingly small. The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string. Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
MRAB: Implementing the regex module (http://pypi.python.org/pypi/regex) would have been more difficult if the internal representation had been UTF-8, because of the need to decode, and the implementation would also have been slower for that reason. One way to build regex support for UTF-8 is to build a fixed width version of the regex code and then interpose an object that converts between the UTF-8 representation and that code. The C++11 standard library contains a regex template that can be instantiated over a UTF-8 representation in this way. Neil -- http://mail.python.org/mailman/listinfo/python-list
unicode and the FSR [was: Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]
On 03/28/2013 08:34 PM, Neil Hodgson wrote: Steven D'Aprano: Any string method that takes a starting offset requires the method to walk the string byte-by-byte. I've even seen languages put responsibility for dealing with that onto the programmer: the start offset is given in *bytes*, not characters. I don't remember what language this was... it might have been Haskell? Whatever it was, it horrified me. It doesn't horrify me - I've been working this way for over 10 years and it seems completely natural. Horrifying or not, I am willing to give up a small amount of speed for correctness. Heck, I'm willing to give up a lot of speed for correctness. Once I have my slow but correct prototype going I can recode in a faster language (if needed) and compare it's blazingly fast output with my slowly-generated but known-good output. You can wrap access in iterators that hide the byte offsets if you like. This then ensures that all operations on those iterators are safe only allowing the iterator to point at the start/end of valid characters. Sure. Or I can let Python handle it for me. The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string. True. But how often do you have the entire document as a single string? Use readlines() instead of read(). Besides, memory is cheap. -- ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Fri, Mar 29, 2013 at 2:34 PM, Neil Hodgson nhodg...@iinet.net.au wrote: It doesn't horrify me - I've been working this way for over 10 years and it seems completely natural. You can wrap access in iterators that hide the byte offsets if you like. This then ensures that all operations on those iterators are safe only allowing the iterator to point at the start/end of valid characters. But both this and your example of case conversion are, fundamentally, iterating over the string. What if you aren't doing that? What if you want to parse and process? ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
Chris Angelico: But both this and your example of case conversion are, fundamentally, iterating over the string. What if you aren't doing that? What if you want to parse and process? Parsing is also normally a scanning operation. If you want to process pieces of the string based on the parse then you remember the positions (as iterators) at the significant places and extract/process the data based on those positions. Neil -- http://mail.python.org/mailman/listinfo/python-list
flaming vs accuracy [was Re: Performance of int/long in Python 3]
On 03/27/2013 06:47 PM, Steven D'Aprano wrote: On Wed, 27 Mar 2013 11:51:07 +, Mark Lawrence defending an unproductive post flaming a troll: I wouldn't call it unproductive -- a half-dozen amusing posts followed because of Mark's initial post, and they were a great relief from the tedium and (dare I say it?) idiocy of jmf's posts. He's not going to change so neither am I. He's a troll disrupting the newsgroup, therefore I'm going to be a troll disrupting the newsgroup too, so nyah!!! So long as Mark doesn't start cussing and swearing I'm not going to get worked up about it. I find jmf's posts for more aggravating. I also suggest you go and moan at Steven D'Aprano who called the idiot a liar. Although thinking about it, I prefer Steven's comment to my own as being more accurate. Yes I did, I suggest you reflect on the difference in content between your post and mine, and why yours can be described as abusive flaming and mine shouldn't be. Mark's post was not, in my not-so-humble opinion, abusive. jmf's (again IMNSHO) was. Your post (Steven's) was possibly more accurate, but Mark's was more amusing, and generated more amusing responses. Clearly, jmf is not going to change his thread-hijacking unicode-whining behavior, whether faced with the cold rational responses or the hotter fed-up responses. So I guess what I'm saying is: Don't Feed The Trolls (Anyone!) ;) Of course, somebody still has to reply so a newcomer doesn't get taken in by him. Has anybody else thought that his last few responses are starting to sound bot'ish? -- ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Thu, Mar 28, 2013 at 2:18 PM, Ethan Furman et...@stoneleaf.us wrote: Has anybody else thought that [jmf's] last few responses are starting to sound bot'ish? Yes, I did wonder. It's like he and Dihedral have been trading accounts sometimes. Hey, Dihedral, I hear there's a discussion of Unicode and PEP 393 and Python 3.3 and Unicode and lots of keywords for you to trigger on and Python and bots are funny and this text is almost grammatical! There. Let's see if he takes the bait. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Mar 28, 8:18 am, Ethan Furman et...@stoneleaf.us wrote: So long as Mark doesn't start cussing and swearing I'm not going to get worked up about it. I find jmf's posts for more aggravating. I support Ned's original gentle reminder -- Please be civil irrespective of surrounding nonsensical behavior. In particular You are a liar is as bad as You are an idiot The same statement can be made non-abusively thus: ... is not true because ... -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote: On Mar 28, 8:18 am, Ethan Furman et...@stoneleaf.us wrote: So long as Mark doesn't start cussing and swearing I'm not going to get worked up about it. I find jmf's posts for more aggravating. I support Ned's original gentle reminder -- Please be civil irrespective of surrounding nonsensical behavior. In particular You are a liar is as bad as You are an idiot The same statement can be made non-abusively thus: ... is not true because ... I accept that criticism, even if I disagree with it. Does that make sense? I mean it in the sense that I accept that your opinion differs from mine. Politeness does not always trump honesty, and stating that somebody's statement is not true because... is not the same as stating that they are deliberately telling lies (rather than merely being mistaken or confused). The world is full of people who deliberately and in complete awareness of what they are doing lie in order to further their agenda, or for profit, or to feel good about themselves, or to harm others. There comes a time where politely ignoring the elephant in the room (the dirty, rotten, lying scoundrel of an elephant) and giving them the benefit of the doubt simply makes life worse for everyone except the liars. We all know this. Unless you've been living in a cave on the top of some mountain, we all know people whose relationship to the truth is, shall we say, rather bendy. And yet we collectively muddy the water and inject uncertainty into debate by politely going along with their lies, or at least treating them with dignity that they don't deserve, by treating them as at worst a matter of honest misunderstanding or even mere difference of opinion. As an Australian, I am constitutionally required to call a spade a bloody shovel at least twice a week, so I have no regrets. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
On Mar 28, 10:20 am, Steven D'Aprano steve +comp.lang.pyt...@pearwood.info wrote: On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote: On Mar 28, 8:18 am, Ethan Furman et...@stoneleaf.us wrote: So long as Mark doesn't start cussing and swearing I'm not going to get worked up about it. I find jmf's posts for more aggravating. I support Ned's original gentle reminder -- Please be civil irrespective of surrounding nonsensical behavior. In particular You are a liar is as bad as You are an idiot The same statement can be made non-abusively thus: ... is not true because ... I accept that criticism, even if I disagree with it. Does that make sense? I mean it in the sense that I accept that your opinion differs from mine. Politeness does not always trump honesty, and stating that somebody's statement is not true because... is not the same as stating that they are deliberately telling lies (rather than merely being mistaken or confused). The world is full of people who deliberately and in complete awareness of what they are doing lie in order to further their agenda, or for profit, or to feel good about themselves, or to harm others. There comes a time where politely ignoring the elephant in the room (the dirty, rotten, lying scoundrel of an elephant) and giving them the benefit of the doubt simply makes life worse for everyone except the liars. We all subscribe to legal systems that decide the undecidable; eg. A pulled out a gun and killed B. Was it murder, manslaughter, just a mistake, euthanasia? Any lawyer with experience knows that horrible mistakes happen in making these decisions; yet they (the judges) need to make them. For the purposes of the python list these ascriptions to personal motives are OT enough to be out of place. We all know this. Unless you've been living in a cave on the top of some mountain, we all know people whose relationship to the truth is, shall we say, rather bendy. And yet we collectively muddy the water and inject uncertainty into debate by politely going along with their lies, or at least treating them with dignity that they don't deserve, by treating them as at worst a matter of honest misunderstanding or even mere difference of opinion. As an Australian, I am constitutionally required to call a spade a bloody shovel at least twice a week, so I have no regrets. If someone has got physically injured by the spade then its a bloody spade; else you are a bloody liar :-) Well… More seriously Ive never seen anyone -- cause or person -- aided by the use of excessively strong language. IOW I repeat my support for Ned's request: Ad hominiem attacks are not welcome, irrespective of the context/provocation. -- http://mail.python.org/mailman/listinfo/python-list