Re: same code to login,one is ok,another is not
First one is using http and second one https, did you try an https handler? as I already pointed out to you in other thread with the same topic... Please don't spam the list, if you aren 't getting the answers you was looking for, wait for a while and then repost not just open threads until get the answer 2011/8/15 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info 守株待兔 wrote: 1.http://www.renren.com/Login.do it is ok,my code: [...] 2.https://passport.baidu.com/?login can't login,my code: [...] Do you have a question, or are you just sharing the bad news? Websites may choose to respond to login attempts differently. Some may require cookies, some may not. Some may check the referrer, some may not. Some may look at the user agent, some may not. If the web developer of the site insists that you log in with a browser, or Internet Explorer, you have to fight to convince the web server to let you in. Many websites really try hard to prevent bots and scripts logging in. The closer you can imitate what a real human being in a browser does, the better the chances you can fool the server that you are a real human being using a browser and not a bot. (Since your script *is* a bot, you may also be in violation of the web site's terms of service.) Some web sites may even check how often you try to log in, or how fast. But what makes you think you can't log in? Given the response below, it looks to me that you did log in, and got a blank page with some javascript to redirect you to the real content page. (If you are a web developer and you do this, I hate you.) But I may be wrong -- I'm not an expert on these things. !--STATUS OK-- htmlheadtitle��§ /title meta http-equiv=content-type content=text/html; charset=gb2312 META http-equiv='Pragma' content='no-cache' /head body script var url=./?pwd=1 url=url.replace(/^\.\//gi,http://passport.baidu.com/;); location.href=url; /script /body /html -- Steven -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On 8/15/2011 12:28 AM, Seebs wrote: To repeat again: you are free to put in explicit dedent markers that will let you re-indent code should all indents be removed. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Help needed with using SWIG wrapped code in Python
Hi, I have wrapped a library from C++ to Python using SWIG. But I am facing problems while importing and using it in Python. $ python import pyossimtest import pyossim a = [Image1.png,Image2.png] b = pyossimtest.Info() b.initialize(len(a),a) Traceback (most recent call last): File stdin, line 1, in module File pyossimtest.py, line 84, in initialize def initialize(self, *args): return _pyossimtest.Info_initialize(self, *args) TypeError: in method 'Info_initialize', argument 3 of type 'char *[]' What does this error message imply? I have already handled char** as a special case in swig using typemaps. Here is the code excerpt from the swig-generated .cxx file: SWIGINTERN PyObject *_wrap_Info_initialize(PyObject *SWIGUNUSEDPARM(self), PyObject *args) { PyObject *resultobj = 0; pyossimtest::Info *arg1 = (pyossimtest::Info *) 0 ; int arg2 ; char **arg3 ; void *argp1 = 0 ; int res1 = 0 ; PyObject * obj0 = 0 ; PyObject * obj1 = 0 ; bool result; if (!PyArg_ParseTuple(args,(char *)OO:Info_initialize,obj0,obj1)) SWIG_fail; res1 = SWIG_ConvertPtr(obj0, argp1,SWIGTYPE_p_pyossimtest__Info, 0 | 0 ); if (!SWIG_IsOK(res1)) { SWIG_exception_fail(SWIG_ArgError(res1), in method ' Info_initialize ', argument 1 of type ' pyossimtest::Info *'); } arg1 = reinterpret_cast pyossimtest::Info * (argp1); { int i; if (!PyList_Check(obj1)) { PyErr_SetString(PyExc_ValueError, Expecting a list); return NULL; } arg2 = PyList_Size(obj1); arg3 = (char **) malloc((arg2+1)*sizeof(char *)); for (i = 0; i arg2; i++) { PyObject *s = PyList_GetItem(obj1,i); if (!PyString_Check(s)) { free(arg3); PyErr_SetString(PyExc_ValueError, List items must be strings); return NULL; } arg3[i] = PyString_AsString(s); } arg3[i] = 0; } { try { result = (bool)(arg1)-initialize(arg2,arg3); } catch (const std::exception e) { SWIG_exception(SWIG_RuntimeError, e.what()); } } resultobj = SWIG_From_bool(static_cast bool (result)); { if (arg3) free(arg3); } return resultobj; fail: { if (arg3) free(arg3); } return NULL; } Kindly help. Thanks and regards, Vipul Raheja -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On Mon, Aug 15, 2011 at 5:28 AM, Seebs usenet-nos...@seebs.net wrote: Character stream: tab tab tab foo newline tab bar. This is, as you say, *usually* two dedents, but it could be one. I see your point, though I cannot imagine anyone who would use tab tab as an indent level. But if you go from 16 spaces down to 8, it's possible that the script uses eight space indents, or four. On Mon, Aug 15, 2011 at 8:31 AM, Terry Reedy tjre...@udel.edu wrote: To repeat again: you are free to put in explicit dedent markers that will let you re-indent code should all indents be removed. This would be a solution to the above, but it has the feeling of syntactic salt. (I don't believe that braces are, because they afford a different form of flexibility.) But sure, if you configure your editor to do it for you. I type: if (blah): It puts: if (blah): | # if with the cursor at the | marker. I never really got used to editors doing this for me, though. It didn't feel right. I prefer an editor that deals with my indentation but lets me do the rest; when I hit enter, it autoindents to either the current indent level or one greater, depending on whether it looks like there ought to be an indent (which mainly happens when I put a loose { on a line). Similarly, when I put a } into the file, it removes an indent level automatically. Still, it wouldn't be hard to make an editor put those dedent comments in, if you want them. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: pythonw.exe
On Mon, Aug 15, 2011 at 3:14 AM, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: Depends... DOS, to me, is just short for Disk Operating System... I've source code (in a book) for K2FDOS, source code for LS-DOS 6, and have used the AmigaDOS component of AmigaOS (granted -- AmigaDOS technically was the part of the OS that gave access to the I/O system, and included the command line interpreter...). DOS does not automatically mean MicroSoft DOS... I would say that DOS can, in a Windows context, mean either MS-DOS or a generic Disk Operating System. The latter sense is no more appropriate to the CLI than the former; in a modern OS, the part that truly operates the disk would be either the kernel or the hard disk driver, depending on your point of view, and neither of those has any sort of UI. What most call DOS is, to me, merely a command line interpreter (CLI). And that's really what we have. A shell. A CLI. A textual command parser (as opposed to a graphical action system which is what most GUIs are). It's more similar to a MUD than to an operating system - first space-separated word is a verb, everything else is modifiers. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On Aug 14, 2011 3:24 PM, Seebs usenet-nos...@seebs.net wrote: ... I'm not impressed by arguments based on but if I do something stupid, like select text with my eyes closed and reindent it without looking, I expect the compiler to save my bacon. In my opinion, it's not the compiler's job to protect you from errors caused by sheer carelessness at the keyboard. I don't know about sheer carelessness. Typos happen. Typos are not something you can prevent from happening just by wanting it very much. If you have valid code caused by improper indentation, shouldn't that be caught by a good set of unit tests? -- http://mail.python.org/mailman/listinfo/python-list
surprising interaction between function scope and class namespace
Hi, I just stumbled over this: A = 1 def foo(x): ... A = x ... class X: ... a = A ... return X ... foo(2).a 2 def foo(x): ... A = x ... class X: ... A = A ... return X ... foo(2).A 1 Works that way in Py2.7 and Py3.3. I couldn't find any documentation on this, but my *guess* about the reasoning is that the second case contains an assignment to A inside of the class namespace, and assignments make a variable local to a scope, in this case, the function scope. Therefore, the A on the rhs is looked up in that scope as well. However, this is just a totally hand waving guess. Does anyone have a better explanation or know of a place where this specific behaviour is documented? Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: surprising interaction between function scope and class namespace
Stefan Behnel, 15.08.2011 11:33: I just stumbled over this: A = 1 def foo(x): ... A = x ... class X: ... a = A ... return X ... foo(2).a 2 def foo(x): ... A = x ... class X: ... A = A ... return X ... foo(2).A 1 Works that way in Py2.7 and Py3.3. I couldn't find any documentation on this, but my *guess* about the reasoning is that the second case contains an assignment to A inside of the class namespace, and assignments make a variable local to a scope, in this case, the function scope. Therefore, the A on the rhs is looked up in that scope as well. However, this is just a totally hand waving guess. ... and an incorrect one, as it turns out. I think I misinterpreted the results the wrong way around. Still: Does anyone have a better explanation or know of a place where this specific behaviour is documented? Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Help needed with using SWIG wrapped code in Python
Vipul Raheja, 15.08.2011 10:08: I have wrapped a library from C++ to Python using SWIG. But I am facing problems while importing and using it in Python. $ python import pyossimtest import pyossim a = [Image1.png,Image2.png] b = pyossimtest.Info() b.initialize(len(a),a) Traceback (most recent call last): File stdin, line 1, inmodule File pyossimtest.py, line 84, in initialize def initialize(self, *args): return _pyossimtest.Info_initialize(self, *args) TypeError: in method 'Info_initialize', argument 3 of type 'char *[]' What does this error message imply? I have already handled char** as a special case in swig using typemaps. I have little experience with SWIG, so I can't comment much on the problem at hand, but what I can do is to encourage you to use Cython instead. It's faster, easier to use and much more versatile for writing Python wrappers than SWIG. Basically, it provides you with the full power and flexibility of a programming language, whereas SWIG (like all automatic wrapper generators) is always limiting because it has its predefined ways of wrapping things, and if they don't fit, you're on your own fighting up-hill against it. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: surprising interaction between function scope and class namespace
Stefan Behnel stefan...@behnel.de wrote: I couldn't find any documentation on this, but my *guess* about the reasoning is that the second case contains an assignment to A inside of the class namespace, and assignments make a variable local to a scope, in this case, the function scope. Therefore, the A on the rhs is looked up in that scope as well. However, this is just a totally hand waving guess. Does anyone have a better explanation or know of a place where this specific behaviour is documented? If it was a function rather than a class then in the first case you look up the non-local variable as expected and in the second case you get an UnboundLocalError. The only difference with the class definition is that instead of UnboundLocalError the lookup falls back to the global scope. This happens because class definitions use LOAD_NAME/STORE_NAME instead of LOAD_FAST/STORE_FAST and LOAD_NAME looks first in local scope and then in global. I suspect that http://docs.python.org/reference/executionmodel.html#naming-and-binding should say something about this but it doesn't. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: surprising interaction between function scope and class namespace
Stefan Behnel wrote: Hi, I just stumbled over this: A = 1 def foo(x): ... A = x ... class X: ... a = A ... return X ... foo(2).a 2 def foo(x): ... A = x ... class X: ... A = A ... return X ... foo(2).A 1 That's subtle. Works that way in Py2.7 and Py3.3. I couldn't find any documentation on this, but my *guess* about the reasoning is that the second case contains an assignment to A inside of the class namespace, and assignments make a variable local to a scope, in this case, the function scope. Therefore, the A on the rhs is looked up in that scope as well. However, this is just a totally hand waving guess. Does anyone have a better explanation or know of a place where this specific behaviour is documented? I think it's an implementation accident. Classes have a special opcode, LOAD_NAME, that allows for x = 42 class A: ... x = x ... A.x 42 which would fail in a function def f(): ... x = x ... f() Traceback (most recent call last): File stdin, line 1, in module File stdin, line 2, in f UnboundLocalError: local variable 'x' referenced before assignment LOAD_NAME is pretty dumb, it looks into the local namespace and if that lookup fails falls back to the global namespace. Someone probably thought I can do better, and reused the static name lookup for nested functions for names that occur only on the right-hand side of assignments in a class. Here's a slightly modified version of your demo: x = global def foo(): ... x = local ... class A: ... x = x ... return A ... def bar(): ... x = local ... class A: ... y = x ... return A ... foo().x 'global' bar().y 'local' Now let's have a glimpse at the bytecode: import dis foo.func_code.co_consts (None, 'local', 'A', code object A at 0x7ffe311bdb70, file stdin, line 3, ()) dis.dis(foo.func_code.co_consts[3]) 3 0 LOAD_NAME0 (__name__) 3 STORE_NAME 1 (__module__) 4 6 LOAD_NAME2 (x) 9 STORE_NAME 2 (x) 12 LOAD_LOCALS 13 RETURN_VALUE bar.func_code.co_consts (None, 'local', 'A', code object A at 0x7ffe311bd828, file stdin, line 3, ()) dis.dis(bar.func_code.co_consts[3]) 3 0 LOAD_NAME0 (__name__) 3 STORE_NAME 1 (__module__) 4 6 LOAD_DEREF 0 (x) 9 STORE_NAME 2 (y) 12 LOAD_LOCALS 13 RETURN_VALUE -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On 08/14/2011 11:28 PM, Seebs wrote: I tend to write stuff like foo.array_of_things.sort.map { block }.join(, ) I like this a lot more than array = foo.array_of_things sorted_array = array.sort() mapped_array = [block(x) for x in sorted_array] , .join(mapped_array) If you like the one-liner, this is readily written as , .join(block(x) for x in sorted(foo.array_of_things)) Modulo your gripes about string.join(), this is about as succinct (and more readable, IMHO) as your initial example. I've got piles of these sorts of things in my ETL code. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On 2011-08-14, Chris Angelico ros...@gmail.com wrote: On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong irmen.nos...@xs4all.nl wrote: On 14-8-2011 7:57, rantingrick wrote: 8. Use e.g. as many times as you can! (e.g. e.g.) If you use e.g. more than ten times in a single post, you will get an invite to Guido's next birthday party; where you'll be forced to do shots whist walking the balcony railing wearing wooden shoes! I lolled about this one, e.g. I laughed out loud. But where are the tulips and windmills for extra credit? Greetings from a Dutchman! No credit. E.g., i.e., exampla gratis, means, for example. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: Help needed with using SWIG wrapped code in Python
On Aug 15, 2011, at 4:08 AM, Vipul Raheja wrote: Hi, I have wrapped a library from C++ to Python using SWIG. But I am facing problems while importing and using it in Python. Hi Vipul, Did you try asking about this on the SWIG mailing list? bye Philip -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
Chris Angelico ros...@gmail.com writes: Why is left-to-right inherently more logical than multiplication-before-addition? Why is it more logical than right-to-left? And why is changing people's expectations more logical than fulfilling them? Python uses the + and - symbols to mean addition and subtraction for good reason. Let's not alienate the mathematical mind by violating this rule. It would be far safer to go the other way and demand parentheses on everything. I'm a clearly a fool for allowing myself to be drawn into this thread, but I've been playing a lot recently with the APL-derivative language J, which uses a right-to-left operator precendence rule. Pragmatically, this is because J defines roughly a bajillion operators, and it would be impossible to remember the precendence of them all, but it makes sense in its own way. If you read 3 * 10 + 7, using right-to-left, you get three times something. Then you read more and you get three times (ten plus something). And finally, you get 3*(10+7). The prefix gives the continuation for the rest of the calculation; no matter what you substitute for X in 3*X, you will always just evaluate X, then multply it by 3. Likewise, for 3*10+X, no matter what X is, you know you'll add 10 and multiply by 3. This took me a while to get used to, but it's definitely a nice property. Not much to do with python, but I do like the syntax enough that I've implemented my own toy evaluator for J-like expressions in python, to get around the verbosity of some bits of numpy. Regards, Johann -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
Seebs wrote: I tend to write stuff like foo.array_of_things.sort.map { block }.join(, ) I like this a lot more than array = foo.array_of_things sorted_array = array.sort() mapped_array = [block(x) for x in sorted_array] , .join(mapped_array) If you insist on a one-liner for four separate operations, what's wrong with this? , .join([block(x) for x in sorted(foo.array_of_things)]) Or if you prefer map: , .join(map(block, sorted(foo.array_of_things)) I think I would be less skeptical about fluent interfaces if they were written more like Unix shell script pipelines instead of using attribute access notation: foo.array_of_things | sort | map block | join , -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
In article mailman.2233.1313179799.1164.python-l...@python.org, Chris Angelico ros...@gmail.com wrote: Python uses the + and - symbols to mean addition and subtraction for good reason. Let's not alienate the mathematical mind by violating this rule. Computer programming languages follow math conventions only in the most vague ways. For example, standard math usage dictates that addition is commutative. While this is true for adding integers, it's certainly not true for adding strings (in any language which supports string addition). Where to draw the line between math and programming languages is not an easy question. It would be far safer to go the other way and demand parentheses on everything. Demand, no, but sometimes it's a good idea. I've been writing computer programs for close to 40 years, and I still have no clue what most of the order of operations is. It's just not worth investing the brain cells to remember such trivia (especially since the details change from language to language). Beyond remembering the (apparently) universal rule that {*, /} bind tighter than {+, -}, I pretty much just punt on everything else and put in extra parens everywhere. It's not the most efficient way to write code, and probably doesn't even result in the prettiest code. But it sure does eliminate those face-palm moments at the end of a long debugging session when you realize that somebody got it wrong. -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On Mon, Aug 15, 2011 at 2:41 PM, Roy Smith r...@panix.com wrote: Demand, no, but sometimes it's a good idea. I've been writing computer programs for close to 40 years, and I still have no clue what most of the order of operations is. It's just not worth investing the brain cells to remember such trivia (especially since the details change from language to language). Beyond remembering the (apparently) universal rule that {*, /} bind tighter than {+, -}, I pretty much just punt on everything else and put in extra parens everywhere. Understandable. I go the other way, though, and keep an operator precedence table for each language handy; often, what I'm after is not which one binds more tightly, but what's the symbol for modulo, which is also (usually) on that same table. Or: Blasted PHP, which operators have precedence between || and or? which is easy to forget. And you're right about the details changing from language to language, hence the operators table *for each language*. But most languages follow fairly sane rules, and tend to come up with pretty much the same ordering. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
Roy Smith wrote: Computer programming languages follow math conventions only in the most vague ways. For example, standard math usage dictates that addition is commutative. While this is true for adding integers, it's certainly not true for adding strings (in any language which supports string addition). Not quite true for maths either, at least in principle. I'm not aware of any number types where addition is non-commutative, but subtraction is noncommutative even for integers, and noncommutative multiplication is quite common (e.g. matrix multiplication). And of course, once you start using floating point numbers, you can't assume commutativity: 0.1 + 0.7 + 0.3 == 0.3 + 0.7 + 0.1 False I'm reminded of this quote from John Baez: The real numbers are the dependable breadwinner of the family, the complete ordered field we all rely on. The complex numbers are a slightly flashier but still respectable younger brother: not ordered, but algebraically complete. The quaternions, being noncommutative, are the eccentric cousin who is shunned at important family gatherings. But the octonions are the crazy old uncle nobody lets out of the attic: they are nonassociative. (And don't even ask about the sedenions...) -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On Mon, Aug 15, 2011 at 3:28 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: And of course, once you start using floating point numbers, you can't assume commutativity: 0.1 + 0.7 + 0.3 == 0.3 + 0.7 + 0.1 False This isn't because programming languages fail to follow mathematics; it's because floating point numbers do not represent real numbers. Python doesn't support substring removal using the subtraction operator, but I'd have to say that floats more closely parallel strings and other high level objects than they do mathematical reals. If Python treated __sub__(str,str) as str.replace(str,) then: hello world + asdfqwer - d hello worlasfqwer hello world - d + asdfqwer hello worlasdfqwer Nobody would expect strings to behave mathematically with subtraction, because negatives don't make sense. Even sets don't quite work, although they're closer: set(asdf)-set(test) {'a', 'd', 'f'} There's no way, in a set, to show a negative reference to 't' and 'e'. In theory you could do this with dictionaries or collections.Counter, but subtracting a Counter from a Counter doesn't produce negative numbers either. No, these constructs do not subtract algebraically, and I do not think it would be any improvement to the language if they did. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
string to unicode
if I am using the standard csv library to read contents of a csv file which contains Unicode strings (short example: '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such as decode or encode to transform this string type into a python unicode type? Must I know the encoding (byte groupings) of the Unicode? Can I get this from the file? Perhaps I need to open the file with particular attributes? thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: string to unicode
On Mon, Aug 15, 2011 at 4:20 PM, Artie Ziff artie.z...@gmail.com wrote: if I am using the standard csv library to read contents of a csv file which contains Unicode strings (short example: '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such as decode or encode to transform this string type into a python unicode type? Must I know the encoding (byte groupings) of the Unicode? Can I get this from the file? Perhaps I need to open the file with particular attributes? Start here: http://www.joelonsoftware.com/articles/Unicode.html The CSV file, being stored on disk, cannot contain Unicode strings; it can only contain bytes. If you know the encoding (eg UTF-8, UCS-2, etc), then you can decode it using that. If you don't, your best bet is to ask the origin of the file; failing that, check the first few bytes - if it's \xFF\xFE or \xFE\xFF or \xEF\xBB\xBF, then it's probably UTF-16LE, UTF-16BE, or UTF-8, respectively (those being the encodings of the BOM). There may be other clues, too, but normally it's best to get the encoding separately from the data rather than try to decode it from the data itself. Chris Angelico -- http://mail.python.org/mailman/listinfo/python-list
Re: string to unicode
On Mon, 2011-08-15 at 08:20 -0700, Artie Ziff wrote: if I am using the standard csv library to read contents of a csv file which contains Unicode strings (short example: '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such as decode or encode to transform this string type into a python unicode type? Must I know the encoding (byte groupings) of the Unicode? Can I get this from the file? Perhaps I need to open the file with particular attributes? Open the file with a codec and pass that file-like object to csv. codecs.open(filename, mode[, encoding[, errors[, buffering]]]) http://docs.python.org/library/codecs.html#codec-objects -- Adam Tauno Williams awill...@whitemice.org LPIC-1, Novell CLA http://www.whitemiceconsulting.com OpenGroupware, Cyrus IMAPd, Postfix, OpenLDAP, Samba -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On Mon, Aug 15, 2011 at 9:06 AM, Neil Cerutti ne...@norwich.edu wrote: On 2011-08-14, Chris Angelico ros...@gmail.com wrote: On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong irmen.nos...@xs4all.nl wrote: On 14-8-2011 7:57, rantingrick wrote: 8. Use e.g. as many times as you can! (e.g. e.g.) If you use e.g. more than ten times in a single post, you will get an invite to Guido's next birthday party; where you'll be forced to do shots whist walking the balcony railing wearing wooden shoes! I lolled about this one, e.g. I laughed out loud. But where are the tulips and windmills for extra credit? Greetings from a Dutchman! No credit. E.g., i.e., exampla gratis, means, for example. The correct spelling is 'exempli gratia'. It's Latin. i.e., on the other hand, comes from 'id est' ('that is'). Latin too. Regards, Lucio -- http://mail.python.org/mailman/listinfo/python-list
Re: Java is killing me! (AKA: Java for Pythonheads?)
On Fri, 12 Aug 2011 17:02:38 +, kj wrote: *Please* forgive me for asking a Java question in a Python forum. My only excuse for this no-no is that a Python forum is more likely than a Java one to have among its readers those who have had to deal with the same problems I'm wrestling with. Due to my job, I have to port some Python code to Java, and write tests for the ported code. (Yes, I've considered finding myself another job, but this is not an option in the immediate future.) Can't you sidestep the porting effort and try to run everything in Jython on the JVM? -dirk (Python lurker with Java experience) -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On 2011-08-15, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Seebs wrote: I tend to write stuff like foo.array_of_things.sort.map { block }.join(, ) I like this a lot more than array = foo.array_of_things sorted_array = array.sort() mapped_array = [block(x) for x in sorted_array] , .join(mapped_array) If you insist on a one-liner for four separate operations, what's wrong with this? , .join([block(x) for x in sorted(foo.array_of_things)]) Nothing in particular; I was just contrasting two styles, not asserting that Python couldn't do that. In general, I don't like to do things that either involve making a lot of variables that are assigned to once and then read from once, or making a whole lot of x = foo(x) type assignments to one variable. It feels cluttered to me. I think I would be less skeptical about fluent interfaces if they were written more like Unix shell script pipelines instead of using attribute access notation: foo.array_of_things | sort | map block | join , Interesting! I think that's probably why I find them so comfortable; shell was one of the first languages I got serious about. -s -- Copyright 2011, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.net http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated! I am not speaking for my employer, although they do rent some of my opinions. -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On 2011-08-15, Roy Smith r...@panix.com wrote: Demand, no, but sometimes it's a good idea. I've been writing computer programs for close to 40 years, and I still have no clue what most of the order of operations is. It's just not worth investing the brain cells to remember such trivia (especially since the details change from language to language). Beyond remembering the (apparently) universal rule that {*, /} bind tighter than {+, -}, I pretty much just punt on everything else and put in extra parens everywhere. It's not the most efficient way to write code, and probably doesn't even result in the prettiest code. But it sure does eliminate those face-palm moments at the end of a long debugging session when you realize that somebody got it wrong. Wholehearted agreement. It is conceivable for me to misremember precedence. I am pretty reliable at recognizing which things are in which parens. So I use them even in obvious cases: foo + (3 * 4) Never regretted that. Yes, it's extra typing, a little, but it prevents a whole category of bugs. -s -- Copyright 2011, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.net http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated! I am not speaking for my employer, although they do rent some of my opinions. -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On 15/08/2011 17:18, Lucio Santi wrote: On Mon, Aug 15, 2011 at 9:06 AM, Neil Cerutti ne...@norwich.edu mailto:ne...@norwich.edu wrote: On 2011-08-14, Chris Angelico ros...@gmail.com mailto:ros...@gmail.com wrote: On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong irmen.nos...@xs4all.nl mailto:irmen.nos...@xs4all.nl wrote: On 14-8-2011 7:57, rantingrick wrote: 8. Use e.g. as many times as you can! (e.g. e.g.) If you use e.g. more than ten times in a single post, you will get an invite to Guido's next birthday party; where you'll be forced to do shots whist walking the balcony railing wearing wooden shoes! I lolled about this one, e.g. I laughed out loud. But where are the tulips and windmills for extra credit? Greetings from a Dutchman! No credit. E.g., i.e., exampla gratis, means, for example. The correct spelling is 'exempli gratia'. It's Latin. i.e., on the other hand, comes from 'id est' ('that is'). Latin too. I remember reading a book about polymorphism in programming. The author said it came from Latin. Nope. -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On 2011-08-15, MRAB pyt...@mrabarnett.plus.com wrote: On 15/08/2011 17:18, Lucio Santi wrote: On Mon, Aug 15, 2011 at 9:06 AM, Neil Cerutti ne...@norwich.edu mailto:ne...@norwich.edu wrote: On 2011-08-14, Chris Angelico ros...@gmail.com mailto:ros...@gmail.com wrote: On Sun, Aug 14, 2011 at 2:21 PM, Irmen de Jong irmen.nos...@xs4all.nl mailto:irmen.nos...@xs4all.nl wrote: On 14-8-2011 7:57, rantingrick wrote: 8. Use e.g. as many times as you can! (e.g. e.g.) If you use e.g. more than ten times in a single post, you will get an invite to Guido's next birthday party; where you'll be forced to do shots whist walking the balcony railing wearing wooden shoes! I lolled about this one, e.g. I laughed out loud. But where are the tulips and windmills for extra credit? Greetings from a Dutchman! No credit. E.g., i.e., exampla gratis, means, for example. The correct spelling is 'exempli gratia'. It's Latin. Thanks for the correction. i.e., on the other hand, comes from 'id est' ('that is'). Latin too. I remember reading a book about polymorphism in programming. The author said it came from Latin. Nope. Sounds more like Greek. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Reusable ways to wrapping thread locking techniques
I'm reviewing a lot of code that has thread acquire and release locks scattered throughout the code base. Would a better technique be to use -- http://mail.python.org/mailman/listinfo/python-list
Re: Reusable ways to wrapping thread locking techniques
Hit send too soon ... I'm reviewing a lot of code that has thread acquire and release locks scattered throughout the code base. Would a better technique be to use contextmanagers (for safe granular locking within a function) or decorators (function wide locks) to manage locks or am I making things too complicated? Am I reinventing the wheel by creating my own versions of above or are there off-the-shelf, debugged versions of above that one can use? Thank you, Malcolm -- http://mail.python.org/mailman/listinfo/python-list
Re: string to unicode
On 8/15/2011 11:29 AM, Adam Tauno Williams wrote: On Mon, 2011-08-15 at 08:20 -0700, Artie Ziff wrote: if I am using the standard csv library to read contents of a csv file which contains Unicode strings (short example: '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such as decode or encode to transform this string type into a python unicode type? Must I know the encoding (byte groupings) of the Unicode? Can I get this from the file? Perhaps I need to open the file with particular attributes? Open the file with a codec and pass that file-like object to csv. codecs.open(filename, mode[, encoding[, errors[, buffering]]]) http://docs.python.org/library/codecs.html#codec-objects In Python 3, just open with open(... encoding = 'xxx') -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
datetime.strptime w/ non-UTC and non-local TZs?
I was hoping somebody give me some clarity on how datetime.strptime is supposed to work, I'm thinking this is a bug, but wanted to see if the community has any ideas before I submit a bug to the python tracker. For reference I am in the CDT timezone... from datetime import datetime date_utc = Sun Jul 24 02:54:11 UTC 2011 date_local = Sun Jul 24 02:54:11 CDT 2011 date_other = Sun Jul 24 02:54:11 PDT 2011 print datetime.strptime(date_utc, '%a %b %d %H:%M:%S %Z %Y') 2011-07-24 02:54:11 print datetime.strptime(date_local, '%a %b %d %H:%M:%S %Z %Y') 2011-07-24 02:54:11 print datetime.strptime(date_other, '%a %b %d %H:%M:%S %Z %Y') Traceback (most recent call last): File stdin, line 1, in module File /usr/lib/python2.6/_strptime.py, line 325, in _strptime (data_string, format)) ValueError: time data 'Sun Jul 24 02:54:11 PDT 2011' does not match format '%a %b %d %H:%M:%S %Z %Y' The format is correct, and it can parse UTC (as mentioned in the documentation), and it can parse CDT (which is my current time zone), but using another timezone causes it to fail. Trying to parse the failing timezone on a computer set on that timezone works correctly. The documentation seems to indicate that parsing non-UTC TZs isn't guaranteed to work, but in that case the exception is non-clear (perhaps there's no elegant solution here). -- http://mail.python.org/mailman/listinfo/python-list
Re: string to unicode
Chris Angelico wrote: On Mon, Aug 15, 2011 at 4:20 PM, Artie Ziff artie.z...@gmail.com wrote: if I am using the standard csv library to read contents of a csv file which contains Unicode strings (short example: '\xe8\x9f\x92\xe8\x9b\x87'), how do I use a python Unicode method such as decode or encode to transform this string type into a python unicode type? Must I know the encoding (byte groupings) of the Unicode? Can I get this from the file? Perhaps I need to open the file with particular attributes? Start here: http://www.joelonsoftware.com/articles/Unicode.html The CSV file, being stored on disk, cannot contain Unicode strings; it can only contain bytes. If you know the encoding (eg UTF-8, UCS-2, etc), then you can decode it using that. If you don't, your best bet is to ask the origin of the file; failing that, check the first few bytes - if it's \xFF\xFE or \xFE\xFF or \xEF\xBB\xBF, then it's probably UTF-16LE, UTF-16BE, or UTF-8, respectively (those being the encodings of the BOM). There may be other clues, too, but normally it's best to get the encoding separately from the data rather than try to decode it from the data itself. As this problem really is not a new one, there are several more – if I may say so – pythonic approaches: http://stackoverflow.com/questions/436220/python-is-there-a-way-to- determine-the-encoding-of-text-file Improving Billy Mays' matching brackets checker, chardet worked for me (the test file was UTF-8-encoded). Watch for word-wrap: --- # encoding: utf-8 ''' Created on 2011-07-18 @author: Thomas 'PointedEars' Lahn pointede...@web.de, based on an idea of Billy Mays 81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com in news:j01ph6$knt$1...@speranza.aioe.org ''' import sys, os, chardet pairs = {u'}': u'{', u')': u'(', u']': u'[', u'”': u'“', u'›': u'‹', u'»': u'«', u'】': u'【', u'〉': u'〈', u'》': u'《', u'」': u'「', u'』': u'『'} valid = set(v for pair in pairs.items() for v in pair) if __name__ == '__main__': for dirpath, dirnames, filenames in os.walk(sys.argv[1]): for name in filenames: stack = [' '] file_path = os.path.join(dirpath, name) with open(file_path, 'rb') as f: reported = False lines = enumerate(f, 1) encoding = chardet.detect(''.join(map(lambda x: x[1], lines)))['encoding'] chars = ((c, line_no, col) for line_no, line in lines for col, c in enumerate(line.decode(encoding), 1) if c in valid) for c, line_no, col in chars: if c in pairs: if stack[-1] == pairs[c]: stack.pop() else: if not reported: first_bad = (c, line_no, col) reported = True else: stack.append(c) print '%s: %s' % (name, (good if len(stack) == 1 else bad '%s' at %s:%s % first_bad)) --- HTH -- PointedEars Bitte keine Kopien per E-Mail. / Please do not Cc: me. -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On Aug 15, 2:31 am, Terry Reedy tjre...@udel.edu wrote: On 8/15/2011 12:28 AM, Seebs wrote: To repeat again: you are free to put in explicit dedent markers that will let you re-indent code should all indents be removed. As Terry has been trying to say for a while now, use the following methods to quell your eye pain. Use pass statement: if foo: if bar: baz else: pass else: quux Use comments: if foo: if bar: baz #else bar (or endif or whatever you like) else: quux Use road signs: :-) # [Warning: Curves Ahead: Eyeball Parse limit 35 WPM!] if foo: # [Exit 266: foo] -- # [Right Curve Ahead: slow eyeball parsing to 15 WPM!] if bar: baz else: pass # -- [Warning: Do not litter!] else: # [Exit 267: Not Foo] -- # [Right Curve Ahead: slow eyeball parsing to 15 WPM!] quux ... # [Eyeball Parse limit 55 WPM!] ... # [PSA: Friends don't let friends write asinine code] ... # [Next Rest Stop: NEVER!] Now you have the nice triangular shape that your eyes have been trained to recognize! I would suggest to use comments whenever possible. Of course there will be times when you cannot use a comment and must use an else clause. Now you have nothing to complain about :). -- http://mail.python.org/mailman/listinfo/python-list
Why no warnings when re-assigning builtin names?
With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. For example, they'll innocently call some variable, list, and assign a list of items to it. ...and if they're _unlucky_ enough, their program may actually work (encouraging them to re-use this name in other programs). If they try to use an actual keyword, both the interpreter and compiler are helpful enough to give them a syntax error, but I think the builtins should be pseudo-reserved, and a user should explicitly have to do something *extra* to not receive a warning. I'd suggest: from __future__ import allow_reassigning_builtins, but I think this abuse of the __future__ module likely isn't welcome. I know that for testing purposes, this functionality is very convenient, and I'm not suggesting it be removed. In these cases, it would be trivial to just require something explicit, telling the interpreter that the programmer was aware they were assigning to a builtin name. The situation is slightly different for modules that come with Python. Most of us would cringe when seeing something like: `string = Some string`; but at least the user has to explicitly import the string module for this to actually cause issues (other than readability). What sayest the Python community about having an explicit warning against such un-pythonic behaviour (re-assigning builtin names)? Regards, Gerrat -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Aug 15, 2011, at 5:52 PM, Gerrat Rickert wrote: With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. For example, they'll innocently call some variable, list, and assign a list of items to it. ...and if they're _unlucky_ enough, their program may actually work (encouraging them to re-use this name in other programs). Or they'll assign a class instance to 'object', only to cause weird errors later when they use it as a base class. I agree that this is a problem. The folks on my project who are new-ish to Python overwrite builtins fairly often. Since there's never been any consequence other than my my vague warnings that something bad might happen as a result, it's difficult for them to develop good habits in this regard. It doesn't help that Eclipse (their editor of choice) doesn't seem to provide a way of coloring builtins differently. (That's what I'm told, anyway. I don't use it.) If they try to use an actual keyword, both the interpreter and compiler are helpful enough to give them a syntax error, but I think the builtins should be pseudo-reserved, and a user should explicitly have to do something *extra* to not receive a warning. Unfortunately you're suggesting a change to the language which could break existing code. I could see a use for from __future__ import squawk_if_i_reassign_a_builtin or something like that, but the current default behavior has to remain as it is. JMO, Philip -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Mon, Aug 15, 2011 at 10:52 PM, Gerrat Rickert grick...@coldstorage.com wrote: With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. For example, they’ll innocently call some variable, “list”, and assign a list of items to it. It's actually masking, not reassigning. That may make it easier or harder to resolve the issue. If you want a future directive that deals with it, I'd do it the other way - from __future__ import mask_builtin_warning or something - so the default remains as it currently is. But this may be a better job for a linting script. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
Gerrat Rickert wrote: What sayest the Python community about having an explicit warning against such un-pythonic behaviour (re-assigning builtin names)? What makes you think this behavior is unpythonic? Python is not about hand-holding. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On 2011-08-15, Ethan Furman et...@stoneleaf.us wrote: Gerrat Rickert wrote: What sayest the Python community about having an explicit warning against such un-pythonic behaviour (re-assigning builtin names)? What makes you think this behavior is unpythonic? Python is not about hand-holding. It seems like something which is sufficiently likely to be a mistake might deserve a warning -- especially since, so far as I can tell, there's never going to be a program which can't easily be written to avoid the problematic behavior. -s -- Copyright 2011, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.net http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated! I am not speaking for my employer, although they do rent some of my opinions. -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
Seebs wrote: On 2011-08-15, Ethan Furman et...@stoneleaf.us wrote: Gerrat Rickert wrote: What sayest the Python community about having an explicit warning against such un-pythonic behaviour (re-assigning builtin names)? What makes you think this behavior is unpythonic? Python is not about hand-holding. It seems like something which is sufficiently likely to be a mistake might deserve a warning -- especially since, so far as I can tell, there's never going to be a program which can't easily be written to avoid the problematic behavior. sufficiently likely depends entirely on who is doing the coding. I use `open()` for opening my files, and so regularly use `file` as a name. It can also be very handy to mask a built-in when doing something even more fun and entertaining and I, for one, have zero desire to have Python start warning me about perfectly legitimate code. Programmers need to learn whichever language they are choosing to code in, and if extra help is needed beyond whatever is basic for that language, find (or write! ;) the third-party tool to help out. There are at least two linters for Python, and multiple IDEs that can help with these, and other, problems. (I don't much care for IDEs, but I am thinking of starting to use a linter, myself.) ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Aug 15, 2011 5:56 PM, Gerrat Rickert grick...@coldstorage.com wrote: With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. For example, they’ll innocently call some variable, “list”, and assign a list of items to it. ...and if they’re _unlucky_ enough, their program may actually work (encouraging them to re-use this name in other programs). If they try to use an actual keyword, both the interpreter and compiler are helpful enough to give them a syntax error, but I think the builtins should be “pseudo-reserved”, and a user should explicitly have to do something *extra* ... What sayest the Python community about having an explicit warning against such un-pythonic behaviour (re-assigning builtin names)? One of Python's greatest strength's in my opinion is that it strives for consistency. As much as possible, Python avoids differentiating between built-in objects (types or otherwise) and user-defined objects. I think it should stay that way. There are tools that can detect these errors and their use should be encouraged, but the Python interpreter shouldn't single out variables which are types that happen to be built-in from any other variable or any other type. -- http://mail.python.org/mailman/listinfo/python-list
testing if a list contains a sublist
hi list, what is the best way to check if a given list (lets call it l1) is totally contained in a second list (l2)? for example: l1 = [1,2], l2 = [1,2,3,4,5] - l1 is contained in l2 l1 = [1,2,2,], l2 = [1,2,3,4,5] - l1 is not contained in l2 l1 = [1,2,3], l2 = [1,3,5,7] - l1 is not contained in l2 my problem is the second example, which makes it impossible to work with sets insteads of lists. But something like set.issubset for lists would be nice. greatz Johannes -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Mon, Aug 15, 2011 at 2:52 PM, Gerrat Rickert grick...@coldstorage.comwrote: With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. http://pypi.python.org/pypi/pylint checks for this and many other issues. I don't know if pyflakes or pychecker do. -- http://mail.python.org/mailman/listinfo/python-list
Re: testing if a list contains a sublist
Check out collections.Counter if you have 2.7 or up. If you don't, google for multiset or bag types. On Mon, Aug 15, 2011 at 4:26 PM, Johannes dajo.m...@web.de wrote: hi list, what is the best way to check if a given list (lets call it l1) is totally contained in a second list (l2)? for example: l1 = [1,2], l2 = [1,2,3,4,5] - l1 is contained in l2 l1 = [1,2,2,], l2 = [1,2,3,4,5] - l1 is not contained in l2 l1 = [1,2,3], l2 = [1,3,5,7] - l1 is not contained in l2 my problem is the second example, which makes it impossible to work with sets insteads of lists. But something like set.issubset for lists would be nice. greatz Johannes -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Reusable ways to wrapping thread locking techniques
On 15Aug2011 13:56, pyt...@bdurham.com pyt...@bdurham.com wrote: | I'm reviewing a lot of code that has thread acquire and release | locks scattered throughout the code base. | | Would a better technique be to use contextmanagers (for safe | granular locking within a function) or decorators (function wide | locks) to manage locks or am I making things too complicated? No, you're on the money. | Am I reinventing the wheel by creating my own versions of above | or are there off-the-shelf, debugged versions of above that one | can use? I routinely have: some_lock = allocate_lock() ... with some_lock: code here! Doing the equivalent with decorators to make monitors seems perfectly reasonable to me too. Do it all with context managers if you can; they do reliable lock release even when an exception occurs, and don't clutter the code with distracting and hard to verify cleanup code. It's worth it just to make everything more readable. The fact that it makes the code smaller and easier to maintain and less bug prone is just sugar. Cheers, -- Cameron Simpson c...@zip.com.au DoD#743 http://www.cskk.ezoshosting.com/cs/ My mind is like a blotter: Soaks it up, gets it backwards. -- http://mail.python.org/mailman/listinfo/python-list
Re: Data issues with Django and Apache
In j27bde$dlr$1...@reader1.panix.com John Gordon gor...@panix.com writes: The problem is that I get conflicting results as to whether these temporary records have reached their expiration date, depending if I search for them via an Apache web call or if I do the search locally from a python shell. And to make it weirder, the conflicts go away if I stop and restart the Apache server, although any new records created after this point will still exhibit the issue. The problem turned out to be a class variable that contained a time filter with the current time. But since it was a class variable, it was only evaluated once upon import and its idea of now was forever frozen at that moment, so it always compared as being less than any of the lock records that were passed in. I changed it to be a class method that constructs and returns a new time filter whenever it is called. Thanks for everyone's help! -- John Gordon A is for Amy, who fell down the stairs gor...@panix.com B is for Basil, assaulted by bears -- Edward Gorey, The Gashlycrumb Tinies -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
In article mailman.10.1313417818.27778.python-l...@python.org, Chris Angelico ros...@gmail.com wrote: Or: Blasted PHP, which operators have precedence between || and or? which is easy to forget. And you're right about the details changing from language to language, hence the operators table *for each language*. But most languages follow fairly sane rules How dare you use the words PHP and sane in two adjoining paragraphs! -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
In article 4e492d08$0$30003$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: I'm reminded of this quote from John Baez: The real numbers are the dependable breadwinner of the family, the complete ordered field we all rely on. The complex numbers are a slightly flashier but still respectable younger brother: not ordered, but algebraically complete. The quaternions, being noncommutative, are the eccentric cousin who is shunned at important family gatherings. But the octonions are the crazy old uncle nobody lets out of the attic: they are nonassociative. Wow, at first glance, I mis-parsed that name as Joan Baez. Had me really confused for a moment. -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On Tue, Aug 16, 2011 at 1:34 AM, Roy Smith r...@panix.com wrote: In article mailman.10.1313417818.27778.python-l...@python.org, Chris Angelico ros...@gmail.com wrote: Or: Blasted PHP, which operators have precedence between || and or? which is easy to forget. And you're right about the details changing from language to language, hence the operators table *for each language*. But most languages follow fairly sane rules How dare you use the words PHP and sane in two adjoining paragraphs! By separating them with the word most. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
rantingrick wrote: Used to and supposed to is the verbiage of children and idiots. So when we reach a certain age we're meant to abandon short, concise and idomatic ways of speaking, and substitute long words and phrases to make ourselves sound adult and educated? -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
Steven D'Aprano wrote: I'm reminded of this quote from John Baez: ...But the octonions are the crazy old uncle nobody lets out of the attic: they are nonassociative. (And don't even ask about the sedenions...) Aren't they the ones that mutilate cattle and abduct people? -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
I don't mind people using e.g. and i.e. as long as they use them *correctly*. Many times people use i.e. when they really mean e.g. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On 2011-08-16, Roy Smith r...@panix.com wrote: In article 4e492d08$0$30003$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: I'm reminded of this quote from John Baez: The real numbers are the dependable breadwinner of the family, the complete ordered field we all rely on. The complex numbers are a slightly flashier but still respectable younger brother: not ordered, but algebraically complete. The quaternions, being noncommutative, are the eccentric cousin who is shunned at important family gatherings. But the octonions are the crazy old uncle nobody lets out of the attic: they are nonassociative. Wow, at first glance, I mis-parsed that name as Joan Baez. Had me really confused for a moment. Would it have been that much weirder than Hedy Lamarr? -s -- Copyright 2011, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.net http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated! I am not speaking for my employer, although they do rent some of my opinions. -- http://mail.python.org/mailman/listinfo/python-list
Re: testing if a list contains a sublist
In article mailman.27.1313450819.27778.python-l...@python.org, Johannes dajo.m...@web.de wrote: hi list, what is the best way to check if a given list (lets call it l1) is totally contained in a second list (l2)? for example: l1 = [1,2], l2 = [1,2,3,4,5] - l1 is contained in l2 l1 = [1,2,2,], l2 = [1,2,3,4,5] - l1 is not contained in l2 l1 = [1,2,3], l2 = [1,3,5,7] - l1 is not contained in l2 my problem is the second example, which makes it impossible to work with sets insteads of lists. But something like set.issubset for lists would be nice. greatz Johannes import re def sublist(l1, l2): s1 = ''.join(map(str, l1)) s2 = ''.join(map(str, l2)) return re.search(s1, s2) assert sublist([1,2], [1,2,3,4,5]) assert not sublist ([1,2,2], [1,2,3,4,5]) assert not sublist([1,2,3], [1,3,5,7]) (running and ducking) -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
In article 9att2bf71...@mid.individual.net, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: rantingrick wrote: Used to and supposed to is the verbiage of children and idiots. So when we reach a certain age we're meant to abandon short, concise and idomatic ways of speaking, and substitute long words and phrases to make ourselves sound adult and educated? Yup. -- http://mail.python.org/mailman/listinfo/python-list
Re: surprising interaction between function scope and class namespace
Peter Otten wrote: LOAD_NAME is pretty dumb, it looks into the local namespace and if that lookup fails falls back to the global namespace. Someone probably thought I can do better, and reused the static name lookup for nested functions for names that occur only on the right-hand side of assignments in a class. I doubt that it was a conscious decision -- it just falls out of the way the compiler looks up names in its symbol table. In case 1, the compiler finds the name 'a' in the function's local namespace and generates a LOAD_FAST opcode, because that's what it does for all function-local names. In case 2, it finds it in the local namespace of the class and generates LOAD_NAME, because that's what it does for all class-local names. The weirdness arises because classes make use of vestiges of the old two-namespace system, which bypasses lexical scoping at run time. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Aug 15, 5:13 pm, Philip Semanchuk phi...@semanchuk.com wrote: On Aug 15, 2011, at 5:52 PM, Gerrat Rickert wrote: With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. For example, they'll innocently call some variable, list, and assign a list of items to it. ...and if they're _unlucky_ enough, their program may actually work (encouraging them to re-use this name in other programs). Or they'll assign a class instance to 'object', only to cause weird errors later when they use it as a base class. I agree that this is a problem. The folks on my project who are new-ish to Python overwrite builtins fairly often. Simple syntax hilighting can head off these issues with great ease. Heck, python even has a keyword module, and you get a list of built- ins from the dir() function! import keyword import __builtin__ PY_BUILTINS = [str(name) for name in dir(__builtin__) if not name.startswith('_')] PY_KEYWORDS = keyword.kwlist Also Python ships with IDLE (which is a simplistic IDE) and although i find it needs a bit of work to be what GvR initially dreamed, it works good enough to get you by. I always say, you must use the correct tool for the job, and syntax hilight is a must have to avoid these accidents. -- http://mail.python.org/mailman/listinfo/python-list
TestFixtures 1.12.0 Released!
Hi All, I'm happy to announce a new release of TestFixtures with the following changes: - OutputCapture has grown a `captured` property and can now be temporarily disabled using their`disable` method: http://packages.python.org/testfixtures/streams.html - Logging can now be captured only when it exceeds a specified logging level: http://packages.python.org/testfixtures/logging.html#only-capturing-specific-logging - The handling of timezones has been reworked in both `test_datetime` and `test_time`. This is not backwards compatible but is much more useful and correct: http://packages.python.org/testfixtures/datetime.html#timezones The package is on PyPI and a full list of all the links to docs, issue trackers and the like can be found here: http://www.simplistix.co.uk/software/python/testfixtures cheers, Chris -- Simplistix - Content Management, Batch Processing Python Consulting - http://www.simplistix.co.uk -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Tue, 16 Aug 2011 08:15 am Chris Angelico wrote: On Mon, Aug 15, 2011 at 10:52 PM, Gerrat Rickert grick...@coldstorage.com wrote: With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. For example, they’ll innocently call some variable, “list”, and assign a list of items to it. It's actually masking, not reassigning. That may make it easier or harder to resolve the issue. The usual term is shadowing builtins, and it's a feature, not a bug :) If you want a future directive that deals with it, I'd do it the other way - from __future__ import mask_builtin_warning or something - so the default remains as it currently is. But this may be a better job for a linting script. Agreed. It's a style issue, nothing else. There's nothing worse about: def spam(list): pass compared to class thingy: pass def spam(thingy): pass Why should built-ins be treated as more sacred than your own objects? -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On Tue, 16 Aug 2011 10:48 am Gregory Ewing wrote: rantingrick wrote: Used to and supposed to is the verbiage of children and idiots. So when we reach a certain age we're meant to abandon short, concise and idomatic ways of speaking, and substitute long words and phrases to make ourselves sound adult and educated? Say what? Used to isn't idiom. It is grammatical English. Avoidance of used to is a hyper-correction done by people who don't know as much about English as they think, like the grammar policeman let Johnny and I off with a warning, perhaps the most widespread hyper-correction in English. (If you take Johnny out of the picture, the policeman let I off with a warning... which is obviously wrong. Whether Johnny was there or not, the policeman let *me* off with a warning.) Used to is unexceptional English: http://www.englishpage.com/verbpage/usedto.html http://www.bbc.co.uk/worldservice/learningenglish/youmeus/quiznet/newquiz114.shtml http://www.englishclub.com/grammar/verbs-m_used-to-do.htm http://www.learnenglish.de/grammar/usedtotext2.htm Any-grammatical-errors-are-deliberate-ly y'rs, -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On 16/08/2011 01:52, Gregory Ewing wrote: I don't mind people using e.g. and i.e. as long as they use them *correctly*. Many times people use i.e. when they really mean e.g. Can you give me an example? :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
In article 9att9mf71...@mid.individual.net, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: I don't mind people using e.g. and i.e. as long as they use them *correctly*. The only correct way to use i.e. is to use it to download a better browser. -- http://mail.python.org/mailman/listinfo/python-list
Re: testing if a list contains a sublist
On Tue, 16 Aug 2011 09:26 am Johannes wrote: hi list, what is the best way to check if a given list (lets call it l1) is totally contained in a second list (l2)? This is not the most efficient algorithm, but for short lists it should be plenty fast enough: def contains(alist, sublist): if len(sublist) == 0 or len(sublist) len(alist): return False start = 0 while True: try: p = alist.index(sublist[0], start) except ValueError: return False for i,x in enumerate(sublist): if alist[p+i] != x: start = p+1 break else: # for loop exits without break return True -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On 2011-08-16, Roy Smith r...@panix.com wrote: In article 9att9mf71...@mid.individual.net, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: I don't mind people using e.g. and i.e. as long as they use them *correctly*. The only correct way to use i.e. is to use it to download a better browser. Similarly: Boy, is there, e.g., on my face now! -s -- Copyright 2011, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.net http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated! I am not speaking for my employer, although they do rent some of my opinions. -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Aug 15, 2011, at 9:32 PM, Steven D'Aprano wrote: On Tue, 16 Aug 2011 08:15 am Chris Angelico wrote: If you want a future directive that deals with it, I'd do it the other way - from __future__ import mask_builtin_warning or something - so the default remains as it currently is. But this may be a better job for a linting script. Agreed. It's a style issue, nothing else. There's nothing worse about: def spam(list): pass compared to class thingy: pass def spam(thingy): pass Why should built-ins be treated as more sacred than your own objects? Because built-ins are described in the official documentation as having a specific behavior, while my objects are not. Yes, it can be useful to replace some of the builtins with one's own implementation, and yes, doing so fits in with Python's we're all consenting adults philosophy. But replacing (shadowing, masking -- call it what you will) builtins is not everyday practice. On the contrary, as the OP Gerrat pointed out, it's most often done unwittingly by newcomers to the language who have no idea that they've done anything out of the ordinary or potentially confusing. If a language feature is most often invoked accidentally without knowledge of or regard for its potential negative consequences, then it might be worth making it easier to avoid those accidents. bye, Philip -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
* 2011-08-14T01:44:05-07:00 * Chris Rebert wrote: I've heard that Dylan is supposedly Lisp, sans parens. http://en.wikipedia.org/wiki/Dylan_(programming_language) It has copied/derived many features from Lisps but it's not a dialect of Lisp because of the syntax and its consequences. -- http://mail.python.org/mailman/listinfo/python-list
Windows service in production?
Anyone know of a Python application running as a Windows service in production? I'm planning a network monitoring application that runs as a service and reports back to the central server. Sort of a heartbeat type agent to assist with this server is down, go check on it type situations. If using Visual Studio and C# is the more reliable way, then I'll go that route. I love Python, but everything I read about Python services seems to have workarounds ahoy for various situations (or maybe that's just Windows services in general?). And there seem to be multiple layers of workarounds, since it takes py2exe (or similar) and there are numerous workarounds required there, depending on which libraries and functionality are being used. Overall, reading about Windows services in Python is not exactly a confidence inspiring experience. If I knew of a reference example of something reliably running in production, I'd feel better than copying and pasting some code from a guy's blog. -- http://mail.python.org/mailman/listinfo/python-list
Re: allow line break at operators
On Aug 15, 11:13 pm, alex23 wuwe...@gmail.com wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: I think I would be less skeptical about fluent interfaces if they were written more like Unix shell script pipelines instead of using attribute access notation: foo.array_of_things | sort | map block | join , I've seen at least one attempt to provide this in Python: If you want 100% OOP then use Ruby: rb [3,100,-20].sort.join('#') -20#3#100 Ruby is great from this angle! The reading proceeds naturally from right to left. I have become accustomed to reading Python's nested function calls however it does feel much more natural in Ruby. Of course, there are architectural reasons why Python cannot do this linear syntactical processing which lends some paradigm-al niceties to the python programmer that are not available to the Ruby folks. -- http://mail.python.org/mailman/listinfo/python-list
Re: Ten rules to becoming a Python community member.
On Aug 15, 7:48 pm, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: rantingrick wrote: Used to and supposed to is the verbiage of children and idiots. So when we reach a certain age we're meant to abandon short, concise and idomatic ways of speaking, and substitute long words and phrases to make ourselves sound adult and educated? Well that is the idea anyway. Not that we should be overly pedantic about it of course, however some words need to be cast off before we leave the primary school playground in the name of articulate communication. These specific phrases i have pointed out (used to and supposed to) are a result of a mind choosing the easy way out instead of putting in the wee bit more effort required to express one's self in an articulate manner. Also these two phrases are quite prolifically used within his community (among others), from the BDFL on down. It's a slippery slope my friend. -- http://mail.python.org/mailman/listinfo/python-list
How to use python environment created using virtualenv?
I have created a python environment using virtualenv, but when i want to import such environment to PyDev, error just appears, it tells there should be a Libs dir, but there is no Libs DIr in the virtual envronment created using virtualenv, what should i do if i want to use this virtual environment? -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
On Tue, 16 Aug 2011 01:23 pm Philip Semanchuk wrote: On Aug 15, 2011, at 9:32 PM, Steven D'Aprano wrote: On Tue, 16 Aug 2011 08:15 am Chris Angelico wrote: If you want a future directive that deals with it, I'd do it the other way - from __future__ import mask_builtin_warning or something - so the default remains as it currently is. But this may be a better job for a linting script. Agreed. It's a style issue, nothing else. There's nothing worse about: def spam(list): pass compared to class thingy: pass def spam(thingy): pass Why should built-ins be treated as more sacred than your own objects? Because built-ins are described in the official documentation as having a specific behavior, while my objects are not. *My* objects certainly are, because I write documentation for my code. My docs are no less official than Python's docs. You can shadow anything. Sometimes shadowing is safe, sometimes it isn't. I don't see why we should necessarily fear safe shadowing of built-ins more than we fear unsafe shadowing of non-built-ins. (I'm not even convinced that making None a reserved word was the right decision.) A warning that is off by default won't help the people who need it, because they don't know enough to turn the warning on. A warning that is on by default will be helpful to the newbie programmer for the first week or so, and then will be nothing but an annoyance for the rest of their career. (For some definition of a week -- some people are slower learners than others.) Yes, it can be useful to replace some of the builtins with one's own implementation, and yes, doing so fits in with Python's we're all consenting adults philosophy. But replacing (shadowing, masking -- call it what you will) builtins is not everyday practice. On the contrary, as the OP Gerrat pointed out, it's most often done unwittingly by newcomers to the language who have no idea that they've done anything out of the ordinary or potentially confusing. Protecting n00bs from their own errors is an admirable aim, but have you considered that warnings for something which may be harmless could do more harm than good? Beginners often lack the skill to distinguish between harmless warnings that can safely be ignored, and fatal errors that need to be fixed. Even user friendly warning or error messages tend to unnerve some beginner coders. There's not much we can do about outright errors, except to make sure that the error string is as useful as possible, but we can avoid overloading beginners with warnings they don't need to care about: WARNING WARNING WARNING WILL ROBINSON, DANGER DANGER DANGER: YOUR SISTER'S NAME 'PENNY' SHADOWS THE BRITISH CURRENCY, POTENTIAL AMBIGUITY ALERT DANGER DANGER DANGER! *wink* Depending on their personality, you may end up teaching them to ignore warnings, or a superstitious dread of anything that leads to a warning. Neither is a good outcome. If a language feature is most often invoked accidentally without knowledge of or regard for its potential negative consequences, then it might be worth making it easier to avoid those accidents. Perhaps. But I'm not so sure it is worth the cost of extra code to detect shadowing and raise a warning. After all, the average coder probably never shadows anything, and for those that do, once they get bitten *once* they either never do it again or learn how to shadow safely. I don't see it as a problem. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
[issue12266] str.capitalize contradicts oneself
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset c34772013c53 by Ezio Melotti in branch '3.2': #12266: Fix str.capitalize() to correctly uppercase/lowercase titlecased and cased non-letter characters. http://hg.python.org/cpython/rev/c34772013c53 New changeset eab17979a586 by Ezio Melotti in branch '2.7': #12266: Fix str.capitalize() to correctly uppercase/lowercase titlecased and cased non-letter characters. http://hg.python.org/cpython/rev/eab17979a586 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12266 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12266] str.capitalize contradicts oneself
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 1ea72da11724 by Ezio Melotti in branch 'default': #12266: merge with 3.2. http://hg.python.org/cpython/rev/1ea72da11724 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12266 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12266] str.capitalize contradicts oneself
Ezio Melotti ezio.melo...@gmail.com added the comment: Fixed, thanks for the report! -- resolution: duplicate - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12266 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Tom Christiansen tchr...@perl.com added the comment: Ezio Melotti rep...@bugs.python.org wrote on Mon, 15 Aug 2011 04:56:55 -: Another thing I noticed is that (at least on wide builds) surrogate pairs are not joined on the fly: p '\ud800\udc00' len(p) 2 p.encode('utf-16').decode('utf-16') ' ' len(_) 1 (For those who may not immediately realize from reading the surrogates, ' ' is code point 0x1, the first non-BMP code point. I piped it through `uniquote -x` just to make sure.) Yes, that makes perfect sense. It's something of a buggy feature or featureful bug that UTF-16 does this. When you are thinking of arbitrary sequences of code points, which is something you have be able to do in memory but not in a UTF stream, then one can say that one has four code points of anything in the 0 .. 0x10 range. Those can be any arbitrary code points only (1) *while* in memory, *and* assuming a (2) non-UTF16, ie UTF-32 or UTF-8 representation. You cannot do that with UTF-16, which is why it works only on a Python wide build. Otherwise they join up. The reason they join up in UTF-16 is also the reason why unlike in regular memory where you might be able to use an alternate representation like UTF-8 or UTF-32, UTF streams cannot contain unpaired surrogates: because if that stream were in UTF-16, you would never be able to tell the difference between a sequence of a lead surrogate followed by a tail surrogate and the same thing meaning just one non-BMP code point. Since you would not be able to tell the difference, it always only means the latter, and the former sense is illegal. This is why lone surrogates are illegal in UTF streams. In case it isn't obvious, *this* is the source of the [풜--풵] bug in all the UTF-16 or UCS-2 regex languages. It is why Java 7 added \x{...}, so that they can rewrite that as [\x{1D49C}--\x{1D4B5}] to pass the regex compiler, so that it seems something indirect, not just surrogates. That's why I always check it in my cross-language regex tests. A 16-bit language has to have a workaround, somehow, or it will be in trouble. The Java regex compiler doesn't generate UTF-16 for itself, either. It generates UTF-32 for its pattern. You can see this right at the start of the source code. This is from the Java Pattern class: /** * Copies regular expression to an int array and invokes the parsing * of the expression which will create the object tree. */ private void compile() { // Handle canonical equivalences if (has(CANON_EQ) !has(LITERAL)) { normalize(); } else { normalizedPattern = pattern; } patternLength = normalizedPattern.length(); // Copy pattern to int array for convenience // Use double zero to terminate pattern temp = new int[patternLength + 2]; hasSupplementary = false; int c, count = 0; // Convert all chars into code points for (int x = 0; x patternLength; x += Character.charCount(c)) { c = normalizedPattern.codePointAt(x); if (isSupplementary(c)) { hasSupplementary = true; } temp[count++] = c; } patternLength = count; // patternLength now in code points See how that works? They use an int(-32) array, not a char(-16) array! It's reasonably clever, and necessary. Because it does that, it can now compile \x{1D49C} or erstwhile embedded UTF-8 non-BMP literals into UTF-32, and not get upset by the stormy sea of troubles that surrogates are. You can't have surrogates in ranges if you don't do something like this in a 16-bit language. Java couldn't fix the [풜--풵] bug except by doing the \x{...} indirection trick, because they are stuck with UTF-16. However, they actually can match the string 풜 against the pattern ^.$, and have it fail on ^..$. Yes, I know: the code-unit length of that string is 2, but its regex count is just one dot worth. I *believe* they did it that way because tr18 says it has to work that way, but they may also have done it just because it makes sense. My current contact at Oracle doing regex support is not the guy who originally wrote the class, so I am not sure. (He's very good, BTW. For Java 7, he also added named captures, script properties, *and* brought the class up to conformance with tr18's level 1 requirements.) I'm thinking Python might be able to do in the regex engine on narrow builds the sort of thing that Java does. However, I am also thinking that that might be a lot of work for a situation more readily addressed by phasing out narrow builds or at least telling people they should use wide builds to get that thing to work. --tom == === QUASI OFF TOPIC ADDENDUM FOLLOWS ===
[issue12266] str.capitalize contradicts oneself
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset d3816fa1bcdf by Ezio Melotti in branch '2.7': #12266: move the tests in test_unicode. http://hg.python.org/cpython/rev/d3816fa1bcdf -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12266 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12711] Explain tracker components in devguide
Ezio Melotti ezio.melo...@gmail.com added the comment: Fixed in http://hg.python.org/devguide/rev/c9dd231b0940 -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12711 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12746] normalization is affected by unicode width
STINNER Victor victor.stin...@haypocalc.com added the comment: See also #12737. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12746 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12737] str.title() is overzealous by upcasing combining marks inappropriately
STINNER Victor victor.stin...@haypocalc.com added the comment: See also #12746. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12746] normalization is affected by unicode width
Changes by Tom Christiansen tchr...@perl.com: -- nosy: +tchrist ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12746 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Marc-Andre Lemburg m...@egenix.com added the comment: Keep in mind that we should be able to access and use lone surrogates too, therefore: s = '\ud800' # should be valid len(s) # should this raise an error? (or return 0.5 ;)? s[0] # error here too? list(s) # here too? p = s + '\udc00' len(p) # 1? s[0] # '\U0001' ? s[1] # IndexError? list(p + 'a') # ['\ud800\udc00', 'a']? We can still decide that strings with lone surrogates work only with a limited number of methods/functions but: 1) it's not backward compatible; 2) it's not very consistent Another thing I noticed is that (at least on wide builds) surrogate pairs are not joined on the fly: p '\ud800\udc00' len(p) 2 p.encode('utf-16').decode('utf-16') ' ' len(_) 1 Hi Tom, welcome to Python land :-) Here's some more background information on how Python's Unicode implementation works: You need to differentiate between Unicode code points stored in Unicode objects and ones encoded in transfer formats by codecs. We generally do allow lone surrogates, unassigned code points, lone combining code points, etc. in Unicode objects since Python needs to be able to work on all Unicode code points and build strings with them. The transfer format codecs do try to combine surrogates on decoding data on UCS4 builds. On UCS2 builds they create surrogate pairs as necessary. On output, those pairs will again be joined to get round-trip safety. It helps if you think of Python's Unicode objects using UCS2 and UCS4 instead of UTF-16/32. Python does try to make working with UCS2 easy and in many cases behaves as if it were using UTF-16 internally, but there are, of course, limits to this. In practice, you only rarely get to see any of these special cases, since non-BMP code points are usually not found in everyday use. If they do become a problem for you, you have the option of switching to a UCS4 build of Python. You also have to be aware of the fact that Python started Unicode in 1999/2000 with Unicode 2.0/3.0, so it uses the terminology of those versions, some of which has changed in more recent versions of Unicode. For more background information, you might want take a look at this talk from 2002: http://www.egenix.com/library/presentations/#PythonAndUnicode Related to the other tickets you opened You'll also find that collation and compression was already on the plate back then, but since no one step forward, it wasn't implemented. Cheers, -- Marc-Andre Lemburg eGenix.com 2011-10-04: PyCon DE 2011, Leipzig, Germany50 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- nosy: +lemburg title: Python lib re cannot handle Unicode properly due to narrow/wide bug - Python lib re cannot handle Unicode properly due to narrow/wide bug ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12751] Use macros for surrogates in unicodeobject.c
New submission from STINNER Victor victor.stin...@haypocalc.com: A lot of code is duplicated in unicodeobject.c to manipulate (encode/decode) surrogates. Each function has from one to three different implementations. The new decode_ucs4() function adds a new implementation. Attached patch replaces this code by macros. I think that only the implementations of IS_HIGH_SURROGATE and IS_LOW_SURROGATE are important for speed. ((ch 0xFC00UL) == 0xD800) (from decode_ucs4) is *a little bit* faster than (0xD800 = ch ch = 0xDBFF) on my CPU (Atom Z520 @ 1.3 GHz): running test_unicode 4 times takes ~54 sec instead of ~57 sec (-3%). These 3 macros have to be checked, I wrote the first one: #define IS_SURROGATE(ch) (((ch) 0xF800UL) == 0xD800) #define IS_HIGH_SURROGATE(ch) (((ch) 0xFC00UL) == 0xD800) #define IS_LOW_SURROGATE(ch) (((ch) 0xFC00UL) == 0xDC00) I added cast to Py_UCS4 in COMBINE_SURROGATES to avoid integer overflow if Py_UNICODE is 16 bits (narrow build). It's maybe useless. #define COMBINE_SURROGATES(ch1, ch2) \ (Py_UCS4)(ch1) 0x3FF) 10) | ((Py_UCS4)(ch2) 0x3FF)) + 0x1) HIGH_SURROGATE and LOW_SURROGATE require that their ordinal argument has been preproceed to fit in [0; 0x]. I added this requirement in the comment of these macros. It would be better to have only one macro to do the two operations, but because *p++ (dereference and increment) is usually used, I prefer to avoid one unique macro (I don't like passing *p++ in a macro using its argument more than once). Or we may add a third macro using HIGH_SURROGATE and LOW_SURROGATE. I rewrote the main loop of PyUnicode_EncodeUTF16() to avoid an useless test on ch2 on narrow build. I also added a IS_NONBMP macro just because I prefer macro over hardcoded constants. -- files: unicode_macros.patch keywords: patch messages: 142108 nosy: benjamin.peterson, ezio.melotti, haypo, lemburg, loewis, pitrou, tchrist, terry.reedy priority: normal severity: normal status: open title: Use macros for surrogates in unicodeobject.c versions: Python 3.3 Added file: http://bugs.python.org/file22901/unicode_macros.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12751] Use macros for surrogates in unicodeobject.c
STINNER Victor victor.stin...@haypocalc.com added the comment: We may use the following unlikely macro for IS_SURROGATE, IS_HIGH_SURROGATE and IS_LOW_SURROGATE: #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) I suppose that we should use microbenchmarks to validate these macros? Should I open a new issue for this idea? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12737] str.title() is overzealous by upcasing combining marks inappropriately
Ezio Melotti ezio.melo...@gmail.com added the comment: So the issue here is that while using combing chars, str.title() fails to titlecase the string properly. The algorithm implemented by str.title() [0] is quite simple: it loops through the code units, and uppercases all the chars that follow a char that is not lower/upper/titlecased. This means that if Déme doesn't use combining accents, the char before the 'm' is 'é', 'é' is a lowercase char, so 'm' is not capitalized. If the 'é' is represented as 'e' + '´', the char before the 'm' is '´', '´' is not a lower/upper/titlecase char, so the 'm' is capitalized. I guess we could normalize the string before doing the title casing, and then normalize it back. Also the str methods don't claim to follow Unicode afaik, so unless we decide that they should, we could implement whatever algorithm we want. [0]: Objects/unicodeobject.c:6752 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12751] Use macros for surrogates in unicodeobject.c
Ezio Melotti ezio.melo...@gmail.com added the comment: This has been proposed already in #10542 (the issue also has patches). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
Ezio Melotti ezio.melo...@gmail.com added the comment: If the regex module works fine here, I think it's better to leave the re module alone and include the regex module in 3.3. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12734] Request for property support in Python re lib
Ezio Melotti ezio.melo...@gmail.com added the comment: This indeed should be fixed by replacing 're' with 'regex'. So I would suggest to focus your tests on 'regex' and report them there so that possible bugs gets fixed and tested before we include the module in the stdlib. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12734 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12733] Request for grapheme support in Python re lib
Ezio Melotti ezio.melo...@gmail.com added the comment: As I said on #12734 and #12731, if the 'regex' module address this issue, we should just wait until we include it in the stdlib. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12733 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12730] Python's casemapping functions are untrustworthy due to narrow/wide build issues
Ezio Melotti ezio.melo...@gmail.com added the comment: This is actually a duplicated of #9200. @Terry Besides which, all I see (on Windowsj) in Firefox is things like ð¼ð¯ð‘…ð¨ð‘‰ð¯ð». Encoding problem. Firefox thinks this is some iso-8859-*. You can fix this selecting 'Unicode (UTF-8)' from View - Character Encoding. IDLE just has empty boxes. This is most likely because it doesn't use a font able to display those chars. -- resolution: - duplicate stage: needs patch - committed/rejected status: open - closed superseder: - str.isprintable() is always False for large code points ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12730 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9200] Make str methods work with non-BMP chars on narrow builds
Ezio Melotti ezio.melo...@gmail.com added the comment: I closed #12730 as a duplicate of this and updated the title of this issue. -- title: str.isprintable() is always False for large code points - Make str methods work with non-BMP chars on narrow builds ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9200 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10542] Py_UNICODE_NEXT and other macros for surrogates
Ezio Melotti ezio.melo...@gmail.com added the comment: See also #12751. -- nosy: +tchrist ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10542 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9200] Make str methods work with non-BMP chars on narrow builds
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +tchrist ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9200 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12752] locale.normalize does not take unicode strings
New submission from Julian Taylor jtaylor.deb...@googlemail.com: using unicode strings for locale.normalize gives following traceback with python2.7: ~$ python2.7 -c 'import locale; locale.normalize(uen_US)' Traceback (most recent call last): File string, line 1, in module File /usr/lib/python2.7/locale.py, line 358, in normalize fullname = localename.translate(_ascii_lower_map) TypeError: character mapping must return integer, None or unicode with python2.6 it works and it also works with non-unicode strings in 2.7 -- components: Unicode messages: 142118 nosy: jtaylor priority: normal severity: normal status: open title: locale.normalize does not take unicode strings versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12752 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12752] locale.normalize does not take unicode strings
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti stage: - test needed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12752 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12204] str.upper converts to title
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 16edc5cf4a79 by Ezio Melotti in branch '3.2': #12204: document that str.upper().isupper() might be False and add a note about cased characters. http://hg.python.org/cpython/rev/16edc5cf4a79 New changeset fb49394f75ed by Ezio Melotti in branch '2.7': #12204: document that str.upper().isupper() might be False and add a note about cased characters. http://hg.python.org/cpython/rev/fb49394f75ed New changeset c821e3a54930 by Ezio Melotti in branch 'default': #12204: merge with 3.2. http://hg.python.org/cpython/rev/c821e3a54930 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12204 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12204] str.upper converts to title
Ezio Melotti ezio.melo...@gmail.com added the comment: Fixed, thanks for the report! -- resolution: - fixed stage: commit review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12204 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: For what it's worth, I've had idea about string storage, roughly based on how *nix stores data on disk. If a string is small, point to a block of codepoints. If a string is medium-sized, point to a block of pointers to codepoint blocks. If a string is large, point to a block of pointers to pointer blocks. This means that a large string doesn't need a single large allocation. The level of indirection can be increased as necessary. For simplicity, all codepoint blocks contain the same number of codepoints, except the final codepoint block, which may contain fewer. A codepoint block may use the minimum width necessary (1, 2 or 4 bytes) to store all of its codepoints. This means that there are no surrogates and that different sections of the string can be stored in different widths to reduce memory usage. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12752] locale.normalize does not take unicode strings
Julian Taylor jtaylor.deb...@googlemail.com added the comment: this is a regression introduced by fixing http://bugs.python.org/issue1813 This breaks some user code,. e.g. wx.Locale.GetCanonicalName returns unicode. Example bugs: https://bugs.launchpad.net/ubuntu/+source/update-manager/+bug/824734 https://bugs.launchpad.net/ubuntu/+source/playonlinux/+bug/825421 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12752 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12752] locale.normalize does not take unicode strings
Marc-Andre Lemburg m...@egenix.com added the comment: Julian Taylor wrote: New submission from Julian Taylor jtaylor.deb...@googlemail.com: using unicode strings for locale.normalize gives following traceback with python2.7: ~$ python2.7 -c 'import locale; locale.normalize(uen_US)' Traceback (most recent call last): File string, line 1, in module File /usr/lib/python2.7/locale.py, line 358, in normalize fullname = localename.translate(_ascii_lower_map) TypeError: character mapping must return integer, None or unicode with python2.6 it works and it also works with non-unicode strings in 2.7 This looks like a side-effect of the change Antoine made to the locale module when trying to make the case mapping work in a non-locale dependent way. -- nosy: +lemburg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12752 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com