Re: What extended ASCII character set uses 0x9D?
On 08/17/2017 05:53 PM, Chris Angelico wrote: On Fri, Aug 18, 2017 at 10:30 AM, John Nagle wrote: On 08/17/2017 05:14 PM, John Nagle wrote: I'm cleaning up some data which has text description fields from multiple sources. A few more cases: bytearray(b'\xe5\x81ukasz zmywaczyk') This one has to be Polish, and the first character should be the letter Ł U+0141 or ł U+0142. In UTF-8, U+0141 becomes C5 81, which is very similar to the E5 81 that you have. So here's an insane theory: something attempted to lower-case the byte stream as if it were ASCII. If you ignore the high bit, 0xC5 looks like 0x45 or "E", which lower-cases by having 32 added to it, yielding 0xE5. Reversing this transformation yields sane data for several of your strings - they then decode as UTF-8: miguel Ángel santos lidija kmetič Łukasz zmywaczyk jiří urbančík Ľubomír mičko petr urbančík You're exactly right. The database has columns "name" and "normalized name". Normalizing the name was done by forcing it to lower case as if in ASCII, even for UTF-8. That resulted in errors like KACMAZLAR MEKANİK -> kacmazlar mekanä°k Anita Calçados -> anita calã§ados Felfria Resor för att Koh Lanta -> felfria resor fã¶r att koh lanta The "name" field is OK; it's just the "normalized name" field that is sometimes garbaged. Now that I know this, and have properly captured the "name" field in UTF-8 where appropriate, I can regenerate the "normalized name" field. MySQL/MariaDB know how to lower-case UTF-8 properly. Clean data at last. Thanks. The database, by the way, is a historical snapshot of startup funding, from Crunchbase. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: What extended ASCII character set uses 0x9D?
On 08/17/2017 10:12 PM, Ian Kelly wrote: Here's some more 0x9d usage, each from a different data item: Guitar Pro, JamPlay, RedBana\\\'s Audition,\x9d Doppleganger\x99s The Lounge\x9d or Heatwave Interactive\x99s Platinum Life Country,\\" This one seems like a good hint since \x99 here looks like it should be an apostrophe. But what character set has an apostrophe there? The best I can come up with is that 0xE2 0x80 0x99 is "right single quotation mark" in UTF-8. Also known as the "smart apostrophe", so it could have been entered by a word processor. The problem is that if that's what it is, then two out of the three bytes are outright missing. If the same thing happened to \x9d then who knows what's missing from it? One possibility is that it's the same two bytes. That would make it 0xE2 0x80 0x9D which is "right double quotation mark". Since it keeps appearing after ending double quotes that seems plausible, although one has to wonder why it appears *in addition to* the ASCII double quotes. I was wondering if it was a signal to some word processor to apply smart quote handling. This has me puzzled. It's often, but not always after a close quote. "TM" or "(R)" might make sense, but what non-Unicode character set has those. And "green"(tm) makes no sense. CP-1252 has ™ at \x99, perhaps coincidentally. CP-1252 and Latin-1 both have ® at \xae. That's helpful. All those text snippets failed Windows-1252 decoding, though, because 0x9d isn't in Windows-1252. I'm coming around to the idea that some of these snippets have been previously mis-converted, which is why they make no sense. Since, as someone pointed out, there was UTF-8 which had been run through an ASCII-type lower casing algorithm, that's a reasonable assumption. Thanks for looking at this, everyone. If a string won't parse as either UTF-8 or Windows-1252, I'm just going to convert the bogus stuff to the Unicode replacement character. I might remove 0x9d chars, since that never seems to affect readability. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: What extended ASCII character set uses 0x9D?
On 08/17/2017 05:53 PM, Chris Angelico wrote:> On Fri, Aug 18, 2017 at 10:30 AM, John Nagle wrote: >> On 08/17/2017 05:14 PM, John Nagle wrote: >>> I'm cleaning up some data which has text description fields from >>> multiple sources. >> A few more cases: >> >> bytearray(b'\xe5\x81ukasz zmywaczyk') > > This one has to be Polish, and the first character should be the > letter Ł U+0141 or ł U+0142. In UTF-8, U+0141 becomes C5 81, which is > very similar to the E5 81 that you have. > > So here's an insane theory: something attempted to lower-case the byte > stream as if it were ASCII. If you ignore the high bit, 0xC5 looks > like 0x45 or "E", which lower-cases by having 32 added to it, yielding > 0xE5. Reversing this transformation yields sane data for several of > your strings - they then decode as UTF-8: > > miguel Ángel santos > lidija kmetič > Łukasz zmywaczyk > jiří urbančík > Ľubomír mičko > petr urbančík I think you're right for those. I'm working from a MySQL dump of supposedly LATIN-1 data, but LATIN-1 will accept anything. I've found UTF-8 and Windows-2152 in there. It's quite possble that someone lower-cased UTF-8 stored in a LATIN-1 field. There are lots of questions on the web which complain about getting a Python decode error on 0x9d, and the usual answer is "Use Latin-1". But that doesn't really decode properly, it just doesn't generate an exception. > That doesn't work for everything, though. The 0x81 0x81 and 0x9d ones > are still a puzzle. The 0x9d thing seems unrelated to the Polish names thing. 0x9d shows up in the middle of English text that's otherwise ASCII. Is this something that can appear as a result of cutting and pasting from Microsoft Word? I'd like to get 0x9d right, because it comes up a lot. The Polish name thing is rare. There's only about a dozen of those in 400MB of database dump. There are hundreds of 0x9d hits. Here's some more 0x9d usage, each from a different data item: Guitar Pro, JamPlay, RedBana\\\'s Audition,\x9d Doppleganger\x99s The Lounge\x9d or Heatwave Interactive\x99s Platinum Life Country,\\" for example \\"I\\\'ve seen the bull run in Pamplona, Spain\x9d.\\" Everything Netwise Depot is a \\"One Stop Web Shop\\"\x9d that provides sustainable \\"green\\"\x9d living are looking for a \\"Do It for Me\\"\x9d solution This has me puzzled. It's often, but not always after a close quote. "TM" or "(R)" might make sense, but what non-Unicode character set has those. And "green"(tm) makes no sense. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: What extended ASCII character set uses 0x9D?
On 08/17/2017 05:14 PM, John Nagle wrote: > I'm cleaning up some data which has text description fields from > multiple sources. A few more cases: bytearray(b'miguel \xe3\x81ngel santos') bytearray(b'lidija kmeti\xe4\x8d') bytearray(b'\xe5\x81ukasz zmywaczyk') bytearray(b'M\x81\x81\xfcnster') bytearray(b'ji\xe5\x99\xe3\xad urban\xe4\x8d\xe3\xadk') bytearray(b'\xe4\xbdubom\xe3\xadr mi\xe4\x8dko') bytearray(b'petr urban\xe4\x8d\xe3\xadk') 0x9d is the most common; that occurs in English text. The others seem to be in some Eastern European character set. Understand, there's no metadata available to disambiguate this. What I have is a big CSV file in which different character sets are mixed. Each field has a uniform character set, so I need character set detection on a per-field basis. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
What extended ASCII character set uses 0x9D?
I'm cleaning up some data which has text description fields from multiple sources. Some are are in UTF-8. Some are in WINDOWS-1252. And some are in some other character set. So I have to examine and sanity check each field in a database dump, deciding which character set best represents what's there. Here's a hard case: g1 = bytearray(b'\\"Perfect Gift Idea\\"\x9d Each time') g1.decode("utf8") UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 21: invalid start byte g1.decode("windows-1252") UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 21: character maps to 0x9d is unmapped in "windows-1252", according to https://en.wikipedia.org/wiki/Windows-1252 So the Python codec isn't wrong here. Trying "latin-1" g1.decode("latin-1") '\\"Perfect Gift Idea\\"\x9d Each time' That just converts 0x9d in the input to 0x9d in Unicode. That's "Operating System Command" (the "Windows" key?) That's clearly wrong; some kind of quote was intended. Any ideas? John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Unicode support in Python 2.7.8 - 16 bit
How do I test if a Python 2.7.8 build was built for 32-bit Unicode? (I'm dealing with shared hosting, and I'm stuck with their provided versions.) If I give this to Python 2.7.x: sy = u'\U0001f60f' len(sy) is 1 on a Ubuntu 14.04LTS machine, but 2 on the Red Hat shared hosting machine. I assume "1" indicates 32-bit Unicode capability, and "2" indicates 16-bit. It looks like Python 2.x in 16-bit mode is using a UTF-16 pair encoding, like Java. Is that right? Is it documented somewhere? (Annoyingly, while the shared host has a Python 3, it's 3.2.3, which rejects "u" Unicode string constants and has other problems in the MySQL area.) John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Who still supports recent Python on shared hosting
I'm looking for shared hosting that supports at least Python 3.4. Hostgator: Highest version is Python 3.2. Dreamhost: Highest version is Python 2.7. Bluehost: Install Python yourself. InMotion: Their documentation says 2.6. Is Python on shared hosting dead? I don't need a whole VM and something I have to sysadmin, just a small shared hosting account. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
input vs. readline
If "readline" is imported, "input" gets "readline" capabilities. It also loses the ability to import control characters. It doesn't matter where "readline" is imported; an import in some library module can trigger this. You can try this with a simple test case: print(repr(input())) as a .py file, run in a console. Try typing "aaaESCbbb". On Windows 7, output is "bbb". On Linux, it's "aaa\x1". So it looks like "readline" is implicitly imported on Windows. I have a multi-threaded Python program which recognizes ESC as a command to stop something. This works on Linux, but not on Windows. Apparently something in Windows land pulls in "readline". What's the best way to get input from the console (not any enclosing shell script) that's cross-platform, cross-version (Python 2.7, 3.x), and doesn't do "readline" processing? (No, I don't want to use signals, a GUI, etc. This is simulating a serial input device while logging messages appear. It's a debug facility to be able to type input in the console window.) John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: py2exe crashes on simple program
On 7/4/2016 11:13 PM, Steven D'Aprano wrote: > If you change it to "library.exe" does it work? Also, I consider this > a bug in py2exe: - it's an abuse of assert, using it to check > user-supplied input; - it's a failing assertion, which by definition > is a bug. I'm not trying to build "library.zip". That's a work file py2exe created. If I delete it, it's re-created by py2exe. The problem seems to be that my "setup.py" file didn't include a "console" entry, which tells py2exe the build target. Apparently, without that py2exe tries to build something bogus and blows up. After fixing that, the next error is Building 'dist\baudotrss.exe'. error: [Errno 2] No such file or directory: 'C:\\Program Files\\Python35\\lib\\site-packages\\py2exe\\run-py3.5-win-amd64.exe' Looks like Pip installed (yesterday) a version of py2exe that doesn't support Python 3.5. The py2exe directory contains "run-py3.3-win-amd64.exe" and "run-py3.4-win-amd64.exe", but not 3.5 versions. That's what PyPi says at "https://pypi.python.org/pypi/py2exe";. The last version of py2exe was uploaded two years ago (2014-05-09) and is for Python 3.4. So of course it doesn't have the 3.5 binary executable it needs. Known problem. Stack Overflow reports py2exe is now broken for Python 3.5. Apparently it's a non-trivial fix, too. http://stackoverflow.com/questions/32963057/is-there-a-py2exe-version-thats-compatible-with-python-3-5 cx_freeze has been suggested as an alternative, but its own documents indicate it's only been tested through Python 3.4. Someone reported success with a development version. I guess people don't create Python executables much. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
py2exe crashes on simple program
I'm trying to create an executable with py2exe. The program runs fine in interpretive mode. But when I try to build an executable, py2exe crashes with an assertion error. See below. This is an all-Python program; no binary modules other than ones that come with the Python 3.5.2 distribution. Running "python setup.py bdist" works, so "setup.py" is sane. What's giving py2exe trouble? U:\>python setup.py py2exe running py2exe running build_py Building shared code archive 'dist\library.zip'. Traceback (most recent call last): File "setup.py", line 14, in packages=['baudotrss'], File "C:\Program Files\Python35\lib\distutils\core.py", line 148, in setup dist.run_commands() File "C:\Program Files\Python35\lib\distutils\dist.py", line 955, in run_commands self.run_command(cmd) File "C:\Program Files\Python35\lib\distutils\dist.py", line 974, in run_command cmd_obj.run() File "C:\Program Files\Python35\lib\site-packages\py2exe\distutils_buildexe.py", line 188, in run self._run() File "C:\Program Files\Python35\lib\site-packages\py2exe\distutils_buildexe.py", line 268, in _run builder.build() File "C:\Program Files\Python35\lib\site-packages\py2exe\runtime.py", line 261, in build self.build_archive(libpath, delete_existing_resources=True) File "C:\Program Files\Python35\lib\site-packages\py2exe\runtime.py", line 426, in build_archive assert mod.__file__.endswith(EXTENSION_SUFFIXES[0]) AssertionError Python 3.5.2 / Win7 / AMD64. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 lack of support for fcgi/wsgi.
On 3/29/2015 7:11 PM, John Nagle wrote: > Meanwhile, I've found two more variants on "flup" > > https://pypi.python.org/pypi/flipflop > https://pypi.python.org/pypi/flup6 > > All of these are descended from the original "flup" code base. > > PyPi also has > > fcgi-python (Python 2.6, Windows only.) > fcgiapp (circa 2005) > superfcgi (circa 2009) > > Those can probably be ignored. > > One of the "flup" variants may do the job, but since there > are so many, and no single version has won out, testing is > necessary. "flipflop" looks promising, simply because the > author took all the code out that you don't need on a server. "flipflop" works well with Apache. It does log "WARNING: SCRIPT_NAME does not match REQUEST_URI" for any URL renamed using mod_rename with Apache, but other than that, it seems to do the job. The warning message was copied over from "flup", and there's an issue for it for one of the "flup" variants. So I referenced that issue for "flipflop": https://github.com/Kozea/flipflop/issues That's part of the problem of having all those forks - now each bug has to be fixed in each fork. After all this, the production system is now running entirely on Python 3. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 lack of support for fcgi/wsgi.
On 3/29/2015 6:03 PM, Paul Rubin wrote: > Those questions seem unfair to me. Nagle posted an experience report > about a real-world project to migrate a Python 2 codebase to Python 3. > He reported hitting more snags than some of us might expect purely from > the Python 3 propaganda ("oh, just run the 2to3 utility and it does > everything for you"). The report presented info worth considering for > anyone thinking of doing a 2-to-3 migration of their own, or maybe even > choosing between 2 and 3 for a new project. I find reports like that to > be valuable whether or not they suggest fixes for the snags. Thanks. Meanwhile, I've found two more variants on "flup" https://pypi.python.org/pypi/flipflop https://pypi.python.org/pypi/flup6 All of these are descended from the original "flup" code base. PyPi also has fcgi-python (Python 2.6, Windows only.) fcgiapp (circa 2005) superfcgi (circa 2009) Those can probably be ignored. One of the "flup" variants may do the job, but since there are so many, and no single version has won out, testing is necessary. "flipflop" looks promising, simply because the author took all the code out that you don't need on a server. CPAN, the Perl module archive, has some curation and testing. PyPi lacks that, which is how we end up with situations like this, where there are 11 ways to do something, most of which don't work. Incidentally, in my last report, I reported problems with BS4, PyMySQL, and Pickle. I now have workarounds for all of those, but not fixes. The bug reports I listed last time contain the workaround code. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 lack of support for fcgi/wsgi.
On 3/29/2015 1:19 PM, John Nagle wrote: > On 3/29/2015 12:11 PM, Ben Finney wrote: >> John Nagle writes: >> >>> The Python 3 documentation at >>> https://docs.python.org/3/howto/webservers.html >>> >>> recommends "flup" >> >> I disagree. In a section where it describes FastCGI, it presents a tiny >> example as a way to test the packages installed. The example happens to >> use ‘flup’. >> >> That's quite different from a recommendation. >> >>> I get the feeling, again, that nobody actually uses this stuff. > > So do others. See "http://www.slideshare.net/mitsuhiko/wsgi-on-python-3"; > > "A talk about the current state of WSGI on Python 3. Warning: > depressing. But it does not have to stay that way" > > "wsgiref on Python 3 is just broken." > > "Python 3 that is supposed to make unicode easier is causing a lot more > problems than unicode environments on Python 2" > > "The Python 3 stdlib is currently incredible broken but because there > are so few users, these bugs stay under the radar." > > That was written in 2010. Most of that stuff is still broken. > Here's his detailed critique: > > http://lucumr.pocoo.org/2010/5/25/wsgi-on-python-3/ > >> You have found yet another poorly-maintained package which is not at all >> the responsibility of Python 3. >> Why are you discussing it as though Python 3 is at fault? > >That's a denial problem. Uncritical fanboys are part of the problem, > not part of the solution. > >Practical problems: the version of "flup" on PyPi is so out of date > as to be useless. The original author abandoned the software. There > are at least six forks of "flup" on Github: > > https://github.com/Pyha/flup-py3.3 > https://github.com/Janno/flup-py3.3 > https://github.com/pquentin/flup-py3 > https://github.com/SmartReceipt/flup-server > https://github.com/dnephin/TreeOrg/tree/master/app-root/flup > https://github.com/noxan/flup > > The first three look reasonably promising; the last three look > abandoned. But why are there so many, and what are the > differences between the first three? Probably nobody > was able to fix all the Python 3 related problems documented by > Ronacher in 2010. None of the versions have much usage. Nobody > thought their version was good enough to push it to Pypi. > > All those people had to struggle to try to get a basic capability for > web development using Python to work. To use WSGI with Python 3, you > need to do a lot of work. Or stay with Python 2. > > Python 3 still isn't ready for prime time. > > John Nagle > -- https://mail.python.org/mailman/listinfo/python-list
Python 3 lack of support for fcgi/wsgi.
The Python 2 module "fcgi" is gone in Python 3. The Python 3 documentation at https://docs.python.org/3/howto/webservers.html recommends "flup" and links here: https://pypi.python.org/pypi/flup/1.0 That hasn't been updated since 2007, and the SVN repository linked there is gone. The recommended version is abandoned. pip3 tries to install version 1.0.2, from 2009. That's here: https://pypi.python.org/pypi/flup/1.0.2 That version is supported only for Python 2.5 and 2.6. There's a later version on Github: https://github.com/Pyha/flup-py3.3 But that's not what "pip3" is installing. I get the feeling, again, that nobody actually uses this stuff. "pip3" seems perfectly happy to install modules that don't work with Python 3. Try "pip3 install dnspython", for example. You need "dnspython3", but pip3 doesn't know that. There's "wsgiref", which looks more promising, but has a different interface. That's not what the Python documentation recommends as the first choice, but it's a standard module. I keep thinking I'm almost done with Python 3 hell, but then I get screwed by Python 3 again. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Workaround for BeautifulSoup/HTML5parser bug
BeautifulSoup 4 and HTML5parser are known to not play well together. I have a workaround for that. See https://bugs.launchpad.net/beautifulsoup/+bug/1430633 This isn't a fix; it's a postprocessor to fix broken BS4 trees. This is for use until the BS4 maintainers fix the bug. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 2 to 3 conversion - embrace the pain
On 3/15/2015 4:43 PM, Roy Smith wrote: > In article , > Mario Figueiredo wrote: > >> What makes you think your anecdotal bugs constitute any sort of >> evidence this programming language isn't ready to be used by the >> public? > > There's several levels of "ready". > > I'm sure the core language is more than ready for production use for a > project starting from scratch which doesn't rely on any third party > libraries. > > The next step up on the "ready" ladder would be a new project which will > require third-party libraries. And that pretty much means any > non-trivial project. I'm reasonably confident that most common use > cases can now be covered by p3-ready third party modules. If only that were true. Look what I'm reporting bugs on: ssl - a core Python module. cPickle - a core Python module. pymysql - the pure-Python recommended way to talk to MySQL. bs4/html5parser - a popular parser for HTML5 We're not in exotic territory here. I've done lots of exotic projects, but this isn't one of them. There's progress. The fix to "ssl" has been made and committed. I have a workaround for the cPickle bug - use pure-Python Pickle. I have a workaround for the pymysql problem, and a permanent fix is going into the next release of pymysql. I have a tiny test case for bs4/html5parser that reproduces the bug on a tiny snippet of HTML, and that's been uploaded to the BS4 issues tracker. I don't have a workaround for that. All this has cost me about two weeks of work so far. The "everything is just fine" comments are not helpful. Denial is not a river in Egypt. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 2 to 3 conversion - embrace the pain
On 3/14/2015 1:00 AM, Marko Rauhamaa wrote: > John Nagle : >> I'm approaching the end of converting a large system from Python 2 >> to Python 3. Here's why you don't want to do this. > > A nice report, thanks. Shows that the slowness of Python 3 adoption is > not only social inertia. > Marko Thanks. Some of the bugs I listed are so easy to hit that I suspect those packages aren't used much. Those bugs should have been found years ago. Fixed, even. I shouldn't be discovering them in 2015. I appreciate all the effort put in by developers in fixing these problems. Python 3 is still a long way from being ready for prime time, though. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 2 to 3 conversion - embrace the pain
On 3/13/2015 3:27 PM, INADA Naoki wrote: > Hi, John. I'm maintainer of PyMySQL. > > I'm sorry about bug of PyMySQL. But the bug is completely unrelated > to Python 3. > You may encounter the bug on Python 2 too. True. But much of the pain of converting to Python 3 comes from having to switch packages because the Python 2 package didn't make it to Python 3. All the bugs I'm discussing reflect forced package changes or upgrades. None were voluntary on my part. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Python 2 to 3 conversion - embrace the pain
I'm approaching the end of converting a large system from Python 2 to Python 3. Here's why you don't want to do this. The language changes aren't that bad, and they're known and documented. It's the package changes that are the problem. Discovering and fixing all the new bugs takes a while. BeautifulSoup: BeautifulSoup 3 has been phased out. I had my own version of BeautifulSoup 3, modified for greater robustness. But that was years ago. So I converted to BeautifulSoup 4, as the documentation says to do. The HTML5parser module is claimed to parse as a browser does, with all the error tolerance specified in the HTML5 spec. (The spec actually specifies how to handle bad HTML consistently across browsers in great detail, and HTML5parser has code in it for that.) It doesn't deliver on that promise, though. Some sites crash BeautifulSoup 4/HTML5parser. Try "kroger.com", which has HTML with . The parse tree constructed has a bad link, and trying to use the parse tree results in exceptions. Submitted bug report. Appears to be another case of a known bug. No workaround at this time. https://bugs.launchpad.net/beautifulsoup/+bug/1270611 https://bugs.launchpad.net/beautifulsoup/+bug/1430633 PyMySQL: "Pymysql is a pure Python drop-in replacement for MySQLdb". Sounds good. Then I discover that LOAD DATA LOCAL wasn't implemented in the version on PyPi. It's on Github, though, and I got the authors to push that out to PyPi. It works on test cases. But it doesn't work on a big job, because the default size of MySQL packets was set to 16MB. This made the LOAD DATA LOCAL code try to send the entire file being loaded as one giant MySQL packet. Unless you configure the MySQL server with 16MB buffers, this fails, with an obscure "server has gone away" message. Found the problem, came up with a workaround, submitted a bug report, and it's being fixed. https://github.com/PyMySQL/PyMySQL/issues/317 SSL: All the new TLS/SSL support is in Python 3. That's good. Unfortunately, using Firefox's set of SSL certs, some important sites (such as "verisign.com") don't validate. This turned out to be a complex problem involving Verisign cross-signing a certificate, which created a certificate hierarchy that some versions of OpenSSL can't handle. There's now a version of OpenSSL that can handle it, but the Python library has to make a call to use it, and that's going in but isn't deployed yet. This bug resulted in much finger-pointing between the Python and OpenSSL developers, the Mozilla certificate store maintainers, and Verisign. It's now been sorted out, but not all the fixes are deployed. Because "ssl" is a core Python module, this will remain broken until the next Python release, on both the 2.7 and 3.4 lines. Also, for no particularly good reason, the exception "SSL.CertificateError" is not a subclass of "SSL.Error", resulting in a routine exception not being recognized. Bug reports submitted for both OpenSSL and Python SSL. Much discussion. Problem fixed, but fix is in next version of Python. No workaround at this time. http://bugs.python.org/issue23476 Pickle: As I just posted recently, CPickle on Python 3.4 seems to have a memory corruption bug. Pure-Python Pickle is fine. So a workaround is possible. Bug report submitted. http://bugs.python.org/issue23655 Converting a large application program to Python 3 thus required diagnosing four library bugs and filing bug reports on all of them. Workarounds are known for two of the problems. I can't deploy the Python 3 version on the servers yet. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3
On 3/12/2015 5:18 PM, John Nagle wrote: > On 3/12/2015 2:56 PM, Cameron Simpson wrote: >> On 12Mar2015 12:55, John Nagle wrote: >>> I have working code from Python 2 which uses "pickle" to talk to a >>> subprocess via stdin/stdio. I'm trying to make that work in Python >>> 3. I'm starting to think that the "cpickle" module, which Python 3 uses by default, has a problem. After the program has been running for a while, I start seeing errors such as File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite if len(self.badbusinessinfo) > 0 : # if bad stuff NameError: name 'len' is not defined which ought to be impossible in Python, and File "C:\projects\sitetruth\subprocesscall.py", line 129, in send self.writer.dump(args) # send data OSError: [Errno 22] Invalid argument from somewhere deep inside CPickle. I got File "C:\projects\sitetruth\InfoSiteRating.py", line 223, in get_rating_text (ratingsmalliconurl, ratinglargiconurl, ratingalttext) = DetailsPageBuilder.getratingiconinfo(rating) NameError: name 'DetailsPageBuilder' is not defined (That's an imported module. It worked earlier in the run.) and finally, even after I deleted all .pyc files and all Python cache directories: Fatal Python error: GC object already tracked Current thread 0x1a14 (most recent call first): File "C:\python34\lib\site-packages\pymysql\connections.py", line 411 in description File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248 in _get_descriptions File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182 in _read_result_packet File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132 in read File "C:\python34\lib\site-packages\pymysql\connections.py", line 929 in _read_query_result File "C:\python34\lib\site-packages\pymysql\connections.py", line 768 in query File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in _query File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in execute File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select File "C:\projects\sitetruth\domaincache.py", line 30 in search File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain File "C:\projects\sitetruth\RatingProcess.py", line 68 in call File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall File "C:\projects\sitetruth\subprocesscall.py", line 158 in run File "C:\projects\sitetruth\RatingProcess.py", line 89 in main File "C:\projects\sitetruth\RatingProcess.py", line 95 in That's a definite memory error. So something is corrupting memory. Probably CPickle. All my code is in Python. Every library module came in via "pip", into a clean Python 3.4.3 (32 bit) installation on Win7/x86-64. Currently installed packages: beautifulsoup4 (4.3.2) dnspython3 (1.12.0) html5lib (0.999) pip (6.0.8) PyMySQL (0.6.6) pyparsing (2.0.3) setuptools (12.0.5) six (1.9.0) And it works fine with Python 2.7.9. Is there some way to force the use of the pure Python pickle module? My guess is that there's something about reusing "pickle" instances that botches memory uses in CPython 3's C code for "cpickle". John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3
On 3/12/2015 2:56 PM, Cameron Simpson wrote: > On 12Mar2015 12:55, John Nagle wrote: >> I have working code from Python 2 which uses "pickle" to talk to a >> subprocess via stdin/stdio. I'm trying to make that work in Python >> 3. First, the subprocess Python is invoked with the "-d' option, so >> stdin and stdio are supposed to be unbuffered binary streams. > > You shouldn't need to use unbuffered streams specifically. It should > be enough to .flush() the output stream (at whichever end) after you > have written the pickle data. Doing that. It's a repeat-transaction thing. Main process sends pickeled item to subprocess, subprocess reads item, subprocess does work, subprocess writes picked item to parent. This repeats. I call writer.clear_memo() and set reader.memo = {} at the end of each cycle, to clear Pickle's cache. That all worked fine in Python 2. Are there any known problems with reusing Python 3 "pickle"s streams? The identical code works with Python 2.7.9; it's converted to Python 3 using "six" so I can run on both Python versions and look for differences. I'm using Pickle format 2, for compatibility. (Tried 0, the ASCII format; it didn't help.) > I'm skipping some of your discussion; I can see nothing wrong. I > don't use pickle itself so aside from saying that your use seems to > conform to the python 3 docs I can't comment more deeply. That said, > I do use subprocess a fair bit. I'll have to put in more logging and see exactly what's going over the pipes. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Python3 "pickle" vs. stdin/stdout - unable to get clean byte streams in Python 3
I have working code from Python 2 which uses "pickle" to talk to a subprocess via stdin/stdio. I'm trying to make that work in Python 3. First, the subprocess Python is invoked with the "-d' option, so stdin and stdio are supposed to be unbuffered binary streams. That was enough in Python 2, but it's not enough in Python 3. The subprocess and its connections are set up with proc = subprocess.Popen(launchargs,stdin=subprocess.PIPE, stdout=subprocess.PIPE, env=env) ... self.reader = pickle.Unpickler(self.proc.stdout) self.writer = pickle.Pickler(self.proc.stdin, 2) after which I get result = self.reader.load() TypeError: 'str' does not support the buffer interface That's as far as traceback goes, so I assume this is disappearing into C code. OK, I know I need a byte stream. I tried self.reader = pickle.Unpickler(self.proc.stdout.buffer) self.writer = pickle.Pickler(self.proc.stdin.buffer, 2) That's not allowed. The "stdin" and "stdout" that are fields of "proc" do not have "buffer". So I can't do that in the parent process. In the child, though, where stdin and stdout come from "sys", "sys.stdin.buffer" is valid. That fixes the ""str" does not support the buffer interface error." But now I get the pickle error "Ran out of input" on the process child side. Probably because there's a str/bytes incompatibility somewhere. So how do I get clean binary byte streams between parent and child process? John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 2.7.9, 3.4.2 won't verify SSL cert for "verisign.com"
On 2/17/2015 3:42 PM, Laura Creighton wrote: > Possibly this bug? > https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1014640 > > Laura Probably that bug in OpenSSL. Some versions of OpenSSL are known to be broken for cases where there multiple valid certificate trees. Python ships with its own copy of OpenSSL on Windows. Tests for "www.verisign.com" Win7, x64: Python 2.7.9 with OpenSSL 1.0.1j 15 Oct 2014. FAIL Python 3.4.2 with OpenSSL 1.0.1i 6 Aug 2014. FAIL openssl s_client -OpenSSL 1.0.1h 5 Jun 2014 FAIL Ubuntu 14.04 LTS, using distro's versions of Python: Python 2.7.6 - test won't run, needs create_default_context Python 3.4.0 with OpenSSL 1.0.1f 6 Jan 2014. FAIL openssl s_client OpenSSL 1.0.1f 6 Jan 2014 PASS That's with the same cert file in all cases. The OpenSSL version for Python programs comes from ssl.OPENSSL_VERSION. The Linux situation has me puzzled. On Linux, Python is supposedly using the system version of OpenSSL. The versions match. Why do Python and the command line client disagree? Different options passed to OpenSSL by Python? Here's the little test program: http://www.animats.com/private/sslbug Please try that and let me know what happens on other platforms. Works with Python 2.7.9 or 3.x. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 2.7.9, 3.4.2 won't verify SSL cert for "verisign.com"
If I remove certs from my "cacert.pem" file passed to create_default_context, the Python test program rejects domains it will pass with the certs present. It's using that file. So that's not it. It seems to be an OpenSSL or cert file problem. I can reproduce the problem with the OpenSSL command line client: openssl s_client -connect www.verisign.com:443 -CAfile cacert.pem fails for "www.verisign.com", where "cacert.pem" has been extracted from Firefox's cert store. The error message from OpenSSL Verify return code: 20 (unable to get local issuer certificate) Try the same OpenSSL command for other domains ("google.com", "python.org") and no errors are reported. More later on this. So it's not a Python level issue. The only Python-specific problem is that the Python library doesn't pass detailed OpenSSL error codes through in exceptions. The Python exception text is "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581).", which is a generic message for most OpenSSL errors. John Nagle On 2/17/2015 12:00 AM, Laura Creighton wrote: > I've seen something like this: > > The requests module http://docs.python-requests.org/en/latest/ > ships with its own set of certificates "cacert.pem" > and ignores the system wide ones -- so, for instance, adding certificates > to /etc/ssl/certs on your debian or ubuntu system won't work. I edited > it by hand and then changed the REQUESTS_CA_BUNDLE environment variable > to point to it. > > Perhaps your problem is along the same lines? > > Laura > -- https://mail.python.org/mailman/listinfo/python-list
Python 2.7.9, 3.4.2 won't verify SSL cert for "verisign.com"
Python 2.7.9, Windows 7 x64. (also 3.4.2 on Win7, and 3.4.0 on Ubuntu 14.04) There's something about the SSL cert for "https://www.verisign.com"; that won't verify properly from Python.The current code looks like this: def testurlopen(host, certfile) : port = httplib.HTTPS_PORT sk = socket.socket(socket.AF_INET, socket.SOCK_STREAM) context = ssl.create_default_context(cafile=certfile) sock = context.wrap_socket(sk, server_hostname=host) try: sock.connect((host,port)) except EnvironmentError as message : print("Connection to \"%s\" failed: %s." % (host, message)) return False print("Connection to \"%s\" succeeded." % (host,)) return True Works for "python.org", "google.com", etc. I can connect to and dump the server's certificate for those sites. But for "verisign.com" and "www.verisign.com", I get Connection to "verisign.com" failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581). The certificate file, "cacert.pem", comes from Mozila's list of approved certificates, obtained from here: http://curl.haxx.se/ca/cacert.pem It has the cert for "VeriSign Class 3 Public Primary Certification Authority - G5" which is the root cert for "verisign.com". After loading that cert file into an SSL context, I can dump the context from Python with context.get_ca_certs() and get this dict for that cert: Cert: {'notBefore': u'Nov 8 00:00:00 2006 GMT', 'serialNumber': u'18DAD19E267DE8BB4A2158CDCC6B3B4A', 'notAfter': 'Jul 16 23:59:59 2036 GMT', 'version': 3L, 'subject': ((('countryName', u'US'),), (('organizationName', u'VeriSign, Inc.'),), (('organizationalUnitName', u'VeriSign Trust Network'),), (('organizationalUnitName', u'(c) 2006 VeriSign, Inc. - For authorized use only'),), (('commonName', u'VeriSign Class 3 Public Primary Certification Authority - G5'),)), 'issuer': ((('countryName', u'US '),), (('organizationName', u'VeriSign, Inc.'),), (('organizationalUnitName', u'VeriSign Trust Network'),), (('organizationalUnitName', u'(c) 2006 VeriSign, Inc. - For authorized use only'),), (('commonName', u'VeriSign Class 3 Public Primary Certification Authority - G5'),))} Firefox is happy with that cert. The serial number of the root cert matches the root cert Firefox displays. So the root cert file being used has the right cert for the cert chain back from "www.verisign.com". If I dump ssl.OPENSSL_VERSION from Python, I get "OpenSSL 1.0.1j 15 Oct 2014". That's an OK version. Something about that cert is unacceptable to the Python SSL module, but what? "CERTIFICATE VERIFY FAILED" doesn't tell me enough to diagnose the problem. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
SSLsocket.getpeercert - request to return ALL the fields of the certificate.
In each revision of "getpeercert", a few more fields are returned. Python 3.2 added "issuer" and "notBefore". Python 3.4 added "crlDistributionPoints", "caIssuers", and OCSP URLS. But some fields still aren't returned. I happen to need CertificatePolicies, which is how you distinguish DV, OV, and EV certs. Here's what you get now: {'OCSP': ('http://EVSecure-ocsp.verisign.com',), 'caIssuers': ('http://EVSecure-aia.verisign.com/EVSecure2006.cer',), 'crlDistributionPoints': ('http://EVSecure-crl.verisign.com/EVSecure2006.crl',), 'issuer': ((('countryName', 'US'),), (('organizationName', 'VeriSign, Inc.'),), (('organizationalUnitName', 'VeriSign Trust Network'),), (('organizationalUnitName', 'Terms of use at https://www.verisign.com/rpa (c)06'),), (('commonName', 'VeriSign Class 3 Extended Validation SSL CA'),)), 'notAfter': 'Mar 22 23:59:59 2015 GMT', 'notBefore': 'Feb 20 00:00:00 2014 GMT', 'serialNumber': '69A7BC85C106DDE1CF4FA47D5ED813DC', 'subject': ((('1.3.6.1.4.1.311.60.2.1.3', 'US'),), (('1.3.6.1.4.1.311.60.2.1.2', 'Delaware'),), (('businessCategory', 'Private Organization'),), (('serialNumber', '2927442'),), (('countryName', 'US'),), (('postalCode', '60603'),), (('stateOrProvinceName', 'Illinois'),), (('localityName', 'Chicago'),), (('streetAddress', '135 S La Salle St'),), (('organizationName', 'Bank of America Corporation'),), (('organizationalUnitName', 'Network Infrastructure'),), (('commonName', 'www.bankofamerica.com'),)), 'subjectAltName': (('DNS', 'mobile.bankofamerica.com'), ('DNS', 'www.bankofamerica.com')), 'version': 3} How about just returning ALL the remaining fields and finishing the job? Thanks. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Show off your Python chops and compete with others
On 11/6/2013 5:04 PM, Chris Angelico wrote: > On Thu, Nov 7, 2013 at 11:00 AM, Nathaniel Sokoll-Ward > wrote: >> Thought this group would appreciate this: >> www.metabright.com/challenges/python >> >> MetaBright makes skill assessments to measure how talented people are at >> different skills. And recruiters use MetaBright to find outrageously skilled >> job candidates. With tracking cookies blocked, you get 0 points. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Front-end to GCC
On 10/25/2013 12:18 PM, Mark Janssen wrote: >> As for the hex value for Nan who really gives a toss? The whole point is >> that you initialise to something that you do not expect to see. Do you not >> have a text book that explains this concept? > > No, I don't think there is a textbook that explains such a concept of > initializing memory to anything but 0 -- UNLESS you're from Stupid > University. > > Thanks for providing fodder... > > Mark Janssen, Ph.D. > Tacoma, WA What a mess of a discussion. First off, this is mostly a C/C++ issue, not a Python issue, because Python generally doesn't let you see uninitialized memory. Second, filling newly allocated memory with an illegal value is a classic debugging technique. Visual C/C++ uses it when you build in debug mode. Wikipedia has an explanation: http://en.wikipedia.org/wiki/Magic_number_%28programming%29#Magic_debug_values Microsoft Visual C++ uses 0xBAADF00D. In valgrind, there's a "-malloc-fill" option, and you can specify a hex value. There's a performance penalty for filling large areas of memory so it's usually done in debug mode only, and is sometimes causes programs with bugs to behave differently when built in debug vs. release mode. Sigh. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Global Variable In Multiprocessing
On 10/22/2013 11:22 PM, Chandru Rajendran wrote: > CAUTION - Disclaimer * > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely > for the use of the addressee(s). If you are not the intended recipient, > please > notify the sender by e-mail and delete the original message. Further, you are > not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys has > taken > every reasonable precaution to minimize this risk, but is not liable for any > damage > you may sustain as a result of any virus in this e-mail. You should carry out > your > own virus checks before opening the e-mail or attachment. Infosys reserves > the > right to monitor and review the content of all messages sent to or from this > e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS End of Disclaimer INFOSYS*** Because of the above restriction, we are unable to reply to your question. John Nagle SiteTruth -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Front-end to GCC
On 10/23/2013 12:25 AM, Philip Herron wrote: > On Wednesday, 23 October 2013 07:48:41 UTC+1, John Nagle wrote: >> On 10/20/2013 3:10 PM, victorgarcia...@gmail.com wrote: >> >>> On Sunday, October 20, 2013 3:56:46 PM UTC-2, Philip Herron >>> wrote: > Nagle replies: >> >>>> Documentation can be found >>>> http://gcc.gnu.org/wiki/PythonFrontEnd. ... > > I think your analysis is probably grossly unfair for many reasons. > But your entitled to your opinion. > > Current i do not use Bohem-GC (I dont have one yet), You included it in your project: http://sourceforge.net/p/gccpy/code/ci/master/tree/boehm-gc > i re-use > principles from gccgo in the _compiler_ not the runtime. At runtime > everything is a gpy_object_t, everything does this. Yeah you could do > a little of dataflow analysis for some really really specific code > and very specific cases and get some performance gains. But the > problem is that the libpython.so it was designed for an interpreter. > > So first off your comparing a project done on my own to something > like cPython loads of developers 20 years on my project or something > PyPy has funding loads of developers. > > Where i speed up is absolutely no runtime lookups on data access. > Look at cPython its loads of little dictionaries. All references are > on the Stack at a much lower level than C. All constructs are > compiled in i can reuse C++ native exceptions in the whole thing. I > can hear you shouting at the email already but the middle crap that a > VM and interpreter have to do and fast lookup is _NOT_ one of them. > If you truely understand how an interpreter works you know you cant > do this > > Plus your referencing really old code on sourceforge is another > thing. That's where you said to look: http://gcc.gnu.org/wiki/PythonFrontEnd "To follow gccpy development see: Gccpy SourceForge https://sourceforge.net/projects/gccpy"; > And i dont want to put out bench marks (I would get so much > shit from people its really not worth it) but it i can say it is > faster than everything in the stuff i compile so far. So yeah... not > only that but your referncing a strncmp to say no its slow yeah it > isn't 100% ideal but in my current git tree i have changed that. So the real source code isn't where you wrote that it is? Where is it, then? > So i > think its completely unfair to reference tiny things and pretend you > know everything about my project. If you wrote more documentation about what you're doing, people might understand what you are doing. > One thing people might find interesting is class i do data flow > analysis to generate a complete type for that class and each member > function is a compiled function like C++ but at a much lower level > than C++. It's not clear what this means. Are you trying to determine, say, which items are integers, lists, or specific object types? Shed Skin tries to do that. It's hard to do, but very effective if you can do it. In CPython, every time "x = a + b" is executed, the interpreter has to invoke the general case for "+", which can handle integers, floats, strings, NumPy, etc. If you can infer types, and know it's a float, the run time code can be float-specific and about three machine instructions. > The whole project has been about stripping out the crap > needed to run user code and i have been successful so far but your > comparing a in my spare time project to people who work on their > stuff full time. With loads of people etc. Shed Skin is one guy. > Anyways i am just going to stay out of this from now but your email > made me want to reply and rage. You've made big claims without giving much detail. So people are trying to find out if you've done something worth paying attention to. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python Front-end to GCC
On 10/20/2013 3:10 PM, victorgarcia...@gmail.com wrote: > On Sunday, October 20, 2013 3:56:46 PM UTC-2, Philip Herron wrote: >> I've been working on GCCPY since roughly november 2009 at least in its >> concept. It was announced as a Gsoc 2010 project and also a Gsoc 2011 >> project. I was mentored by Ian Taylor who has been an extremely big >> influence on my software development carrer. > > Cool! > >> Documentation can be found http://gcc.gnu.org/wiki/PythonFrontEnd. >> (Although this is sparse partialy on purpose since i do not wan't >> people thinking this is by any means ready to compile real python >> applications) > > Is there any document describing what it can already compile and, if > possible, showing some benchmarks? After reading through a vast amount of drivel below on irrelevant topics, looking at the nonexistent documentation, and finally reading some of the code, I think I see what's going on here. Here's the run-time code for integers: http://sourceforge.net/p/gccpy/code/ci/master/tree/libgpython/runtime/gpy-object-integer.c The implementation approach seems to be that, at runtime, everything is a struct which represents a general Python object. The compiler is, I think, just cranking out general subroutine calls that know nothing about type information. All the type handling is at run time. That's basically what CPython does, by interpreting a pseudo-instruction set to decide which subroutines to call. It looks like integers and lists have been implemented, but not much else. Haven't found source code for strings yet. Memory management seems to rely on the Boehm garbage collector. Much code seems to have been copied over from the GCC library for Go. Go, though, is strongly typed at compile time. There's no inherent reason this "compiled" approach couldn't work, but I don't know if it actually does. The performance has to be very low. Each integer add involves a lot of code, including two calls of "strcmp (x->identifier, "Int")". A performance win over CPython is unlikely. Compare Shed Skin, which tries to infer the type of Python objects so it can generate efficient type-specific C++ code. That's much harder to do, and has trouble with very dynamic code, but what comes out is fast. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Complex literals (was Re: I am never going to complain about Python again)
On 10/10/2013 6:27 PM, Steven D'Aprano wrote: > For what it's worth, there is no three-dimensional extension to complex > numbers, but there is a four-dimensional one, the quaternions or > hypercomplex numbers. They look like 1 + 2i + 3j + 4k, where i, j and k > are all distinct but i**2 == j**2 == k**2 == -1. Quaternions had a brief > period of popularity during the late 19th century but fell out of > popularity in the 20th. In recent years, they're making something of a > comeback, as using quaternions for calculating rotations is more > numerically stable than traditional matrix calculations. I've done considerable work with quaternions in physics engines for simulation. Nobody in that area calls them "hypercomplex numbers". The geometric concept is simple. Consider an angle represented as a 2-element unit vector. It's a convenient angle representation. It's homogeneous - there's no special case at 0 degrees. Then upgrade to 3D. You can represent latitude and longitude as a 3-element unit vector. (GPS systems do this; latitude and longitude are only generated at the end, for output.) Then upgrade to 4D. Now you have a 4-element unit vector that represents latitude, longitude, and heading. It can also be thought of as a point on the surface of a 4D sphere, although that isn't too useful. If you have to numerically integrate torques to get angular velocity, and angular velocity to get angular position, quaternions are the way to go. If you want to understand all this, there's a good writeup in one of the Graphics Gems books. Unlike complex numbers, these quaternions are always unit vectors. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: PID tuning.
On 10/14/2013 2:03 PM, Ben Finney wrote: > Renato Barbosa Pim Pereira > writes: > >> I am looking for some software for PID tuning that would take the >> result of a step response, and calculates Td, Ti, Kp, any suggestion >> or hint of where to start?, thanks. > > Is this related to Python? What is “PID tuning”, and what have you > tried already? See "http://sts.bwk.tue.nl/7y500/readers/.%5CInstellingenRegelaars_ExtraStof.pdf"; You might also try the OpenHRP3 forums. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Python was designed (was Re: Multi-threading in Python vs Java)
On 10/12/2013 3:37 PM, Chris Angelico wrote: > On Sat, Oct 12, 2013 at 7:10 AM, Peter Cacioppi > wrote: >> Along with "batteries included" and "we're all adults", I think >> Python needs a pithy phrase summarizing how well thought out it is. >> That is to say, the major design decisions were all carefully >> considered, and as a result things that might appear to be >> problematic are actually not barriers in practice. My suggestion >> for this phrase is "Guido was here". > > "Designed". > > You simply can't get a good clean design if you just let it grow by > itself, one feature at a time. No, Python went through the usual design screwups. Look at how painful the slow transition to Unicode was, from just "str" to Unicode strings, ASCII strings, byte strings, byte arrays, 16 and 31 bit character builds, and finally automatic switching between rune widths. Old-style classes vs. new-style classes. Adding a boolean type as an afterthought (that was avoidable; C went through that painful transition before Python was created).Operator "+" as concatenation for built-in arrays but addition for NumPy arrays. Each of those reflects a design error in the type system which had to be corrected. The type system is now in good shape. The next step is to make Python fast. Python objects have dynamic operations suited to a naive interpreter like CPython. These make many compile time optimizations hard. At any time, any thread can monkey-patch any code, object, or variable in any other thread. The ability for anything to use "setattr()" on anything carries a high performance price. That's part of why Unladen Swallow failed and why PyPy development is so slow. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: web scraping
On 10/12/2013 1:35 PM, dvgh...@gmail.com wrote: > On Saturday, October 12, 2013 7:12:38 AM UTC-7, Ronald Routt wrote: >> I am new to programming and trying to figure out python. >> >> >> >> I am trying to learn which tools and tutorials I need to use along >> with some good beginner tutorials in scraping the the web. The end >> result I am trying to come up with is scraping auto dealership >> sites for the following: >> >> 1.Name of dealership >> 2. State where dealership is located >> 3. Name of Owner, President or General Manager >> 4. Email address of number 3 above >> 5. Phone number of dealership If you really want that data, and aren't just hacking, buy it. There are data brokers that will sell it to you. D&B, FindTheCompany, Infot, etc. Sounds like you want to spam. Don't. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: What version of glibc is Python using?
On 10/12/2013 1:28 PM, Ian Kelly wrote: > Reading the docs more closely, I think that the function is actually > working as intended. It says that it determines "the libc version > against which the file executable (defaults to the Python interpreter) > is linked" -- or in other words, the minimum compatible libc version, > NOT the libc version that is currently loaded. A strange interpretation. > So I think that a patch to replace this with gnu_get_libc_version() > should be rejected, since it would change the documented behavior of > the function. It may be worth considering adding an additional > function that matches the OP's expectations, but since it would just > be a simple ctypes wrapper it is probably best done by the user. Ah, the apologist approach. The documentation is badly written. The next line, "Note that this function has intimate knowledge of how different libc versions add symbols to the executable is probably only usable for executables compiled using gcc" isn't even a sentence. The documentation needs to be updated. Please submit a patch. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: What version of glibc is Python using?
On 10/12/2013 4:43 AM, Ian Kelly wrote: > On Sat, Oct 12, 2013 at 2:46 AM, Terry Reedy wrote: >> On 10/12/2013 3:53 AM, Christian Gollwitzer wrote: >>> >>> That function is really bogus. It states itself, that it has "intimate >>> knowledge of how different libc versions add symbols to the executable >>> and thus is probably only useable for executables compiled using gcc" >>> which is just another way of saying "it'll become outdated and broken >>> soon". It's not even done by reading the symbol table, it opens the >>> binary and matches a RE *shocked* I would have expected such hacks in a >>> shell script. >>> >>> glibc has a function for this: >>> >>> gnu_get_libc_version () >>> >>> which should be used. >> >> >> So *please* submit a patch with explanation. > > Easier said than done. The module is currently written in pure > Python, and the comment "Note: Please keep this module compatible to > Python 1.5.2" would appear to rule out the use of ctypes to call the > glibc function. I wonder though whether that comment is really still > appropriate. What a mess. It only "works" on Linux, it only works with GCC, and there it returns bogus results. Amusingly, there was a fix in 2011 to speed up platform.libc_ver () by having it read bigger blocks: http://code.activestate.com/lists/python-checkins/100109/ It still got the wrong answer, but it's faster. There's a bug report that it doesn't work right on Solaris: http://comments.gmane.org/gmane.comp.python.gis/870 It fails on Cygwin ("wontfix") http://bugs.python.org/issue928297 The result under GenToo is bogus: http://archives.gentoo.org/gentoo-user/msg_b676eccb5fc00cb051b7423db1b5a9f7.xml There are several programs which fetch this info and display it, or send it in with crash reports, but I haven't found any that actually use the result for anything. I'd suggest deprecating it and documenting that. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: What version of glibc is Python using?
On 10/11/2013 11:50 PM, Christian Gollwitzer wrote: > Am 12.10.13 08:34, schrieb John Nagle: >> I'm trying to find out which version of glibc Python is using. >> I need a fix that went into glibc 2.10 back in 2009. >> (http://udrepper.livejournal.com/20948.html) >> >> So I try the recommended way to do this, on a CentOS server: >> >> /usr/local/bin/python2.7 >> Python 2.7.2 (default, Jan 18 2012, 10:47:23) >> [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import platform >>>>> platform.libc_ver() >> ('glibc', '2.3') > > Try > > ldd /usr/local/bin/python2.7 > > Then execute the reported libc.so, which gives you some information. > > Christian > Thanks for the quick reply. That returned: /lib64/libc.so.6 GNU C Library stable release version 2.12, by Roland McGrath et al. Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.4.6 20110731 (Red Hat 4.4.6-3). Compiled on a Linux 2.6.32 system on 2011-12-06. Available extensions: The C stubs add-on version 2.1.2. crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B RT using linux kernel aio libc ABIs: UNIQUE IFUNC For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. Much more helpful. I have a good version of libc, and can now work on my DNS resolver problem. Why is the info from "plaform.libc_ver()" so bogus? John Nagle -- https://mail.python.org/mailman/listinfo/python-list
What version of glibc is Python using?
I'm trying to find out which version of glibc Python is using. I need a fix that went into glibc 2.10 back in 2009. (http://udrepper.livejournal.com/20948.html) So I try the recommended way to do this, on a CentOS server: /usr/local/bin/python2.7 Python 2.7.2 (default, Jan 18 2012, 10:47:23) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import platform >>> platform.libc_ver() ('glibc', '2.3') This is telling me that the Python distribution built in 2012, with a version of GCC released April 16, 2011, is using glibc 2.3, released in October 2002. That can't be right. I tried this on a different Linux machine, a desktop running Ubuntu 12.04 LTS: Python 2.7.3 (defualt, April 10 2013, 06:20:15) [GCC 4.6.3] on linux2 ('glibc', '2.7') That version of glibc is from October 2007. Where are these ancient versions coming from? They're way out of sync with the GCC version. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Re: Applying 4x4 transformation to 3-element vector with numpy
On 10/8/2013 10:36 PM, Christian Gollwitzer wrote: > Dear John, > > Am 09.10.13 07:28, schrieb John Nagle: >> This is the basic transformation of 3D graphics. Take >> a 3D point, make it 4D by adding a 1 on the end, multiply >> by a transformation matrix to get a new 4-element vector, >> discard the last element. >> >> Is there some way to do that in numpy without >> adding the extra element and then discarding it? >> > > if you can discard the last element, the matrix has a special structure: > It is an affine transform, where the last row is unity, and it can be > rewritten as > > A*x+b > > where A is the 3x3 upper left submatrix and b is the column vector. You > can do this by simple slicing - with C as the 4x4 matrix it is something > like > > dot(C[0:3, 0:3], x) + C[3, 0:3] > > (untested, you need to check if I got the indices right) > > *IF* however, your transform is perspective, then this is incorrect - > you must divide the result vector by the last element before discarding > it, if it is a 3D-point. For a 3D-vector (enhanced by a 0) you might > still find a shortcut. I only need affine transformations. This is just moving the coordinate system of a point, not perspective rendering. I have to do this for a lot of points, and I'm hoping numpy has some way to do this without generating extra garbage on the way in and the way out. I've done this before in C++. John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Applying 4x4 transformation to 3-element vector with numpy
This is the basic transformation of 3D graphics. Take a 3D point, make it 4D by adding a 1 on the end, multiply by a transformation matrix to get a new 4-element vector, discard the last element. Is there some way to do that in numpy without adding the extra element and then discarding it? John Nagle -- https://mail.python.org/mailman/listinfo/python-list
Python FTP timeout value not effective
I'm reading files from an FTP server at the U.S. Securities and Exchange Commission. This code has been running successfully for years. Recently, they imposed a consistent connection delay of 20 seconds at FTP connection, presumably because they're having some denial of service attack. Python 2.7 urllib2 doesn't seem to use the timeout specified. After 20 seconds, it gives up and times out. Here's the traceback: Internal error in EDGAR update: File "./edgar/edgarnetutil.py", line 53, in urlopen return(urllib2.urlopen(url,timeout=timeout)) File "/opt/python27/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/opt/python27/lib/python2.7/urllib2.py", line 394, in open response = self._open(req, data) File "/opt/python27/lib/python2.7/urllib2.py", line 412, in _open '_open', req) File "/opt/python27/lib/python2.7/urllib2.py", line 372, in _call_chain result = func(*args) File "/opt/python27/lib/python2.7/urllib2.py", line 1379, in ftp_open fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout) File "/opt/python27/lib/python2.7/urllib2.py", line 1400, in connect_ftp fw = ftpwrapper(user, passwd, host, port, dirs, timeout) File "/opt/python27/lib/python2.7/urllib.py", line 860, in __init__ self.init() File "/opt/python27/lib/python2.7/urllib.py", line 866, in init self.ftp.connect(self.host, self.port, self.timeout) File "/opt/python27/lib/python2.7/ftplib.py", line 132, in connect self.sock = socket.create_connection((self.host, self.port), self.timeout) File "/opt/python27/lib/python2.7/socket.py", line 571, in create_connection raise err URLError: Periodic update completed in 21.1 seconds. -- Here's the relevant code: TIMEOUTSECS = 60## give up waiting for server after 60 seconds ... def urlopen(url,timeout=TIMEOUTSECS) : if url.endswith(".gz") :# gzipped file, must decompress first nd = urllib2.urlopen(url,timeout=timeout) # get connection ... # (NOT .gz FILE, DOESN'T TAKE THIS PATH) else : return(urllib2.urlopen(url,timeout=timeout)) # (OPEN FAILS) TIMEOUTSECS used to be 20 seconds, and I increased it to 60. It didn't help. This isn't an OS problem. The above traceback was on a Linux system. On Windows 7, it fails with "URLError: " But in both cases, the command line FTP client will work, after a consistent 20 second delay before the login prompt. So the Python timeout parameter isn't working. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: [RELEASED] Python 2.7.5
On 5/15/2013 9:19 PM, Benjamin Peterson wrote: > It is my greatest pleasure to announce the release of Python 2.7.5. > > 2.7.5 is the latest maintenance release in the Python 2.7 series. Thanks very much. It's important that Python 2.x be maintained. 3.x is a different language, with different libraries, and lots of things that still don't work. Many old applications will never be converted. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Getting USB volume serial number from inserted device on OSX
On 4/2/2013 3:18 PM, Sven wrote: > Hello, > > I am using Python 2.7 with pyobjc on Lion and NSNotification center to > monitor any inserted USB volumes. This works fine. > > I've also got some ideas how to get a device's serial number, but these > involve just parsing all the USB devices ('system_profiler SPUSBDataType' > command). However I'd like to specifically query the inserted device only > (the one creating the notification) rather than checking the entire USB > device list. The latter seems a little inefficient and "wrong". That would be useful to have as a portable function for all USB devices. Serial port devices are particularly annoying, because their port number is somewhat random when there's more than one, and changes on hot-plugging. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Distributing a Python program hell
I'm struggling with radio hams who are trying to get my antique Teletype program running. I hate having to write instructions like this: Installation instructions (Windows): Download and install Python 2.7 (32-bit) if not already installed. (Python 2.6 or 2.7 is required; "pyserial" will not work correctly on older versions, and "feedparser" is not supported in 3.x versions.) Install the Python module "setuptools" from the Python Package Index. (Needed by other installers. Has a Windows installer.) Install the Python module "feedparser" from Google Code. (Unpack ZiP file, run "setup.py install") Install the Python module "pyserial" from SourceForge. (Windows installer, but 32-bit only) Install the Python module "pygooglevoice" from Google Code. (Requires 7Zip to unpack the .tar.gz file. Then "setup.py install") Download "BaudotRSS" from SourceForge. (ZIP file, put in your chosen directory for this program.) Run: python baudotrss.py --help I'm thinking of switching to Go. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Unhelpful traceback
On 3/7/2013 10:42 AM, John Nagle wrote: > On 3/7/2013 5:10 AM, Dave Angel wrote: >> On 03/07/2013 01:33 AM, John Nagle wrote: >>> Here's a traceback that's not helping: >>> >> >> A bit more context would be helpful. Starting with Python version. > > Sorry, Python 2.7. The trouble comes from here: decoder = codecs.getreader('utf-8') # UTF-8 reader with decoder(infdraw,errors="replace") as infd : It's not the CSV module that's blowing up. If I just feed the raw unconverted bytes from the ZIP module into the CSV module, the CSV module runs without complaint. I've tried 'utf-8', 'ascii', and 'windows-1252' as codecs. They all blow up. 'errors="replace"' doesn't help. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Unhelpful traceback
On 3/7/2013 5:10 AM, Dave Angel wrote: > On 03/07/2013 01:33 AM, John Nagle wrote: >> >> "infdraw" is a stream from the zip module, create like this: >> >> with inzip.open(zipelt.filename,"r") as infd : > > You probably need a 'rb' rather than 'r', since the file is not ASCII. > >> self.dofilecsv(infile, infd) >> >> This works for data records that are pure ASCII, but as soon as some >> non-ASCII character comes through, it fails. No, the ZIP module gives you back the bytes you put in. "rb" is not accepted there: File "InfoCompaniesHouse.py", line 197, in dofilezip with inzip.open(zipelt.filename,"rb") as infd :# do this file File "C:\python27\lib\zipfile.py", line 872, in open raise RuntimeError, 'open() requires mode "r", "U", or "rU"' RuntimeError: open() requires mode "r", "U", or "rU" "b" for files is about end of line handling (CR LF -> LF), anyway. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Unhelpful traceback
On 3/7/2013 5:10 AM, Dave Angel wrote: > On 03/07/2013 01:33 AM, John Nagle wrote: >> Here's a traceback that's not helping: >> > > A bit more context would be helpful. Starting with Python version. Sorry, Python 2.7. > > If that isn't enough, then please give the whole context, such as where > zipelt and filename came from. And don't forget to specify Python > version. Version 3.x treats nonbinary files very differently than 2.x Here it is, with some email wrap problems. John Nagle def dofilecsv(self, infilename, infdraw) : """ Loader for Companies House company data, with files already open. """ self.logger.info('Converting "%s"' % (infilename, ))# log (pathpart, filepart) = os.path.split(infilename)# split off file part to construct outputfile) (outfile, ext) = os.path.splitext(filepart) # remove extension outfile += ".sql" # add SQL suffix outfilename = os.path.abspath(os.path.join(self.options.destdir, outfile)) # ***NEED TO INSURE UNIQUE OUTFILENAME EVEN IF DUPLICATED IN ZIP FILES*** decoder = codecs.getreader('utf-8') # UTF-8 reader with decoder(infdraw,errors="replace") as infd : with codecs.open(outfilename, encoding='utf-8', mode='w') as outfd : headerline = infd.readline()# read header line self.doheaderline(headerline) # process header line reader = csv.reader(infd, delimiter=',', quotechar='"') # CSV file for fields in reader : # read entire CSV file self.doline(outfd, fields) # copy fields self.logstats(infilename) # log statistics of this file def dofilezip(self, infilename) : """ Do a ZIP file containing CSV files. """ try : inzip = zipfile.ZipFile(infilename, "r", allowZip64=True) # try to open zipdir = inzip.infolist() # get objects in file for zipelt in zipdir : # for all objects in file self.logger.debug('ZIP file "%s" contains "%s".' % (infilename, zipelt.filename)) (infile, ext) = os.path.splitext(zipelt.filename) # remove extension if ext.lower() == ".csv" : # if a CSV file with inzip.open(zipelt.filename,"r") as infd : # do this file self.dofilecsv(infile, infd)# as a CSV file else : self.logger.error('Non-CSV file in ZIP file: "%s"' % (zipelt.filename,)) self.errorcount += 1# tally except zipfile.BadZipfile as message : # if trouble self.logger.error('Bad ZIP file: "%s"' % (infilename,)) # note trouble self.errorcount += 1# tally def dofile(self, infilename) : """ Loader for Companies House company data """ (sink, ext) = os.path.splitext(infilename) # get extension if ext == ".zip" : # if .ZIP file self.dofilezip(infilename) # do ZIP file elif ext == ".csv" : self.logger.info('Converting "%s"' % (infilename,))# log with open(infilename, "rb") as infd : self.dofilecsv(infilename, infd)# do self.logstats(infilename) # log statistics of this file else : self.logger.error('File of unexpected type (not .csv or .zip): %s ' % (infilename,)) self.errorcount += 1 -- http://mail.python.org/mailman/listinfo/python-list
Unhelpful traceback
Here's a traceback that's not helping: Traceback (most recent call last): File "InfoCompaniesHouse.py", line 255, in main() File "InfoCompaniesHouse.py", line 251, in main loader.dofile(infile) # load this file File "InfoCompaniesHouse.py", line 213, in dofile self.dofilezip(infilename) # do ZIP file File "InfoCompaniesHouse.py", line 198, in dofilezip self.dofilecsv(infile, infd)# as a CSV file File "InfoCompaniesHouse.py", line 182, in dofilecsv for fields in reader : # read entire CSV file UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 14: ordinal not in range(128) This is wierd, becuase "for fields in reader" isn't directly doing a decode. That's further down somewhere, and the backtrace didn't tell me where. The program is converting some .CSV files that come packaged in .ZIP files. The files are big, so rather than expanding them, they're read directly from the ZIP files and processed through the ZIP and CSV modules. Here's the code that's causing the error above: decoder = codecs.getreader('utf-8') with decoder(infdraw,errors="replace") as infd : with codecs.open(outfilename, encoding='utf-8', mode='w') as outfd : headerline = infd.readline() self.doheaderline(headerline) reader = csv.reader(infd, delimiter=',', quotechar='"') for fields in reader : pass Normally, the "pass" is a call to something that uses the data, but for test purposes, I put a "pass" in there. It still fails. With that "pass", nothing is ever written to the output file, and no "encoding" should be taking place. "infdraw" is a stream from the zip module, create like this: with inzip.open(zipelt.filename,"r") as infd : self.dofilecsv(infile, infd) This works for data records that are pure ASCII, but as soon as some non-ASCII character comes through, it fails. Where is the error being generated? I'm not seeing any place where there's a conversion to ASCII. Not even a print. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing ISO date/time strings - where did the parser go?
On 9/8/2012 5:20 PM, John Gleeson wrote: > > On 2012-09-06, at 2:34 PM, John Nagle wrote: >> Yes, it should. There's no shortage of implementations. >> PyPi has four. Each has some defect. >> >> PyPi offers: >> >> iso8601 0.1.4 Simple module to parse ISO 8601 dates >> iso8601.py 0.1dev Parse utilities for iso8601 encoding. >> iso8601plus 0.1.6 Simple module to parse ISO 8601 dates >> zc.iso8601 0.2.0 ISO 8601 utility functions > > > Here are three more on PyPI you can try: > > iso-8601 0.2.3 Flexible ISO 8601 parser... > PySO8601 0.1.7 PySO8601 aims to parse any ISO 8601 date... > isodate 0.4.8An ISO 8601 date/time/duration parser and formater > > All three have been updated this year. There's another one inside feedparser, and there used to be one in the xml module. Filed issue 15873: "datetime" cannot parse ISO 8601 dates and times http://bugs.python.org/issue15873 This really should be handled in the standard library, instead of everybody rolling their own, badly. Especially since in Python 3.x, there's finally a useful "tzinfo" subclass for fixed time zone offsets. That provides a way to directly represent ISO 8601 date/time strings with offsets as "time zone aware" date time objects. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing ISO date/time strings - where did the parser go?
On 9/6/2012 12:51 PM, Paul Rubin wrote: > John Nagle writes: >> There's an iso8601 module on PyPi, but it's abandoned; it hasn't been >> updated since 2007 and has many outstanding issues. > > Hmm, I have some code that uses ISO date/time strings and just checked > to see how I did it, and it looks like it uses iso8601-0.1.4-py2.6.egg . > I don't remember downloading that module (I must have done it and > forgotten). I'm not sure what its outstanding issues are, as it works > ok in the limited way I use it. > > I agree that this functionality ought to be in the stdlib. Yes, it should. There's no shortage of implementations. PyPi has four. Each has some defect. PyPi offers: iso8601 0.1.4 Simple module to parse ISO 8601 dates iso8601.py 0.1dev Parse utilities for iso8601 encoding. iso8601plus 0.1.6 Simple module to parse ISO 8601 dates zc.iso8601 0.2.0ISO 8601 utility functions Unlike CPAN, PyPi has no quality control. Looking at the first one, it's in Google Code. http://code.google.com/p/pyiso8601/source/browse/trunk/iso8601/iso8601.py The first bug is at line 67. For a timestamp with a "Z" at the end, the offset should always be zero, regardless of the default timezone. See "http://en.wikipedia.org/wiki/ISO_8601";. The code uses the default time zone in that case, which is wrong. So don't call that code with your local time zone as the default; it will return bad times. Looking at the second one, it's on github: https://github.com/accellion/iso8601.py/blob/master/iso8601.py Giant regular expressions! The code to handle the offset is present, but it doesn't make the datetime object a timezone-aware object. It returns a naive object in UTC. The third one is at https://github.com/jimklo/pyiso8601plus This is a fork of the first one, because the first one is abandonware. The bug in the first one, mentioned above, isn't fixed. However, if a time zone is present, it does return an "aware" datetime object. The fourth one is the Zope version. This brings in the pytz module, which brings in the Olsen database of named time zones and their historical conversion data. None of that information is used, or necessary, to parse ISO dates and times. Somebody just wanted the pytz.fixedOffset() function, which does something datetime already does. (For all the people who keep saying "use strptime", that doesn't handle time zone offsets at all.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Parsing ISO date/time strings - where did the parser go?
In Python 2.7: I want to parse standard ISO date/time strings such as 2012-09-09T18:00:00-07:00 into Python "datetime" objects. The "datetime" object offers an output method , datetimeobj.isoformat(), but not an input parser. There ought to be classmethod datetime.fromisoformat(s) but there isn't. I'd like to avoid adding a dependency on a third party module like "dateutil". The "Working with time" section of the Python wiki is so ancient it predates "datetime", and says so. There's an iso8601 module on PyPi, but it's abandoned; it hasn't been updated since 2007 and has many outstanding issues. There are mentions of "xml.utils.iso8601.parse" in various places, but the "xml" module that comes with Python 2.7 doesn't have xml.utils. http://www.seehuhn.de/pages/pdate says: "Unfortunately there is no easy way to parse full ISO 8601 dates using the Python standard library." It looks like this was taken out of "xml" at some point, but not moved into "datetime". John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: python 6 compilation failure on RHEL
On 8/20/2012 2:50 PM, Emile van Sebille wrote: > On 8/20/2012 1:55 PM Walter Hurry said... >> On Mon, 20 Aug 2012 12:19:23 -0700, Emile van Sebille wrote: >> >>> Package dependencies. If the OP intends to install a package that >>> doesn't support other than 2.6, you install 2.6. >> >> It would be a pretty poor third party package which specified Python 2.6 >> exactly, rather than (say) "Python 2.6 or later, but not Python 3" After a thread of clueless replies, it's clear that nobody responding actually read the build log. Here's the problem: Failed to find the necessary bits to build these modules: bsddb185 dl imageop sunaudiodev What's wrong is that the Python 2.6 build script is looking for some antiquated packages that aren't in a current RHEL. Those need to be turned off. This is a known problem (see http://pythonstarter.blogspot.com/2010/08/bsddb185-sunaudiodev-python-26-ubuntu.html) but, unfortunately, the site with the patch for it (http://www.lysium.de/sw/python2.6-disable-old-modules.patch) is no longer in existence. But someone archived it on Google Code, at http://code.google.com/p/google-earth-enterprise-compliance/source/browse/trunk/googleclient/geo/earth_enterprise/src/third_party/python/python2.6-disable-old-modules.patch so if you apply that patch to the setup.py file for Python 2.6, that ought to help. You might be better off building Python 2.7, but you asked about 2.6. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: On-topic: alternate Python implementations
On 8/4/2012 7:19 PM, Steven D'Aprano wrote: > On Sat, 04 Aug 2012 18:38:33 -0700, Paul Rubin wrote: > >> Steven D'Aprano writes: >>> Runtime optimizations that target the common case, but fall back to >>> unoptimized code in the rare cases that the optimization doesn't apply, >>> offer the opportunity of big speedups for most code at the cost of >>> trivial slowdowns when you do something unusual. >> >> The problem is you can't always tell if the unusual case is being >> exercised without an expensive dynamic check, which in some cases must >> be repeated in every iteration of a critical inner loop, even though it >> turns out that the program never actually uses the unusual case. There are other approaches. PyPy uses two interpreters and a JIT compiler to handle the hard cases. When code does something unexpected to other code, the backup interpreter is used to get control out of the trouble spot so that the JIT compiler can then recompile the code. (I think; I've read the paper but haven't looked at the internals.) This is hard to implement and hard to get right. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Implicit conversion to boolean in if and while statements
On 7/15/2012 1:34 AM, Andrew Berg wrote: This has probably been discussed before, but why is there an implicit conversion to a boolean in if and while statements? if not None: print('hi') prints 'hi' since bool(None) is False. If this was discussed in a PEP, I would like a link to it. There are so many PEPs, and I wouldn't know which ones to look through. Converting 0 and 1 to False and True seems reasonable, but I don't see the point in converting other arbitrary values. Because Boolean types were an afterthought in Python. See PEP 285. If a language starts out with a Boolean type, it tends towards Pascal/Ada/Java semantics in this area. If a language backs into needing a Boolean type, as Python and C did, it tends to have the somewhat weird semantics of a language which can't quite decide what's a Boolean. C and C++ have the same problem, for exactly the same reason - boolean types were an afterthought there, too. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: How to safely maintain a status file
On 7/8/2012 2:52 PM, Christian Heimes wrote: You are contradicting yourself. Either the OS is providing a fully atomic rename or it doesn't. All POSIX compatible OS provide an atomic rename functionality that renames the file atomically or fails without loosing the target side. On POSIX OS it doesn't matter if the target exists. Rename on some file system types (particularly NFS) may not be atomic. You don't need locks or any other fancy stuff. You just need to make sure that you flush the data and metadata correctly to the disk and force a re-write of the directory inode, too. It's a standard pattern on POSIX platforms and well documented in e.g. the maildir RFC. You can use the same pattern on Windows but it doesn't work as good. That's because you're using the wrong approach. See how to use ReplaceFile under Win32: http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx Renaming files is the wrong way to synchronize a crawler. Use a database that has ACID properties, such as SQLite. Far fewer I/O operations are required for small updates. It's not the 1980s any more. I use a MySQL database to synchronize multiple processes which crawl web sites. The tables of past activity are InnoDB tables, which support transactions. The table of what's going on right now is a MEMORY table. If the database crashes, the past activity is recovered cleanly, the MEMORY table comes back empty, and all the crawler processes lose their database connections, abort, and are restarted. This allows multiple servers to coordinate through one database. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Socket code not executing properly in a thread (Windows)
On 7/8/2012 3:55 AM, Andrew D'Angelo wrote: Hi, I've been writing an IRC chatbot that an relay messages it receives as an SMS. We have no idea what IRC module you're using. As it stands, I can retrieve and parse SMSs from Google Voice perfectly The Google Voice code you have probably won't work once you have enough messages stored that Google Voice returns them on multiple pages. You have to read all the pages. If there's any significant amount of traffic, the completed messages have to be moved or deleted, or each polling cycle returns more data than the last one. Google Voice isn't a very good SMS gateway. I used to use it, but switched to Twilio (which costs, but works) two years ago. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: simpler increment of time values?
On 7/4/2012 5:29 PM, Vlastimil Brom wrote: Hi all, I'd like to ask about the possibilities to do some basic manipulation on timestamps - such as incrementing a given time (hour.minute - string) by some minutes. Very basic notion of "time" is assumed, i.e. dateless, timezone-unaware, DST-less etc. I first thought, it would be possible to just add a timedelta to a time object, but, it doesn't seem to be the case. That's correct. A datetime.time object is a time within a day. A datetime.date object is a date without a time. A datetime.datetime object contains both. You can add a datetime.timedelta object to a datetime.datetime object, which will yield a datetime.datetime object. You can also call time.time(), and get the number of seconds since the epoch (usually 1970-01-01 00:00:00 UTC). That's just a number, and you can do arithmetic on that. Adding a datetime.time to a datetime.timedelta isn't that useful. It would have to return a value error if the result crossed a day boundary. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: when "normal" parallel computations in CPython will be implemented at last?
On 7/1/2012 10:51 AM, dmitrey wrote: hi all, are there any information about upcoming availability of parallel computations in CPython without modules like multiprocessing? I mean something like parallel "for" loops, or, at least, something without forking with copying huge amounts of RAM each time and possibility to involve unpiclable data (vfork would be ok, but AFAIK it doesn't work with CPython due to GIL). AFAIK in PyPy some progress have been done ( http://morepypy.blogspot.com/2012/06/stm-with-threads.html ) Thank you in advance, D. It would be "un-Pythonic" to have real concurrency in Python. You wouldn't be able to patch code running in one thread from another thread. Some of the dynamic features of Python would break. If you want fine-grained concurrency, you need controlled isolation between concurrent tasks, so they interact only at well-defined points. That's un-Pythonic. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: PySerial could not open port COM4: [Error 5] Access is denied - please help
On 6/26/2012 9:12 PM, Adam wrote: Host OS:Ubuntu 10.04 LTS Guest OS:Windows XP Pro SP3 I am able to open port COM4 with Terminal emulator. So, what can cause PySerial to generate the following error ... C:\Wattcher>python wattcher.py Traceback (most recent call last): File "wattcher.py", line 56, in ser.open() File "C:\Python25\Lib\site-packages\serial\serialwin32.py", line 56, in open raise SerialException("could not open port %s: %s" % (self.portstr, ctypes.WinError())) serial.serialutil.SerialException: could not open port COM4: [Error 5] Access is denied. Are you trying to access serial ports from a virtual machine? Which virtual machine environment? Xen? VMware? QEmu? VirtualBox? I wouldn't expect that to work in most of those. What is "COM4", anyway? Few machines today actually have four serial ports. Is some device emulating a serial port? John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Why has python3 been created as a seperate language where there is still python2.7 ?
On 6/25/2012 1:36 AM, Stefan Behnel wrote: gmspro, 24.06.2012 05:46: Why has python3 been created as a seperate language where there is still python2.7 ? The intention of Py3 was to deliberately break backwards compatibility in order to clean up the language. The situation is not as bad as you seem to think, a huge amount of packages have been ported to Python 3 already and/or work happily with both language dialects. The syntax changes in Python 3 are a minor issue for serious programmers. The big headaches come from packages that aren't being ported to Python 3 at all. In some cases, there's a replacement package from another author that performs the same function, but has a different API. Switching packages involves debugging some new package with, probably, one developer and a tiny user community. The Python 3 to MySQL connection is still a mess. The original developer of MySQLdb doesn't want to support Python 3. There's "pymysql", but it hasn't been updated since 2010 and has a long list of unfixed bugs. There was a "MySQL-python-1.2.3-py3k" port by a third party, but the domain that hosted it ("http://www.elecmor.mooo.com/python/MySQL-python-1.2.3-py3k.zip";) is dead. There's MySQL for Python 3 (https://github.com/davispuh/MySQL-for-Python-3) but it doesn't work on Windows. MySQL Connector (https://code.launchpad.net/myconnpy) hasn't been updated in a while, but at least has some users. OurSQL has a different API than MySQLdb, and isn't quite ready for prime time yet. That's why I'm still on Python 2.7. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Internationalized domain names not working with URLopen
On 6/12/2012 11:42 PM, Andrew Berg wrote: On 6/13/2012 1:17 AM, John Nagle wrote: What does "urllib2" want? Percent escapes? Punycode? Looks like Punycode is the correct answer: https://en.wikipedia.org/wiki/Internationalized_domain_name#ToASCII_and_ToUnicode I haven't tried it, though. This is Python bug #9679: http://bugs.python.org/issue9679 It's been open for years, and the maintainers offer elaborate excuses for not fixing the problem. The socket module accepts Unicode domains, as does httplib. But urllib2, which is a front end to both, is still broken. It's failing when it constructs the HTTP headers. Domains in HTTP headers have to be in punycode. The code in stackoverflow doesn't really work right. Only the domain part of a URL should be converted to punycode. Path, port, and query parameters need to be converted to percent-encoding. (Unclear if urllib2 or httplib does this already. The documentation doesn't say.) While HTTP content can be in various character sets, the headers are currently required to be ASCII only, since the header has to be processed to determine the character code. (http://lists.w3.org/Archives/Public/ietf-http-wg/2011OctDec/0155.html) Here's a workaround, for the domain part only. # # idnaurlworkaround -- workaround for Python defect 9679 # PYTHONDEFECT9679FIXED = False # Python defect #9679 - change when fixed def idnaurlworkaround(url) : """ Convert a URL to a form the currently broken urllib2 will accept. Converts the domain to "punycode" if necessary. This is a workaround for Python defect #9679. """ if PYTHONDEFECT9679FIXED : # if defect fixed return(url) # use unmodified URL url = unicode(url) # force to Unicode (scheme, accesshost, path, params, query, fragment) = urlparse.urlparse(url)# parse URL if scheme == '' and accesshost == '' and path != '' : # bare domain accesshost = path # use path as access host path = '' # no path labels = accesshost.split('.') # split domain into sections ("labels") labels = [encodings.idna.ToASCII(w) for w in labels]# convert each label to punycode if necessary accesshost = '.'.join(labels) # reassemble domain url = urlparse.urlunparse((scheme, accesshost, path, params, query, fragment)) # reassemble url return(url) # return complete URL with punycode domain John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Internationalized domain names not working with URLopen
I'm trying to open http://пример.испытание with urllib2.urlopen(s1) in Python 2.7 on Windows 7. This produces a Unicode exception: >>> s1 u'http://\u043f\u0440\u0438\u043c\u0435\u0440.\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435' >>> fd = urllib2.urlopen(s1) Traceback (most recent call last): File "", line 1, in File "C:\python27\lib\urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "C:\python27\lib\urllib2.py", line 394, in open response = self._open(req, data) File "C:\python27\lib\urllib2.py", line 412, in _open '_open', req) File "C:\python27\lib\urllib2.py", line 372, in _call_chain result = func(*args) File "C:\python27\lib\urllib2.py", line 1199, in http_open return self.do_open(httplib.HTTPConnection, req) File "C:\python27\lib\urllib2.py", line 1168, in do_open h.request(req.get_method(), req.get_selector(), req.data, headers) File "C:\python27\lib\httplib.py", line 955, in request self._send_request(method, url, body, headers) File "C:\python27\lib\httplib.py", line 988, in _send_request self.putheader(hdr, value) File "C:\python27\lib\httplib.py", line 935, in putheader hdr = '%s: %s' % (header, '\r\n\t'.join([str(v) for v in values])) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) >>> The HTTP library is trying to put the URL in the header as ASCII. Why isn't "urllib2" handling that? What does "urllib2" want? Percent escapes? Punycode? John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: sqlite INSERT performance
On 5/30/2012 6:57 PM, duncan smith wrote: Hello, I have been attempting to speed up some code by using an sqlite database, but I'm not getting the performance gains I expected. SQLite is a "lite" database. It's good for data that's read a lot and not changed much. It's good for small data files. It's so-so for large database loads. It's terrible for a heavy load of simultaneous updates from multiple processes. However, wrapping the inserts into a transaction with BEGIN and COMMIT may help. If you have 67 columns in a table, you may be approaching the problem incorrectly. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Email Id Verification
On 5/24/2012 5:32 AM, niks wrote: Hello everyone.. I am new to asp.net... I want to use Regular Expression validator in Email id verification.. Can anyone tell me how to use this and what is the meaning of this \w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)* Not a Python question. It matches anything that looks like a mail user name followed by an @ followed by anything that looks more or less like a domain name. The domain name must contain at least one ".", and cannot end with a ".", which is not strictly correct but usually works. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On 4/5/2012 10:10 PM, Steve Howell wrote: On Apr 5, 9:59 pm, rusi wrote: On Apr 6, 6:56 am, Steve Howell wrote: You've one-upped me with 2-to-the-N backspace escaping. Early attempts at UNIX word processing, "nroff" and "troff", suffered from that problem, due to a badly designed macro system. A question in language design is whether to escape or quote. Do you write "X = %d" % (n,)) or "X = " + str(n) In general, for anything but output formatting, the second scales better. Regular expressions have a bad case of the first. For a quoted alternative to regular expression syntax, see SNOBOL or Icon. SNOBOL allows naming patterns, and those patterns can then be used as components of other patterns. SNOBOL is obsolete, but that approach produced much more readable code. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: serial module
On 5/22/2012 2:07 PM, Paul Rubin wrote: John Nagle writes: If a device is registered as /dev/ttyUSBnn, one would hope that the Linux USB insertion event handler, which assigns that name, determined that the device was a serial port emulator. Unfortunately, the USB standard device classes (http://www.usb.org/developers/defined_class) don't have "serial port emulator" as a standardized device. So there's more variation in this area than in keyboards, mice, or storage devices. Hmm, I've been using USB-to-serial adapters and so far they've worked just fine. I plug the USB end of adapter into a Ubuntu box, see /dev/ttyUSB* appear, plug the serial end into the external serial device, and just use pyserial like with an actual serial port. I didn't realize there were issues with this. There are. See "http://wiki.debian.org/usbserial";. Because there's no standard USB class for such devices, the specific vendor ID/product ID pair has to be known to the OS. In Linux, there's a file of these, but not all USB to serial adapters are in it. In Windows, there tends to be a vendor-provided driver for each brand of USB to serial converter. This all would have been much simpler if the USB Consortium had defined a USB class for these devices, as they did for keyboards, mice, etc. However, this is not the original poster's problem. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: serial module
On 5/22/2012 8:42 AM, Grant Edwards wrote: On 2012-05-22, Albert van der Horst wrote: It is anybody's guess what they do in USB. They do exactly what they're supposed to regardless of what sort of bus is used to connect the CPU and the UART (ISA, PCI, PCI-express, USB, Ethernet, etc.). If a device is registered as /dev/ttyUSBnn, one would hope that the Linux USB insertion event handler, which assigns that name, determined that the device was a serial port emulator. Unfortunately, the USB standard device classes (http://www.usb.org/developers/defined_class) don't have "serial port emulator" as a standardized device. So there's more variation in this area than in keyboards, mice, or storage devices. The best answers is probably that it depends on the whim of whoever implements the usb device. It does not depend on anybody's whim. The meaning of those parameters is well-defined. Certainly this stuff is system dependant, No, it isn't. It is, a little. There's a problem with the way Linux does serial ports. The only speeds allowed are the ones nailed into the kernel as named constants. This is a holdover from UNIX, which is a holdover from DEC PDP-11 serial hardware circa mid 1970s, which had 14 standard baud rates encoded in 4 bits. Really. In the Windows world, the actual baud rate is passed to the driver. Serial ports on the original IBM PC were loaded with a clock rate, so DOS worked that way. This only matters if you need non-standard baud rates. I've had to deal with that twice, for a SICK LMS LIDAR, (1,000,000 baud) and 1930s Teletype machines (45.45 baud). If you need non-standard speeds, see this: http://www.aetherltd.com/connectingusb.html If 19,200 baud is enough for you, don't worry about it. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a directory structure and modifying files automatically in Python
On 5/7/2012 9:09 PM, Steve Howell wrote: On May 7, 8:46 pm, John Nagle wrote: On 5/6/2012 9:59 PM, Paul Rubin wrote: Javierwrites: Or not... Using directories may be a way to do rapid prototyping, and check quickly how things are going internally, without needing to resort to complex database interfaces. dbm and shelve are extremely simple to use. Using the file system for a million item db is ridiculous even for prototyping. Right. Steve Bellovin wrote that back when UNIX didn't have any database programs, let alone free ones. It's kind of sad that the Unix file system doesn't serve as an effective key-value store at any kind of nontrivial scale. It would simplify a lot of programming if filenames were keys and file contents were values. You don't want to go there in a file system. Some people I know tried that around 1970. "A bit is a file. An ordered collection of files is a file". Didn't work out. There are file models other than the UNIX one. Many older systems had file versioning. Tandem built their file system on top of their distributed, redundant database system. There are backup systems where the name of the file is its hash, allowing elimination of duplicates. Most of the "free online storage" sites do that. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a directory structure and modifying files automatically in Python
On 5/6/2012 9:59 PM, Paul Rubin wrote: Javier writes: Or not... Using directories may be a way to do rapid prototyping, and check quickly how things are going internally, without needing to resort to complex database interfaces. dbm and shelve are extremely simple to use. Using the file system for a million item db is ridiculous even for prototyping. Right. Steve Bellovin wrote that back when UNIX didn't have any database programs, let alone free ones. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On 5/4/2012 12:14 AM, Steve Howell wrote: On May 3, 11:59 pm, Paul Rubin wrote: Steve Howell writes: compressor = zlib.compressobj() s = compressor.compress("foobar") s += compressor.flush(zlib.Z_SYNC_FLUSH) s_start = s compressor2 = compressor.copy() That's awful. There's no point in compressing six characters with zlib. Zlib has a minimum overhead of 11 bytes. You just made the data bigger. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
"
An HTML page for a major site (http://www.chase.com) has some incorrect HTML. It contains
Re: Python SOAP library
On 5/2/2012 8:35 AM, Alec Taylor wrote: What's the best SOAP library for Python? I am creating an API converter which will be serialising to/from a variety of sources, including REST and SOAP. Relevant parsing is XML [incl. SOAP] and JSON. Would you recommend: http://code.google.com/p/soapbox/ Or suggest another? Thanks for all information, Are you implementing the client or the server? Python "Suds" is a good client-side library. It's strict SOAP; you must have a WSDL file, and the XML queries and replies must verify against the WSDL file. https://fedorahosted.org/suds/ John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a directory structure and modifying files automatically in Python
On 4/30/2012 8:19 AM, deltaquat...@gmail.com wrote: Hi, I would like to automate the following task under Linux. I need to create a set of directories such as 075 095 100 125 The directory names may be read from a text file foobar, which also contains a number corresponding to each dir, like this: 075 1.818 095 2.181 100 2.579 125 3.019 In each directory I must copy a text file input.in. This file contains two lines which need to be edited: Learn how to use a database. Creating and managing a big collection of directories to handle small data items is the wrong approach to data storage. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/28/2012 4:47 AM, Kiuhnm wrote: On 4/27/2012 17:39, Adam Skutt wrote: On Apr 27, 8:07 am, Kiuhnm wrote: Useful... maybe, conceptually sound... no. Conceptually, NaN is the class of all elements which are not numbers, therefore NaN = NaN. NaN isn't really the class of all elements which aren't numbers. NaN is the result of a few specific IEEE 754 operations that cannot be computed, like 0/0, and for which there's no other reasonable substitute (e.g., infinity) for practical applications . In the real world, if we were doing the math with pen and paper, we'd stop as soon as we hit such an error. Equality is simply not defined for the operations that can produce NaN, because we don't know to perform those computations. So no, it doesn't conceptually follow that NaN = NaN, what conceptually follows is the operation is undefined because NaN causes a halt. Mathematics is more than arithmetics with real numbers. We can use FP too (we actually do that!). We can say that NaN = NaN but that's just an exception we're willing to make. We shouldn't say that the equivalence relation rules shouldn't be followed just because *sometimes* we break them. This is what programming languages ought to do if NaN is compared to anything other than a (floating-point) number: disallow the operation in the first place or toss an exception. If you do a signaling floating point comparison on IEEE floating point numbers, you do get an exception. On some FPUs, though, signaling operations are slower. On superscalar CPUs, exact floating point exceptions are tough to implement. They are done right on x86 machines, mostly for backwards compatibility. This requires an elaborate "retirement unit" to unwind the state of the CPU after a floating point exception. DEC Alphas didn't have that; SPARC and MIPS machines varied by model. ARM machines in their better modes do have that. Most game console FPUs do not have a full IEEE implementation. Proper language support for floating point exceptions varies with the platform. Microsoft C++ on Windows does support getting it right. (I had to deal with this once in a physics engine, where an overflow or a NaN merely indicated that a shorter time step was required.) But even there, it's an OS exception, like a signal, not a language-level exception. Other than Ada, which requires it, few languages handle such exceptions as language level exceptions. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/28/2012 1:04 PM, Paul Rubin wrote: Roy Smith writes: I agree that application-level name cacheing is "wrong", but sometimes doing it the wrong way just makes sense. I could whip up a simple cacheing wrapper around getaddrinfo() in 5 minutes. Depending on the environment (both technology and bureaucracy), getting a cacheing nameserver installed might take anywhere from 5 minutes to a few days to ... IMHO this really isn't one of those times. The in-app wrapper would only be usable to just that process, and we already know that the OP has multiple processes running the same app on the same machine. They would benefit from being able to share the cache, so now your wrapper gets more complicated. If it's not a nameserver then it's something that fills in for one. And then, since the application appears to be a large scale web spider, it probably wants to run on a cluster, and the cache should be shared across all the machines. So you really probably want an industrial strength nameserver with a big persistent cache, and maybe a smaller local cache because of high locality when crawling specific sites, etc. Each process is analyzing one web site, and has its own cache. Once the site is analyzed, which usually takes about a minute, the cache disappears. Multiple threads are reading multiple pages from the web site during that time. A local cache is enough to fix the huge overhead problem of doing a DNS lookup for every link found. One site with a vast number of links took over 10 hours to analyze before this fix; now it takes about four minutes. That solved the problem. We can probably get an additional minor performance boost with a real local DNS daemon, and will probably configure one. We recently changed servers from Red Hat to CentOS, and management from CPanel to Webmin. Before the change, we had a local DNS daemon with cacheing, so we didn't have this problem. Webmin's defaults tend to be on the minimal side. The DNS information is used mostly to help decide whether two URLs actually point to the same IP address, as part of deciding whether a link is on-site or off-site. Most of those links will never be read. We're not crawling the entire site, just looking at likely pages to find the name and address of the business behind the site. (It's part of our "Know who you're dealing with" system, SiteTruth.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/27/2012 9:55 PM, Paul Rubin wrote: John Nagle writes: I may do that to prevent the stall. But the real problem was all those DNS requests. Parallizing them wouldn't help much when it took hours to grind through them all. True dat. But building a DNS cache into the application seems like a kludge. Unless the number of requests is insane, running a caching nameserver on the local box seems cleaner. I know. When I have a bit more time, I'll figure out why CentOS 5 and Webmin didn't set up a caching DNS resolver by default. Sometimes the number of requests IS insane. When the system hits a page with a thousand links, it has to resolve all of them. (Beyond a thousand links, we classify it as link spam and stop. The record so far is a page with over 10,000 links.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/27/2012 9:20 PM, Paul Rubin wrote: John Nagle writes: The code that stored them looked them up with "getaddrinfo()", and did this while a lock was set. Don't do that!! Added a local cache in the program to prevent this. Performance much improved. Better to release the lock while the getaddrinfo is running, if you can. I may do that to prevent the stall. But the real problem was all those DNS requests. Parallizing them wouldn't help much when it took hours to grind through them all. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/27/2012 6:25 PM, Adam Skutt wrote: On Apr 27, 2:54 pm, John Nagle wrote: I have a multi-threaded CPython program, which has up to four threads. One thread is simply a wait loop monitoring the other three and waiting for them to finish, so it can give them more work to do. When the work threads, which read web pages and then parse them, are compute-bound, I've had the monitoring thread starved of CPU time for as long as 120 seconds. How exactly are you determining that this is the case? Found the problem. The threads, after doing their compute intensive work of examining pages, stored some URLs they'd found. The code that stored them looked them up with "getaddrinfo()", and did this while a lock was set. On CentOS, "getaddrinfo()" at the glibc level doesn't always cache locally (ref https://bugzilla.redhat.com/show_bug.cgi?id=576801). Python doesn't cache either. So huge numbers of DNS requests were being made. For some pages being scanned, many of the domains required accessing a rather slow DNS server. The combination of thousands of instances of the same domain, a slow DNS server, and no caching slowed the crawler down severely. Added a local cache in the program to prevent this. Performance much improved. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
CPython thread starvation
I have a multi-threaded CPython program, which has up to four threads. One thread is simply a wait loop monitoring the other three and waiting for them to finish, so it can give them more work to do. When the work threads, which read web pages and then parse them, are compute-bound, I've had the monitoring thread starved of CPU time for as long as 120 seconds. It's sleeping for 0.5 seconds, then checking on the other threads and for new work do to, so the work thread isn't using much compute time. I know that the CPython thread dispatcher sucks, but I didn't realize it sucked that bad. Is there a preference for running threads at the head of the list (like UNIX, circa 1979) or something like that? (And yes, I know about "multiprocessing". These threads are already in one of several service processes. I don't want to launch even more copies of the Python interpreter. The threads are usually I/O bound, but when they hit unusually long web pages, they go compute-bound during parsing.) Setting "sys.setcheckinterval" from the default to 1 seems to have little effect. This is on Windows 7. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/26/2012 4:45 AM, Adam Skutt wrote: On Apr 26, 1:48 am, John Nagle wrote: On 4/25/2012 5:01 PM, Steven D'Aprano wrote: On Wed, 25 Apr 2012 13:49:24 -0700, Adam Skutt wrote: Though, maybe it's better to use a different keyword than 'is' though, due to the plain English connotations of the term; I like 'sameobj' personally, for whatever little it matters. Really, I think taking away the 'is' operator altogether is better, so the only way to test identity is: id(x) == id(y) Four reasons why that's a bad idea: 1) The "is" operator is fast, because it can be implemented directly by the interpreter as a simple pointer comparison (or equivalent). This assumes that everything is, internally, an object. In CPython, that's the case, because Python is a naive interpreter and everything, including numbers, is "boxed". That's not true of PyPy or Shed Skin. So does "is" have to force the creation of a temporary boxed object? That's what C# does AFAIK. Java defines '==' as value comparison for primitives and '==' as identity comparison for objects, but I don't exactly know how one would do that in Python. I would suggest that "is" raise ValueError for the ambiguous cases. If both operands are immutable, "is" should raise ValueError. That's the case where the internal representation of immutables shows through. If this breaks a program, it was broken anyway. It will catch bad comparisons like if x is 1000 : ... which is implementation dependent. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/25/2012 5:01 PM, Steven D'Aprano wrote: On Wed, 25 Apr 2012 13:49:24 -0700, Adam Skutt wrote: Though, maybe it's better to use a different keyword than 'is' though, due to the plain English connotations of the term; I like 'sameobj' personally, for whatever little it matters. Really, I think taking away the 'is' operator altogether is better, so the only way to test identity is: id(x) == id(y) Four reasons why that's a bad idea: 1) The "is" operator is fast, because it can be implemented directly by the interpreter as a simple pointer comparison (or equivalent). This assumes that everything is, internally, an object. In CPython, that's the case, because Python is a naive interpreter and everything, including numbers, is "boxed". That's not true of PyPy or Shed Skin. So does "is" have to force the creation of a temporary boxed object? The concept of "object" vs. the implementation of objects is one reason you don't necessarily want to expose the implementation. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/22/2012 9:34 PM, Steven D'Aprano wrote: On Sun, 22 Apr 2012 12:43:36 -0700, John Nagle wrote: On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote: On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote: I believe it says somewhere in the Python docs that it's undefined and implementation-dependent whether two identical expressions have the same identity when the result of each is immutable Bad design. Where "is" is ill-defined, it should raise ValueError. "is" is never ill-defined. "is" always, without exception, returns True if the two operands are the same object, and False if they are not. This is literally the simplest operator in Python. John, you've been using Python for long enough that you should know this. I can only guess that you are trolling, although I can't imagine why. Because the language definition should not be what CPython does. As PyPy advances, we need to move beyond that. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/22/2012 3:17 PM, John Roth wrote: On Sunday, April 22, 2012 1:43:36 PM UTC-6, John Nagle wrote: On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote: On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote: I believe it says somewhere in the Python docs that it's undefined and implementation-dependent whether two identical expressions have the same identity when the result of each is immutable Bad design. Where "is" is ill-defined, it should raise ValueError. A worse example, one which is very implementation-dependent: http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers a = 256 b = 256 a is b True # this is an expected result a = 257 b = 257 a is b False Operator "is" should be be an error between immutables unless one is a built-in constant. ("True" and "False" should be made hard constants, like "None". You can't assign to None, but you can assign to True, usually with unwanted results. It's not clear why True and False weren't locked down when None was.) John Nagle Three points. First, since there's no obvious way of telling whether an arbitrary user-created object is immutable, trying to make "is" fail in that case would be a major change to the language. If a program fails because such a comparison becomes invalid, it was broken anyway. The idea was borrowed from LISP, which has both "eq" (pointer equality) and and "equals" (compared equality). It made somewhat more sense in the early days of LISP, when the underlying representation of everything was well defined. Second: the definition of "is" states that it determines whether two objects are the same object; this has nothing to do with mutability or immutability. The id([]) == id([]) thing is a place where cPython's implementation is showing through. It won't work that way in any implementation that uses garbage collection and object compaction. I think Jython does it that way, I'm not sure about either IronPython or PyPy. That represents a flaw in the language design - the unexpected exposure of an implementation dependency. Third: True and False are reserved names and cannot be assigned to in the 3.x series. They weren't locked down in the 2.x series when they were introduced because of backward compatibility. That's one of the standard language designer fuckups. Somebody starts out thinking that 0 and 1 don't have to be distinguished from False and True. When they discover that they do, the backwards compatibility sucks. C still suffers from this. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: global vars across modules
On 4/22/2012 12:39 PM, mambokn...@gmail.com wrote: Question: How can I access to the global 'a' in file_2 without resorting to the whole name 'file_1.a' ? Actually, it's better to use the fully qualified name "file_1.a". Using "import *" brings in everything in the other module, which often results in a name clash. Just do import file_1 and, if desired localnamefora = file_1.a John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote: On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote: I believe it says somewhere in the Python docs that it's undefined and implementation-dependent whether two identical expressions have the same identity when the result of each is immutable Bad design. Where "is" is ill-defined, it should raise ValueError. A worse example, one which is very implementation-dependent: http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers >>> a = 256 >>> b = 256 >>> a is b True # this is an expected result >>> a = 257 >>> b = 257 >>> a is b False Operator "is" should be be an error between immutables unless one is a built-in constant. ("True" and "False" should be made hard constants, like "None". You can't assign to None, but you can assign to True, usually with unwanted results. It's not clear why True and False weren't locked down when None was.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Deep merge two dicts?
On 4/12/2012 10:41 AM, Roy Smith wrote: Is there a simple way to deep merge two dicts? I'm looking for Perl's Hash::Merge (http://search.cpan.org/~dmuey/Hash-Merge-0.12/Merge.pm) in Python. def dmerge(a, b) : for k in a : v = a[k] if isinstance(v, dict) and k in b: dmerge(v, b[k]) a.update(b) -- http://mail.python.org/mailman/listinfo/python-list
Re: python module development workflow
On 4/11/2012 1:04 PM, Miki Tebeka wrote: Could any expert suggest an authoritative and complete guide for developing python modules? Thanks! I'd start with http://docs.python.org/distutils/index.html Make sure that python setup.py build python setup.py install works. Don't use the "rotten egg" distribution system. (http://packages.python.org/distribute/easy_install.html) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Donald E. Knuth in Python, cont'd
On 4/11/2012 6:03 AM, Antti J Ylikoski wrote: I wrote about a straightforward way to program D. E. Knuth in Python, and received an excellent communcation about programming Deterministic Finite Automata (Finite State Machines) in Python. The following stems from my Knuth in Python programming exercises, according to that very good communication. (By Roy Smith.) I'm in the process of delving carefully into Knuth's brilliant and voluminous work The Art of Computer Programming, Parts 1--3 plus the Fascicles in Part 4 -- the back cover of Part 1 reads: "If you think you're a really good programmer -- read [Knuth's] Art of Computer Programming... You should definitely send me a résumé if you can read the whole thing." -- Bill Gates. (Microsoft may in the future receive some e-mail from me.) You don't need those books as much as you used to. You don't have to write collections, hash tables, and sorts much any more. Those are solved problems and there are good libraries. Most of the basics are built into Python. Serious programmers should read those books, much as they should read von Neumann's "First Draft of a Report on the EDVAC", for background on how things work down at the bottom. But they're no longer essential desk references for most programmers. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Gotcha's?
On 4/8/2012 10:55 AM, Miki Tebeka wrote: 8. Opening a URL can result in an unexpected prompt on standard input if the URL has authentication. This can stall servers. Can you give an example? I don't think anything in the standard library does that. It's in "urllib". See http://docs.python.org/library/urllib.html "When performing basic authentication, a FancyURLopener instance calls its prompt_user_passwd() method. The default implementation asks the users for the required information on the controlling terminal. A subclass may override this method to support more appropriate behavior if needed." A related "gotcha" is knowing that "urllib" sucks and you should use "urllib2". John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Gotcha's?
On 4/4/2012 3:34 PM, Miki Tebeka wrote: Greetings, I'm going to give a "Python Gotcha's" talk at work. If you have an interesting/common "Gotcha" (warts/dark corners ...) please share. (Note that I want over http://wiki.python.org/moin/PythonWarts already). Thanks, -- Miki A few Python "gotchas": 1. Nobody is really in charge of third party packages. In the Perl world, there's a central repository, CPAN, and quality control. Python's "pypi" is just a collection of links. Many major packages are maintained by one person, and if they lose interest, the package dies. 2. C extensions are closely tied to the exact version of CPython you're using, and finding a properly built version may be difficult. 3. "eggs". The "distutils" system has certain assumptions built into it about where things go, and tends to fail in obscure ways. There's no uniform way to distribute a package. 4. The syntax for expression-IF is just weird. 5. "+" as concatenation. This leads to strange numerical semantics, such as (1,2) + (3,4) is (1,2,3,4). But, for "numarray" arrays, "+" does addition. What does a mixed mode expression of a numarray and a tuple do? Guess. 5. It's really hard to tell what's messing with the attributes of a class, since anything can store into anything. This creates debugging problems. 6. Multiple inheritance is a mess. Especially "super". 7. Using attributes as dictionaries can backfire. The syntax of attributes is limited. So turning XML or HTML structures into Python objects creates problems. 8. Opening a URL can result in an unexpected prompt on standard input if the URL has authentication. This can stall servers. 9. Some libraries aren't thread-safe. Guess which ones. 10. Python 3 isn't upward compatible with Python 2. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 4/2/2012 6:53 PM, John Nagle wrote: On 4/1/2012 1:41 PM, John Nagle wrote: On 4/1/2012 9:26 AM, Michael Torrie wrote: On 03/31/2012 04:58 PM, John Nagle wrote: Removed all "search" and "domain" entries from /etc/resolve.conf It's a design bug in glibc. I just submitted a bug report. http://sourceware.org/bugzilla/show_bug.cgi?id=13935 The same bug is in "dnspython". Submitted a bug report there, too. https://github.com/rthalley/dnspython/issues/6 John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to structure data for efficient searching
On 3/28/2012 11:39 AM, larry.mart...@gmail.com wrote: I have the following use case: I have a set of data that is contains 3 fields, K1, K2 and a timestamp. There are duplicates in the data set, and they all have to processed. Then I have another set of data with 4 fields: K3, K4, K5, and a timestamp. There are also duplicates in that data set, and they also all have to be processed. I need to find all the items in the second data set where K1==K3 and K2==K4 and the 2 timestamps are within 20 seconds of each other. I have this working, but the way I did it seems very inefficient - I simply put the data in 2 arrays (as tuples) and then walked through the entire second data set once for each item in the first data set, looking for matches. Is there a better, more efficient way I could have done this? How big are the data sets? Millions of entries? Billions? Trillions? Will all the data fit in memory, or will this need files or a database. In-memory, it's not hard. First, decide which data set is smaller. That one gets a dictionary keyed by K1 or K3, with each entry being a list of tuples. Then go through the other data set linearly. You can also sort one database by K1, the other by K3, and match. Then take the matches, sort by K2 and K4, and match again. Sort the remaining matches by timestamp and pull the ones within the threshold. Or you can load all the data into a database with a query optimizer, like MySQL, and let it figure out, based on the index sizes, how to do the join. All of these approaches are roughly O(N log N), which beats the O(N^2) approach you have now. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 4/1/2012 1:41 PM, John Nagle wrote: On 4/1/2012 9:26 AM, Michael Torrie wrote: On 03/31/2012 04:58 PM, John Nagle wrote: Removed all "search" and "domain" entries from /etc/resolve.conf It's a design bug in glibc. I just submitted a bug report. http://sourceware.org/bugzilla/show_bug.cgi?id=13935 It only appears if you have a machine with a two-component domain name ending in ".com" as the actual machine name. Most hosting services generate some long arbitrary name as the primary name, but I happen to have a server set up as "companyname.com". The default rule for looking up domains in glibc is that the "domain" is everything after the FIRST ".". Failed lookups are retried with that "domain" appended. The idea, back in the 1980s, was that if you're on "foo.bigcompany.com", and look up "bar", it's looked up as "bar.bigcompany.com". This idea backfires when the actual hostname only has two components, and the search just appends ".com". There is a "com.com" domain, and this gets them traffic. They exploit this to send you (where else) to an ad-heavy page. Try "python.com.com", for example,and you'll get an ad for a Java database. The workaround in Python is to add the AI_CANONNAME flag to getaddrinfo calls, then check that the returned domain name matches the one put in. That workaround won't work for some domains. For example, >>> socket.getaddrinfo(s,"http",0,0,socket.SOL_TCP,socket.AI_CANONNAME) [(2, 1, 6, 'orig-10005.themarker.cotcdn.net', ('208.93.137.80', 80))] Nor will addiing options to /etc/resolv.conf work well, because that file is overwritten by some system administration programs. I may have to bring in "dnspython" to get a reliable DNS lookup. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Will MySQL ever be supported for Python 3.x?
On 3/31/2012 10:54 PM, Tim Roberts wrote: John Nagle wrote: On 3/30/2012 2:32 PM, Irmen de Jong wrote: Try Oursql instead http://packages.python.org/oursql/ "oursql is a new set of MySQL bindings for python 2.4+, including python 3.x" Not even close to being compatible with existing code. Every SQL statement has to be rewritten, with the parameters expressed differently. It's a good approach, but very incompatible. Those changes can be automated, given an adequate editor. "Oursql" is a far better product than the primitive MySQLdb wrapper. It is worth the trouble. It's an interesting approach. As it matures, and a few big sites use it. it will become worth looking at. The emphasis on server-side buffering seems strange. Are there benchmarks indicating this is worth doing? Does it keep transactions locked longer? This bug report https://answers.launchpad.net/oursql/+question/191256 indicates a performance problem. I'd expect server side buffering to slow things down. Usually, you want to drain results out of the server as fast as possible, then close out the command, releasing server resources and locks. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 4/1/2012 9:26 AM, Michael Torrie wrote: On 03/31/2012 04:58 PM, John Nagle wrote: If you can make this happen, report back the CentOS version and the library version, please. CentOS release 6.2 (Final) glibc-2.12-1.47.el6_2.9.x86_64 example does not ping example.com does not resolve to example.com.com Removed all "search" and "domain" entries from /etc/resolve.conf It's a design bug in glibc. I just submitted a bug report. http://sourceware.org/bugzilla/show_bug.cgi?id=13935 It only appears if you have a machine with a two-component domain name ending in ".com" as the actual machine name. Most hosting services generate some long arbitrary name as the primary name, but I happen to have a server set up as "companyname.com". The default rule for looking up domains in glibc is that the "domain" is everything after the FIRST ".". Failed lookups are retried with that "domain" appended. The idea, back in the 1980s, was that if you're on "foo.bigcompany.com", and look up "bar", it's looked up as "bar.bigcompany.com". This idea backfires when the actual hostname only has two components, and the search just appends ".com". There is a "com.com" domain, and this gets them traffic. They exploit this to send you (where else) to an ad-heavy page. Try "python.com.com", for example,and you'll get an ad for a Java database. The workaround in Python is to add the AI_CANONNAME flag to getaddrinfo calls, then check that the returned domain name matches the one put in. Good case: >>> s = "python.org" >>> socket.getaddrinfo(s, 80, 0,0, 0, socket.AI_CANONNAME) [(2, 1, 6, 'python.org', ('82.94.164.162', 80)), (2, 2, 17, '', ('82.94.164.162', 80)), (2, 3, 0, '', ('82.94.164.162', 80)), (10, 1, 6, '', ('2001:888:2000:d::a2', 80, 0, 0)), (10, 2, 17, '', ('2001:888:2000:d::a2', 80, 0, 0)), (10, 3, 0, '', ('2001:888:2000:d::a2', 80, 0, 0))] Bad case: >>> s = "noexample.com" >>> socket.getaddrinfo(s, 80, 0,0, 0, socket.AI_CANONNAME) [(2, 1, 6, 'phx1-ss-2-lb.cnet.com', ('64.30.224.112', 80)), (2, 2, 17, '', ('64.30.224.112', 80)), (2, 3, 0, '', ('64.30.224.112', 80))] Note that what went in isn't what came back. getaddrinfo has been pwned. Again, you only get this if you're on a machine whose primary host name is "something.com", with exactly two components ending in ".com". John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 3/31/2012 9:26 PM, Owen Jacobson wrote: On 2012-03-31 22:58:45 +, John Nagle said: Some versions of CentOS 6 seem to have a potential getaddrinfo exploit. See To test, try this from a command line: ping example If it fails, good. If it returns pings from "example.com", bad. The getaddrinfo code is adding ".com" to the domain. There is insufficient information in your diagnosis to make that conclusion. For example: what network configuration services (DHCP clients and whatnot, along with various desktop-mode configuration tools and services) are running? What kernel and libc versions are you running? What are the contents of /etc/nsswitch.conf? Of /etc/resolv.conf (particularly, the 'search' entries)? What do /etc/hosts, LDAP, NIS+, or other hostname services say about the names you're resolving? Does a freestanding C program that directly calls getaddrinfo and that runs in a known-good loader environment exhibit the same surprises? Name resolution is not so simple that you can conclude "getaddrinfo is misbehaving" from the behaviour of ping, or of your Python sample, alone. In any case, this seems more appropriate for a Linux or a CentOS newsgroup/mailing list than a Python one. Please do not reply to this post in comp.lang.python. -o I expected that some noob would have a reply like that. A more detailed discussion appears here: http://serverfault.com/questions/341383/possible-nxdomain-hijacking John Nagle -- http://mail.python.org/mailman/listinfo/python-list
getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
Some versions of CentOS 6 seem to have a potential getaddrinfo exploit. See To test, try this from a command line: ping example If it fails, good. If it returns pings from "example.com", bad. The getaddrinfo code is adding ".com" to the domain. If that returns pings, please try ping noexample.com There is no "noexample.com" domain in DNS. This should time out. But if you get ping replies from a CNET site, let me know. Some implementations try "noexample.com", get a NXDOMAIN error, and try again, adding ".com". This results in a ping of "noexample.com,com". "com.com" is a real domain, run by a unit of CBS, and they have their DNS set up to catch all subdomains and divert them to, inevitably, an ad-oriented junk search page. (You can view the junk page at "http://slimeball.com.com";. Replace "slimeball" with anything else you like; it will still resolve.) If you find a case where "ping noexample.com" returns a reply, then try it in Python: import socket socket.getaddrinfo("noexample.com", 80) That should return an error. If it returns the IP address of CNET's ad server, there's trouble. This isn't a problem with the upstream DNS. Usually, this sort of thing means you're using some sleazy upstream DNS provider like Comcast. That's not the case here. "host" and "nslookup" aren't confused. Only programs that use getaddrinfo, like "ping", "wget", and Python, have this ".com" appending thing. Incidentally, if you try "noexample.net", there's no problem, because the owner of "net.com" hasn't set up their DNS to exploit this. And, of course, it has nothing to do with browser toolbars. This is at a much lower level. If you can make this happen, report back the CentOS version and the library version, please. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Will MySQL ever be supported for Python 3.x?
On 3/30/2012 2:32 PM, Irmen de Jong wrote: Try Oursql instead http://packages.python.org/oursql/ "oursql is a new set of MySQL bindings for python 2.4+, including python 3.x" Not even close to being compatible with existing code. Every SQL statement has to be rewritten, with the parameters expressed differently. It's a good approach, but very incompatible. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Will MySQL ever be supported for Python 3.x?
The MySQLdb entry on SourceForge (http://sourceforge.net/projects/mysql-python/) web site still says the last supported version of Python is 2.6. PyPi says the last supported version is Python 2.5. The last download is from 2007. I realize there are unsupported fourth-party versions from other sources. (http://www.lfd.uci.edu/~gohlke/pythonlibs/) But those are just blind builds; they haven't been debugged. MySQL Connector (http://forge.mysql.com/projects/project.php?id=302) is still pre-alpha. John Nagle -- http://mail.python.org/mailman/listinfo/python-list