Re: [Python-Dev] very bad network performance
On Mon, Apr 21, 2008 at 8:10 PM, Gregory P. Smith [EMAIL PROTECTED] wrote: The 64K hunch is wrong. The system limit can be found using getsockopt(...SO_RCVBUF...). It can easily be (and often is) set to many megabytes either at a system default level or on a per socket level by the user using setsockopt. When the system default is that large, limiting by the system limit would not help the 10mb read case. but it would help in the 100mb read case. Even smaller allocations like 64K cause problems as mentioned in issue 1092502 linking to this twisted http://twistedmatrix.com/trac/ticket/1079bug. twisted's solution was to make the string object returned by a recv as short lived as possible by copying it into a StringIO. We could do the same in _fileobject.read() and readline(). this approach look reasonable to me. - Ralf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A smarter shutil.copytree ?
On Mon, Apr 21, 2008 at 2:25 AM, Steven Bethard [EMAIL PROTECTED] wrote: On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé [EMAIL PROTECTED] wrote: I have submitted a patch for review here: http://bugs.python.org/issue2663 glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories. I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like:: exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func) ? I made another draft based on a single callable argument to try out: http://bugs.python.org/file10073/shutil.copytree.filtering.patch The callable takes the src directory + its content as a list, and returns filter eligible for exclusion That makes me wonder, like Alexander said on the bug tracker: In the glob-style patterns callable, do we want to deal with absolute paths ? Tarek ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] r62342 - python/branches/py3k/Objects/bytesobject.c
Neal Norwitz wrote: I haven't seen any action on 3to2 (although I'm very behind on email). Stefan, could you try to implement some of these and report back how it works? No, sorry, that's too low a priority for me currently. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
On 2008-04-21 23:31, Martin v. Löwis wrote: This is useful when you get a hunk of data which _should_ be some sort of intelligible text from the Big Scary Internet (say, a posted web form or email message), and you want to do something useful with it (say, search the content). I don't think that should be part of the standard library. People will mistake what it tells them for certain. +1 I also think that it's better to educate people to add (correct) encoding information to their text data, rather than give them a guess mechanism... http://chardet.feedparser.org/docs/faq.html#faq.yippie chardet is based on the Mozilla algorithm and at least in my experience that algorithm doesn't work too well. The Mozilla algorithm may work for Asian encodings due to the fact that those encodings are usually also bound to a specific language (and you can then use character and word frequency analysis), but for encodings which can encode far more than just a single language (e.g. UTF-8 or Latin-1), the correct detection rate is rather low. The problem becomes completely even more difficult when leaving the normal text domain or when mixing languages in the same text, e.g. when trying to detect source code with comments using a non-ASCII encoding. The trick to just pass the text through a codec and see whether it roundtrips also doesn't necessarily help: Latin-1, for example, will always round-trip, since Latin-1 is a subset of Unicode. IMHO, more research has to be done into this area before a standard module can be added to the Python's stdlib... and who knows, perhaps we're lucky and by the time everyone is using UTF-8 anyway :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 22 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pydoc works with eggs? (python-2.5.1)
pydoc blew up when I tried to view doc for pytools module, which is an egg: pydoc -p 8082 pydoc server ready at http://localhost:8082/ Exception happened during processing of request from ('127.0.0.1', 52915) Traceback (most recent call last): File /usr/lib64/python2.5/SocketServer.py, line 222, in handle_request self.process_request(request, client_address) File /usr/lib64/python2.5/SocketServer.py, line 241, in process_request self.finish_request(request, client_address) File /usr/lib64/python2.5/SocketServer.py, line 254, in finish_request self.RequestHandlerClass(request, client_address, self) File /usr/lib64/python2.5/SocketServer.py, line 522, in __init__ self.handle() File /usr/lib64/python2.5/BaseHTTPServer.py, line 316, in handle self.handle_one_request() File /usr/lib64/python2.5/BaseHTTPServer.py, line 310, in handle_one_request method() File /usr/lib64/python2.5/pydoc.py, line 1924, in do_GET self.send_document(describe(obj), html.document(obj, path)) File /usr/lib64/python2.5/pydoc.py, line 321, in document if inspect.ismodule(object): return self.docmodule(*args) File /usr/lib64/python2.5/pydoc.py, line 672, in docmodule contents.append(self.document(value, key, name, fdict, cdict)) File /usr/lib64/python2.5/pydoc.py, line 322, in document if inspect.isclass(object): return self.docclass(*args) File /usr/lib64/python2.5/pydoc.py, line 807, in docclass lambda t: t[1] == 'method') File /usr/lib64/python2.5/pydoc.py, line 735, in spill funcs, classes, mdict, object)) File /usr/lib64/python2.5/pydoc.py, line 323, in document if inspect.isroutine(object): return self.docroutine(*args) File /usr/lib64/python2.5/pydoc.py, line 891, in docroutine getdoc(object), self.preformat, funcs, classes, methods) File /usr/lib64/python2.5/pydoc.py, line 79, in getdoc result = inspect.getdoc(object) or inspect.getcomments(object) File /usr/lib64/python2.5/inspect.py, line 521, in getcomments lines, lnum = findsource(object) File /usr/lib64/python2.5/inspect.py, line 510, in findsource if pat.match(lines[lnum]): break IndexError: list index out of range ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
IMHO, more research has to be done into this area before a standard module can be added to the Python's stdlib... and who knows, perhaps we're lucky and by the time everyone is using UTF-8 anyway :-) I walked over to our computational linguistics group and asked. This is often combined with language guessing (which uses a similar approach, but using characters instead of bytes), and apparently can usually be done with high confidence. Of course, they're usually looking at clean texts, not random stuff. I'll see if I can get some references and report back -- most of the research on this was done in the 90's. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] configure error: rm: conftest.dSYM: is a directory
On 5 Apr, 2008, at 21:17, [EMAIL PROTECTED] wrote: I just noticed this error message during configure: checking whether gcc accepts -Olimit 1500... no checking whether gcc supports ParseTuple __format__... no checking whether pthreads are available without options... yes checking whether g++ also accepts flags for thread support... no checking for ANSI C header files... rm: conftest.dSYM: is a directory rm: conftest.dSYM: is a directory yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes Note the rm: conftest.dSYM: is a directory. This occurred a few times during the configure process. Didn't cause it to conk out, but is annoying. I've looked into this issue. It is harmless and caused by an interaction between AC_TRY_RUN and gcc on leopard. Gcc generates '.dSYM' directories when linking with debugging enabled. These directories contain detached debugging information (see man dsymutil). AC_TRY_RUN tries to remove 'conftest.*' using rm, without the -r flag. The end result is an error message during configure and a 'config.dSYM' turd. AFAIK this not easily fixed without changing the definition of AC_TRY_RUN, at least not without crude hacks. Ronald Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com smime.p7s Description: S/MIME cryptographic signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
On 22-Apr-08, at 12:30 AM, Martin v. Löwis wrote: IMO, encoding estimation is something that many web programs will have to deal with Can you please explain why that is? Web programs should not normally have the need to detect the encoding; instead, it should be specified always - unless you are talking about browsers specifically, which need to support web pages that specify the encoding incorrectly. Two cases come immediately to mind: email and web forms. When a web browser POSTs data, there is no standard way of communicating which encoding it's using. There are some hints which make it easier (accept-charset attributes, the encoding used to send the page to the browser), but no guarantees. Email is a smaller problem, because it usually has a helpful content- type header, but that's no guarantee. Now, at the moment, the only data I have to support this claim is my experience with DrProject in non-English locations. If I'm the only one who has had these sorts of problems, I'll go back to Unicode for Dummies. so it might as well be built in; I would prefer the option to run `text=input.encode('guess')` (or something similar) than relying on an external dependency or worse yet using a hand-rolled algorithm. Ok, let me try differently then. Please feel free to post a patch to bugs.python.org, and let other people rip it apart. For example, I don't think it should be a codec, as I can't imagine it working on streams. As things frequently are, it seems like this is a much larger problem that I originally believed. I'll go back and take another look at the problem, then come back if new revelations appear. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A smarter shutil.copytree ?
On Tue, Apr 22, 2008 at 1:56 AM, Tarek Ziadé [EMAIL PROTECTED] wrote: On Mon, Apr 21, 2008 at 2:25 AM, Steven Bethard [EMAIL PROTECTED] wrote: On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé [EMAIL PROTECTED] wrote: I have submitted a patch for review here: http://bugs.python.org/issue2663 glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories. I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like:: exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func) I made another draft based on a single callable argument to try out: http://bugs.python.org/file10073/shutil.copytree.filtering.patch The callable takes the src directory + its content as a list, and returns filter eligible for exclusion FWIW, that looks better to me. That makes me wonder, like Alexander said on the bug tracker: In the glob-style patterns callable, do we want to deal with absolute paths ? I think that it would be okay to document that shutil.ignore_patterns() only accepts patterns matching individual filenames (not complex paths). If someone needs to do something with absolute paths, then they can write their own 'ignore' function, right? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
When a web browser POSTs data, there is no standard way of communicating which encoding it's using. That's just not true. Web browser should and do use the encoding of the web page that originally contained the form. There are some hints which make it easier (accept-charset attributes, the encoding used to send the page to the browser), but no guarantees. Not true. The latter is guaranteed (unless you assume bugs - but if you do, can you present a specific browser that has that bug?) Email is a smaller problem, because it usually has a helpful content-type header, but that's no guarantee. Then assume windows-1252. Mailers who don't use MIME for non-ASCII characters mostly died 10 years ago; those people who continue to use them likely can accept occasional moji-bake (or else they would have switched long ago). Now, at the moment, the only data I have to support this claim is my experience with DrProject in non-English locations. If I'm the only one who has had these sorts of problems, I'll go back to Unicode for Dummies. For web forms, I always encode the pages in UTF-8, and that always works. For email, I once added encoding processing to the pipermail (the mailman archiver), and that also always works. I'll go back and take another look at the problem, then come back if new revelations appear. Good luck! Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] BSDDB3
Hi Jesus, Martin v. Löwis wrote: | I think it would be helpful if you could analyze the crashes that | bsddb caused on Windows. Just go back a few revisions in the | subversion tree to reproduce the crashes. I have no MS Windows machines in my environment :-( I remember those rampant BSDDB crashes on Windows well. I brought this up with Martin at PyCon; I really don't think we can fault BSDDB here -- basically, the tests weren't cleaning up their environment in the right order, so BSDDB was getting passed completely and utterly bogus values. I *think* I managed to persuade Martin that this was indeed our fault, and we can't really hold BSDDB accountable. (My argument being that if a 3rd party app says the behaviour of a method is undefined if you pass it a null pointer, and you pass it a null pointer, and it crashes your program, it's your fault, not theirs.) Once this was addressed, the BSDDB tests ran more or less on Windows 32-bit without error. Windows x64 was another matter though -- I traced the problem down to wildly conflicting compiler and linker flags between our Python build and how we were building BSDDB (or rather how BSDDB builds out of the box on Windows). My solution was to drop our reliance on the Berkeley_DB.sln/db_static.vcproj files completely, and mimic a bsddb44 vcproj in our own pcbuild.sln, which basically meant all the BSDDB source code got built in the exact same fashion as the rest of Python. I also took this approach with sqlite3 and it's worked really well -- there have been no issues with either module since this change. I've also got bsddb45.vcproj and bsddb46.vcproj projects floating around in one of my local branches somewhere. These mimic the corresponding BSDDB projects, with the intent being that when it comes to release time for 2.6 and 3.0, we'd make a decision about which one to ship with, and then set the Python _bsddb module to use that. I should probably pick that up again... Hope this clarifies things... Regards, Trent. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3k checkin mails to python-checkins
Since a few days, checkin notifications for the 3k branch seem to be sent to both the python-checkins and the python-3000-checkins lists. Was that a deliberate decision or has some bug crept into the SVN hook? This should be fixed now. The new mailer.py had named some config options differently from the old one. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
On 22-Apr-08, at 3:31 AM, M.-A. Lemburg wrote: I don't think that should be part of the standard library. People will mistake what it tells them for certain. +1 I also think that it's better to educate people to add (correct) encoding information to their text data, rather than give them a guess mechanism... That is a fallacious alternative: the programmers that need encoding detection are not the same people who are omitting encoding information. I only have a small opinion on whether charset detection should appear in the stdlib, but I am somewhat perplexed by the arguments in this thread. I don't see how inclusion in the stdlib would make people more inclined to think that the algorithm is always correct. In terms of the need of this functionality: Martin wrote: Can you please explain why that is? Web programs should not normally have the need to detect the encoding; instead, it should be specified always - unless you are talking about browsers specifically, which need to support web pages that specify the encoding incorrectly. Any program that needs to examine the contents of documents/feeds/ whatever on the web needs to deal with incorrectly-specified encodings (which, sadly, is rather common). The set of programs of programs that need this functionality is probably the same set that needs BeautifulSoup--I think that set is larger than just browsers grin -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
[CCing python-dev again] On 2008-04-22 12:38, Greg Wilson wrote: I don't think that should be part of the standard library. People will mistake what it tells them for certain. [etc] These are all good arguments, but the fact remains that we can't control our inputs (e.g., we're archiving mail messages sent to lists managed by DrProject), and some of those inputs *don't* tell us how they're encoded. Under those circumstances, what would you recommend? I haven't done much research into this, but in general, I think it's better to: * first try to look at other characteristics of a text message, e.g. language, origin, topic, etc., * then narrow down the number of encodings which could apply, * rank them to try to avoid ambiguities and * then try to see what percentage of the text you can decode using each of the encodings in reverse ranking order (ie. more specialized encodings should be tested first, latin-1 last). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 22 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Known doctest bug with unicode?
-On [20080418 18:05], Adam Olsen ([EMAIL PROTECTED]) wrote: 4. Make doctest smarter, so that it can grab the original module's encoding. 5. Wait until 3.0, where this is hopefully fixed by making doctests use unicode by default? Getting rid of the u in front of the strings as required made Python 3 indeed run the doctests as they should. So there's a difference in behaviour between 2.x and 3.0 when it comes to this part. I guess the better behaviour would be for doctest to honour the encoding specified in the file/module? If other people agree I can see what I can to make that work. -- Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Confutatis maledictis, flammis acribus addictis... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Can you please explain why that is? Web programs should not normally have the need to detect the encoding; instead, it should be specified always - unless you are talking about browsers specifically, which need to support web pages that specify the encoding incorrectly. Any program that needs to examine the contents of documents/feeds/whatever on the web needs to deal with incorrectly-specified encodings That's not true. Most programs that need to examine the contents of a web page don't need to guess the encoding. In most such programs, the encoding can be hard-coded if the declared encoding is not correct. Most such programs *know* what page they are webscraping, or else they couldn't extract the information out of it that they want to get at. As for feeds - can you give examples of incorrectly encoded one (I don't ever use feeds, so I honestly don't know whether they are typically encoded incorrectly. I've heard they are often XML, in which case I strongly doubt they are incorrectly encoded) As for whatever - can you give specific examples? (which, sadly, is rather common). The set of programs of programs that need this functionality is probably the same set that needs BeautifulSoup--I think that set is larger than just browsers grin Again, can you give *specific* examples that are not web browsers? Programs needing BeautifulSoup may still not need encoding guessing, since they still might be able to hard-code the encoding of the web page they want to process. In any case, I'm very skeptical that a general guess encoding module would do a meaningful thing when applied to incorrectly encoded HTML pages. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] py3k: print function treats sep=None and end=None in an unintuitive way
Can anybody please point me why print('a', 'b', sep=None, end=None) should produce a b\n instead of ab? I've read http://docs.python.org/dev/3.0/library/functions.html#print, pep-3105 and some ml threads but did not find a good reason justifying such a strange behaviour. Thanks. -Alessandro Guido ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Tue, 2008-04-08 at 10:01 -0700, zooko wrote: They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said So they are planning to reinvent apt!. When I wear my sysadmin hat, eggs become a nuisance. They are not listed in the system packages; if zipped they won't work when the apache user tries to import them; easy_install can produce unexpected upgrades. The system package manager (apt or yum) is much preferred. As a developer, eggs are great. If a python module is not already available from my system packagers, easy_install will find it, get it, and install it. I waste almost no time with system administration issues while developing. Fortunately, distutils includes tools like bdist_rpm so that python modules can be packaged for easy processing by the system package manager. So once I need to switch back to a sysadmin role, I can use the system tools to install and track packages. -- Lloyd Kvam Venix Corp DLSLUG/GNHLUG library http://www.librarything.com/catalog/dlslug http://www.librarything.com/profile/dlslug http://www.librarything.com/rsshtml/recent/dlslug ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, Apr 09, 2008 at 11:37:07AM +1000, Ben Finney wrote: zooko [EMAIL PROTECTED] writes: I am skeptical that prorgammers are going to be willing to use a new database format. They already have a database -- their filesystem -- and they already have the tools to control it -- mv, rm, and PYTHONPATH. Many of them already hate the existence the easy_instlal.pth database file, and I don't see why a new database file would be any different. Moreover, many of us already have a database of *all* packages on the system, not just Python-language ones: the package database of our operating system. Adding another, parallel, database which needs separate maintenance, and only applies to Python packages, is not a step forward in such a situation. 90 % (at least) of the world does not have such database. I, and probably you, have such a very nice database. I works well, and we can choose to forget the problems our users are facing. It does not solve them though. In addition, packaging is system-specific. I recently had to learn some Debian packaging, because I wanted my Ubuntu and Debian users to be able to use my projects seamlessly. What about RPMs for RHEL, Fedora, Mandriva? ... and coronary packages? and MSIs? ... When do I find time to do development if I have to learn all this packaging. It would be fantastic to have an abstraction on all these packaging systems, including, as you point out, their database. I do agree that reusing the system packaging's database is great, and would be the best option for system-wide install. However one of the very neat features of setuptools and eggs is that you don't need administrator access to install the packages, and that is great in a shared environment, like a computation cluster. The system's database is thus unfortunately not a complete solution to the problem. My 2 cents, Gaël ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, Apr 09, 2008 at 12:41:32AM -0400, Phillip J. Eby wrote: The way to achieve a database for Python would be to provide tools for conversion of eggs to rpms and debs, Such tools already exist, although the conversion takes place from source distributions rather than egg distributions. What is the status of the deb backend? The only one I know is unofficial maintained by Andrew Straw, but my information my be lagging behind. By the way, if these tools work well, they are priceless! Cheers, Gaël ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, April 9, 2008 12:41 am, Phillip J. Eby wrote: At 10:49 PM 4/8/2008 -0400, Stanley A. Klein wrote: On Tue, April 8, 2008 9:37 pm, Ben Finney [EMAIL PROTECTED] wrote: Date: Wed, 09 Apr 2008 11:37:07 +1000 From: Ben Finney [EMAIL PROTECTED] Subject: Re: [Distutils] how to easily consume just the parts of eggs thatare good for you To: [EMAIL PROTECTED] zooko [EMAIL PROTECTED] writes: eyes and said So they are planning to reinvent apt!. That's pretty much my reaction, too. I have the same reaction. I'm curious. Have any of you actually read PEP 262 in any detail? I have seen precious little discussion so far that doesn't appear to be based on significant misunderstandings of either the purpose of reviving the PEP, or the mechanics of its proposed implementation. I haven't read the PEP at all. I generally don't read PEP's. I have tried in the past to use easy_install, but have run into problems because there is no communication between easy_install and the rpm database, resulting in failure of easy_install to recognize that dependencies have already been installed using rpms. This problem doesn't exist with Python 2.5, unless you're using a platform that willfully strips out the installation information that Python 2.5 provides for these packages. IIRC, I have had the problem with Python 2.5 on Fedora 7. Until recently, Fedora packagers did strip out the egg information included with Python packages they packaged. I left those files in when packaging myself using bdist_rpm. However, are you implying that the installation information for Python egg packages accesses and coordinates with the rpm database? I found myself having to go into the setup.py for the relevant package(s) and delete any statements regarding dependencies. Otherwise, IIRC, the packaging couldn't proceed because the Python packaging tool couldn't find the dependencies that had already been installed as rpms. After installation, Python managed to find the relevant files, but the packaging tool couldn't. A database focused only on Python packages is highly inappropriate for Linux systems, violates the Linux standards, and creates problems because eggs are not coordinated with the operating system package manager. The revamp of PEP 262 is aimed at removing .egg files and directories from the process, by allowing system packagers to tell Python what files belong to them and should not be messed with. And conversely, allowing systems and installation targets *without* package managers to safely manage their Python installations. IMHO, the main system without a package manager is Windows. A reasonable way to deal with Windows would be to create a package manager for it that could be used by Python and anyone else who wanted to use it. The package manager could establish a file hierarchy similar to the Unix FHS and install files appropriately, except for what is needed to satisfy the Windows OS. That would probably go a long way to addressing the issues being discussed here. This is primarily a Windows problem, not a Python problem. The way to achieve a database for Python would be to provide tools for conversion of eggs to rpms and debs, Such tools already exist, although the conversion takes place from source distributions rather than egg distributions. You are talking here about bdist_rpm and not about a tool that would take a Python package distributed as an egg file and convert the egg to an rpm or a deb. Unfortunately, some Python packagers are beginning to limit their focus only to egg distribution. That creates a problem for users who have native operating system package management. to have eggs support conformance to the LSB and FHS, Applying LSB and FHS to the innards of Python packages makes as much sense as applying them to the contents of Java .jar files -- i.e., none. If it's unchanging data that's part of a program or library, then it's a program or library, just like static data declared in a C program or library. Whether the file extension is .py, .so, or even .png is irrelevant. The FHS defines places to put specific kinds of files, such as command scripts (/bin, /usr/bin, /sbin, or /usr/sbin), documentation (/usr/share/doc/package-name), and configuration files (/etc). There are several kinds of files identified and places defined to put them. Distribution by eggs has a tendency to scoop up all of those files and put them in /usr/lib/python/site-packages, regardless of where they belong. Having eggs support conformance to FHS would mean recognizing and tagging the relevant files. A tool for converting eggs to rpms or debs would essentially reformat the egg to rpm or deb and put files where they belong. Stan Klein ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
Re: [Python-Dev] Python Leopard DLL Hell
I have learned that this is a specific behavior of OS X. I have submitted a formal bug report to Apple about the problem. It appears that this is documented by Apple as acceptable: http://developer.apple.com/documentation/DeveloperTools/Reference/MachOReference/Reference/reference.html#//apple_ref/c/func/dlopen Whereas, linux will respect the fact you gave it a specific shared library: http://linux.die.net/man/3/dlopen If I am provided a workaround by apple I will post a python patch. A little scary that someone can circumvent my application by just setting an environment variable. -Brian Cole On Tue, Apr 8, 2008 at 7:52 PM, Michael Torrie [EMAIL PROTECTED] wrote: Brian Cole wrote: That appears to be working correctly at first glance. The argument to dlopen is the correct shared library. Unfortunately, either python or OS X is lying to me here. If I inspect the python process with OS X's Activity Monitor and look at the Open Files and Ports tab, it shows that the _foo.so shared library is actually the one located inside $DYLD_LIBRARY_PATH. So this problem may not be python's, but I place it here as a first shot (maybe I'm using the imp module incorrectly). Sounds like you're going to need to learn how to use dtrace. Then you can more closely monitor exactly what python and the loader are doing. dtrace is very complicated (borrowed from Solaris) but extremely powerful. Worth learning anyway, but sounds like it's probably the debugging tool you need. Another thing you can do is check through the python source code and see how the os x-specific code is handling this type of situation. -- http://mail.python.org/mailman/listinfo/python-list ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
All my development is done on Linux. I use Windows very minimally (such as for tax preparation) and unless forced to do so for specific circumstances (such as submittal to grants.gov) do not expose Windows to the Internet. In the future there may possibly arise a need for us to port some Linux-developed Python code to Windows, but we will have to cross that bridge when we get there. I think you raise an interesting issue: What is a package manager? I have minimal experience installing packages on Windows over the last 5-10 years, but in my experience a Windows package comes as an executable that, when run, installs itself. Unless a third-party program monitors the installation, uninstalling is a nasty chore, as is finding out what files were installed or where they went. The rpm and deb package managers (and their yum and other higher level dependency managers) do a lot of things: 1. They install packages and maintain databases of what packages were installed 2. They manage dependencies 3. They support clean uninstalling of packages 4. They can query packages, both installed (via their databases) and not yet installed (e.g., as rpm or deb files), to determine attributes, such as files they install, dependencies, and other information defined at packaging time. 5. They build packages and (in some cases) can rebuild packages. 6. They can verify packages for integrity and security purposes. 7. They can download package files and maintain archives of installed package files for use as local repositories. There may be other functions, but the above is a top-of-the-head list. I can say that I'm not terribly happy with Python packaging that is only minimally compatible with rpm. I haven't used Ubuntu all that much. I do like Ubuntu's packaging and package management, and I do know that there are programs, such as alias, that can translate from rpm to deb formats. I don't think I ever said that Windows is broken in the area of package management. My own experience is that the files of Windows programs tend to be put in a directory devoted to the program, rather than put in directories with other files having similar purposes. At one time, the default location in Windows for word processing files was even in a sub-directory of the word processing program. That changed to having a form of user home directory, but it didn't change much for the program files themselves. Unix/Linux puts the files in specific areas of the file system having functional commonality. One could almost say that the Windows default approach to structuring its filesystem avoids or minimizes the need for package management. I repeat that this issue mainly arises because Windows doesn't have the same kind of filesystem structure (and therefore the need for package management) that other systems have. I don't know what Windows add/remove programs function does, but all it might do is to run the executable to install packages and record the installation (as was previously done by third party programs) to facilitate clean removal. Unless you can perform more of the other functions I listed above, I doubt I would call add/remove a package manager. Stan Klein On Wed, April 9, 2008 1:23 pm, Paul Moore wrote: On 09/04/2008, Stanley A. Klein [EMAIL PROTECTED] wrote: IMHO, the main system without a package manager is Windows. A reasonable way to deal with Windows would be to create a package manager for it that could be used by Python and anyone else who wanted to use it. The package manager could establish a file hierarchy similar to the Unix FHS and install files appropriately, except for what is needed to satisfy the Windows OS. That would probably go a long way to addressing the issues being discussed here. This is primarily a Windows problem, not a Python problem. Windows does have a package manager - the add/remove programs application. It's extremely limited, and doesn't make any attempt at doing dependency resolution, certainly - but that's a separate issue. I don't know if you use Windows (as in, develop programs using Python on Windows). If you do, then I'd be interested in your views on bdist_wininst and bdist_msi installers, and how they fit into the setuptools/egg environment, particularly with regard to the package manager you are proposing. If you don't use Windows, then I don't see how you can usefully comment. Personally, as I've said before, I don't have a problem with a Python-only package manager, as long as it replaces or integrates bdist_wininst and bdist_msi. Having two package managers is far worse than having none - and claiming that add/remove programs isn't a package manager is just ignoring reality (if it isn't, then why do bdist_wininst and bdist_msi exist?). Are the Linux users happy with having a Python package manager that ignores RPM/apt? Why should Windows users be any happier? Sorry - I'm feeling a little grumpy. I've read one too many
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, Apr 09, 2008 at 02:26:31PM -0400, Stanley A. Klein wrote: The rpm and deb package managers (and their yum and other higher level dependency managers) do a lot of things: 1. They install packages and maintain databases of what packages were installed 2. They manage dependencies 3. They support clean uninstalling of packages 4. They can query packages, both installed (via their databases) and not yet installed (e.g., as rpm or deb files), to determine attributes, such as files they install, dependencies, and other information defined at packaging time. 5. They build packages and (in some cases) can rebuild packages. 6. They can verify packages for integrity and security purposes. 7. They can download package files and maintain archives of installed package files for use as local repositories. You are collapsing three different functionalities in one: * Dealing with repositories and downloading: yum/apt * Installing + uninstalling packages, and dealing with system consistency (thus checking the dependencies are available): rpm/dpkg * Building For me it is important that the 3 are separated: * I may want to download the dependencies of a package to burn to a CD for a computer that does not have internet access. * I may want to send a tarball to a build server that does the building, but no install (so as not to corrupt my working system). Cheers, Gaël ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, April 9, 2008 3:19 pm, Gael Varoquaux wrote: On Wed, Apr 09, 2008 at 02:26:31PM -0400, Stanley A. Klein wrote: The rpm and deb package managers (and their yum and other higher level dependency managers) do a lot of things: 1. They install packages and maintain databases of what packages were installed 2. They manage dependencies 3. They support clean uninstalling of packages 4. They can query packages, both installed (via their databases) and not yet installed (e.g., as rpm or deb files), to determine attributes, such as files they install, dependencies, and other information defined at packaging time. 5. They build packages and (in some cases) can rebuild packages. 6. They can verify packages for integrity and security purposes. 7. They can download package files and maintain archives of installed package files for use as local repositories. You are collapsing three different functionalities in one: * Dealing with repositories and downloading: yum/apt * Installing + uninstalling packages, and dealing with system consistency (thus checking the dependencies are available): rpm/dpkg * Building For me it is important that the 3 are separated: * I may want to download the dependencies of a package to burn to a CD for a computer that does not have internet access. * I may want to send a tarball to a build server that does the building, but no install (so as not to corrupt my working system). Cheers, Gaël Gael - The functionalities are combined in programs but are not necessarily required to be used all at the same time. I'm not that familiar with apt, but yum also installs, including downloading both a package and its dependencies. Yum also has a query capability (yum list, yum info). I think synaptic does the same thing yum does, and adds a GUI and search capabilities similar to yum info as well. The build capabilities of rpm were moved to rpmbuild, but the building remains part of the rpm system. IIRC, bdist_rpm actually calls rpmbuild as part of its processing. Also, IIRC, rpmbuild can build from a tarball if it contains an rpm spec. It does not install in the same process. That is a separate step. You would not corrupt your working system by building an rpm from a tarball on it. BTW, I would not want to do dependencies with rpm if yum is available. Doing dependencies with rpm is very difficult and it is easy to wind up in dependency hell. Yum will find the dependencies and install them as long as they are in repositories that are registered in the yum configuration. I looked at man yum and couldn't find an option to download dependencies to the local repository without installing. However, if you did install a package and its dependencies, and if you have selected the option of retaining the cache and not cleaning it after installation, the rpms (e.g., for updates) are in /var/cache/yum/updates/packages/. They can be copied from there to a CD for a system without internet connectivity. Also, both Fedora and Ubuntu have software for building installable live CD's, although I don't know how they get their package files. Stan Klein ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, April 9, 2008 3:40 pm, Phillip J. Eby wrote: At 11:52 AM 4/9/2008 -0400, Stanley A. Klein wrote: However, are you implying that the installation information for Python egg packages accesses and coordinates with the rpm database? Yes, when the information isn't stripped out. Try a more recent Fedora. IMHO, the main system without a package manager is Windows. You're ignoring shared environments and development environments. (Not to mention Mac OS.) I don't understand what you mean by shared environments and development environments. I also don't know much about Mac OS, except that its underlying Darwin system is a version of BSD (that I assume would follow the Unix FHS). A reasonable way to deal with Windows would be to create a package manager for it that could be used by Python and anyone else who wanted to use it. Let us know when you've finished it, along with the one for Mac OS. :) I have enough trouble with what I'm already doing. :-) Of course this still won't do anything for shared environments and development environments. You are talking here about bdist_rpm and not about a tool that would take a Python package distributed as an egg file and convert the egg to an rpm or a deb. Unfortunately, some Python packagers are beginning to limit their focus only to egg distribution. That creates a problem for users who have native operating system package management. That is indeed a problem -- but it's a social one, not a technical one. It's trivial for the publisher of an egg to change their command line from setup.py bdist_egg upload to setup.py sdist bdist_egg upload, as soon as their users (politely) request that they do so. I agree that we are dealing with a combination of technical and social issues here. However, I think it takes a lot more understanding for a publisher to get everything straight. Applying LSB and FHS to the innards of Python packages makes as much sense as applying them to the contents of Java .jar files -- i.e., none. If it's unchanging data that's part of a program or library, then it's a program or library, just like static data declared in a C program or library. Whether the file extension is .py, .so, or even .png is irrelevant. The FHS defines places to put specific kinds of files, such as command scripts (/bin, /usr/bin, /sbin, or /usr/sbin), documentation (/usr/share/doc/package-name), and configuration files (/etc). There are several kinds of files identified and places defined to put them. Distribution by eggs has a tendency to scoop up all of those files and put them in /usr/lib/python/site-packages, regardless of where they belong. Eggs don't include documentation or configuration files, and they install scripts in script directories, so I don't get what you're talking about here. For any other data that a package accesses at runtime, my earlier comments apply. But rpms and debs do include these files, plus manual pages, localization files and a lot of other ancillary stuff. IIRC, you once mentioned that you have a CENTOS system. Do an rpm -qa |sort|less to get an alphabetized list of your installed packages, and then an rpm -qil on some of the packages, and you will see the range of different kinds of files in there. Having eggs support conformance to FHS would mean recognizing and tagging the relevant files. A tool for converting eggs to rpms or debs would essentially reformat the egg to rpm or deb and put files where they belong. No, because such files as you describe don't exist. If you think they do, then either you have misunderstood the nature of the files in question, or the developer has incorrectly placed non-runtime files in their installation tree. Most of the Python tarballs I have downloaded have all kinds of files in their installation trees. This is a major pain in the you-know-what for someone trying to use bdist_rpm and get proper, FHS-compliant rpms. If eggs are supposed to be strictly runtime files, I think very few developers actually understand that. Better yet, how do you define what should be included in an installation? It sounds like the egg concept doesn't include several kinds of files that rpm and deb would include in an installation. I think that may be an important issue here. Stan Klein ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, Apr 09, 2008 at 11:46:19PM +0100, Paul Moore wrote: I find this whole discussion hugely confusing, because a lot of people are stating opinions about environments which it seems they don't use, or know much about. I don't know how to avoid this, but it does make it highly unlikely that any practical progress will get made. I find that something that doesn't help at all the discussion move forward is that everybody has different usecases in mind, on different platforms, and is not interested in other people's usecases. Hopefuly I am wrong, Cheers, Gaël ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, Apr 09, 2008 at 11:52:08PM +0100, Paul Moore wrote: And I would say that Windows doesn't have a problem. Are any Windows users proposing building a package management system for Windows (Python-specific or otherwise)? It's a genuine question - is this something that Windows users are after, or is it just Linux users trying to show Windows users what they are missing? Well, users don't phrase this that way, because they don't know what package management (or rather automatic dependency tracking) is, but yes, they are some usecases. It is nowadays really tedious to deploy Python applications making uses of many packages on Python. The scientific community is a domain in which this problem is crucial, as we are trying to ship desktop applications to non-computer-savy people, with many dependencies outside the standard library. Enthought is working on shipping a Python distribution with some sort of package management for this purpose ( see http://code.enthought.com/enstaller/ ), and finding it is not an easy problem. Cheers, Gael ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PyArg_ParseTuple and Py_BuildValue question
Hello fellow pythonistas, I'm currently writing a simple python SCTP module in C. So far it works sending and receiving strings from it. The C sctp function sctp_sendmsg() has been wrapped and my function looks like this: sendMessage(PyObject *self, PyObject *args) { const char *msg = ; if (!PyArg_ParseTuple(args, s, msg)) return NULL; snprintf(buffer, 1025, msg); ret = sctp_sendmsg(connSock, (void *)buffer, (size_t)strlen(buffer), 0, 0, 0x0300, 0, 0, 0, 0); return Py_BuildValue(b, ); } I'm going to construct an SS7 packet in python using struct.pack(). Here's the question, how am I going to pass the packet I wrote in python to my module and back? I already asked this question in comp.lang.python but so far no responses yet. I hope anyone can point me to the right direction. Thanks in advance. --- Alvin Delagon ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you
On Wed, 2008-04-09 at 18:17 -0500, Dave Peterson wrote: I think I can sum up any further points by simply asking: Should it be safe to assume I can distribute my application via eggs / easy_install just because it is written in Python? I think that based on this discussion the bottom line answer to this question is No. Stan Klein On Wed, 2008-04-09 at 18:17 -0500, Dave Peterson wrote: I think I can sum up any further points by simply asking: Should it be safe to assume I can distribute my application via eggs / easy_install just because it is written in Python? I think that based on this discussion the bottom line answer to this question is No. Stan Klein ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] SetType=set in types module ?
Hi, the SetType is not available in the types module, so wouldn't it be needed here ? (in 2.6 by example) I guess the change is really simple and would be backward compatible : adding SetType = set ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Global Python Sprint Weekends: May 10th-11th and June 21st-22nd.
Anyone in Melbourne, Australia keen for the first sprint? I'm not sure if I'll be available, but if I can it'd be great to work with some others. Failing that, it's red bull and pizza in my lounge room :) I've been working on some neat code for an AST optimizer. If I'm free that weekend, I'll probably continue my work on that. Cheers, T Trent Nelson wrote: Following on from the success of previous sprint/bugfix weekends and sprinting efforts at PyCon 2008, I'd like to propose the next two Global Python Sprint Weekends take place on the following dates: * May 10th-11th (four days after 2.6a3 and 3.0a5 are released) * June 21st-22nd (~week before 2.6b2 and 3.0b2 are released) It seems there are a few of the Python User Groups keen on meeting up in person and sprinting collaboratively, akin to PyCon, which I highly recommend. I'd like to nominate Saturday across the board as the day for PUGs to meet up in person, with Sunday geared more towards an online collaboration day via IRC, where we can take care of all the little things that got in our way of coding on Saturday (like finalising/preparing/reviewing patches, updating tracker and documentation, writing tests ;-). For User Groups that are planning on meeting up to collaborate, please reply to this thread on python-dev@python.org and let every- one know your intentions! As is commonly the case, #python-dev on irc.freenode.net will be the place to be over the course of each sprint weekend; a large proportion of Python developers with commit access will be present, increasing the amount of eyes available to review and apply patches. For those that have an idea on areas they'd like to sprint on and want to look for other developers to rope in (or just to communicate plans in advance), please also feel free to jump on this thread via python-dev@ and indicate your intentions. For those that haven't the foggiest on what to work on, but would like to contribute, the bugs tracker at http://bugs.python.org is the best place to start. Register an account and start searching for issues that you'd be able to lend a hand with. All contributors that submit code patches or documentation updates will typically get listed in Misc/ACKS.txt; come September when the final release of 2.6 and 3.0 come about, you'll be able to point at the tarball or .msi and exclaim loudly ``I helped build that!'', and actually back it up with hard evidence ;-) Bring on the pizza and Red Bull! Trent. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/krumms%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] socket recv on win32 can be extremly delayed (python bug?)
hello, I tried to implement a simple python XMLRPC service on a win32 environment (client/server code inserted below). The profiler of the client told me, that a simple function call needs about 200ms (even if I run it in a loop, the time needed per call stays the same). After analysing the problem with etherreal I found out, that the XMLRPC request is transmitted via two TCP packets. One containing the HTTP header and one containting the data. But the acknowledge to the first TCP packet is delayed by 200ms. I tried around on the server side and found out that if the server reads exactly all bytes transfered in the first TCP frame (via socket.recv()), the next socket.recv(), even if reading only one byte, needs about 200 ms. But if I read one byte less than transfered in the first TCP frame and then reading 2 bytes (socket.recv(2)) there is no delay, although the same total amount of data was read. After some googling I found the website http://support.microsoft.com/?scid=kb%3Ben-us%3B823764x=12y=15, which proposed a workaround (modifing the registryentry for the tcp/ip driver) that did work. But modifing the clients registry settings is no option for us. Is there anybody who nows how to solve the problem? Or is it even a problem if the python socket implementation? By the way: I testet Win2000 SP4 and WinXP SP2 with Python 2.3.3 and Python 2.5.1 each. CLIENT: -- import xmlrpclib import profile server = xmlrpclib.ServerProxy(http://server:80;) profile.run('server.test(1,2)') SERVER: -- import SimpleXMLRPCServer def test(a,b): return a+b server = SimpleXMLRPCServer.SimpleXMLRPCServer( ('', 80) ) server.register_function(test) server.serve_forever() -- Mit freundlichen Grüßen, Best Regards, Robert Hölzl BALTECH AG Firmensitz: Lilienthalstrasse 27, D-85399 Hallbergmoos Registergericht: Amtsgericht München, HRB 115215 Vorstand: Jürgen Rösch (Vorsitzender), Martina M. Schuster Aufsichtsratsvorsitzende: Eva Zeising begin:vcard fn;quoted-printable:Robert H=C3=B6lzl n;quoted-printable:H=C3=B6lzl;Robert org:Baltech AG;Development adr:;;Lilienthalstrasse 27;Hallbergmoos;;85399;Germany email;internet:[EMAIL PROTECTED] title:Mr. tel;work:+49 (811) 99 88 1-18 tel;fax:+49 (811) 99 88 1-11 note;quoted-printable:Registergericht: Amtsgericht M=C3=BCnchen, HRB 115215=0D=0A= Vorstand: Martina Schuster-R=C3=B6sch=0D=0A= Vorstandsvorsitzender: J=C3=BCrgen R=C3=B6sch=0D=0A= Aufsichtsratsvorsitzende: Eva Zeising x-mozilla-html:TRUE url:http://www.baltech.de version:2.1 end:vcard ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Security Advisory for unicode repr() bug?
___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Known doctest bug with unicode?
So there's a difference in behaviour between 2.x and 3.0 when it comes to this part. I guess the better behaviour would be for doctest to honour the encoding specified in the file/module? If other people agree I can see what I can to make that work. I'm fairly skeptical that you can make that work, whether or not it's a good idea. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
On 2008-04-22 18:33, Bill Janssen wrote: The 2002 paper A language and character set determination method based on N-gram statistics by Izumi Suzuki and Yoshiki Mikami and Ario Ohsato and Yoshihide Chubachi seems to me a pretty good way to go about this. Thanks for the reference. Looks like the existing research on this just hasn't made it into the mainstream yet. Here's their current project: http://www.language-observatory.org/ Looks like they are focusing more on language detection. Another interesting paper using n-grams: Language Identification in Web Pages by Bruno Martins and Mário J. Silva http://xldb.fc.ul.pt/data/Publications_attach/ngram-article.pdf And one using compression: Text Categorization Using Compression Models by Eibe Frank, Chang Chui, Ian H. Witten http://portal.acm.org/citation.cfm?id=789742 They're looking at LSEs, language-script-encoding triples; a script is a way of using a particular character set to write in a particular language. Their system has these requirements: R1. the response must be either correct answer or unable to detect where unable to detect includes other than registered [the registered set of LSEs]; R2. Applicable to multi-LSE texts; R3. never accept a wrong answer, even when the program does not have enough data on an LSE; and R4. applicable to any LSE text. So, no wrong answers. The biggest disadvantage would seem to be that the registration data for a particular LSE is kind of bulky; on the order of 10,000 shift-codons, each of three bytes, about 30K uncompressed. http://portal.acm.org/ft_gateway.cfm?id=772759type=pdf For a server based application that doesn't sound too large. Unless you're using a very broad scope, I don't think that you'd need more than a few hundred LSEs for a typical application - nothing you'd want to put in the Python stdlib, though. Bill IMHO, more research has to be done into this area before a standard module can be added to the Python's stdlib... and who knows, perhaps we're lucky and by the time everyone is using UTF-8 anyway :-) I walked over to our computational linguistics group and asked. This is often combined with language guessing (which uses a similar approach, but using characters instead of bytes), and apparently can usually be done with high confidence. Of course, they're usually looking at clean texts, not random stuff. I'll see if I can get some references and report back -- most of the research on this was done in the 90's. Bill -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 22 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
When a web browser POSTs data, there is no standard way of communicating which encoding it's using. That's just not true. Web browser should and do use the encoding of the web page that originally contained the form. Since the site that receives the POST doesn't necessarily have access to the Web page that originally contained the form, that's not really helpful. However, POSTs can use the MIME type multipart/form-data for non-Latin-1 content, and should. That contains facilities for indicating the encoding and other things as well. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyArg_ParseTuple and Py_BuildValue question
On Wed, Apr 9, 2008 at 8:23 PM, Alvin Delagon [EMAIL PROTECTED] wrote: I'm going to construct an SS7 packet in python using struct.pack(). Here's the question, how am I going to pass the packet I wrote in python to my module and back? I already asked this question in comp.lang.python but so far no responses yet. I hope anyone can point me to the right direction. Thanks in advance. What exactly is your problem? -- Cheers, Benjamin Peterson ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SetType=set in types module ?
On Wed, Apr 16, 2008 at 8:08 AM, iks hefem [EMAIL PROTECTED] wrote: Hi, the SetType is not available in the types module, so wouldn't it be needed here ? (in 2.6 by example) Nothing new is currently being added to the types module because we are trying to decide whether to remove it or not. Please file a bug report, though, to remind us if we decide to keep it. -- Cheers, Benjamin Peterson ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Unless you're using a very broad scope, I don't think that you'd need more than a few hundred LSEs for a typical application - nothing you'd want to put in the Python stdlib, though. I tend to agree with this (and I'm generally in favor of putting everything in the standard library!). For those of us doing document-processing applications (Martin, it's not just about Web browsers), this would be a very useful package to have up on PyPI. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
On 22-Apr-08, at 2:16 PM, Martin v. Löwis wrote: Any program that needs to examine the contents of documents/feeds/whatever on the web needs to deal with incorrectly-specified encodings That's not true. Most programs that need to examine the contents of a web page don't need to guess the encoding. In most such programs, the encoding can be hard-coded if the declared encoding is not correct. Most such programs *know* what page they are webscraping, or else they couldn't extract the information out of it that they want to get at. I certainly agree that if the target set of documents is small enough it is possible to hand-code the encoding. There are many applications, however, that need to examine the content of an arbitrary, or at least non-small set of web documents. To name a few such applications: - web search engines - translation software - document/bookmark management systems - other kinds of document analysis (market research, seo, etc.) As for feeds - can you give examples of incorrectly encoded one (I don't ever use feeds, so I honestly don't know whether they are typically encoded incorrectly. I've heard they are often XML, in which case I strongly doubt they are incorrectly encoded) I also don't have much experience with feeds. My statement is based on the fact that chardet, the tool that has been cited most in this thread, was written specifically for use with the author's feed parsing package. As for whatever - can you give specific examples? Not that I can substantiate. Documents feeds covers a lot of what is on the web--I was only trying to make the point that on the web, whenever an encoding can be specified, it will be specified incorrectly for a significant chunk of exemplars. (which, sadly, is rather common). The set of programs of programs that need this functionality is probably the same set that needs BeautifulSoup--I think that set is larger than just browsers grin Again, can you give *specific* examples that are not web browsers? Programs needing BeautifulSoup may still not need encoding guessing, since they still might be able to hard-code the encoding of the web page they want to process. Indeed, if it is only one site it is pretty easy to work around. My main use of python is processing and analyzing hundreds of millions of web documents, so it is pretty easy to see applications (which I have listed above). I think that libraries like Mark Pilgrim's FeedParser and BeautifulSoup are possible consumers of guessing as well. In any case, I'm very skeptical that a general guess encoding module would do a meaningful thing when applied to incorrectly encoded HTML pages. Well, it does. I wish I could easily provide data on how often it is necessary over the whole web, but that would be somewhat difficult to generate. I can say that it is much more important to be able to parse all the different kinds of encoding _specification_ on the web (Content-Type/Content-Encoding/meta http-equiv tags, etc), and the malformed cases of these. I can also think of good arguments for excluding encoding detection for maintenance reasons: is every case of the algorithm guessing wrong a bug that needs to be fixed in the stdlib? That is an unbounded commitment. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] socket recv on win32 can be extremly delayed (python bug?)
Hi, This is not a python-specific problem. See http://en.wikipedia.org/wiki/Nagle's_algorithm -Mike On 17-Apr-08, at 3:08 AM, Robert Hölzl wrote: hello, I tried to implement a simple python XMLRPC service on a win32 environment (client/server code inserted below). The profiler of the client told me, that a simple function call needs about 200ms (even if I run it in a loop, the time needed per call stays the same). After analysing the problem with etherreal I found out, that the XMLRPC request is transmitted via two TCP packets. One containing the HTTP header and one containting the data. But the acknowledge to the first TCP packet is delayed by 200ms. I tried around on the server side and found out that if the server reads exactly all bytes transfered in the first TCP frame (via socket.recv()), the next socket.recv(), even if reading only one byte, needs about 200 ms. But if I read one byte less than transfered in the first TCP frame and then reading 2 bytes (socket.recv(2)) there is no delay, although the same total amount of data was read. After some googling I found the website http://support.microsoft.com/?scid=kb%3Ben-us%3B823764x=12y=15 , which proposed a workaround (modifing the registryentry for the tcp/ip driver) that did work. But modifing the clients registry settings is no option for us. Is there anybody who nows how to solve the problem? Or is it even a problem if the python socket implementation? By the way: I testet Win2000 SP4 and WinXP SP2 with Python 2.3.3 and Python 2.5.1 each. CLIENT: -- import xmlrpclib import profile server = xmlrpclib.ServerProxy(http://server:80;) profile.run('server.test(1,2)') SERVER: -- import SimpleXMLRPCServer def test(a,b): return a+b server = SimpleXMLRPCServer.SimpleXMLRPCServer( ('', 80) ) server.register_function(test) server.serve_forever() -- Mit freundlichen Grüßen, Best Regards, Robert Hölzl BALTECH AG Firmensitz: Lilienthalstrasse 27, D-85399 Hallbergmoos Registergericht: Amtsgericht München, HRB 115215 Vorstand: Jürgen Rösch (Vorsitzender), Martina M. Schuster Aufsichtsratsvorsitzende: Eva Zeising robert_hoelzl.vcf___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mike.klaas%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python hangs when parsing a bad-formed email
Hello, Alberto Casado Martín wrote: Hi all, First of all, sorry if this isn't the list where I have to post this. And sorry for my english. As the subject says, I'm having problems with the attached email, when I try to get a email object reading the attached file, the python process gets hang and gets all cpu. I have debuged my code to find where it happens, and I found that is _parsegen method of the FeedParser class. I know that the email format is wrong but I don't know why python hangs. following paste the code showing where hangs. [snip] bash-3.00$ python Python 2.5.1 (r251:54863, Feb 28 2008, 07:48:25) [GCC 3.4.6] on sunos5 Type help, copyright, credits or license for more information. import email fp = open('raro.txt') mail = email.message_from_file(fp) never return When you think you found a problem with python, please submit an issue in the python issue tracker: http://bugs.python.org/ In your case, I suspect some regular expression trying to match all the empty lines of the message, one character at a time. -- Amaury Forgeot d'Arc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] GSoC student introduction and sandbox commit privileges request
Hi there, I've just been accepted into this year's Google Summer of Code, to work for the Python Software Foundation on 2to3. My project is to give 2to3 fixers the ability to rank how confident they are on each fix, and let users choose to intervene manually whenever that confidence level is below a certain threshold. Among other things, this might allow fixers for situations where the code translation is not always guaranteed to be correct (like % string formatting, which came up recently in another thread). The full proposal is at http://isnomore.net/2to3 . Collin Winter will be my mentor, and I'd like to thank him and Christian Heimes for all the help they gave me in designing the project. I'd also like to thank Martin Löwis, for discussing a project with me which ended up not turning into a proposal, but helped me write the 2to3 one. Finally, I'd like to request commit privileges to work on a sandbox branch, during the Summer of Code. If you have any further questions, please feel free to contact me. I'm really looking forward to working on this project! Cheers, rbp -- Rodrigo Bernardo Pimentel [EMAIL PROTECTED] | GPG: 0x0DB14978 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python hangs when parsing a bad-formed email
Amaury Forgeot d'Arc [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | When you think you found a problem with python, please submit an issue | in the python issue tracker: |http://bugs.python.org/ Or post to comp.lang.python / python mailing list / gmane.comp.python.general ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Yup, but DrProject (the target application) also serves as a relay and archive for email. We have no control over the agent used for composition, and AFAIK there's no standard way to include encoding information. Greg, Internet-compliant email actually has well-specified mechanisms for including encoding information; see RFCs 2047 and 2231. There's no need to guess; you can just look. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GSoC student introduction and sandbox commit privileges request
On Tue, Apr 22, 2008 at 4:35 PM, Rodrigo Bernardo Pimentel [EMAIL PROTECTED] wrote: Hi there, I've just been accepted into this year's Google Summer of Code, to work for the Python Software Foundation on 2to3. My project is to give 2to3 fixers the ability to rank how confident they are on each fix, and let users choose to intervene manually whenever that confidence level is below a certain threshold. Among other things, this might allow fixers for situations where the code translation is not always guaranteed to be correct (like % string formatting, which came up recently in another thread). The full proposal is at http://isnomore.net/2to3 . Collin Winter will be my mentor, and I'd like to thank him and Christian Heimes for all the help they gave me in designing the project. I'd also like to thank Martin Löwis, for discussing a project with me which ended up not turning into a proposal, but helped me write the 2to3 one. Finally, I'd like to request commit privileges to work on a sandbox branch, during the Summer of Code. Isn't this a chance for bzr to shine? With lib2to3 in the 3.0 bzr branch, can't Rodrigo and the other students who don't have some funky requirement just use bzr? If you have any further questions, please feel free to contact me. I'm really looking forward to working on this project! Thanks for contributing! -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GSoC student introduction and sandbox commit privileges request
On Tue, Apr 22 2008 at 09:02:49PM BRT, Brett Cannon [EMAIL PROTECTED] wrote: On Tue, Apr 22, 2008 at 4:35 PM, Rodrigo Bernardo Pimentel [EMAIL PROTECTED] wrote: I've just been accepted into this year's Google Summer of Code (...) Finally, I'd like to request commit privileges to work on a sandbox branch, during the Summer of Code. Isn't this a chance for bzr to shine? With lib2to3 in the 3.0 bzr branch, can't Rodrigo and the other students who don't have some funky requirement just use bzr? FWIW, +1 from me, I'm perfectly comfortable with bzr. If you have any further questions, please feel free to contact me. I'm really looking forward to working on this project! Thanks for contributing! My pleasure :) Cheers, rbp -- Rodrigo Bernardo Pimentel [EMAIL PROTECTED] | GPG: 0x0DB14978 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SetType=set in types module ?
Benjamin Peterson schrieb: On Wed, Apr 16, 2008 at 8:08 AM, iks hefem [EMAIL PROTECTED] wrote: Hi, the SetType is not available in the types module, so wouldn't it be needed here ? (in 2.6 by example) Nothing new is currently being added to the types module because we are trying to decide whether to remove it or not. Please file a bug report, though, to remind us if we decide to keep it. Eventually the types module will go away or at least be stripped down in Python 3.0. New types like the set type weren't added to types deliberately. Please don't file a bug report. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyArg_ParseTuple and Py_BuildValue question
Alvin Delagon schrieb: I'm going to construct an SS7 packet in python using struct.pack(). Here's the question, how am I going to pass the packet I wrote in python to my module and back? I already asked this question in comp.lang.python but so far no responses yet. I hope anyone can point me to the right direction. Thanks in advance. The Python developer list is meant for the development OF Python, not WITH Python. Please use the general Python user list to get help. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] GSoC Student Introduction
Hello, My name is Nick Edds. I am going to be working on the 2to3 tool with Collin Winter as my mentor. More specifically, I will be working on improving the performance of the 2to3 tool in general, and its use of patterns in particular. I would like to request commit privileges to work in a sandbox branch and although I don't have any familiarity with bzr, I would be comfortable using it. Regards, Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GSoC student introduction and sandbox commit privileges request
On Tue, Apr 22, 2008 at 5:18 PM, Rodrigo Bernardo Pimentel [EMAIL PROTECTED] wrote: On Tue, Apr 22 2008 at 09:02:49PM BRT, Brett Cannon [EMAIL PROTECTED] wrote: On Tue, Apr 22, 2008 at 4:35 PM, Rodrigo Bernardo Pimentel [EMAIL PROTECTED] wrote: I've just been accepted into this year's Google Summer of Code (...) Finally, I'd like to request commit privileges to work on a sandbox branch, during the Summer of Code. Isn't this a chance for bzr to shine? With lib2to3 in the 3.0 bzr branch, can't Rodrigo and the other students who don't have some funky requirement just use bzr? FWIW, +1 from me, I'm perfectly comfortable with bzr. Fine by me; I don't care one way or the other. Collin If you have any further questions, please feel free to contact me. I'm really looking forward to working on this project! Thanks for contributing! My pleasure :) Cheers, rbp -- Rodrigo Bernardo Pimentel [EMAIL PROTECTED] | GPG: 0x0DB14978 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/collinw%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GSoC Student Introduction
On Tue, Apr 22, 2008 at 7:42 PM, Nick Edds [EMAIL PROTECTED] wrote: Hello, My name is Nick Edds. I am going to be working on the 2to3 tool with Collin Winter as my mentor. More specifically, I will be working on improving the performance of the 2to3 tool in general, and its use of patterns in particular. I would like to request commit privileges to work in a sandbox branch and although I don't have any familiarity with bzr, I would be comfortable using it. Luckily, Bazaar is really easy. Thanks for contributing! -- Cheers, Benjamin Peterson ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] BSDDB3
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Trent Nelson wrote: | I remember those rampant BSDDB crashes on Windows well. [...] | basically, the tests weren't cleaning up their environment in | the right order, so BSDDB was getting passed completely and | utterly bogus values. Next week I will (if nothing goes wrong) publish pybsddb 4.6.4. This release supports distributed transactions and replication, testsuite is way faster, and rewritten to be able to launch tests from multiple threads/processes if you wish, setuptools/pypi support, etc. I think this release would be appropiate to integrate in Python. I think most demands are solved and new features are interesting (replication, distributed transactions, do not crash when closing objects in the wrong order...). Also, I completed the documentation, with the full supported API, and ported it to Python 2.6 documentation system. The result: http://www.jcea.es/programacion/pybsddb.htm#bsddb3-4.6.4 http://www.jcea.es/programacion/pybsddb_doc/preview/ I'm very interested in integrating this release in Python 2.6 for the new features, the full documentation, and to get feedback from Buildbot and python-dev community. Also, I would like to avoid to integrate pybsddb late in the python 2.6 release cycle; I hope to be away of my computer in August! :). I'm a bit nervous about syncing, because I have the feeling that python-dev is committing changes to python private branch of pybsddb. I would rather prefer patches send to me and integrate canonical pybsddb releases in Python frequently. Somebody suggested to post patches in the tracker, but I think this is not going to work. The diff from current python bsddb and the official version is so huge that nobody could follow it. A more sensible approach, I think, is to diff current python pybsddb against the version I used as my root (January?), integrate the changes in current canonical pybsddb and, then, drop the entire updated package into python. Then, commits to python pybsddb should be avoided; patches should be send to me. I think this is the only way when integrating a project outside python SVN. Suggestions?. PS: I can't comment on Win64. It is an alien world to me :). - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ [EMAIL PROTECTED] - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/ _/_/_/_/_/ ~ _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSA6oeJlgi5GaxT1NAQItswP+KR15vZWbnYZ23WQHoUozVOWvf+ghG2Q8 acVhCwJajzvxOEfozRMZRmQkPUBmWga1zbHjkHt5c196vku7+X0bDc7aO4T2jRHx 00PbPLGnYth972elTVFfSWpZVNkX/9A4EbtTHVCav105nW+u1/Kod/rY5fzgKcTn SxYkmk4Ax7U= =98uc -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GSoC Student Introduction
On Tue, Apr 22, 2008 at 7:38 PM, Benjamin Peterson [EMAIL PROTECTED] wrote: On Tue, Apr 22, 2008 at 7:42 PM, Nick Edds [EMAIL PROTECTED] wrote: Hello, My name is Nick Edds. I am going to be working on the 2to3 tool with Collin Winter as my mentor. More specifically, I will be working on improving the performance of the 2to3 tool in general, and its use of patterns in particular. I would like to request commit privileges to work in a sandbox branch and although I don't have any familiarity with bzr, I would be comfortable using it. Luckily, Bazaar is really easy. See http://python.org/dev/bazaar/ for info. And if you have any other issues feel free to ask, Nick. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Bill Janssen writes: Internet-compliant email actually has well-specified mechanisms for including encoding information; see RFCs 2047 and 2231. There's no need to guess; you can just look. You must be very special to get only compliant email. About half my colleagues use RFC 2047 to encode Japanese file names in MIME attachments (a MUST NOT behavior according to RFC 2047), and a significant fraction of the rest end up with binary Shift JIS or EUC or MacRoman in there. And those are just the most widespread violations I can think of off the top of my head. Not to mention that I find this: =?X-UNKNOWN?Q?Martin_v=2E_L=F6wis?= [EMAIL PROTECTED], in the header I got from you. (I'm not ragging on you, I get Martin's name wrong a significant portion of the time myself. :-( ) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Martin v. Löwis writes: In any case, I'm very skeptical that a general guess encoding module would do a meaningful thing when applied to incorrectly encoded HTML pages. That depends on whether you can get meaningful information about the language from the fact that you're looking at the page. In the browser context, for one, 99.44% of users are monolingual, so you only have to distinguish among the encodings for their language. In this context a two stage process of determining a category of encoding (eg, ISO 8859, ISO 2022 7-bit, ISO 2022 8-bit multibyte, UTF-8, etc), and then picking an encoding from the category according to a user-specified configuration has served Emacs/MULE users very well for about 20 years. It does *not* work in a context where multiple encodings from the same category are in use (eg, the email folder of a Polish Gastarbeiter in Berlin). Nonetheless it is pretty useful for user agents like mail clients, web browsers, and editors. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Guido van Rossum writes: To the contrary, an encoding-guessing module is often needed, and guessing can be done with a pretty high success rate. Other Unicode libraries (e.g. ICU) contain guessing modules. I suppose the API could return two values: the guessed encoding and a confidence indicator. Note that the locale settings might figure in the guess. Not locale settings, but user configuration. A Bayesian detector (CodeBayes? hi, Skip!) might be a good way to go for servers, while a simple language preference might really up the probability for user agents. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Yup, but DrProject (the target application) also serves as a relay and archive for email. We have no control over the agent used for composition, and AFAIK there's no standard way to include encoding information. That's not at all the case. MIME defines that in full detail, since 1993. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
I certainly agree that if the target set of documents is small enough it is possible to hand-code the encoding. There are many applications, however, that need to examine the content of an arbitrary, or at least non-small set of web documents. To name a few such applications: - web search engines - translation software I'll question whether these are many programs. Web search engines and translation software have many more challenges to master, and they are fairly special-cased, so I would expect they need to find their own answer to character set detection, anyway (see Bill Janssen's answer on machine translation, also). - document/bookmark management systems - other kinds of document analysis (market research, seo, etc.) Not sure what specifically you have in mind, however, I expect that these also have their own challenges. For example, I would expect that MS-Word documents are frequent. You don't need character set detection there (Word is all Unicode), but you need an API to look into the structure of .doc files. Not that I can substantiate. Documents feeds covers a lot of what is on the web--I was only trying to make the point that on the web, whenever an encoding can be specified, it will be specified incorrectly for a significant chunk of exemplars. I firmly believe this assumption is false. If the encoding comes out of software (which it often does), it will be correct most of the time. It's incorrect only if the content editor has to type it. Indeed, if it is only one site it is pretty easy to work around. My main use of python is processing and analyzing hundreds of millions of web documents, so it is pretty easy to see applications (which I have listed above). Ok. What advantage would you (or somebody working on a similar project) gain if chardet was part of the standard library? What if it was not chardet, but some other algorithm? I can also think of good arguments for excluding encoding detection for maintenance reasons: is every case of the algorithm guessing wrong a bug that needs to be fixed in the stdlib? That is an unbounded commitment. Indeed, that's what I meant with my initial remark. People will expect that it works correctly - both with the consequence of unknowingly proceeding with the incorrect response, and then complaining when they find out that it did produce an incorrect answer. For chardet specifically, my usual standard-library remark applies: it can't become part of the standard library unless the original author contributes it, anyway. I would then hope that he or a group of people would volunteer to maintain it, with the threat of removing it from the stdlib again if these volunteers go away and too many problems show up. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
Martin v. Löwis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | I certainly agree that if the target set of documents is small enough it | | Ok. What advantage would you (or somebody working on a similar project) | gain if chardet was part of the standard library? What if it was not | chardet, but some other algorithm? It seems to me that since there is not a 'correct' algorithm but only competing heuristics, encoding detection modules should be made available via PyPI and only be considered for stdlib after a best of breed emerges with community support. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] socket recv on win32 can be extremly delayed (python bug?)
Is there anybody who nows how to solve the problem? If it's really the problem described in the MSKB article, the article also suggests a solution: use non-blocking sockets. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com