date:20080422

 IMHO, more research has to be done into this area before a
 standard module can be added to the Python's stdlib... and
 who knows, perhaps we're lucky and by the time everyone is
 using UTF-8 anyway :-)

I walked over to our computational linguistics group and asked.  This
is often combined with language guessing (which uses a similar
approach, but using characters instead of bytes), and apparently can
usually be done with high confidence.  Of course, they're usually
looking at clean texts, not random stuff.  I'll see if I can get
some references and report back -- most of the research on this was
done in the 90's.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] configure error: rm: conftest.dSYM: is a directory

2008-04-22 Thread Ronald Oussoren



On 5 Apr, 2008, at 21:17, [EMAIL PROTECTED] wrote:

I just noticed this error message during configure:

   checking whether gcc accepts -Olimit 1500... no
   checking whether gcc supports ParseTuple __format__... no
   checking whether pthreads are available without options... yes
   checking whether g++ also accepts flags for thread support... no
   checking for ANSI C header files... rm: conftest.dSYM: is a  
directory

   rm: conftest.dSYM: is a directory
   yes
   checking for sys/types.h... yes
   checking for sys/stat.h... yes
   checking for stdlib.h... yes
   checking for string.h... yes

Note the rm: conftest.dSYM: is a directory.  This occurred a few  
times
during the configure process.  Didn't cause it to conk out, but is  
annoying.


I've looked into this issue. It is harmless and caused by an  
interaction between

AC_TRY_RUN and gcc on leopard.

Gcc generates '.dSYM' directories when linking with debugging enabled.  
These

directories contain detached debugging information (see man dsymutil).

AC_TRY_RUN tries to remove 'conftest.*' using rm, without the -r flag.  
The end result

is an error message during configure and a 'config.dSYM' turd.

AFAIK this not easily fixed without changing the definition of  
AC_TRY_RUN, at least not

without crude hacks.

Ronald



Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com




smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread David Wolever


On 22-Apr-08, at 12:30 AM, Martin v. Löwis wrote:
IMO, encoding estimation is something that many web programs will  
have

to deal with

Can you please explain why that is? Web programs should not normally
have the need to detect the encoding; instead, it should be specified
always - unless you are talking about browsers specifically, which
need to support web pages that specify the encoding incorrectly.

Two cases come immediately to mind: email and web forms.
When a web browser POSTs data, there is no standard way of  
communicating which encoding it's using.  There are some hints which  
make it easier (accept-charset attributes, the encoding used to send  
the page to the browser), but no guarantees.
Email is a smaller problem, because it usually has a helpful content- 
type header, but that's no guarantee.


Now, at the moment, the only data I have to support this claim is my  
experience with DrProject in non-English locations.
If I'm the only one who has had these sorts of problems, I'll go back  
to Unicode for Dummies.



so it might as well be built in; I would prefer the option
to run `text=input.encode('guess')` (or something similar) than  
relying

on an external dependency or worse yet using a hand-rolled algorithm.

Ok, let me try differently then. Please feel free to post a patch to
bugs.python.org, and let other people rip it apart.
For example, I don't think it should be a codec, as I can't imagine it
working on streams.


As things frequently are, it seems like this is a much larger problem  
that I originally believed.


I'll go back and take another look at the problem, then come back if  
new revelations appear.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A smarter shutil.copytree ?

2008-04-22 Thread Steven Bethard

On Tue, Apr 22, 2008 at 1:56 AM, Tarek Ziadé [EMAIL PROTECTED] wrote:
 On Mon, Apr 21, 2008 at 2:25 AM, Steven Bethard [EMAIL PROTECTED] wrote:
  On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé [EMAIL PROTECTED] wrote:
 I have submitted a patch for review here: 
 http://bugs.python.org/issue2663

  glob-style patterns or a callable (for complex cases) can be provided
  to filter out files or directories.
  
I'm not a big fan of the sequence-or-callable argument. Why not just
make it a callable argument, and supply a utility function so that you
can write something like::
  
   exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2')
   shutil.copytree(src_dir, dst_dir, exclude=exclude_func)

  I made another draft based on a single callable argument to try out:
  http://bugs.python.org/file10073/shutil.copytree.filtering.patch

  The callable takes the src directory + its content as a list, and
  returns filter eligible for exclusion

FWIW, that looks better to me.

  That makes me wonder, like Alexander said on the bug tracker:
  In the glob-style patterns callable, do we want to deal with absolute paths ?

I think that it would be okay to document that
shutil.ignore_patterns() only accepts patterns matching individual
filenames (not complex paths). If someone needs to do something with
absolute paths, then they can write their own 'ignore' function,
right?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
 --- Bucky Katt, Get Fuzzy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

 When a web browser POSTs data, there is no standard way of communicating
 which encoding it's using.

That's just not true. Web browser should and do use the encoding of the
web page that originally contained the form.

 There are some hints which make it easier
 (accept-charset attributes, the encoding used to send the page to the
 browser), but no guarantees.

Not true. The latter is guaranteed (unless you assume bugs - but if
you do, can you present a specific browser that has that bug?)

 Email is a smaller problem, because it usually has a helpful
 content-type header, but that's no guarantee.

Then assume windows-1252. Mailers who don't use MIME for non-ASCII
characters mostly died 10 years ago; those people who continue to
use them likely can accept occasional moji-bake (or else they would
have switched long ago).

 Now, at the moment, the only data I have to support this claim is my
 experience with DrProject in non-English locations.
 If I'm the only one who has had these sorts of problems, I'll go back to
 Unicode for Dummies.

For web forms, I always encode the pages in UTF-8, and that always
works.

For email, I once added encoding processing to the pipermail (the
mailman archiver), and that also always works.

 I'll go back and take another look at the problem, then come back if new
 revelations appear.

Good luck!

Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BSDDB3

2008-04-22 Thread Trent Nelson

Hi Jesus,

 Martin v. Löwis wrote:
 | I think it would be helpful if you could analyze the crashes that
 | bsddb caused on Windows. Just go back a few revisions in the
 | subversion tree to reproduce the crashes.

 I have no MS Windows machines in my environment :-(

I remember those rampant BSDDB crashes on Windows well.  I brought this up with 
Martin at PyCon; I really don't think we can fault BSDDB here -- basically, the 
tests weren't cleaning up their environment in the right order, so BSDDB was 
getting passed completely and utterly bogus values.  I *think* I managed to 
persuade Martin that this was indeed our fault, and we can't really hold BSDDB 
accountable.  (My argument being that if a 3rd party app says the behaviour of 
a method is undefined if you pass it a null pointer, and you pass it a null 
pointer, and it crashes your program, it's your fault, not theirs.)

Once this was addressed, the BSDDB tests ran more or less on Windows 32-bit 
without error.  Windows x64 was another matter though -- I traced the problem 
down to wildly conflicting compiler and linker flags between our Python build 
and how we were building BSDDB (or rather how BSDDB builds out of the box on 
Windows).

My solution was to drop our reliance on the Berkeley_DB.sln/db_static.vcproj 
files completely, and mimic a bsddb44 vcproj in our own pcbuild.sln, which 
basically meant all the BSDDB source code got built in the exact same fashion 
as the rest of Python.  I also took this approach with sqlite3 and it's worked 
really well -- there have been no issues with either module since this change.

I've also got bsddb45.vcproj and bsddb46.vcproj projects floating around in one 
of my local branches somewhere.  These mimic the corresponding BSDDB projects, 
with the intent being that when it comes to release time for 2.6 and 3.0, we'd 
make a decision about which one to ship with, and then set the Python _bsddb 
module to use that.  I should probably pick that up again...

Hope this clarifies things...

Regards,

Trent.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3k checkin mails to python-checkins

 Since a few days, checkin notifications for the 3k branch seem to be sent
 to both the python-checkins and the python-3000-checkins lists. Was that a
 deliberate decision or has some bug crept into the SVN hook?

This should be fixed now. The new mailer.py had named some config
options differently from the old one.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Mike Klaas



On 22-Apr-08, at 3:31 AM, M.-A. Lemburg wrote:



I don't think that should be part of the standard library. People
will mistake what it tells them for certain.


+1

I also think that it's better to educate people to add (correct)
encoding information to their text data, rather than give them a
guess mechanism...


That is a fallacious alternative: the programmers that need encoding  
detection are not the same people who are omitting encoding information.


I only have a small opinion on whether charset detection should appear  
in the stdlib, but I am somewhat perplexed by the arguments in this  
thread.  I don't see how inclusion in the stdlib would make people  
more inclined to think that the algorithm is always correct.  In terms  
of the need of this functionality:


Martin wrote:

Can you please explain why that is? Web programs should not normally
have the need to detect the encoding; instead, it should be specified
always - unless you are talking about browsers specifically, which
need to support web pages that specify the encoding incorrectly.


Any program that needs to examine the contents of documents/feeds/ 
whatever on the web needs to deal with incorrectly-specified encodings  
(which, sadly, is rather common).  The set of programs of programs  
that need this functionality is probably the same set that needs  
BeautifulSoup--I think that set is larger than just browsers grin


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread M.-A. Lemburg


[CCing python-dev again]

On 2008-04-22 12:38, Greg Wilson wrote:

I don't think that should be part of the standard library. People
will mistake what it tells them for certain.
[etc]


These are all good arguments, but the fact remains that we can't control 
our inputs (e.g., we're archiving mail messages sent to lists managed by 
DrProject), and some of those inputs *don't* tell us how they're encoded.

Under those circumstances, what would you recommend?


I haven't done much research into this, but in general, I think it's
better to:

 * first try to look at other characteristics of a text
   message, e.g. language, origin, topic, etc.,

 * then narrow down the number of encodings which could apply,

 * rank them to try to avoid ambiguities and

 * then try to see what percentage of the text you can decode using
   each of the encodings in reverse ranking order (ie. more specialized
   encodings should be tested first, latin-1 last).

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 22 2008)

Python/Zope Consulting and Support ...http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/



 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Known doctest bug with unicode?

2008-04-22 Thread Jeroen Ruigrok van der Werven

-On [20080418 18:05], Adam Olsen ([EMAIL PROTECTED]) wrote:
4. Make doctest smarter, so that it can grab the original module's encoding.
5. Wait until 3.0, where this is hopefully fixed by making doctests
use unicode by default?

Getting rid of the u in front of the strings as required made Python 3
indeed run the doctests as they should.

So there's a difference in behaviour between 2.x and 3.0 when it comes to
this part. I guess the better behaviour would be for doctest to honour the
encoding specified in the file/module? If other people agree I can see what
I can to make that work.

-- 
Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Confutatis maledictis, flammis acribus addictis...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

 Can you please explain why that is? Web programs should not normally
 have the need to detect the encoding; instead, it should be specified
 always - unless you are talking about browsers specifically, which
 need to support web pages that specify the encoding incorrectly.
 
 Any program that needs to examine the contents of
 documents/feeds/whatever on the web needs to deal with
 incorrectly-specified encodings

That's not true. Most programs that need to examine the contents of
a web page don't need to guess the encoding. In most such programs,
the encoding can be hard-coded if the declared encoding is not
correct. Most such programs *know* what page they are webscraping,
or else they couldn't extract the information out of it that they
want to get at.

As for feeds - can you give examples of incorrectly encoded one
(I don't ever use feeds, so I honestly don't know whether they
are typically encoded incorrectly. I've heard they are often XML,
in which case I strongly doubt they are incorrectly encoded)

As for whatever - can you give specific examples?

 (which, sadly, is rather common). The
 set of programs of programs that need this functionality is probably the
 same set that needs BeautifulSoup--I think that set is larger than just
 browsers grin

Again, can you give *specific* examples that are not web browsers?
Programs needing BeautifulSoup may still not need encoding guessing,
since they still might be able to hard-code the encoding of the web
page they want to process.

In any case, I'm very skeptical that a general guess encoding
module would do a meaningful thing when applied to incorrectly
encoded HTML pages.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] py3k: print function treats sep=None and end=None in an unintuitive way

2008-04-22 Thread Alessandro Guido

Can anybody please point me why print('a', 'b', sep=None, end=None) should
produce a b\n instead of ab?
I've read http://docs.python.org/dev/3.0/library/functions.html#print, pep-3105 
and some
ml threads but did not find a good reason justifying such a strange behaviour.

Thanks.

-Alessandro Guido
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

2008-04-22 Thread Lloyd Kvam

On Tue, 2008-04-08 at 10:01 -0700, zooko wrote:
 They both agreed that it made perfect sense.  I told one of them  
 about the alternate proposal to define a new database file to
 contain  
 a list of installed packages, and he sighed and rolled his eyes and  
 said So they are planning to reinvent apt!.

When I wear my sysadmin hat, eggs become a nuisance.  They are not
listed in the system packages; if zipped they won't work when the apache
user tries to import them; easy_install can produce unexpected upgrades.
The system package manager (apt or yum) is much preferred.

As a developer, eggs are great.  If a python module is not already
available from my system packagers, easy_install will find it, get it,
and install it.  I waste almost no time with system administration
issues while developing.

Fortunately, distutils includes tools like bdist_rpm so that python
modules can be packaged for easy processing by the system package
manager.  So once I need to switch back to a sysadmin role, I can use
the system tools to install and track packages.

-- 
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/profile/dlslug
http://www.librarything.com/rsshtml/recent/dlslug

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, Apr 09, 2008 at 11:37:07AM +1000, Ben Finney wrote:
 zooko [EMAIL PROTECTED] writes:

  I am skeptical that prorgammers are going to be willing to use a new
  database format. They already have a database -- their filesystem --
  and they already have the tools to control it -- mv, rm, and
  PYTHONPATH. Many of them already hate the existence the
  easy_instlal.pth database file, and I don't see why a new database
  file would be any different.

 Moreover, many of us already have a database of *all* packages on the
 system, not just Python-language ones: the package database of our
 operating system. Adding another, parallel, database which needs
 separate maintenance, and only applies to Python packages, is not a
 step forward in such a situation.

90 % (at least) of the world does not have such database. I, and probably
you, have such a very nice database. I works well, and we can choose to
forget the problems our users are facing. It does not solve them though.

In addition, packaging is system-specific. I recently had to learn some
Debian packaging, because I wanted my Ubuntu and Debian users to be able
to use my projects seamlessly. What about RPMs for RHEL, Fedora,
Mandriva? ... and coronary packages? and MSIs? ... When do I find time to
do development if I have to learn all this packaging.

It would be fantastic to have an abstraction on all these packaging
systems, including, as you point out, their database. I do agree that
reusing the system packaging's database is great, and would be the best
option for system-wide install. However one of the very neat features of
setuptools and eggs is that you don't need administrator access to
install the packages, and that is great in a shared environment, like a
computation cluster. The system's database is thus unfortunately not a
complete solution to the problem.

My 2 cents,

Gaël
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, Apr 09, 2008 at 12:41:32AM -0400, Phillip J. Eby wrote:
 The way to achieve a database for Python would be to provide tools for
 conversion of eggs to rpms and debs,

 Such tools already exist, although the conversion takes place from 
 source distributions rather than egg distributions.

What is the status of the deb backend? The only one I know is unofficial
maintained by Andrew Straw, but my information my be lagging behind.

By the way, if these tools work well, they are priceless!

Cheers,

Gaël
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, April 9, 2008 12:41 am, Phillip J. Eby wrote:
 At 10:49 PM 4/8/2008 -0400, Stanley A. Klein wrote:
On Tue, April 8, 2008 9:37 pm, Ben Finney
[EMAIL PROTECTED] wrote:
  Date: Wed, 09 Apr 2008 11:37:07 +1000
  From: Ben Finney [EMAIL PROTECTED]
  Subject: Re: [Distutils] how to easily consume just the parts of eggs
thatare good for you
  To: [EMAIL PROTECTED]

  zooko [EMAIL PROTECTED] writes:
  eyes and said So they are planning to reinvent apt!.

  That's pretty much my reaction, too.

I have the same reaction.

 I'm curious.  Have any of you actually read PEP 262 in any detail?  I
 have seen precious little discussion so far that doesn't appear to be
 based on significant misunderstandings of either the purpose of
 reviving the PEP, or the mechanics of its proposed implementation.

I haven't read the PEP at all.  I generally don't read PEP's.

I have tried in the past to use easy_install, but have run into problems
because there is no communication between easy_install and the rpm
database, resulting in failure of easy_install to recognize that
dependencies have already been installed using rpms.

 This problem doesn't exist with Python 2.5, unless you're using a
 platform that willfully strips out the installation information that
 Python 2.5 provides for these packages.

IIRC, I have had the problem with Python 2.5 on Fedora 7.  Until recently,
Fedora packagers did strip out the egg information included with Python
packages they packaged.  I left those files in when packaging myself using
bdist_rpm.

However, are you implying that the installation information for Python egg
packages accesses and coordinates with the rpm database?  I found myself
having to go into the setup.py for the relevant package(s) and delete any
statements regarding dependencies.  Otherwise, IIRC, the packaging
couldn't proceed because the Python packaging tool couldn't find the
dependencies that had already been installed as rpms.  After installation,
Python managed to find the relevant files, but the packaging tool
couldn't.

A database focused only on Python packages is highly inappropriate for
Linux systems, violates the Linux standards, and creates problems because
eggs are not coordinated with the operating system package manager.

 The revamp of PEP 262 is aimed at removing .egg files and directories
 from the process, by allowing system packagers to tell Python what
 files belong to them and should not be messed with.  And conversely,
 allowing systems and installation targets *without* package managers
 to safely manage their Python installations.

IMHO, the main system without a package manager is Windows.  A reasonable
way to deal with Windows would be to create a package manager for it that
could be used by Python and anyone else who wanted to use it.  The package
manager could establish a file hierarchy similar to the Unix FHS and
install files appropriately, except for what is needed to satisfy the
Windows OS.  That would probably go a long way to addressing the issues
being discussed here.  This is primarily a Windows problem, not a Python
problem.

   The
way to achieve a database for Python would be to provide tools for
conversion of eggs to rpms and debs,

 Such tools already exist, although the conversion takes place from
 source distributions rather than egg distributions.

You are talking here about bdist_rpm and not about a tool that would take
a Python package distributed as an egg file and convert the egg to an rpm
or a deb.  Unfortunately, some Python packagers are beginning to limit
their focus only to egg distribution.  That creates a problem for users
who have native operating system package management.

to have eggs support conformance to
the LSB and FHS,

 Applying LSB and FHS to the innards of Python packages makes as much
 sense as applying them to the contents of Java .jar files -- i.e.,
 none.  If it's unchanging data that's part of a program or library,
 then it's a program or library, just like static data declared in a C
 program or library.  Whether the file extension is .py, .so, or even
 .png is irrelevant.

The FHS defines places to put specific kinds of files, such as command
scripts (/bin, /usr/bin, /sbin, or /usr/sbin), documentation
(/usr/share/doc/package-name), and configuration files (/etc).  There are
several kinds of files identified and places defined to put them. 
Distribution by eggs has a tendency to scoop up all of those files and put
them in /usr/lib/python/site-packages, regardless of where they belong. 
Having eggs support conformance to FHS would mean recognizing and tagging
the relevant files.  A tool for converting eggs to rpms or debs would
essentially reformat the egg to rpm or deb and put files where they
belong.

Stan Klein

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:

Re: [Python-Dev] Python Leopard DLL Hell

2008-04-22 Thread Brian Cole

I have learned that this is a specific behavior of OS X. I have
submitted a formal bug report to Apple about the problem. It appears
that this is documented by Apple as acceptable:
http://developer.apple.com/documentation/DeveloperTools/Reference/MachOReference/Reference/reference.html#//apple_ref/c/func/dlopen

Whereas, linux will respect the fact you gave it a specific shared library:
http://linux.die.net/man/3/dlopen

If I am provided a workaround by apple I will post a python patch. A
little scary that someone can circumvent my application by just
setting an environment variable.

-Brian Cole

On Tue, Apr 8, 2008 at 7:52 PM, Michael Torrie [EMAIL PROTECTED] wrote:
Brian Cole wrote:
That appears to be working correctly at first glance. The argument to
dlopen is the correct shared library. Unfortunately, either python or
OS X is lying to me here. If I inspect the python process with OS X's
Activity Monitor and look at the Open Files and Ports tab, it shows
that the _foo.so shared library is actually the one located inside
$DYLD_LIBRARY_PATH.

So this problem may not be python's, but I place it here as a first
shot (maybe I'm using the imp module incorrectly).

Sounds like you're going to need to learn how to use dtrace. Then you
can more closely monitor exactly what python and the loader are doing.
dtrace is very complicated (borrowed from Solaris) but extremely
powerful. Worth learning anyway, but sounds like it's probably the
debugging tool you need.

Another thing you can do is check through the python source code and see
how the os x-specific code is handling this type of situation.

--
http://mail.python.org/mailman/listinfo/python-list

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

All my development is done on Linux.  I use Windows very minimally (such
as for tax preparation) and unless forced to do so for specific
circumstances (such as submittal to grants.gov) do not expose Windows to
the Internet.

In the future there may possibly arise a need for us to port some
Linux-developed Python code to Windows, but we will have to cross that
bridge when we get there.

I think you raise an interesting issue:  What is a package manager?  I
have minimal experience installing packages on Windows over the last 5-10
years, but in my experience a Windows package comes as an executable that,
when run, installs itself.  Unless a third-party program monitors the
installation, uninstalling is a nasty chore, as is finding out what files
were installed or where they went.

The rpm and deb package managers (and their yum and other higher level
dependency managers) do a lot of things:

1.  They install packages and maintain databases of what packages were
installed
2.  They manage dependencies
3.  They support clean uninstalling of packages
4.  They can query packages, both installed (via their databases) and not
yet installed (e.g., as rpm or deb files), to determine attributes, such
as files they install, dependencies, and other information defined at
packaging time.
5.  They build packages and (in some cases) can rebuild packages.
6.  They can verify packages for integrity and security purposes.
7.  They can download package files and maintain archives of installed
package files for use as local repositories.

There may be other functions, but the above is a top-of-the-head list.

I can say that I'm not terribly happy with Python packaging that is only
minimally compatible with rpm.  I haven't used Ubuntu all that much. I do
like Ubuntu's packaging and package management, and I do know that there
are  programs, such as alias, that can translate from rpm to deb formats.

I don't think I ever said that Windows is broken in the area of package
management.  My own experience is that the files of Windows programs tend
to be put in a directory devoted to the program, rather than put in
directories with other files having similar purposes.  At one time, the
default location in Windows for word processing files was even in a
sub-directory of the word processing program.  That changed to having a
form of user home directory, but it didn't change much for the program
files themselves.  Unix/Linux puts the files in specific areas of the file
system having functional commonality.  One could almost say that the
Windows default approach to structuring its filesystem avoids or minimizes
the need for package management.

I repeat that this issue mainly arises because Windows doesn't have the
same kind of filesystem structure (and therefore the need for package
management) that other systems have.  I don't know what Windows add/remove
programs function does, but all it might do is to run the executable to
install packages and record the installation (as was previously done by
third party programs) to facilitate clean removal.  Unless you can perform
more of the other functions I listed above, I doubt I would call
add/remove a package manager.


Stan Klein



On Wed, April 9, 2008 1:23 pm, Paul Moore wrote:
 On 09/04/2008, Stanley A. Klein [EMAIL PROTECTED] wrote:
 IMHO, the main system without a package manager is Windows.  A
 reasonable
  way to deal with Windows would be to create a package manager for it
 that
  could be used by Python and anyone else who wanted to use it.  The
 package
  manager could establish a file hierarchy similar to the Unix FHS and
  install files appropriately, except for what is needed to satisfy the
  Windows OS.  That would probably go a long way to addressing the issues
  being discussed here.  This is primarily a Windows problem, not a
 Python
  problem.

 Windows does have a package manager - the add/remove programs
 application. It's extremely limited, and doesn't make any attempt at
 doing dependency resolution, certainly - but that's a separate issue.

 I don't know if you use Windows (as in, develop programs using Python
 on Windows). If you do, then I'd be interested in your views on
 bdist_wininst and bdist_msi installers, and how they fit into the
 setuptools/egg environment, particularly with regard to the package
 manager you are proposing. If you don't use Windows, then I don't see
 how you can usefully comment.

 Personally, as I've said before, I don't have a problem with a
 Python-only package manager, as long as it replaces or integrates
 bdist_wininst and bdist_msi. Having two package managers is far worse
 than having none - and claiming that add/remove programs isn't a
 package manager is just ignoring reality (if it isn't, then why do
 bdist_wininst and bdist_msi exist?).

 Are the Linux users happy with having a Python package manager that
 ignores RPM/apt? Why should Windows users be any happier?

 Sorry - I'm feeling a little grumpy. I've read one too many

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, Apr 09, 2008 at 02:26:31PM -0400, Stanley A. Klein wrote:
 The rpm and deb package managers (and their yum and other higher level
 dependency managers) do a lot of things:

 1.  They install packages and maintain databases of what packages were
 installed
 2.  They manage dependencies
 3.  They support clean uninstalling of packages
 4.  They can query packages, both installed (via their databases) and not
 yet installed (e.g., as rpm or deb files), to determine attributes, such
 as files they install, dependencies, and other information defined at
 packaging time.
 5.  They build packages and (in some cases) can rebuild packages.
 6.  They can verify packages for integrity and security purposes.
 7.  They can download package files and maintain archives of installed
 package files for use as local repositories.

You are collapsing three different functionalities in one:

  * Dealing with repositories and downloading: yum/apt

 * Installing + uninstalling packages, and dealing with system
   consistency (thus checking the dependencies are available): rpm/dpkg

  * Building

For me it is important that the 3 are separated:

  * I may want to download the dependencies of a package to burn to a CD
for a computer that does not have internet access.

  * I may want to send a tarball to a build server that does the building,
but no install (so as not to corrupt my working system).

Cheers,

Gaël
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you


On Wed, April 9, 2008 3:19 pm, Gael Varoquaux wrote:
 On Wed, Apr 09, 2008 at 02:26:31PM -0400, Stanley A. Klein wrote:
 The rpm and deb package managers (and their yum and other higher level
 dependency managers) do a lot of things:

 1.  They install packages and maintain databases of what packages were
 installed
 2.  They manage dependencies
 3.  They support clean uninstalling of packages
 4.  They can query packages, both installed (via their databases) and
 not
 yet installed (e.g., as rpm or deb files), to determine attributes, such
 as files they install, dependencies, and other information defined at
 packaging time.
 5.  They build packages and (in some cases) can rebuild packages.
 6.  They can verify packages for integrity and security purposes.
 7.  They can download package files and maintain archives of installed
 package files for use as local repositories.

 You are collapsing three different functionalities in one:

   * Dealing with repositories and downloading: yum/apt

  * Installing + uninstalling packages, and dealing with system
consistency (thus checking the dependencies are available): rpm/dpkg

   * Building

 For me it is important that the 3 are separated:

   * I may want to download the dependencies of a package to burn to a CD
 for a computer that does not have internet access.

   * I may want to send a tarball to a build server that does the building,
 but no install (so as not to corrupt my working system).

 Cheers,

 Gaël


Gael -

The functionalities are combined in programs but are not necessarily
required to be used all at the same time.

I'm not that familiar with apt, but yum also installs, including
downloading both a package and its dependencies.  Yum also has a query
capability (yum list, yum info).  I think synaptic does the same thing yum
does, and adds a GUI and search capabilities similar to yum info as well.

The build capabilities of rpm were moved to rpmbuild, but the building
remains part of the rpm system.  IIRC, bdist_rpm actually calls rpmbuild
as part of its processing.

Also, IIRC, rpmbuild can build from a tarball if it contains an rpm spec. 
It does not install in the same process.  That is a separate step.  You
would not corrupt your working system by building an rpm from a tarball on
it.

BTW, I would not want to do dependencies with rpm if yum is available. 
Doing dependencies with rpm is very difficult and it is easy to wind up in
dependency hell.  Yum will find the dependencies and install them as
long as they are in repositories that are registered in the yum
configuration.

I looked at man yum and couldn't find an option to download dependencies
to the local repository without installing.  However, if you did install a
package and its dependencies, and if you have selected the option of
retaining the cache and not cleaning it after installation, the rpms
(e.g., for updates) are in /var/cache/yum/updates/packages/.  They can be
copied from there to a CD for a system without internet connectivity. 
Also, both Fedora and Ubuntu have software for building installable live
CD's, although I don't know how they get their package files.


Stan Klein


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, April 9, 2008 3:40 pm, Phillip J. Eby wrote:
 At 11:52 AM 4/9/2008 -0400, Stanley A. Klein wrote:
However, are you implying that the installation information for Python
 egg
packages accesses and coordinates with the rpm database?

 Yes, when the information isn't stripped out.  Try a more recent Fedora.


IMHO, the main system without a package manager is Windows.

 You're ignoring shared environments and development
 environments.  (Not to mention Mac OS.)


I don't understand what you mean by shared environments and development
 environments.  I also don't know much about Mac OS, except that its
underlying Darwin system is a version of BSD (that I assume would follow
the Unix FHS).



   A reasonable
way to deal with Windows would be to create a package manager for it that
could be used by Python and anyone else who wanted to use it.

 Let us know when you've finished it, along with the one for Mac OS.  :)

I have enough trouble with what I'm already doing.  :-)


 Of course this still won't do anything for shared environments and
 development environments.


You are talking here about bdist_rpm and not about a tool that would take
a Python package distributed as an egg file and convert the egg to an rpm
or a deb.  Unfortunately, some Python packagers are beginning to limit
their focus only to egg distribution.  That creates a problem for users
who have native operating system package management.

 That is indeed a problem -- but it's a social one, not a technical
 one.  It's trivial for the publisher of an egg to change their
 command line from setup.py bdist_egg upload to setup.py sdist
 bdist_egg upload, as soon as their users (politely) request that they do
 so.


I agree that we are dealing with a combination of technical and social
issues here.  However, I think it takes a lot more understanding for a
publisher to get everything straight.


  Applying LSB and FHS to the innards of Python packages makes as much
  sense as applying them to the contents of Java .jar files -- i.e.,
  none.  If it's unchanging data that's part of a program or library,
  then it's a program or library, just like static data declared in a C
  program or library.  Whether the file extension is .py, .so, or even
  .png is irrelevant.

The FHS defines places to put specific kinds of files, such as command
scripts (/bin, /usr/bin, /sbin, or /usr/sbin), documentation
(/usr/share/doc/package-name), and configuration files (/etc).  There are
several kinds of files identified and places defined to put them.
Distribution by eggs has a tendency to scoop up all of those files and
 put
them in /usr/lib/python/site-packages, regardless of where they belong.

 Eggs don't include documentation or configuration files, and they
 install scripts in script directories, so I don't get what you're
 talking about here.  For any other data that a package accesses at
 runtime, my earlier comments apply.


But rpms and debs do include these files, plus manual pages, localization
files and a lot of other ancillary stuff.

IIRC, you once mentioned that you have a CENTOS system.  Do an rpm -qa
|sort|less to get an alphabetized list of your installed packages, and
then an rpm -qil on some of the packages, and you will see the range of
different kinds of files in there.


Having eggs support conformance to FHS would mean recognizing and tagging
the relevant files.  A tool for converting eggs to rpms or debs would
essentially reformat the egg to rpm or deb and put files where they
belong.

 No, because such files as you describe don't exist.  If you think
 they do, then either you have misunderstood the nature of the files
 in question, or the developer has incorrectly placed non-runtime
 files in their installation tree.


Most of the Python tarballs I have downloaded have all kinds of files in
their installation trees.  This is a major pain in the you-know-what for
someone trying to use bdist_rpm and get proper, FHS-compliant rpms.  If
eggs are supposed to be strictly runtime files, I think very few
developers actually understand that.  Better yet, how do you define what
should be included in an installation?  It sounds like the egg concept
doesn't include several kinds of files that rpm and deb would include in
an installation.  I think that may be an important issue here.


Stan Klein


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, Apr 09, 2008 at 11:46:19PM +0100, Paul Moore wrote:
 I find this whole discussion hugely confusing, because a lot of people
 are stating opinions about environments which it seems they don't use,
 or know much about. I don't know how to avoid this, but it does make
 it highly unlikely that any practical progress will get made.

I find that something that doesn't help at all the discussion move
forward is that everybody has different usecases in mind, on different
platforms, and is not interested in other people's usecases.

Hopefuly I am wrong,

Cheers,

Gaël
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, Apr 09, 2008 at 11:52:08PM +0100, Paul Moore wrote:
 And I would say that Windows doesn't have a problem. Are any Windows
 users proposing building a package management system for Windows
 (Python-specific or otherwise)? It's a genuine question - is this
 something that Windows users are after, or is it just Linux users
 trying to show Windows users what they are missing?

Well, users don't phrase this that way, because they don't know what
package management (or rather automatic dependency tracking) is, but yes,
they are some usecases. It is nowadays really tedious to deploy Python
applications making uses of many packages on Python.

The scientific community is a domain in which this problem is crucial, as
we are trying to ship desktop applications to non-computer-savy people,
with many dependencies outside the standard library.

Enthought is working on shipping a Python distribution with some sort of
package management for this purpose ( see
http://code.enthought.com/enstaller/ ), and finding it is not an easy
problem.

Cheers,

Gael
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PyArg_ParseTuple and Py_BuildValue question

2008-04-22 Thread Alvin Delagon

Hello fellow pythonistas,

I'm currently writing a simple python SCTP module in C. So far it works
sending and receiving strings from it. The C sctp function sctp_sendmsg()
has been wrapped and my function looks like this:

sendMessage(PyObject *self, PyObject *args)
{
  const char *msg = ;
  if (!PyArg_ParseTuple(args, s, msg))
return NULL;
  snprintf(buffer, 1025, msg);
  ret = sctp_sendmsg(connSock, (void *)buffer, (size_t)strlen(buffer), 0, 0,
0x0300, 0, 0, 0, 0);
  return Py_BuildValue(b, );
}

I'm going to construct an SS7 packet in python using struct.pack(). Here's
the question, how am I going to pass the packet I wrote in python to my
module and back? I already asked this question in comp.lang.python but so
far no responses yet. I hope anyone can point me to the right direction.
Thanks in advance.

---
Alvin Delagon
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

On Wed, 2008-04-09 at 18:17 -0500, Dave Peterson wrote:
   I think I can sum up any further points by simply asking: Should it
 be safe to assume I can distribute my application via eggs /
 easy_install just because it is written in Python?


I think that based on this discussion the bottom line answer to this
question is No.


Stan Klein







On Wed, 2008-04-09 at 18:17 -0500, Dave Peterson wrote:

 I think I can sum up any further points by simply asking: Should it be safe to assume I can distribute my application via eggs / easy_install just because it is written in Python?



I think that based on this discussion the bottom line answer to this question is No.


Stan Klein


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] SetType=set in types module ?

2008-04-22 Thread iks hefem

Hi,

the SetType is not available in the types module, so wouldn't it be
needed here ? (in 2.6 by example)

I guess the change is really simple and would be backward compatible :
adding SetType = set
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Global Python Sprint Weekends: May 10th-11th and June 21st-22nd.

2008-04-22 Thread Thomas Lee

Anyone in Melbourne, Australia keen for the first sprint? I'm not sure 
if I'll be available, but if I can it'd be great to work with some 
others. Failing that, it's red bull and pizza in my lounge room :)


I've been working on some neat code for an AST optimizer. If I'm free 
that weekend, I'll probably continue my work on that.


Cheers,
T

Trent Nelson wrote:

Following on from the success of previous sprint/bugfix weekends and
sprinting efforts at PyCon 2008, I'd like to propose the next two
Global Python Sprint Weekends take place on the following dates:

* May 10th-11th (four days after 2.6a3 and 3.0a5 are released)
* June 21st-22nd (~week before 2.6b2 and 3.0b2 are released)

It seems there are a few of the Python User Groups keen on meeting
up in person and sprinting collaboratively, akin to PyCon, which I
highly recommend.  I'd like to nominate Saturday across the board
as the day for PUGs to meet up in person, with Sunday geared more
towards an online collaboration day via IRC, where we can take care
of all the little things that got in our way of coding on Saturday
(like finalising/preparing/reviewing patches, updating tracker and
documentation, writing tests ;-).

For User Groups that are planning on meeting up to collaborate,
please reply to this thread on python-dev@python.org and let every-
one know your intentions!

As is commonly the case, #python-dev on irc.freenode.net will be
the place to be over the course of each sprint weekend; a large
proportion of Python developers with commit access will be present,
increasing the amount of eyes available to review and apply patches.

For those that have an idea on areas they'd like to sprint on and
want to look for other developers to rope in (or just to communicate
plans in advance), please also feel free to jump on this thread via
python-dev@ and indicate your intentions.

For those that haven't the foggiest on what to work on, but would
like to contribute, the bugs tracker at http://bugs.python.org is
the best place to start.  Register an account and start searching
for issues that you'd be able to lend a hand with.

All contributors that submit code patches or documentation updates
will typically get listed in Misc/ACKS.txt; come September when the
final release of 2.6 and 3.0 come about, you'll be able to point at
the tarball or .msi and exclaim loudly ``I helped build that!'',
and actually back it up with hard evidence ;-)

Bring on the pizza and Red Bull!

Trent.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/krumms%40gmail.com
  


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] socket recv on win32 can be extremly delayed (python bug?)

2008-04-22 Thread Robert Hölzl


hello,

I tried to implement a simple python XMLRPC service on a win32 
environment (client/server code inserted below).
The profiler of the client told me, that a simple function call needs 
about 200ms (even if I run it in a loop, the time needed per call stays 
the same).


After analysing the problem with etherreal I found out, that the XMLRPC 
request is transmitted via two TCP packets. One containing the HTTP 
header and one containting the data. But the acknowledge to the first 
TCP packet is delayed by 200ms.


I tried around on the server side and found out that if the server reads 
exactly all bytes transfered in the first TCP frame (via socket.recv()), 
the next socket.recv(), even if reading only one byte, needs about 200 
ms. But if I read one byte less than transfered in the first TCP frame 
and then reading 2 bytes (socket.recv(2)) there is no delay, although 
the same total amount of data was read.


After some googling I found the website 
http://support.microsoft.com/?scid=kb%3Ben-us%3B823764x=12y=15, which 
proposed a workaround (modifing the registryentry for the tcp/ip driver) 
that did work. But modifing the clients registry settings is no option 
for us.


Is there anybody who nows how to solve the problem? Or is it even a 
problem if the python socket implementation?


By the way: I testet Win2000 SP4 and WinXP SP2 with Python 2.3.3 and 
Python 2.5.1 each.


CLIENT:
--
import xmlrpclib
import profile
server = xmlrpclib.ServerProxy(http://server:80;)
profile.run('server.test(1,2)')

SERVER:
--
import SimpleXMLRPCServer
def test(a,b): return a+b
server = SimpleXMLRPCServer.SimpleXMLRPCServer( ('', 80) )
server.register_function(test)
server.serve_forever()

--
Mit freundlichen Grüßen,
Best Regards,

Robert Hölzl
BALTECH AG

Firmensitz: Lilienthalstrasse 27, D-85399 Hallbergmoos
Registergericht: Amtsgericht München, HRB 115215
Vorstand: Jürgen Rösch (Vorsitzender), Martina M. Schuster
Aufsichtsratsvorsitzende: Eva Zeising

begin:vcard
fn;quoted-printable:Robert H=C3=B6lzl
n;quoted-printable:H=C3=B6lzl;Robert
org:Baltech AG;Development
adr:;;Lilienthalstrasse 27;Hallbergmoos;;85399;Germany
email;internet:[EMAIL PROTECTED]
title:Mr.
tel;work:+49 (811) 99 88 1-18
tel;fax:+49 (811) 99 88 1-11
note;quoted-printable:Registergericht: Amtsgericht M=C3=BCnchen, HRB 115215=0D=0A=
	Vorstand: Martina Schuster-R=C3=B6sch=0D=0A=
	Vorstandsvorsitzender: J=C3=BCrgen R=C3=B6sch=0D=0A=
	Aufsichtsratsvorsitzende: Eva Zeising
x-mozilla-html:TRUE
url:http://www.baltech.de
version:2.1
end:vcard

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Security Advisory for unicode repr() bug?

2008-04-22 Thread guzarva16

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Known doctest bug with unicode?

 So there's a difference in behaviour between 2.x and 3.0 when it comes to
 this part. I guess the better behaviour would be for doctest to honour the
 encoding specified in the file/module? If other people agree I can see what
 I can to make that work.

I'm fairly skeptical that you can make that work, whether or not it's a
good idea.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread M.-A. Lemburg


On 2008-04-22 18:33, Bill Janssen wrote:

The 2002 paper A language and character set determination method
based on N-gram statistics by Izumi Suzuki and Yoshiki Mikami and
Ario Ohsato and Yoshihide Chubachi seems to me a pretty good way to go
about this. 


Thanks for the reference.

Looks like the existing research on this just hasn't made it into the
mainstream yet.

Here's their current project: http://www.language-observatory.org/
Looks like they are focusing more on language detection.

Another interesting paper using n-grams:
Language Identification in Web Pages by Bruno Martins and Mário J. Silva
http://xldb.fc.ul.pt/data/Publications_attach/ngram-article.pdf

And one using compression:
Text Categorization Using Compression Models by 
Eibe Frank, Chang Chui, Ian H. Witten
http://portal.acm.org/citation.cfm?id=789742


They're looking at LSEs, language-script-encoding
triples; a script is a way of using a particular character set to
write in a particular language.

Their system has these requirements:

R1. the response must be either correct answer or unable to detect
where unable to detect includes other than registered [the
registered set of LSEs];

R2. Applicable to multi-LSE texts;

R3. never accept a wrong answer, even when the program does not have
enough data on an LSE; and

R4. applicable to any LSE text.

So, no wrong answers.

The biggest disadvantage would seem to be that the registration data
for a particular LSE is kind of bulky; on the order of 10,000
shift-codons, each of three bytes, about 30K uncompressed.

http://portal.acm.org/ft_gateway.cfm?id=772759type=pdf


For a server based application that doesn't sound too large.

Unless you're using a very broad scope, I don't think that
you'd need more than a few hundred LSEs for a typical
application - nothing you'd want to put in the Python stdlib,
though.


Bill


IMHO, more research has to be done into this area before a
standard module can be added to the Python's stdlib... and
who knows, perhaps we're lucky and by the time everyone is
using UTF-8 anyway :-)

I walked over to our computational linguistics group and asked.  This
is often combined with language guessing (which uses a similar
approach, but using characters instead of bytes), and apparently can
usually be done with high confidence.  Of course, they're usually
looking at clean texts, not random stuff.  I'll see if I can get
some references and report back -- most of the research on this was
done in the 90's.

Bill


--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 22 2008)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

  When a web browser POSTs data, there is no standard way of communicating
  which encoding it's using.
 
 That's just not true. Web browser should and do use the encoding of the
 web page that originally contained the form.

Since the site that receives the POST doesn't necessarily have access
to the Web page that originally contained the form, that's not really
helpful.  However, POSTs can use the MIME type multipart/form-data
for non-Latin-1 content, and should.  That contains facilities for
indicating the encoding and other things as well.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PyArg_ParseTuple and Py_BuildValue question

2008-04-22 Thread Benjamin Peterson

On Wed, Apr 9, 2008 at 8:23 PM, Alvin Delagon [EMAIL PROTECTED] wrote:

 I'm going to construct an SS7 packet in python using struct.pack(). Here's
 the question, how am I going to pass the packet I wrote in python to my
 module and back? I already asked this question in comp.lang.python but so
 far no responses yet. I hope anyone can point me to the right direction.
 Thanks in advance.

What exactly is your problem?



-- 
Cheers,
Benjamin Peterson
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] SetType=set in types module ?

2008-04-22 Thread Benjamin Peterson

On Wed, Apr 16, 2008 at 8:08 AM, iks hefem [EMAIL PROTECTED] wrote:
 Hi,

  the SetType is not available in the types module, so wouldn't it be
  needed here ? (in 2.6 by example)

Nothing new is currently being added to the types module because we
are trying to decide whether to remove it or not. Please file a bug
report, though, to remind us if we decide to keep it.



-- 
Cheers,
Benjamin Peterson
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

 Unless you're using a very broad scope, I don't think that
 you'd need more than a few hundred LSEs for a typical
 application - nothing you'd want to put in the Python stdlib,
 though.

I tend to agree with this (and I'm generally in favor of putting
everything in the standard library!).  For those of us doing
document-processing applications (Martin, it's not just about Web
browsers), this would be a very useful package to have up on PyPI.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Mike Klaas



On 22-Apr-08, at 2:16 PM, Martin v. Löwis wrote:



Any program that needs to examine the contents of
documents/feeds/whatever on the web needs to deal with
incorrectly-specified encodings


That's not true. Most programs that need to examine the contents of
a web page don't need to guess the encoding. In most such programs,
the encoding can be hard-coded if the declared encoding is not
correct. Most such programs *know* what page they are webscraping,
or else they couldn't extract the information out of it that they
want to get at.


I certainly agree that if the target set of documents is small enough  
it is possible to hand-code the encoding.  There are many  
applications, however, that need to examine the content of an  
arbitrary, or at least non-small set of web documents.  To name a few  
such applications:


 - web search engines
 - translation software
 - document/bookmark management systems
 - other kinds of document analysis (market research, seo, etc.)


As for feeds - can you give examples of incorrectly encoded one
(I don't ever use feeds, so I honestly don't know whether they
are typically encoded incorrectly. I've heard they are often XML,
in which case I strongly doubt they are incorrectly encoded)


I also don't have much experience with feeds.  My statement is based  
on the fact that chardet, the tool that has been cited most in this  
thread, was written specifically for use with the author's feed  
parsing package.



As for whatever - can you give specific examples?


Not that I can substantiate.  Documents  feeds covers a lot of what  
is on the web--I was only trying to make the point that on the web,  
whenever an encoding can be specified, it will be specified  
incorrectly for a significant chunk of exemplars.



(which, sadly, is rather common). The
set of programs of programs that need this functionality is  
probably the
same set that needs BeautifulSoup--I think that set is larger than  
just

browsers grin


Again, can you give *specific* examples that are not web browsers?
Programs needing BeautifulSoup may still not need encoding guessing,
since they still might be able to hard-code the encoding of the web
page they want to process.


Indeed, if it is only one site it is pretty easy to work around.  My  
main use of python is processing and analyzing hundreds of millions of  
web documents, so it is pretty easy to see applications (which I have  
listed above).  I think that libraries like Mark Pilgrim's FeedParser  
and BeautifulSoup are possible consumers of guessing as well.



In any case, I'm very skeptical that a general guess encoding
module would do a meaningful thing when applied to incorrectly
encoded HTML pages.


Well, it does.  I wish I could easily provide data on how often it is  
necessary over the whole web, but that would be somewhat difficult to  
generate.  I can say that it is much more important to be able to  
parse all the different kinds of encoding _specification_ on the web  
(Content-Type/Content-Encoding/meta http-equiv tags, etc), and the  
malformed cases of these.


I can also think of good arguments for excluding encoding detection  
for maintenance reasons: is every case of the algorithm guessing wrong  
a bug that needs to be fixed in the stdlib?  That is an unbounded  
commitment.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] socket recv on win32 can be extremly delayed (python bug?)

2008-04-22 Thread Mike Klaas


Hi,

This is not a python-specific problem. See
http://en.wikipedia.org/wiki/Nagle's_algorithm

-Mike

On 17-Apr-08, at 3:08 AM, Robert Hölzl wrote:

hello,

I tried to implement a simple python XMLRPC service on a win32  
environment (client/server code inserted below).
The profiler of the client told me, that a simple function call  
needs about 200ms (even if I run it in a loop, the time needed per  
call stays the same).


After analysing the problem with etherreal I found out, that the  
XMLRPC request is transmitted via two TCP packets. One containing  
the HTTP header and one containting the data. But the acknowledge to  
the first TCP packet is delayed by 200ms.


I tried around on the server side and found out that if the server  
reads exactly all bytes transfered in the first TCP frame (via  
socket.recv()), the next socket.recv(), even if reading only one  
byte, needs about 200 ms. But if I read one byte less than  
transfered in the first TCP frame and then reading 2 bytes  
(socket.recv(2)) there is no delay, although the same total amount  
of data was read.


After some googling I found the website http://support.microsoft.com/?scid=kb%3Ben-us%3B823764x=12y=15 
, which proposed a workaround (modifing the registryentry for the  
tcp/ip driver) that did work. But modifing the clients registry  
settings is no option for us.


Is there anybody who nows how to solve the problem? Or is it even a  
problem if the python socket implementation?


By the way: I testet Win2000 SP4 and WinXP SP2 with Python 2.3.3 and  
Python 2.5.1 each.


CLIENT:
--
import xmlrpclib
import profile
server = xmlrpclib.ServerProxy(http://server:80;)
profile.run('server.test(1,2)')

SERVER:
--
import SimpleXMLRPCServer
def test(a,b): return a+b
server = SimpleXMLRPCServer.SimpleXMLRPCServer( ('', 80) )
server.register_function(test)
server.serve_forever()

--
Mit freundlichen Grüßen,
Best Regards,

Robert Hölzl
BALTECH AG

Firmensitz: Lilienthalstrasse 27, D-85399 Hallbergmoos
Registergericht: Amtsgericht München, HRB 115215
Vorstand: Jürgen Rösch (Vorsitzender), Martina M. Schuster
Aufsichtsratsvorsitzende: Eva Zeising

robert_hoelzl.vcf___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/mike.klaas%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] python hangs when parsing a bad-formed email

2008-04-22 Thread Amaury Forgeot d'Arc

Hello,

Alberto Casado Martín wrote:
 Hi all,
  First of all, sorry if this isn't the list where I have to post this.
  And sorry for my english.

  As the subject says, I'm having problems with the attached email, when
  I try to get a email object  reading the attached file, the python
  process gets hang and gets all cpu.

  I have debuged my code to find where it happens, and I found that is
  _parsegen method of the FeedParser class. I know that the email format
  is wrong but  I don't know why python hangs.

  following paste the code showing where hangs.
[snip]
  bash-3.00$ python
  Python 2.5.1 (r251:54863, Feb 28 2008, 07:48:25)
  [GCC 3.4.6] on sunos5
  Type help, copyright, credits or license for more information.
   import email
   fp = open('raro.txt')
   mail = email.message_from_file(fp)
  never return

When you think you found a problem with python, please submit an issue
in the python issue tracker:
http://bugs.python.org/

In your case, I suspect some regular expression trying to match all
the empty lines of the message, one character at a time.

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] GSoC student introduction and sandbox commit privileges request

2008-04-22 Thread Rodrigo Bernardo Pimentel

Hi there,

I've just been accepted into this year's Google Summer of Code, to work for
the Python Software Foundation on 2to3. My project is to give 2to3 fixers
the ability to rank how confident they are on each fix, and let users choose
to intervene manually whenever that confidence level is below a certain
threshold. Among other things, this might allow fixers for situations where
the code translation is not always guaranteed to be correct (like % string
formatting, which came up recently in another thread). The full proposal is
at http://isnomore.net/2to3 .

Collin Winter will be my mentor, and I'd like to thank him and Christian
Heimes for all the help they gave me in designing the project. I'd also like
to thank Martin Löwis, for discussing a project with me which ended up not
turning into a proposal, but helped me write the 2to3 one.

Finally, I'd like to request commit privileges to work on a sandbox branch,
during the Summer of Code.

If you have any further questions, please feel free to contact me. I'm
really looking forward to working on this project!

Cheers,


rbp
-- 
Rodrigo Bernardo Pimentel [EMAIL PROTECTED] | GPG: 0x0DB14978
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] python hangs when parsing a bad-formed email

2008-04-22 Thread Terry Reedy


Amaury Forgeot d'Arc [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]


| When you think you found a problem with python, please submit an issue
| in the python issue tracker:
|http://bugs.python.org/

Or post to comp.lang.python / python mailing list / 
gmane.comp.python.general 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

 Yup, but DrProject (the target application) also serves as a relay and 
 archive for email.  We have no control over the agent used for 
 composition, and AFAIK there's no standard way to include encoding 
 information.

Greg,

Internet-compliant email actually has well-specified mechanisms for
including encoding information; see RFCs 2047 and 2231.  There's no
need to guess; you can just look.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GSoC student introduction and sandbox commit privileges request

2008-04-22 Thread Brett Cannon

On Tue, Apr 22, 2008 at 4:35 PM, Rodrigo Bernardo Pimentel
[EMAIL PROTECTED] wrote:
 Hi there,

  I've just been accepted into this year's Google Summer of Code, to work for
  the Python Software Foundation on 2to3. My project is to give 2to3 fixers
  the ability to rank how confident they are on each fix, and let users choose
  to intervene manually whenever that confidence level is below a certain
  threshold. Among other things, this might allow fixers for situations where
  the code translation is not always guaranteed to be correct (like % string
  formatting, which came up recently in another thread). The full proposal is
  at http://isnomore.net/2to3 .

  Collin Winter will be my mentor, and I'd like to thank him and Christian
  Heimes for all the help they gave me in designing the project. I'd also like
  to thank Martin Löwis, for discussing a project with me which ended up not
  turning into a proposal, but helped me write the 2to3 one.

  Finally, I'd like to request commit privileges to work on a sandbox branch,
  during the Summer of Code.


Isn't this a chance for bzr to shine? With lib2to3 in the 3.0 bzr
branch, can't Rodrigo and the other students who don't have some funky
requirement just use bzr?

  If you have any further questions, please feel free to contact me. I'm
  really looking forward to working on this project!

Thanks for contributing!

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GSoC student introduction and sandbox commit privileges request

2008-04-22 Thread Rodrigo Bernardo Pimentel

On Tue, Apr 22 2008 at 09:02:49PM BRT, Brett Cannon [EMAIL PROTECTED] wrote:
 On Tue, Apr 22, 2008 at 4:35 PM, Rodrigo Bernardo Pimentel
 [EMAIL PROTECTED] wrote:

   I've just been accepted into this year's Google Summer of Code
(...)
   Finally, I'd like to request commit privileges to work on a sandbox
  branch, during the Summer of Code.

 Isn't this a chance for bzr to shine? With lib2to3 in the 3.0 bzr
 branch, can't Rodrigo and the other students who don't have some funky
 requirement just use bzr?

FWIW, +1 from me, I'm perfectly comfortable with bzr.

   If you have any further questions, please feel free to contact me. I'm
   really looking forward to working on this project!
 
 Thanks for contributing!

My pleasure :)

Cheers,


rbp
-- 
Rodrigo Bernardo Pimentel [EMAIL PROTECTED] | GPG: 0x0DB14978
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] SetType=set in types module ?

2008-04-22 Thread Christian Heimes

Benjamin Peterson schrieb:
 On Wed, Apr 16, 2008 at 8:08 AM, iks hefem [EMAIL PROTECTED] wrote:
 Hi,

  the SetType is not available in the types module, so wouldn't it be
  needed here ? (in 2.6 by example)
 
 Nothing new is currently being added to the types module because we
 are trying to decide whether to remove it or not. Please file a bug
 report, though, to remind us if we decide to keep it.

Eventually the types module will go away or at least be stripped down in
Python 3.0. New types like the set type weren't added to types
deliberately. Please don't file a bug report.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PyArg_ParseTuple and Py_BuildValue question

2008-04-22 Thread Christian Heimes

Alvin Delagon schrieb:
 I'm going to construct an SS7 packet in python using struct.pack(). Here's
 the question, how am I going to pass the packet I wrote in python to my
 module and back? I already asked this question in comp.lang.python but so
 far no responses yet. I hope anyone can point me to the right direction.
 Thanks in advance.

The Python developer list is meant for the development OF Python, not
WITH Python. Please use the general Python user list to get help.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] GSoC Student Introduction

2008-04-22 Thread Nick Edds

Hello,

My name is Nick Edds. I am going to be working on the 2to3 tool with Collin
Winter as my mentor. More specifically, I will be working on improving the
performance of the 2to3 tool in general, and its use of patterns in
particular.
I would like to request commit privileges to work in a sandbox branch and
although I don't have any familiarity with bzr, I would be comfortable using
it.

Regards,
Nick
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GSoC student introduction and sandbox commit privileges request

2008-04-22 Thread Collin Winter

On Tue, Apr 22, 2008 at 5:18 PM, Rodrigo Bernardo Pimentel
[EMAIL PROTECTED] wrote:
 On Tue, Apr 22 2008 at 09:02:49PM BRT, Brett Cannon [EMAIL PROTECTED] wrote:
   On Tue, Apr 22, 2008 at 4:35 PM, Rodrigo Bernardo Pimentel
   [EMAIL PROTECTED] wrote:


I've just been accepted into this year's Google Summer of Code
  (...)

Finally, I'd like to request commit privileges to work on a sandbox
branch, during the Summer of Code.

   Isn't this a chance for bzr to shine? With lib2to3 in the 3.0 bzr
   branch, can't Rodrigo and the other students who don't have some funky
   requirement just use bzr?

  FWIW, +1 from me, I'm perfectly comfortable with bzr.

Fine by me; I don't care one way or the other.

Collin

 If you have any further questions, please feel free to contact me. I'm
 really looking forward to working on this project!
  
   Thanks for contributing!

  My pleasure :)

  Cheers,


 rbp
  --
  Rodrigo Bernardo Pimentel [EMAIL PROTECTED] | GPG: 0x0DB14978


 ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/collinw%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GSoC Student Introduction

2008-04-22 Thread Benjamin Peterson

On Tue, Apr 22, 2008 at 7:42 PM, Nick Edds [EMAIL PROTECTED] wrote:
 Hello,

 My name is Nick Edds. I am going to be working on the 2to3 tool with Collin
 Winter as my mentor. More specifically, I will be working on improving the
 performance of the 2to3 tool in general, and its use of patterns in
 particular.

  I would like to request commit privileges to work in a sandbox branch and
 although I don't have any familiarity with bzr, I would be comfortable using
 it.

Luckily, Bazaar is really easy.

Thanks for contributing!



-- 
Cheers,
Benjamin Peterson
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BSDDB3

2008-04-22 Thread Jesus Cea


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Trent Nelson wrote:
| I remember those rampant BSDDB crashes on Windows well.
[...]
| basically, the tests weren't cleaning up their environment in
| the right order, so BSDDB was getting passed completely and
| utterly bogus values.

Next week I will (if nothing goes wrong) publish pybsddb 4.6.4. This
release supports distributed transactions and replication, testsuite is
way faster, and rewritten to be able to launch tests from multiple
threads/processes if you wish, setuptools/pypi support, etc.

I think this release would be appropiate to integrate in Python. I think
most demands are solved and new features are interesting (replication,
distributed transactions, do not crash when closing objects in the wrong
order...). Also, I completed the documentation, with the full supported
API, and ported it to Python 2.6 documentation system. The result:

http://www.jcea.es/programacion/pybsddb.htm#bsddb3-4.6.4
http://www.jcea.es/programacion/pybsddb_doc/preview/

I'm very interested in integrating this release in Python 2.6 for the
new features, the full documentation, and to get feedback from Buildbot
and python-dev community. Also, I would like to avoid to integrate
pybsddb late in the python 2.6 release cycle; I hope to be away of my
computer in August! :).

I'm a bit nervous about syncing, because I have the feeling that
python-dev is committing changes to python private branch of pybsddb. I
would rather prefer patches send to me and integrate canonical pybsddb
releases in Python frequently.

Somebody suggested to post patches in the tracker, but I think this is
not going to work. The diff from current python bsddb and the official
version is so huge that nobody could follow it. A more sensible
approach, I think, is to diff current python pybsddb against the
version I used as my root (January?), integrate the changes in current
canonical pybsddb and, then, drop the entire updated package into python.

Then, commits to python pybsddb should be avoided; patches should be
send to me.

I think this is the only way when integrating a project outside python
SVN. Suggestions?.

PS: I can't comment on Win64. It is an alien world to me :).

- --
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[EMAIL PROTECTED] - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
~   _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSA6oeJlgi5GaxT1NAQItswP+KR15vZWbnYZ23WQHoUozVOWvf+ghG2Q8
acVhCwJajzvxOEfozRMZRmQkPUBmWga1zbHjkHt5c196vku7+X0bDc7aO4T2jRHx
00PbPLGnYth972elTVFfSWpZVNkX/9A4EbtTHVCav105nW+u1/Kod/rY5fzgKcTn
SxYkmk4Ax7U=
=98uc
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GSoC Student Introduction

2008-04-22 Thread Brett Cannon

On Tue, Apr 22, 2008 at 7:38 PM, Benjamin Peterson
[EMAIL PROTECTED] wrote:

 On Tue, Apr 22, 2008 at 7:42 PM, Nick Edds [EMAIL PROTECTED] wrote:
   Hello,
  
   My name is Nick Edds. I am going to be working on the 2to3 tool with Collin
   Winter as my mentor. More specifically, I will be working on improving the
   performance of the 2to3 tool in general, and its use of patterns in
   particular.

I would like to request commit privileges to work in a sandbox branch and
   although I don't have any familiarity with bzr, I would be comfortable 
 using
   it.

  Luckily, Bazaar is really easy.


See http://python.org/dev/bazaar/ for info. And if you have any other
issues feel free to ask, Nick.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Stephen J. Turnbull

Bill Janssen writes:

  Internet-compliant email actually has well-specified mechanisms for
  including encoding information; see RFCs 2047 and 2231.  There's no
  need to guess; you can just look.

You must be very special to get only compliant email.

About half my colleagues use RFC 2047 to encode Japanese file names in
MIME attachments (a MUST NOT behavior according to RFC 2047), and a
significant fraction of the rest end up with binary Shift JIS or EUC
or MacRoman in there.

And those are just the most widespread violations I can think of off
the top of my head.

Not to mention that I find this:

=?X-UNKNOWN?Q?Martin_v=2E_L=F6wis?= [EMAIL PROTECTED],

in the header I got from you.  (I'm not ragging on you, I get Martin's
name wrong a significant portion of the time myself. :-( )
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Stephen J. Turnbull

Martin v. Löwis writes:

  In any case, I'm very skeptical that a general guess encoding
  module would do a meaningful thing when applied to incorrectly
  encoded HTML pages.

That depends on whether you can get meaningful information about the
language from the fact that you're looking at the page.  In the
browser context, for one, 99.44% of users are monolingual, so you only
have to distinguish among the encodings for their language.  In this
context a two stage process of determining a category of encoding (eg,
ISO 8859, ISO 2022 7-bit, ISO 2022 8-bit multibyte, UTF-8, etc), and
then picking an encoding from the category according to a
user-specified configuration has served Emacs/MULE users very well for
about 20 years.

It does *not* work in a context where multiple encodings from the same
category are in use (eg, the email folder of a Polish Gastarbeiter in
Berlin).

Nonetheless it is pretty useful for user agents like mail clients, web
browsers, and editors.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Stephen J. Turnbull

Guido van Rossum writes:

  To the contrary, an encoding-guessing module is often needed, and
  guessing can be done with a pretty high success rate. Other Unicode
  libraries (e.g. ICU) contain guessing modules. I suppose the API could
  return two values: the guessed encoding and a confidence indicator.
  Note that the locale settings might figure in the guess.

Not locale settings, but user configuration.  A Bayesian detector
(CodeBayes? hi, Skip!) might be a good way to go for servers, while a
simple language preference might really up the probability for user
agents.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

 Yup, but DrProject (the target application) also serves as a relay and
 archive for email.  We have no control over the agent used for
 composition, and AFAIK there's no standard way to include encoding
 information.

That's not at all the case. MIME defines that in full detail, since
1993.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

 I certainly agree that if the target set of documents is small enough it
 is possible to hand-code the encoding.  There are many applications,
 however, that need to examine the content of an arbitrary, or at least
 non-small set of web documents.  To name a few such applications:
 
  - web search engines
  - translation software

I'll question whether these are many programs. Web search engines
and translation software have many more challenges to master, and
they are fairly special-cased, so I would expect they need to find
their own answer to character set detection, anyway (see Bill Janssen's
answer on machine translation, also).

  - document/bookmark management systems
  - other kinds of document analysis (market research, seo, etc.)

Not sure what specifically you have in mind, however, I expect that
these also have their own challenges. For example, I would expect
that MS-Word documents are frequent. You don't need character set
detection there (Word is all Unicode), but you need an API to look
into the structure of .doc files.

 Not that I can substantiate.  Documents  feeds covers a lot of what is
 on the web--I was only trying to make the point that on the web,
 whenever an encoding can be specified, it will be specified incorrectly
 for a significant chunk of exemplars.

I firmly believe this assumption is false. If the encoding comes out of
software (which it often does), it will be correct most of the time.
It's incorrect only if the content editor has to type it.

 Indeed, if it is only one site it is pretty easy to work around.  My
 main use of python is processing and analyzing hundreds of millions of
 web documents, so it is pretty easy to see applications (which I have
 listed above).

Ok. What advantage would you (or somebody working on a similar project)
gain if chardet was part of the standard library? What if it was not
chardet, but some other algorithm?

 I can also think of good arguments for excluding encoding detection for
 maintenance reasons: is every case of the algorithm guessing wrong a bug
 that needs to be fixed in the stdlib?  That is an unbounded commitment.

Indeed, that's what I meant with my initial remark. People will expect
that it works correctly - both with the consequence of unknowingly
proceeding with the incorrect response, and then complaining when they
find out that it did produce an incorrect answer.

For chardet specifically, my usual standard-library remark applies:
it can't become part of the standard library unless the original
author contributes it, anyway. I would then hope that he or a group
of people would volunteer to maintain it, with the threat of removing
it from the stdlib again if these volunteers go away and too many
problems show up.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Terry Reedy


Martin v. Löwis [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
| I certainly agree that if the target set of documents is small enough it
|
| Ok. What advantage would you (or somebody working on a similar project)
| gain if chardet was part of the standard library? What if it was not
| chardet, but some other algorithm?

It seems to me that since there is not a 'correct' algorithm but only 
competing heuristics, encoding detection modules should be made available 
via PyPI and only be considered for stdlib after a best of breed emerges 
with community support. 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] socket recv on win32 can be extremly delayed (python bug?)