Re: [Python-Dev] very bad network performance

2008-04-21 Thread Ralf Schmitt
On Mon, Apr 21, 2008 at 8:10 PM, Gregory P. Smith <[EMAIL PROTECTED]> wrote:

>
>
>
> The 64K hunch is wrong.  The system limit can be found using
> getsockopt(...SO_RCVBUF...).  It can easily be (and often is) set to many
> megabytes either at a system default level or on a per socket level by the
> user using setsockopt.  When the system default is that large, limiting by
> the system limit would not help the 10mb read case.
>

but it would help in the 100mb read case.


>
> Even smaller allocations like 64K cause problems as mentioned in issue
> 1092502 linking to this twisted http://twistedmatrix.com/trac/ticket/1079bug. 
>  twisted's solution was to make the string object returned by a recv as
> short lived as possible by copying it into a StringIO.  We could do the same
> in _fileobject.read() and readline().
>

this approach look reasonable to me.

- Ralf
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r62342 - python/branches/py3k/Objects/bytesobject.c

2008-04-21 Thread Neal Norwitz
On Tue, Apr 15, 2008 at 2:21 AM, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> Neal Norwitz wrote:
>  > Iteration with the dict methods (e.g., keys -> iterkeys()),
>  > map/zip/filter returning iterator rather than list.
>
>  That's only an optimisation, it's not functionally required. A list behaves
>  like an iterator in most use cases, so it's rather unlikely that Py3 code 
> will
>  break in the backport because of this (and it's very unlikely that static
>  analysis can catch the cases where it breaks). There should be a rule to
>  optimise "list(map(...))" into "map(...)" and "list(x.keys())" into plain
>  "x.keys()" etc., but I don't think there is much more to do here.

It's not just an optimization if a copy won't fit in memory.  I'd like
the solution to be closer to 100% than 95%.

>  > int -> (int, long)
>
>  Is there any case where this might be required? I don't see any reason why
>  back-converted Py3 code should break here. What would "long()" be needed for
>  in working Py3 code that "int()" doesn't provide in Py2?
>
>  Although you might have been referring to "isinstance(x, int)" in Py3?

Yes, sorry, I wasn't explicit.  isinstance is specifically what I was
referring to (or other type checks such as type(x) in (int, long)).

>  > str -> basestring or (str, unicode)
>
>  This is an issue, although I think that Py3 is explicit enough here to make
>  this easy to handle by static renaming (and maybe optimising "isinstance(s,
>  (str, bytes))" into "..., basestring))").
>
>
>
>  > __bool__ -> __nonzero__
>  > exec/execfile
>  > input -> rawinput
>
>  Also valid issues that can be handled through renaming and static syntactic
>  adjustments.
>
>
>
>  > Most things that have a fixer in 2to3 would also require one in 3to2.
>
>  I think the more explicit syntax in Py3 will actually make it easier to
>  back-convert the code statically from 3to2 than to migrate it from 2to3.

Sure, that's the idea.

I haven't seen any action on 3to2 (although I'm very behind on email).
 Stefan, could you try to implement some of these and report back how
it works?

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Martin v. Löwis
> IMO, encoding estimation is something that many web programs will have
> to deal with

Can you please explain why that is? Web programs should not normally
have the need to detect the encoding; instead, it should be specified
always - unless you are talking about browsers specifically, which
need to support web pages that specify the encoding incorrectly.

> so it might as well be built in; I would prefer the option
> to run `text=input.encode('guess')` (or something similar) than relying
> on an external dependency or worse yet using a hand-rolled algorithm.

Ok, let me try differently then. Please feel free to post a patch to
bugs.python.org, and let other people rip it apart.

For example, I don't think it should be a codec, as I can't imagine it
working on streams.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Jim Jewett
David Wolever wrote:

> IMO, encoding estimation is something that
> many web programs will have to deal with,
> so it might as well be built in; I would prefer
> the option to run `text=input.encode('guess')`
> (or something similar) than relying on an external
> dependency or worse yet using a hand-rolled
> algorithm

The (still draft) html5 spec is trying to get error-correction
standardized, so it includes all sort of "if this fails, do X".
Encoding detection will be standardized, so there will be an external
standard that we can reference.

http://dev.w3.org/html5/spec/Overview.html#determining

Note that this portion of the spec is probably not stable yet, as
there was some new analysis on which "wrong" answers provided better
results on real world web pages.

e.g.,

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-March/014127.html

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-March/014190.html

There was also a recent analysis of how many characters it takes to
sniff successfully X% of the time on today's web, though I'm not
finding it at the moment.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread David Wolever

On 21-Apr-08, at 5:31 PM, Martin v. Löwis wrote:

This is useful when you get a hunk of data which _should_ be some
sort of intelligible text from the Big Scary Internet (say, a posted
web form or email message), and you want to do something useful with
it (say, search the content).

I don't think that should be part of the standard library. People
will mistake what it tells them for certain.
As Oleg mentioned, if the method is called something like  
'guess_encoding', I think we could live with clear consciences.


IMO, encoding estimation is something that many web programs will  
have to deal with, so it might as well be built in; I would prefer  
the option to run `text=input.encode('guess')` (or something similar)  
than relying on an external dependency or worse yet using a hand- 
rolled algorithm.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A smarter shutil.copytree ?

2008-04-21 Thread Greg Ewing

Steven Bethard wrote:

I'm not a big fan of the sequence-or-callable argument. Why not just
make it a callable argument, and supply a utility function


Or have two different keyword arguments, one for a sequence
and one for a callable.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on having EOFError inherit from EnvironmentError?

2008-04-21 Thread Greg Ewing

Steven wrote:
It might help if you explain what sort of actual things that the user does 
wrong that you are talking about.


Usually it's some kind of parsing error. Or it might be a
network connection getting closed unexpectedly.

Essentially it's anything due to factors outside the
program's control, but which aren't detected and reported
by any of the built-in IOError or OSError exceptions.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Oleg Broytmann
On Mon, Apr 21, 2008 at 06:37:20PM -0300, Rodrigo Bernardo Pimentel wrote:
> On Mon, Apr 21 2008 at 06:31:06PM BRT, "\"Martin v. L??wis\"" <[EMAIL 
> PROTECTED]> wrote:
> > > This is useful when you get a hunk of data which _should_ be some  
> > > sort of intelligible text from the Big Scary Internet (say, a posted  
> > > web form or email message), and you want to do something useful with  
> > > it (say, search the content).
> > 
> > I don't think that should be part of the standard library. People
> > will mistake what it tells them for certain.
> 
> Maybe call it "charguess", then?

   The famous chardet returns probablity of its guessing:

>>> import chardet
>>> chardet.detect("dabc")
{'confidence': 1.0, 'encoding': 'ascii'}
>>> chardet.detect("тест")
{'confidence': 0.98999, 'encoding': 'KOI8-R'}

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Rodrigo Bernardo Pimentel
On Mon, Apr 21 2008 at 06:31:06PM BRT, "\"Martin v. Löwis\"" <[EMAIL 
PROTECTED]> wrote:
> > This is useful when you get a hunk of data which _should_ be some  
> > sort of intelligible text from the Big Scary Internet (say, a posted  
> > web form or email message), and you want to do something useful with  
> > it (say, search the content).
> 
> I don't think that should be part of the standard library. People
> will mistake what it tells them for certain.

Maybe call it "charguess", then?


rbp
-- 
Rodrigo Bernardo Pimentel <[EMAIL PROTECTED]> | GPG: <0x0DB14978>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Martin v. Löwis
> This is useful when you get a hunk of data which _should_ be some  
> sort of intelligible text from the Big Scary Internet (say, a posted  
> web form or email message), and you want to do something useful with  
> it (say, search the content).

I don't think that should be part of the standard library. People
will mistake what it tells them for certain.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] BSDDB3

2008-04-21 Thread Martin v. Löwis
> | This problem has now been worked-around, by reformulating the test cases
> | so that the situation doesn't occur anymore, but IMO, it should not be
> | possible for an extension module to cause an interpreter abort.
> 
> Agreed. But I can't test Windows builds myself. Could anybody help me in
> this issue?.

Sure. If you post a patch, I'm sure someone will be able to test it on
Windows and report results; when it's committed, also the buildbots will
test it.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Tony Nelson
At 1:14 PM -0400 4/21/08, David Wolever wrote:
>On 21-Apr-08, at 12:44 PM, [EMAIL PROTECTED] wrote:
>>
>> David> Is there some sort of text encoding detection module is the
>> David> standard library?  And, if not, is there any reason not
>> to add
>> David> one?
>> No, there's not.  I suspect the fact that you can't correctly
>> determine the
>> encoding of a chunk of text 100% of the time mitigates against it.
>Sorry, I wasn't very clear what I was asking.
>
>I was thinking about making an educated guess -- just like chardet
>(http://chardet.feedparser.org/).
>
>This is useful when you get a hunk of data which _should_ be some
>sort of intelligible text from the Big Scary Internet (say, a posted
>web form or email message), and you want to do something useful with
>it (say, search the content).

Feedparser.org's chardet can't guess 'latin1', so it should be used as a
last resort, just as the docs say.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] very bad network performance

2008-04-21 Thread Gregory P. Smith
On Mon, Apr 14, 2008 at 4:41 PM, Curt Hagenlocher <[EMAIL PROTECTED]>
wrote:

> On Mon, Apr 14, 2008 at 4:19 PM, Guido van Rossum <[EMAIL PROTECTED]>
> wrote:
> >
> > But why was imaplib apparently specifying 10MB? Did it know there was
> > that much data? Or did it just not want to bother looping over all the
> > data in smaller buffer increments (e.g. 64K, which is probably the max
> > of what most TCP stacks will give you)?
>
> I'm going to guess that the code in question is
>
>size = int(self.mo.group('size'))
>if __debug__:
>if self.debug >= 4:
>self._mesg('read literal size %s' % size)
>data = self.read(size)
>
> It's reading however many bytes are reported by the server as the size.
>
> > If I'm right with my hunch that the TCP stack will probably clamp at
> > 64K, perhaps we should use min(system limit, max(requested size,
> > buffer size))?
>
> I have indeed missed the point of the read buffer size.  This would work.
>

The 64K hunch is wrong.  The system limit can be found using
getsockopt(...SO_RCVBUF...).  It can easily be (and often is) set to many
megabytes either at a system default level or on a per socket level by the
user using setsockopt.  When the system default is that large, limiting by
the system limit would not help the 10mb read case.

Even smaller allocations like 64K cause problems as mentioned in issue
1092502 linking to this twisted
http://twistedmatrix.com/trac/ticket/1079bug.  twisted's solution was
to make the string object returned by a recv as
short lived as possible by copying it into a StringIO.  We could do the same
in _fileobject.read() and readline().

I have attached a patch to issue 2632 that changes socket to use StringIO
for its read buffer and keeps the lifetime of strings returned by recv() as
short as possible when appropriate.  It also refuses to call recv() with a
size smaller than default_bufsize within read() [the source of the
performance problem].  That changes internal recv() call behavior over the
existing code after the issue 1092502 "fix" was applied to use min() rather
than max(), but it is -not- a significant change over the pre-1092502 "fix"
behavior that exists in all released versions of python (it already chose
the larger of two values for recv sizes).

The main difference behind the scenes?  StringIO is using realloc only to
increase its size while recv() was using realloc to shrink the allocation
size and many of these recv()ed shrunken strings were being held onto in a
list before the final value was constructed.

I suggest continuing the discussion within issue 2632 to keep better track
of it.

My socket-strio patch in 2632 needs more testing (it passed socket, http*
and url* tests) and verification that both issue's problems are indeed gone
but they should be.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread skip

Guido> Note that the locale settings might figure in the guess.

Alas, locale settings in a web server have little or nothing to do with the
locale settings in the client submitting the form.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread skip

Michael> The only approach I know of is a heuristic based approach. e.g.

Michael> http://www.voidspace.org.uk/python/articles/guessing_encoding.shtml

Michael> (Which was 'borrowed' from docutils in the first place.)

Yes, I implemented a heuristic approach for the Musi-Cal web server.  I was
able to rely on domain knowledge to guess correctly almost all the time.
The heuristic was that almost all form submissions came from the US and the
rest which didn't came from Western Europe.  Python could never embed such a
narrow-focused heuristic into its core distribution.

Skip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Guido van Rossum
To the contrary, an encoding-guessing module is often needed, and
guessing can be done with a pretty high success rate. Other Unicode
libraries (e.g. ICU) contain guessing modules. I suppose the API could
return two values: the guessed encoding and a confidence indicator.
Note that the locale settings might figure in the guess.

On Mon, Apr 21, 2008 at 10:28 AM, Georg Brandl <[EMAIL PROTECTED]> wrote:
> Christian Heimes schrieb:
>
> > David Wolever schrieb:
>  >> Is there some sort of text encoding detection module is the standard
>  >> library?
>  >> And, if not, is there any reason not to add one?
>  >
>  > You cannot detect the encoding unless it's explicitly defined through a
>  > header (e.g. the UTF BOM). It's technically impossible. The best you can
>  > do is an educated guess.
>
>  Exactly, and in light of that, I'm -1 for such a standard module.
>  We've enough issues with modules implementing (apparently) fully
>  specified standards. :)
>
>  Georg
>
>
>
>  ___
>  Python-Dev mailing list
>  Python-Dev@python.org
>  http://mail.python.org/mailman/listinfo/python-dev
>  Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Georg Brandl
Christian Heimes schrieb:
> David Wolever schrieb:
>> Is there some sort of text encoding detection module is the standard  
>> library?
>> And, if not, is there any reason not to add one?
> 
> You cannot detect the encoding unless it's explicitly defined through a
> header (e.g. the UTF BOM). It's technically impossible. The best you can
> do is an educated guess.

Exactly, and in light of that, I'm -1 for such a standard module.
We've enough issues with modules implementing (apparently) fully
specified standards. :)

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A smarter shutil.copytree ?

2008-04-21 Thread Guido van Rossum
On Sun, Apr 20, 2008 at 5:25 PM, Steven Bethard
<[EMAIL PROTECTED]> wrote:
>
> On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <[EMAIL PROTECTED]> wrote:
>  > I have submitted a patch for review here: http://bugs.python.org/issue2663
>  >
>  >  glob-style patterns or a callable (for complex cases) can be provided
>  >  to filter out files or directories.
>
>  I'm not a big fan of the sequence-or-callable argument. Why not just
>  make it a callable argument, and supply a utility function so that you
>  can write something like::
>
> exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2')
> shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
>
>  ?

Agreed. Type testing is fraught with problems.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unscriptable?

2008-04-21 Thread Georg Brandl
Alexander Belopolsky schrieb:
>> ruby: undefined method `[]=' for 1:Fixnum (NoMethodError)
> 
> I think it will be natural to unify [] error message with
> the other binary ops:
> 
> Now:
 1+""
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
> 
> Proposal:
 1[2]
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unsupported operand type(s) for []: 'int' and 'int'

This is misleading, since it suggests that you can remedy this
by using another type in the subscript, which isn't possible.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread David Wolever
On 21-Apr-08, at 12:44 PM, [EMAIL PROTECTED] wrote:
>
> David> Is there some sort of text encoding detection module is the
> David> standard library?  And, if not, is there any reason not  
> to add
> David> one?
> No, there's not.  I suspect the fact that you can't correctly  
> determine the
> encoding of a chunk of text 100% of the time mitigates against it.
Sorry, I wasn't very clear what I was asking.

I was thinking about making an educated guess -- just like chardet  
(http://chardet.feedparser.org/).

This is useful when you get a hunk of data which _should_ be some  
sort of intelligible text from the Big Scary Internet (say, a posted  
web form or email message), and you want to do something useful with  
it (say, search the content).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Jean-Paul Calderone
On Mon, 21 Apr 2008 17:50:43 +0100, Michael Foord <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] wrote:
>> David> Is there some sort of text encoding detection module is the
>> David> standard library?  And, if not, is there any reason not to add
>> David> one?
>>
>> No, there's not.  I suspect the fact that you can't correctly determine the
>> encoding of a chunk of text 100% of the time mitigates against it.
>>
>
>The only approach I know of is a heuristic based approach. e.g.
>
>http://www.voidspace.org.uk/python/articles/guessing_encoding.shtml
>
>(Which was 'borrowed' from docutils in the first place.)

This isn't the only approach, although you're right that in general you
have to rely on heuristics.  See the charset detection features of ICU:

  http://www.icu-project.org/userguide/charsetDetection.html

I think OSAF's pyicu exposes these APIs:

  http://pyicu.osafoundation.org/

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Christian Heimes
David Wolever schrieb:
> Is there some sort of text encoding detection module is the standard  
> library?
> And, if not, is there any reason not to add one?

You cannot detect the encoding unless it's explicitly defined through a
header (e.g. the UTF BOM). It's technically impossible. The best you can
do is an educated guess.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread Michael Foord
[EMAIL PROTECTED] wrote:
> David> Is there some sort of text encoding detection module is the
> David> standard library?  And, if not, is there any reason not to add
> David> one?
>
> No, there's not.  I suspect the fact that you can't correctly determine the
> encoding of a chunk of text 100% of the time mitigates against it.
>   

The only approach I know of is a heuristic based approach. e.g.

http://www.voidspace.org.uk/python/articles/guessing_encoding.shtml

(Which was 'borrowed' from docutils in the first place.)

Michael Foord
> Skip
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread skip

David> Is there some sort of text encoding detection module is the
David> standard library?  And, if not, is there any reason not to add
David> one?

No, there's not.  I suspect the fact that you can't correctly determine the
encoding of a chunk of text 100% of the time mitigates against it.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unscriptable?

2008-04-21 Thread Alexander Belopolsky
> ruby: undefined method `[]=' for 1:Fixnum (NoMethodError)

I think it will be natural to unify [] error message with
the other binary ops:

Now:
>>> 1+""
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Proposal:
>>> 1[2]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unsupported operand type(s) for []: 'int' and 'int'



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Encoding detection in the standard library?

2008-04-21 Thread David Wolever
Is there some sort of text encoding detection module is the standard  
library?
And, if not, is there any reason not to add one?

After some googling, I've come across this:
http://mail.python.org/pipermail/python-3000/2006-September/003537.html
But I can't find any changes that resulted from that thread.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] BSDDB3

2008-04-21 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Martin v. Löwis wrote:
| I think it would be helpful if you could analyze the crashes that bsddb
| caused on Windows. Just go back a few revisions in the subversion tree
| to reproduce the crashes.

I have no MS Windows machines in my environment :-(

| This problem has now been worked-around, by reformulating the test cases
| so that the situation doesn't occur anymore, but IMO, it should not be
| possible for an extension module to cause an interpreter abort.

Agreed. But I can't test Windows builds myself. Could anybody help me in
this issue?.

- --
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[EMAIL PROTECTED] - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
~   _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSAyr9Zlgi5GaxT1NAQKzHgP/YKQ/9eLlwQfEKxePb5VtapkbfDri5T6C
dulFrvuEEQqsefBopC1K70Tm4XGNmmLpf6U4Ew5ran0dQwjRPfQH+AWo8Cloh9IC
ta8KjxHzIdl6myzitDwH1YKkDbrqdd1M5qs2QDKDjVx5c53ePHQNfLS9oOqtoWcc
XNg6ro8K4os=
=eQC0
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Is Py_WIN_WIDE_FILENAMES still alive?

2008-04-21 Thread ocean
Hello. I noticed when I removes following line in trunk/PC/pyconfig.h

#define Py_WIN_WIDE_FILENAMES

_fileio.c and posixmodule.c (and maybe more) cannot be compiled on Windows.

When Py_WIN_WIDE_FILENAMES is not defined, how should python behave?

  - call posix functions like open(2) 

  - call ANSI Win32 APIs like MoveFileA

Or maybe this macro is not used anymore?

Thank you.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A smarter shutil.copytree ?

2008-04-21 Thread Tarek Ziadé
The pattern matching uses the src_dir to call glob.glob(), which returns
the list of files to be excluded. That's why I added within the
copytree() function.

To make an excluding_patterns work, it could be coded like this::

def excluding_patterns(*patterns):
def _excluding_patterns(filepath):
exclude_files = []
dir_ = os.path.dirname(filepath)
for pattern in patterns:
pattern = os.path.join(dir_, pattern)
exclude_files.extend(glob.glob(pattern))
return path in exclude_files
return _excluding_patterns

But I can see some performance issues, as the glob function will
be called within the loop to test each file or folder::

def copytree(src, dst, exclude):
...
for name in names:
srcname = os.path.join(src, name)
if exclude(srcname):
continue
...
...

Adding it at the beginning of the `copytree`  function would then
be better for performance, but means that the callable has to return
a list of matching files instead of the match result itself::

def excluding_patterns(*patterns):
def _excluding_patterns(path):
exclude_files = []
for pattern in patterns:
pattern = os.path.join(dir_, pattern)
exclude_files.extend(glob.glob(pattern))
return exclude_files

Then in copytree::

def copytree(src, dst, exclude):
...
excluded = exclude(src)
...
for name in names:
srcname = os.path.join(src, name)
if srcname in excluded:
continue
...
...

But this means that people that wants to implement their own
callable will have to provide a function that returns a list
of excluded files, therefore they won't be free to implement
what they want.

We could have two parameters, one for the glob-style sequence
and one for the callable, to be able to use them at the
appropriate places in the function, but I think this would
make the function signature rather heavy::

def copytree(src, dst, exclude_patterns=None, exclude_function=None):
...


That's why I would be in favor of sequence-or-callable argument
even if I admit that it is not the pretiest way to present
an argument.

Regards

Tarek

On Mon, Apr 21, 2008 at 2:38 AM, Isaac Morland <[EMAIL PROTECTED]> wrote:
> On Sun, 20 Apr 2008, Steven Bethard wrote:
>
>
> > On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <[EMAIL PROTECTED]>
> wrote:
> >
> > > I have submitted a patch for review here:
> http://bugs.python.org/issue2663
> > >
> > >  glob-style patterns or a callable (for complex cases) can be provided
> > >  to filter out files or directories.
> > >
> >
> > I'm not a big fan of the sequence-or-callable argument. Why not just
> > make it a callable argument, and supply a utility function so that you
> > can write something like::
> >
> >   exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2')
> >   shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
> >
>
>  Even if a glob pattern filter is considered useful enough to be worth
> special-casing, the glob capability should also be exposed via something
> like your excluding_patterns constructor and additionally as a function that
> can be called by another function intended for use as a callable argument.
>
>  If it is not, then doing something like "files matching these glob patterns
> except for those matching this non-glob-expressible condition and also those
> files matching this second non-glob-expressible condition" becomes painful
> because the glob part essentially needs to be re-implemented.
>
>  Isaac Morland   CSCF Web Guru
>  DC 2554C, x36650WWW Software Specialist



-- 
Tarek Ziadé | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com