Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Martin v. Löwis
Travis Oliphant wrote:
> In the long term, what is the status of plans to re-work the Python 
> Memory manager to free memory that it acquires (or improve the detection 
> of already freed memory locations).

The Python memory manager does reuse memory that has been deallocated
earlier. There are patches "floating around" that makes it return
unused memory to the system (which it currently doesn't).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Martin v. Löwis
Travis Oliphant wrote:
> As verified by removing usage of the Python PyObject_MALLOC function, it 
> was the Python memory manager that was performing poorly.   Even though 
> the array-scalar objects were deleted, the memory manager would not 
> re-use their memory for later object creation. Instead, the memory 
> manager kept allocating new arenas to cover the load (when it should 
> have been able to re-use the old memory that had been freed by the 
> deleted objects--- again, I don't know enough about the memory manager 
> to say why this happened).

One way (I think the only way) this could happen if:
- the objects being allocated are all smaller than 256 bytes
- when allocating new objects, the requested size was different
   from any other size previously deallocated.

So if you first allocate 1,000,000 objects of size 200, and then
release them, and then allocate 1,000,000 objects of size 208,
the memory is not reused.

If the objects are all of same size, or all larger than 256 bytes,
this effect does not occur.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Martin v. Löwis
Travis Oliphant wrote:
> So, I now believe that his code (plus the array scalar extension type) 
> was actually exposing a real bug in the memory manager itself.  In 
> theory, the Python memory manager should have been able to re-use the 
> memory for the array-scalar instances because they are always the same 
> size.  In practice, the memory was apparently not being re-used but 
> instead new blocks were being allocated to handle the load.

That is really very hard to believe. Most people on this list would
probably agree that obmalloc certain *will* reuse deallocated memory
if the next request is for the very same size (number of bytes) that
the previously-release object had.

> His code is quite complicated and it is difficult to replicate the 
> problem.  

That the code is complex would not so much be a problem: we often
analyse complex code here. It is a problem that the code is not
available, and it would be a problem if the problem was not
reproducable even if you had the code (i.e. if the problem would
sometimes occur, but not the next day when you ran it again).

So if you can, please post the code somewhere, and add a bugreport
on sf.net/projects/python.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Fredrik Lundh
Martin v. Löwis wrote:

> One way (I think the only way) this could happen if:
> - the objects being allocated are all smaller than 256 bytes
> - when allocating new objects, the requested size was different
>from any other size previously deallocated.
>
> So if you first allocate 1,000,000 objects of size 200, and then
> release them, and then allocate 1,000,000 objects of size 208,
> the memory is not reused.
>
> If the objects are all of same size, or all larger than 256 bytes,
> this effect does not occur.

but the allocator should be able to move empty pools between size
classes via the freepools list, right ?  or am I missing something ?

maybe what's happening here is more like

So if you first allocate 1,000,000 objects of size 200, and then
release most of them, and then allocate 1,000,000 objects of
size 208, all memory might not be reused.

?





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Robert Kern
Martin v. Löwis wrote:

> That the code is complex would not so much be a problem: we often
> analyse complex code here. It is a problem that the code is not
> available, and it would be a problem if the problem was not
> reproducable even if you had the code (i.e. if the problem would
> sometimes occur, but not the next day when you ran it again).

You can get the version of scipy_core just before the fix that Travis
applied:

  svn co -r 1488 http://svn.scipy.org/svn/scipy_core/trunk

The fix:

  http://projects.scipy.org/scipy/scipy_core/changeset/1489
  http://projects.scipy.org/scipy/scipy_core/changeset/1490

Here's some code that eats up memory with rev1488, but not with the HEAD:

"""
import scipy

a = scipy.arange(10)
for i in xrange(1000):
x = a[5]
"""

-- 
Robert Kern
[EMAIL PROTECTED]

"In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die."
  -- Richard Harter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urlparse brokenness

2005-11-24 Thread Donovan Baarda
On Tue, 2005-11-22 at 23:04 -0600, Paul Jimenez wrote:
> It is my assertion that urlparse is currently broken.  Specifically, I 
> think that urlparse breaks an abstraction boundary with ill effect.
> 
> In writing a mailclient, I wished to allow my users to specify their
> imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which
> worked fine. I then thought that the natural extension to support

FWIW, I have a small addition related to this that I think would be
handy to add to the urlparse module. It is a pair of functions
"netlocparse()" and "netlocunparse()" that is for parsing and unparsing
"user:[EMAIL PROTECTED]:port" netloc's.

Feel free to use/add/ignore it...

http://minkirri.apana.org.au/~abo/projects/osVFS/netlocparse.py

-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Armin Rigo
Hi,

On Thu, Nov 24, 2005 at 01:59:57AM -0800, Robert Kern wrote:
> You can get the version of scipy_core just before the fix that Travis
> applied:

Now we can start debugging :-)

>   http://projects.scipy.org/scipy/scipy_core/changeset/1490

This changeset alone fixes the small example you provided.  However,
compiling python "--without-pymalloc" doesn't fix it, so we can't blame
the memory allocator.  That's all I can say; I am rather clueless as to
how the above patch manages to make any difference even without
pymalloc.


A bientot,

Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Armin Rigo
Hi,

Ok, here is the reason for the leak...

There is in scipy a type called 'int32_arrtype' which inherits from both
another scipy type called 'signedinteger_arrtype', and from 'int'.
Obscure!  This is not 100% officially allowed: you are inheriting from
two C types.  You're living dangerously!

Now in this case it mostly works as expected, because the parent scipy
type has no field at all, so it's mostly like inheriting from both
'object' and 'int' -- which is allowed, or would be if the bases were
written in the opposite order.  But still, something confuses the
fragile logic of typeobject.c.  (I'll leave this bit to scipy people to
debug :-)

The net result is that unless you force your own tp_free as in revision
1490, the type 'int32_arrtype' has tp_free set to int_free(), which is
the normal tp_free of 'int' objects.  This causes all deallocated
int32_arrtype instances to be added to the CPython free list of integers
instead of being freed!


A bientot,

Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 302, PEP 338 and imp.getloader (was Re: a Python interface for the AST (WAS: DRAFT: python-dev...)

2005-11-24 Thread Nick Coghlan
Phillip J. Eby wrote:
> This isn't hard to implement per se; setuptools for example has a 
> 'get_importer' function, and going from importer to loader is simple:

Thanks, I think I'll definitely be able to build something out of that.

> So with the above function you could do something like:
> 
> def get_loader(fullname, path):
> for path_item in path:
> try:
> loader = get_importer(path_item).find_module(fullname)
> if loader is not None:
> return loader
> except ImportError:
> continue
> else:
> return None
> 
> in order to implement the rest.

I think sys.meta_path needs to figure into that before digging through 
sys.path, but otherwise the concept seems basically correct.

[NickC]
>> ** I'm open to suggestions on how to deal with argv[0] and __file__. They
>> should be set to whatever __file__ would be set to by the module 
>> loader, but
>> the Importer Protocol in PEP 302 doesn't seem to expose that 
>> information. The
>> current proposal is a compromise that matches the existing behaviour 
>> of -m
>> (which supports scripts like regrtest.py) while still giving a meaningful
>> value for scripts which are not part of the normal filesystem.

[PJE]
> Ugh.  Those are tricky, no question.  I can think of several simple 
> answers for each, all of which are wrong in some way.  :)

Indeed. I tried turning to "exec co in d" and "execfile(name, d)" for 
guidance, and didn't find any real help there. The only thing they 
automatically add to the supplied dictionary is __builtins__.

The consequence is that any code executed using "exec" or "execfile" sees its 
name as being "__builtin__" because the lookup for '__name__' falls back to 
the builtin namespace.

Further, "__file__" and "__loader__" won't be set at all when using these 
functions, which may be something of a surprise for some modules (to say the 
least).

My current thinking is to actually try to distance the runpy module from 
"exec" and "execfile" significantly more than I'd originally intended. That 
way, I can explicitly focus on making it look like the item was invoked from 
the command line, without worrying about behaviour differences between this 
and the exec statement. It also means runpy can avoid the "implicitly modify 
the current namespace" behaviour that exec and execfile currently have.

The basic function runpy.run_code would look like:

   def run_code(code, init_globals=None,
  mod_name=None, mod_file=None, mod_loader=None):
   """Executes a string of source code or a code object
  Returns the resulting top level namespace dictionary
   """
   # Handle omitted arguments
   if mod_name is None:
   mod_name = ""
   if mod_file is None:
   mod_file = ""
   if mod_loader is None:
   mod_loader = StandardImportLoader(".")
   # Set up the top level namespace dictionary
   run_globals = {}
   if init_globals is not None:
   run_globals.update(init_globals)
   run_globals.update(__name__ = mod_name,
  __file__ = mod_file,
  __loader__ = mod_loader)
   # Run it!
   exec code in run_globals
   return run_globals

Note that run_code always creates a new execution dictionary and returns it, 
in contrast to exec and execfile. This is so that naively doing:

   run_code("print 'Hi there!'", globals())

or:

   run_code("print 'Hi there!'", locals())

doesn't trash __name__, __file__ or __loader__ in the current module (which 
would be bad).

And runpy.run_module would look something like:

   def run_module(mod_name, run_globals=None, run_name=None, as_script=False)
   loader = _get_loader(mod_name) # Handle lack of imp.get_loader
   code = loader.get_code(mod_name)
   filename = _get_filename(loader, mod_name) # Handle lack of protocol
   if run_name is None:
   run_name = mod_name
   if as_script:
   sys.argv[0] = filename
   return run_code(code, run_globals, run_name, filename, loader)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] (no subject)

2005-11-24 Thread Duncan Grisby
Hi,

I posted this to comp.lang.python, but got no response, so I thought I
would consult the wise people here...

I have encountered a problem with the re module. I have a
multi-threaded program that does lots of regular expression searching,
with some relatively complex regular expressions. Occasionally, events
can conspire to mean that the re search takes minutes. That's bad
enough in and of itself, but the real problem is that the re engine
does not release the interpreter lock while it is running. All the
other threads are therefore blocked for the entire time it takes to do
the regular expression search.

Is there any fundamental reason why the re module cannot release the
interpreter lock, for at least some of the time it is running?  The
ideal situation for me would be if it could do most of its work with
the lock released, since the software is running on a multi processor
machine that could productively do other work while the re is being
processed. Failing that, could it at least periodically release the
lock to give other threads a chance to run?

A quick look at the code in _sre.c suggests that for most of the time,
no Python objects are being manipulated, so the interpreter lock could
be released. Has anyone tried to do that?

Thanks,

Duncan.

-- 
 -- Duncan Grisby --
  -- [EMAIL PROTECTED] --
   -- http://www.grisby.org --
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (no subject)

2005-11-24 Thread Donovan Baarda
On Thu, 2005-11-24 at 14:11 +, Duncan Grisby wrote:
> Hi,
> 
> I posted this to comp.lang.python, but got no response, so I thought I
> would consult the wise people here...
> 
> I have encountered a problem with the re module. I have a
> multi-threaded program that does lots of regular expression searching,
> with some relatively complex regular expressions. Occasionally, events
> can conspire to mean that the re search takes minutes. That's bad
> enough in and of itself, but the real problem is that the re engine
> does not release the interpreter lock while it is running. All the
> other threads are therefore blocked for the entire time it takes to do
> the regular expression search.

I don't know if this will help, but in my experience compiling re's
often takes longer than matching them... are you sure that it's the
match and not a compile that is taking a long time? Are you using
pre-compiled re's or are you dynamically generating strings and using
them?

> Is there any fundamental reason why the re module cannot release the
> interpreter lock, for at least some of the time it is running?  The
> ideal situation for me would be if it could do most of its work with
> the lock released, since the software is running on a multi processor
> machine that could productively do other work while the re is being
> processed. Failing that, could it at least periodically release the
> lock to give other threads a chance to run?
> 
> A quick look at the code in _sre.c suggests that for most of the time,
> no Python objects are being manipulated, so the interpreter lock could
> be released. Has anyone tried to do that?

probably not... not many people would have several-minutes-to-match
re's.

I suspect it would be do-able... I suggest you put together a patch and
submit it on SF...


-- 
Donovan Baarda <[EMAIL PROTECTED]>
http://minkirri.apana.org.au/~abo/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: Regular expressions

2005-11-24 Thread Duncan Grisby
On Thursday 24 November, Donovan Baarda wrote:

> I don't know if this will help, but in my experience compiling re's
> often takes longer than matching them... are you sure that it's the
> match and not a compile that is taking a long time? Are you using
> pre-compiled re's or are you dynamically generating strings and using
> them?

It's definitely matching time. The res are all pre-compiled.

[...]
> > A quick look at the code in _sre.c suggests that for most of the time,
> > no Python objects are being manipulated, so the interpreter lock could
> > be released. Has anyone tried to do that?
> 
> probably not... not many people would have several-minutes-to-match
> re's.
> 
> I suspect it would be do-able... I suggest you put together a patch and
> submit it on SF...

The thing that scares me about doing that is that there might be
single-threadedness assumptions in the code that I don't spot. It's the
kind of thing where a patch could appear to work fine, but them
mysteriously fail due to some occasional race condition. Does anyone
know if there is there any global state in _sre that would prevent it
being re-entered, or know for certain that there isn't?

Cheers,

Duncan.

-- 
 -- Duncan Grisby --
  -- [EMAIL PROTECTED] --
   -- http://www.grisby.org --
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (no subject)

2005-11-24 Thread Fredrik Lundh
Donovan Baarda wrote:

> I don't know if this will help, but in my experience compiling re's
> often takes longer than matching them... are you sure that it's the
> match and not a compile that is taking a long time? Are you using
> pre-compiled re's or are you dynamically generating strings and using
> them?

patterns with nested repeats can behave badly on certain types of non-
matching input. (each repeat is basically a loop, and if you nest enough
loops things can quickly get out of hand, even if the inner loop doesn't
do much...)

 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Tim Peters
[Martin v. Löwis]
> One way (I think the only way) this could happen if:
> - the objects being allocated are all smaller than 256 bytes
> - when allocating new objects, the requested size was different
>   from any other size previously deallocated.
>
> So if you first allocate 1,000,000 objects of size 200, and then
> release them, and then allocate 1,000,000 objects of size 208,
> the memory is not reused.

Nope, the memory is reused in this case.  While each obmalloc "pool" P
is devoted to a fixed size so long as at least one object from P is in
use, when all objects allocated from P have been released, P can be
reassigned to any other size class.

The comments in obmalloc.c are quite accurate.  This particular case
is talked about here:

"""
empty == all the pool's blocks are currently available for allocation
On transition to empty, a pool is unlinked from its usedpools[] list,
and linked to the front of the (file static) singly-linked freepools list,
via its nextpool member.  The prevpool member has no meaning in this
case.  Empty pools have no inherent size class:  the next time a
malloc finds an empty list in usedpools[], it takes the first pool off of
freepools.  If the size class needed happens to be the same as the
size class the pool last had, some pool initialization can be skipped.
"""

Now if you end up allocating a million pools all devoted to 72-byte
objects, and leave one object from each pool in use, then all those
pools remain devoted to 72-byte objects.  Wholly empty pools can be
(and do get) reused freely, though.

> If the objects are all of same size, or all larger than 256 bytes,
> this effect does not occur.

If they're larger than 256 bytes, then you see the reuse behavior of
the system malloc/free, about which virtually nothing can be said
that's true across all Python platforms.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Travis E. Oliphant
Armin Rigo wrote:
> Hi,
> 
> Ok, here is the reason for the leak...
> 
> There is in scipy a type called 'int32_arrtype' which inherits from both
> another scipy type called 'signedinteger_arrtype', and from 'int'.
> Obscure!  This is not 100% officially allowed: you are inheriting from
> two C types.  You're living dangerously!

This is allowed because the two types have compatible binaries (in fact 
the signed integer type is only the PyObject_HEAD)

> 
> Now in this case it mostly works as expected, because the parent scipy
> type has no field at all, so it's mostly like inheriting from both
> 'object' and 'int' -- which is allowed, or would be if the bases were
> written in the opposite order.  But still, something confuses the
> fragile logic of typeobject.c.  (I'll leave this bit to scipy people to
> debug :-)
> 

This is definitely possible.  I've tripped up in this logic before.   I 
was beginning to suspect that it might have something to do with what is 
going on.

> The net result is that unless you force your own tp_free as in revision
> 1490, the type 'int32_arrtype' has tp_free set to int_free(), which is
> the normal tp_free of 'int' objects.  This causes all deallocated
> int32_arrtype instances to be added to the CPython free list of integers
> instead of being freed!

I'm not sure this is true,  It sounds plausible but I will have to 
check.   Previously the tp_free should have been inherited as 
PyObject_Del for the int32_arrtype.  Unless the typeobject.c code copied 
the tp_free from the wrong base type, this shouldn't have been the case.

Thanks for the pointers.  It sounds like we're getting close.  Perhaps 
the problem is in typeobject.c 


-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Travis E. Oliphant
Armin Rigo wrote:
> Hi,
> 
> Ok, here is the reason for the leak...
> 
> There is in scipy a type called 'int32_arrtype' which inherits from both
> another scipy type called 'signedinteger_arrtype', and from 'int'.
> Obscure!  This is not 100% officially allowed: you are inheriting from
> two C types.  You're living dangerously!
> 
> Now in this case it mostly works as expected, because the parent scipy
> type has no field at all, so it's mostly like inheriting from both
> 'object' and 'int' -- which is allowed, or would be if the bases were
> written in the opposite order.  But still, something confuses the
> fragile logic of typeobject.c.  (I'll leave this bit to scipy people to
> debug :-)
> 
> The net result is that unless you force your own tp_free as in revision
> 1490, the type 'int32_arrtype' has tp_free set to int_free(), which is
> the normal tp_free of 'int' objects.  This causes all deallocated
> int32_arrtype instances to be added to the CPython free list of integers
> instead of being freed!

I can confirm that indeed the int32_arrtype object gets the tp_free slot 
from it's second parent (the python integer type) instead of its first 
parent (the new, empty signed integer type).  I just did a printf after 
PyType_Ready was called to see what the tp_free slot contained, and 
indeed it contained the wrong thing.

I suspect this may also be true of the float64_arrtype as well (which 
inherits from Python's float type).

What I don't understand is why the tp_free slot from the second base 
type got copied over into the tp_free slot of the child.  It should have 
received the tp_free slot of the first parent, right?

I'm still looking for why that would be the case.  I think, though, 
Armin has identified the real culprit of the problem.  I apologize for 
any consternation over the memory manager that may have taken place. 
This problem is obviously an issue of dual inheritance in C.

I understand this is not well tested code, but in principle it should 
work correctly, right?  I'll keep looking to see if I made a mistake in 
believing that the int32_arrtype should have inherited its tp_free slot 
from the first parent and not the second.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Problems with mro for dual inheritance in C [Was: Problems with the Python Memory Manager]

2005-11-24 Thread Travis E. Oliphant
Armin Rigo wrote:
> Hi,
> 
> Ok, here is the reason for the leak...
> 
> There is in scipy a type called 'int32_arrtype' which inherits from both
> another scipy type called 'signedinteger_arrtype', and from 'int'.
> Obscure!  This is not 100% officially allowed: you are inheriting from
> two C types.  You're living dangerously!
> 
> Now in this case it mostly works as expected, because the parent scipy
> type has no field at all, so it's mostly like inheriting from both
> 'object' and 'int' -- which is allowed, or would be if the bases were
> written in the opposite order.  But still, something confuses the
> fragile logic of typeobject.c.  (I'll leave this bit to scipy people to
> debug :-)

Well, I'm stumped on this.  Note the method resolution order for the new 
scalar array type (exactly as I would expect).   Why doesn't the int32 
type inherit its tp_free from the early types first?

a = zeros(10)
type(a[0]).mro()

[, , ,
, , , ]




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] registering unicode codecs

2005-11-24 Thread Neal Norwitz
While running regrtest with -R to find reference leaks I found a usage
issue.  When a codec is registered it is stored in the interpreter
state and cannot be removed.  Since it is stored as a list, if you
repeated add the same search function, you will get duplicates in the
list and they can't be removed.  This shows up as a reference leak
(which it really isn't) in test_unicode with this code modified from
test_codecs_errors:

import codecs
def search_function(encoding):
def encode1(input, errors="strict"):
return 42
return (encode1, None, None, None)

codecs.register(search_function)

###

Should the search function be added to the search path if it is
already in there?  I don't understand a benefit of having duplicate
search functions.

Should users have access to the search path (through a
codecs.unregister())?  If so, should it search from the end of the
list to the beginning to remove an item?  That way the last entry
would be removed rather than the first.

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] registering unicode codecs

2005-11-24 Thread M.-A. Lemburg
Neal Norwitz wrote:
> While running regrtest with -R to find reference leaks I found a usage
> issue.  When a codec is registered it is stored in the interpreter
> state and cannot be removed.  Since it is stored as a list, if you
> repeated add the same search function, you will get duplicates in the
> list and they can't be removed.  This shows up as a reference leak
> (which it really isn't) in test_unicode with this code modified from
> test_codecs_errors:
> 
> import codecs
> def search_function(encoding):
> def encode1(input, errors="strict"):
> return 42
> return (encode1, None, None, None)
> 
> codecs.register(search_function)
> 
> ###
> 
> Should the search function be added to the search path if it is
> already in there?  I don't understand a benefit of having duplicate
> search functions.

Me neither :-) I never expected someone to register a search
function more than once, since there's no point in doing so.

> Should users have access to the search path (through a
> codecs.unregister())?  

Maybe, but why would you want to unregister a search function ?

> If so, should it search from the end of the
> list to the beginning to remove an item?  That way the last entry
> would be removed rather than the first.

I'd suggest to raise an exception in case a user tries
to register a search function twice. Removal should be the
same as doing list.remove(), ie. remove the first (and
only) item in the list of search functions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] registering unicode codecs

2005-11-24 Thread Neal Norwitz
On 11/24/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>
> > Should users have access to the search path (through a
> > codecs.unregister())?
>
> Maybe, but why would you want to unregister a search function ?
>
> > If so, should it search from the end of the
> > list to the beginning to remove an item?  That way the last entry
> > would be removed rather than the first.
>
> I'd suggest to raise an exception in case a user tries
> to register a search function twice.

This should take care of the testing problem.

> Removal should be the
> same as doing list.remove(), ie. remove the first (and
> only) item in the list of search functions.

Do you recommend adding an unregister()?  It's not necessary for this case.

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] registering unicode codecs

2005-11-24 Thread M.-A. Lemburg
Neal Norwitz wrote:
> On 11/24/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> 
>>>Should users have access to the search path (through a
>>>codecs.unregister())?
>>
>>Maybe, but why would you want to unregister a search function ?
>>
>>
>>>If so, should it search from the end of the
>>>list to the beginning to remove an item?  That way the last entry
>>>would be removed rather than the first.
>>
>>I'd suggest to raise an exception in case a user tries
>>to register a search function twice.
> 
> 
> This should take care of the testing problem.
> 
> 
>>Removal should be the
>>same as doing list.remove(), ie. remove the first (and
>>only) item in the list of search functions.
> 
> 
> Do you recommend adding an unregister()?  It's not necessary for this case.

Not really - I don't see much of a need for this; except
maybe if a codec package wants to replace another codec
package.

So far no-one has requested such a feature, so I'd say
we don't add .unregister() until a request for it pops up.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 24 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with mro for dual inheritance in C [Was: Problems with the Python Memory Manager]

2005-11-24 Thread Armin Rigo
Hi Travis,

On Thu, Nov 24, 2005 at 10:17:43AM -0700, Travis E. Oliphant wrote:
> Why doesn't the int32 
> type inherit its tp_free from the early types first?

In your case I suspect that the tp_free is inherited from the tp_base
which is probably 'int'.  I don't see how to "fix" typeobject.c, because
I'm not sure that there is a solution that would do the right thing in
all cases at this level.

I would suggest that you just force the tp_alloc/tp_free that you want
in your static types instead.  That's what occurs for example if you
build a similar inheritance hierarchy with classes defined in Python:
these classes are then 'heap types', so they always get the generic
tp_alloc/tp_free before PyType_Ready() has a chance to see them.


Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] SRE should release the GIL (was: no subject)

2005-11-24 Thread Martin v. Löwis
Duncan Grisby wrote:
> Is there any fundamental reason why the re module cannot release the
> interpreter lock, for at least some of the time it is running?  The
> ideal situation for me would be if it could do most of its work with
> the lock released, since the software is running on a multi processor
> machine that could productively do other work while the re is being
> processed. Failing that, could it at least periodically release the
> lock to give other threads a chance to run?

Formally: no; it access a Python string/Python unicode object all
the time.

Now, since all the shared objects it accesses are immutable, likely
no harm would be done releasing the GIL. I think SRE was originally
also intended to operate on array.array objects; this would have
caused bigger problems. Not sure whether this is still an issue.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] reference leaks

2005-11-24 Thread Neal Norwitz
There are still a few reference leaks I've been able to identify.  I
didn't see an obvious solution to these (well, I saw one obvious
solution which crashed, so obviously I was wrong).

When running regrtest with -R here are the ref leaks reported:

test_codeccallbacks leaked [2, 2, 2, 2] references
test_compiler leaked [176, 242, 202, 248] references
test_generators leaked [254, 254, 254, 254] references
test_tcl leaked [35, 35, 35, 35] references
test_threading_local leaked [36, 36, 28, 36] references
test_urllib2 leaked [-130, 70, -120, 60] references

test_compiler and test_urllib2 are probably not real leaks, but data
being cached.  I'm not really sure if test_tcl is a leak or not. 
Since there's a lot that goes on under the covers.  I didn't see
anything obvious in _tkinter.c.

I have no idea about test_threading_local.

I'm pretty certain test_codeccallbacks and test_generators are leaks. 
Here is code that I gleaned/modified from the tests and causes leaks
in the interpreter:

 test_codeccallbacks

import codecs
def test_callbacks():
  def handler(exc):
l = [u"<%d>" % ord(exc.object[pos]) for pos in xrange(exc.start, exc.end)]
return (u"[%s]" % u"".join(l), exc.end)
  codecs.register_error("test.handler", handler)
  # the {} is necessary to cause the leak, {} can hold data too
  codecs.charmap_decode("abc", "test.handler", {})

test_callbacks()
# leak from PyUnicode_DecodeCharmap() each time test_callbacks() is called

 test_generators

from itertools import tee

def fib():
  def yield_identity_forever(g):
while 1:
  yield g
  def _fib():
for i in yield_identity_forever(head):
  yield i
  head, tail, result = tee(_fib(), 3)
  return result

x = fib()
# x.next() leak from itertool.tee()



The itertools.tee() fix I thought was quite obvious:

+++ Modules/itertoolsmodule.c   (working copy)
@@ -356,7 +356,8 @@
 {
if (tdo->nextlink == NULL)
tdo->nextlink = teedataobject_new(tdo->it);
-   Py_INCREF(tdo->nextlink);
+   else
+   Py_INCREF(tdo->nextlink);
return tdo->nextlink;
 }

However, this creates problems elsewhere.  I think test_heapq crashed
when I added this fix.  The patch also didn't fix all the leaks, just
a bunch of them.  So clearly there's more going on that I'm not
getting.

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Travis Oliphant
Martin v. Löwis wrote:

> Travis Oliphant wrote:
>
>> So, I now believe that his code (plus the array scalar extension 
>> type) was actually exposing a real bug in the memory manager itself.  
>> In theory, the Python memory manager should have been able to re-use 
>> the memory for the array-scalar instances because they are always the 
>> same size.  In practice, the memory was apparently not being re-used 
>> but instead new blocks were being allocated to handle the load.
>
>
> That is really very hard to believe. Most people on this list would
> probably agree that obmalloc certain *will* reuse deallocated memory
> if the next request is for the very same size (number of bytes) that
> the previously-release object had.


Yes, I see that it does.  This became more clear as all the simple tests 
I tried failed to reproduce the problem (and I spent some time looking 
at the code and reading its comments).   I just can't figure out another 
explanation for why the problem went away when I went to using the 
system malloc other than some kind of corner-case in the Python memory 
allocator.

>
>> His code is quite complicated and it is difficult to replicate the 
>> problem.  
>
>
> That the code is complex would not so much be a problem: we often
> analyse complex code here. It is a problem that the code is not
> available, and it would be a problem if the problem was not
> reproducable even if you had the code (i.e. if the problem would
> sometimes occur, but not the next day when you ran it again).
>
The problem was definitely reproducible.  On his machine, and on the two 
machines I tried to run it on.  It without fail rapidly consumed all 
available memory.

> So if you can, please post the code somewhere, and add a bugreport
> on sf.net/projects/python.
>
I'll try to do this at some point. 

I'll have to get permission from him for the actual Python code.  The 
extension modules he used are all publically available (PyMC).  I 
changed the memory allocator in scipy --- which eliminated the problem 
--- so you'd have to check out an older version of the code from SVN to 
see the problem.

Thanks for the tips.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problems with the Python Memory Manager

2005-11-24 Thread Travis Oliphant
Martin v. Löwis wrote:

> Travis Oliphant wrote:
>
>> As verified by removing usage of the Python PyObject_MALLOC function, 
>> it was the Python memory manager that was performing poorly.   Even 
>> though the array-scalar objects were deleted, the memory manager 
>> would not re-use their memory for later object creation. Instead, the 
>> memory manager kept allocating new arenas to cover the load (when it 
>> should have been able to re-use the old memory that had been freed by 
>> the deleted objects--- again, I don't know enough about the memory 
>> manager to say why this happened).
>
>
> One way (I think the only way) this could happen if:
> - the objects being allocated are all smaller than 256 bytes
> - when allocating new objects, the requested size was different
>   from any other size previously deallocated.


In one version of the code I had moved all objects from the Python 
memory manager to the system malloc *except* the array scalars.   The 
problem still remained, so I'm pretty sure these were the problem.
The array scalars are all less than 256 bytes but they are always the 
same number of bytes. 

>
> So if you first allocate 1,000,000 objects of size 200, and then
> release them, and then allocate 1,000,000 objects of size 208,
> the memory is not reused.

That is useful information.   I don't think his code was doing that kind 
of thing, but it definitely provides something to check on.

Previously I was using the standard tp_alloc and tp_free methods (I was 
not setting them but just letting PyType_Ready fill those slots in with 
the default values).When I changed these methods to ones that used 
system free and system malloc the problem went away.  That's why I 
attribute the issue to the Python memory manager.   Of course, it's 
always possible that I was doing something wrong, but I really did try 
to make sure I wasn't making a mistake.  I didn't do anything fancy with 
the Python memory allocator. 

The array scalars all subclass from each other in C, though.  I don't 
see how that could be relevant, but I could be missing something.

-Travis




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Regular expressions

2005-11-24 Thread Dennis Allison

This is probably OT for [Python-dev]

I suspect that your problem is not the GIL but is due to something else.
Rather than dorking with the interpreter's threading, you probably would 
be better off rethinking your problem and finding a better way to 
accomplish your task.

On Thu, 24 Nov 2005, Duncan Grisby wrote:

> On Thursday 24 November, Donovan Baarda wrote:
> 
> > I don't know if this will help, but in my experience compiling re's
> > often takes longer than matching them... are you sure that it's the
> > match and not a compile that is taking a long time? Are you using
> > pre-compiled re's or are you dynamically generating strings and using
> > them?
> 
> It's definitely matching time. The res are all pre-compiled.
> 
> [...]
> > > A quick look at the code in _sre.c suggests that for most of the time,
> > > no Python objects are being manipulated, so the interpreter lock could
> > > be released. Has anyone tried to do that?
> > 
> > probably not... not many people would have several-minutes-to-match
> > re's.
> > 
> > I suspect it would be do-able... I suggest you put together a patch and
> > submit it on SF...
> 
> The thing that scares me about doing that is that there might be
> single-threadedness assumptions in the code that I don't spot. It's the
> kind of thing where a patch could appear to work fine, but them
> mysteriously fail due to some occasional race condition. Does anyone
> know if there is there any global state in _sre that would prevent it
> being re-entered, or know for certain that there isn't?
> 
> Cheers,
> 
> Duncan.
> 
> 

-- 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Bug bz2.BZ2File(...).seek(0,2) + patch

2005-11-24 Thread Victor STINNER
Hi,

I found a bug in bz2 python module. Example:
 import bz2
 bz2.BZ2File("test.bz2","r")
 bz2.seek(0,2)
 assert bz2.tell() != 0

Details and *patch* at:
http://sourceforge.net/tracker/index.php?func=detail&aid=1366000&group_id=5470&atid=105470

Please CC-me for all your answers.

Bye, Victor
-- 
Victor Stinner - student at the UTBM (Belfort, France)
http://www.haypocalc.com/wiki/Accueil


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] (no subject)

2005-11-24 Thread Frank
hi,
test mail list :)



致
礼!


Frank
[EMAIL PROTECTED]
  2005-11-25
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com