Re: OK to memoize re objects?

2009-09-22 Thread Hyuga
On Sep 21, 11:02 am, Nobody  wrote:
> On Mon, 21 Sep 2009 07:11:36 -0700, Ethan Furman wrote:
> > Looking in the code for re in 2.5:
> > _MAXCACHE = 100
> > On the other hand, I (a
> > re novice, to be sure) have only used between two to five in any one
> > program... it'll be a while before I hit _MAXCACHE!
>
> Do you know how many REs import-ed modules are using? The cache isn't
> reserved for __main__.

Based on this, I'd say that the best policy would be that if you only
have a handful of simple REs that are used only on occasion, it's
probably not worth using re.compile--even if they fall out of cache,
it shouldn't take a noticeable amount of time to recompile them.

If, however, these are either complex REs, or REs that are being used
very frequently, say in a loop, might as well save the compiled RE
somewhere just to be sure it doesn't have to be recompiled at any
point.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OK to memoize re objects?

2009-09-21 Thread Steven D'Aprano
On Mon, 21 Sep 2009 13:33:05 +, kj wrote:

> I find the docs are pretty confusing on this point.  They first make the
> point of noting that pre-compiling regular expressions is more
> efficient, and then *immediately* shoot down this point by saying that
> one need not worry about pre-compiling in most cases. From the docs:
> 
> ...using compile() and saving the resulting regular expression
> object for reuse is more efficient when the expression will be used
> several times in a single program.
> 
> Note: The compiled versions of the most recent patterns passed to
> re.match(), re.search() or re.compile() are cached, so programs that
> use only a few regular expressions at a time needn't worry about
> compiling regular expressions.
> 
> Honestly I don't know what to make of this...  I would love to see an
> example in which re.compile was unequivocally preferable, to really
> understand what the docs are saying here...

I find it entirely understandable. If you have only a few regexes, then 
there's no need to pre-compile them yourself, because the re module 
caches them. Otherwise, don't rely on the cache -- it may help, or it may 
not, no promises are made.

The nature of the cache isn't explained because it is an implementation 
detail. As it turns out, the current implementation is a single cache in 
the re module, so every module "import re" shares the one cache. The 
cache is also completely emptied if it exceeds a certain number of 
objects, so the cache may be flushed at arbitrary times out of your 
control. Or it might not.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OK to memoize re objects?

2009-09-21 Thread Ethan Furman

Nobody wrote:

On Mon, 21 Sep 2009 07:11:36 -0700, Ethan Furman wrote:



Looking in the code for re in 2.5:




_MAXCACHE = 100




On the other hand, I (a
re novice, to be sure) have only used between two to five in any one
program... it'll be a while before I hit _MAXCACHE!



Do you know how many REs import-ed modules are using? The cache isn't
reserved for __main__.



As a matter of fact, I haven't got a clue.  :-)

Fortunately, I always use .compile to save my re's.  Seems simpler to me 
that way.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: OK to memoize re objects?

2009-09-21 Thread Nobody
On Mon, 21 Sep 2009 07:11:36 -0700, Ethan Furman wrote:

> Looking in the code for re in 2.5:

> _MAXCACHE = 100

> On the other hand, I (a
> re novice, to be sure) have only used between two to five in any one
> program... it'll be a while before I hit _MAXCACHE!

Do you know how many REs import-ed modules are using? The cache isn't
reserved for __main__.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OK to memoize re objects?

2009-09-21 Thread Ethan Furman

kj wrote:

In  Robert Kern 
 writes:



kj wrote:


My Python code is filled with assignments of regexp objects to
globals variables at the top level; e.g.:

_spam_re = re.compile('^(?:ham|eggs)$', re.I)

Don't like it.  My Perl-pickled brain wishes that re.compile was
a memoizing method, so that I could use it anywhere, even inside
tight loops, without ever having to worry about the overhead of
regexp compilation.




Just use re.search(), etc. They already memoize the compiled regex objects.



Thanks.

I find the docs are pretty confusing on this point.  They first
make the point of noting that pre-compiling regular expressions is
more efficient, and then *immediately* shoot down this point by
saying that one need not worry about pre-compiling in most cases.

From the docs:


...using compile() and saving the resulting regular expression
object for reuse is more efficient when the expression will be
used several times in a single program.

Note: The compiled versions of the most recent patterns passed
to re.match(), re.search() or re.compile() are cached, so
programs that use only a few regular expressions at a time
needn't worry about compiling regular expressions.

Honestly I don't know what to make of this...  I would love to see
an example in which re.compile was unequivocally preferable, to
really understand what the docs are saying here...

kynn


Looking in the code for re in 2.5:
.
.
.
_MAXCACHE = 100
.
.
.
if len(_cache) >= _MAXCACHE:
_cache.clear()
.
.
.

so when you fill up, you lose the entire cache.  On the other hand, I (a 
re novice, to be sure) have only used between two to five in any one 
program... it'll be a while before I hit _MAXCACHE!


~Ethan~

--
http://mail.python.org/mailman/listinfo/python-list


Re: OK to memoize re objects?

2009-09-21 Thread kj
In  Robert Kern 
 writes:

>kj wrote:
>> 
>> My Python code is filled with assignments of regexp objects to
>> globals variables at the top level; e.g.:
>> 
>> _spam_re = re.compile('^(?:ham|eggs)$', re.I)
>> 
>> Don't like it.  My Perl-pickled brain wishes that re.compile was
>> a memoizing method, so that I could use it anywhere, even inside
>> tight loops, without ever having to worry about the overhead of
>> regexp compilation.

>Just use re.search(), etc. They already memoize the compiled regex objects.

Thanks.

I find the docs are pretty confusing on this point.  They first
make the point of noting that pre-compiling regular expressions is
more efficient, and then *immediately* shoot down this point by
saying that one need not worry about pre-compiling in most cases.
>From the docs:

...using compile() and saving the resulting regular expression
object for reuse is more efficient when the expression will be
used several times in a single program.

Note: The compiled versions of the most recent patterns passed
to re.match(), re.search() or re.compile() are cached, so
programs that use only a few regular expressions at a time
needn't worry about compiling regular expressions.

Honestly I don't know what to make of this...  I would love to see
an example in which re.compile was unequivocally preferable, to
really understand what the docs are saying here...

kynn
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: OK to memoize re objects?

2009-09-19 Thread Robert Kern

kj wrote:


My Python code is filled with assignments of regexp objects to
globals variables at the top level; e.g.:

_spam_re = re.compile('^(?:ham|eggs)$', re.I)

Don't like it.  My Perl-pickled brain wishes that re.compile was
a memoizing method, so that I could use it anywhere, even inside
tight loops, without ever having to worry about the overhead of
regexp compilation.


Just use re.search(), etc. They already memoize the compiled regex objects.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list