Re: OK to memoize re objects?
On Sep 21, 11:02 am, Nobody wrote: > On Mon, 21 Sep 2009 07:11:36 -0700, Ethan Furman wrote: > > Looking in the code for re in 2.5: > > _MAXCACHE = 100 > > On the other hand, I (a > > re novice, to be sure) have only used between two to five in any one > > program... it'll be a while before I hit _MAXCACHE! > > Do you know how many REs import-ed modules are using? The cache isn't > reserved for __main__. Based on this, I'd say that the best policy would be that if you only have a handful of simple REs that are used only on occasion, it's probably not worth using re.compile--even if they fall out of cache, it shouldn't take a noticeable amount of time to recompile them. If, however, these are either complex REs, or REs that are being used very frequently, say in a loop, might as well save the compiled RE somewhere just to be sure it doesn't have to be recompiled at any point. -- http://mail.python.org/mailman/listinfo/python-list
Re: OK to memoize re objects?
On Mon, 21 Sep 2009 13:33:05 +, kj wrote: > I find the docs are pretty confusing on this point. They first make the > point of noting that pre-compiling regular expressions is more > efficient, and then *immediately* shoot down this point by saying that > one need not worry about pre-compiling in most cases. From the docs: > > ...using compile() and saving the resulting regular expression > object for reuse is more efficient when the expression will be used > several times in a single program. > > Note: The compiled versions of the most recent patterns passed to > re.match(), re.search() or re.compile() are cached, so programs that > use only a few regular expressions at a time needn't worry about > compiling regular expressions. > > Honestly I don't know what to make of this... I would love to see an > example in which re.compile was unequivocally preferable, to really > understand what the docs are saying here... I find it entirely understandable. If you have only a few regexes, then there's no need to pre-compile them yourself, because the re module caches them. Otherwise, don't rely on the cache -- it may help, or it may not, no promises are made. The nature of the cache isn't explained because it is an implementation detail. As it turns out, the current implementation is a single cache in the re module, so every module "import re" shares the one cache. The cache is also completely emptied if it exceeds a certain number of objects, so the cache may be flushed at arbitrary times out of your control. Or it might not. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: OK to memoize re objects?
Nobody wrote: On Mon, 21 Sep 2009 07:11:36 -0700, Ethan Furman wrote: Looking in the code for re in 2.5: _MAXCACHE = 100 On the other hand, I (a re novice, to be sure) have only used between two to five in any one program... it'll be a while before I hit _MAXCACHE! Do you know how many REs import-ed modules are using? The cache isn't reserved for __main__. As a matter of fact, I haven't got a clue. :-) Fortunately, I always use .compile to save my re's. Seems simpler to me that way. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: OK to memoize re objects?
On Mon, 21 Sep 2009 07:11:36 -0700, Ethan Furman wrote: > Looking in the code for re in 2.5: > _MAXCACHE = 100 > On the other hand, I (a > re novice, to be sure) have only used between two to five in any one > program... it'll be a while before I hit _MAXCACHE! Do you know how many REs import-ed modules are using? The cache isn't reserved for __main__. -- http://mail.python.org/mailman/listinfo/python-list
Re: OK to memoize re objects?
kj wrote: In Robert Kern writes: kj wrote: My Python code is filled with assignments of regexp objects to globals variables at the top level; e.g.: _spam_re = re.compile('^(?:ham|eggs)$', re.I) Don't like it. My Perl-pickled brain wishes that re.compile was a memoizing method, so that I could use it anywhere, even inside tight loops, without ever having to worry about the overhead of regexp compilation. Just use re.search(), etc. They already memoize the compiled regex objects. Thanks. I find the docs are pretty confusing on this point. They first make the point of noting that pre-compiling regular expressions is more efficient, and then *immediately* shoot down this point by saying that one need not worry about pre-compiling in most cases. From the docs: ...using compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program. Note: The compiled versions of the most recent patterns passed to re.match(), re.search() or re.compile() are cached, so programs that use only a few regular expressions at a time needn't worry about compiling regular expressions. Honestly I don't know what to make of this... I would love to see an example in which re.compile was unequivocally preferable, to really understand what the docs are saying here... kynn Looking in the code for re in 2.5: . . . _MAXCACHE = 100 . . . if len(_cache) >= _MAXCACHE: _cache.clear() . . . so when you fill up, you lose the entire cache. On the other hand, I (a re novice, to be sure) have only used between two to five in any one program... it'll be a while before I hit _MAXCACHE! ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: OK to memoize re objects?
In Robert Kern writes: >kj wrote: >> >> My Python code is filled with assignments of regexp objects to >> globals variables at the top level; e.g.: >> >> _spam_re = re.compile('^(?:ham|eggs)$', re.I) >> >> Don't like it. My Perl-pickled brain wishes that re.compile was >> a memoizing method, so that I could use it anywhere, even inside >> tight loops, without ever having to worry about the overhead of >> regexp compilation. >Just use re.search(), etc. They already memoize the compiled regex objects. Thanks. I find the docs are pretty confusing on this point. They first make the point of noting that pre-compiling regular expressions is more efficient, and then *immediately* shoot down this point by saying that one need not worry about pre-compiling in most cases. >From the docs: ...using compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program. Note: The compiled versions of the most recent patterns passed to re.match(), re.search() or re.compile() are cached, so programs that use only a few regular expressions at a time needn't worry about compiling regular expressions. Honestly I don't know what to make of this... I would love to see an example in which re.compile was unequivocally preferable, to really understand what the docs are saying here... kynn -- http://mail.python.org/mailman/listinfo/python-list
Re: OK to memoize re objects?
kj wrote: My Python code is filled with assignments of regexp objects to globals variables at the top level; e.g.: _spam_re = re.compile('^(?:ham|eggs)$', re.I) Don't like it. My Perl-pickled brain wishes that re.compile was a memoizing method, so that I could use it anywhere, even inside tight loops, without ever having to worry about the overhead of regexp compilation. Just use re.search(), etc. They already memoize the compiled regex objects. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list