Re: Self optimizing iterable

2009-07-17 Thread Carl Banks
On Jul 17, 5:40 pm, Paul Rubin  wrote:
> Zac Burns  writes:
> > An example use case for this would be for something like a large table
> > of regular expressions that would be iterated over trying to match in
> > some string. If some regular expressions are more statistically more
> > successful then the iteration will generally be short.
>
> Generally if you are matching against a bunch of regexps, they will
> tend to match overlapping sets, so you want to control precisely what
> order they are tested in.  Having stuff bubble around based on hit
> counts doesn't sound good.

As a corrollary, if you happen to be matching non-overlapping sets,
then it is a red flag that there could be some other (better) way to
dispatch than a linear regexp search.  For instance, if all your
regexps start with a different keyword, then you should dispatch on
that keyword using a dictionary.  A regexp can be used after
dispatching to extract parameters if necessary.

It's possible to have disjoint regexps without a simple dispatch
criterion, but I'd guess that's less common.


Carl Banks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Self optimizing iterable

2009-07-17 Thread Gabriel Genellina

En Fri, 17 Jul 2009 21:31:44 -0300, Zac Burns  escribió:


I would like a set like object that when iterated maintains a count of
where iteration stopped and then re-orders itself based on that count
so that the iteration stopped on the most bubble to the top.

An example use case for this would be for something like a large table
of regular expressions that would be iterated over trying to match in
some string. If some regular expressions are more statistically more
successful then the iteration will generally be short.


If you require a certain order, it's not a set; better to use a list.
The code below extends the builtin list type to add a "promote" method;  
lst.promote(3) "bubbles" lst[3] up (exchanging lst[3] with lst[2]). It's  
thread safe, and O(1).



import threading

class XList(list):
  #
  @property
  def lock(self):
lock = getattr(self, '_lock', None)
if lock is None: lock = self._lock = threading.Lock()
return lock
  #
  def promote(self, index):
if index<0: index += len(self)
if index>0:
  with self.lock:
self[index], self[index-1] = self[index-1], self[index]


py> a = XList('python')
py> a
['p', 'y', 't', 'h', 'o', 'n']
py> a.promote(3)
py> a
['p', 'y', 'h', 't', 'o', 'n']
py> a.promote(2)
py> a
['p', 'h', 'y', 't', 'o', 'n']
py> a.promote(-1)
py> a
['p', 'h', 'y', 't', 'n', 'o']
py> a.promote(0)
py> a
['p', 'h', 'y', 't', 'n', 'o']

An example, looking for a matching regular expression:

for index,expr in enumerate(exprlist):
  m = expr.match(txt)
  if m:
 dosomethingwith(m)
 exprlist.promote(index)
 break

After many calls, the most frequently matched expressions appear towards  
the front of the list.



   Extend to also include a mapping version


I'm unsure if this point is still applicable. I'm using a list here, not a  
set as in your original proposal.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Self optimizing iterable

2009-07-17 Thread MRAB

Zac Burns wrote:

Greetings,

I would like a set like object that when iterated maintains a count of
where iteration stopped and then re-orders itself based on that count
so that the iteration stopped on the most bubble to the top.

An example use case for this would be for something like a large table
of regular expressions that would be iterated over trying to match in
some string. If some regular expressions are more statistically more
successful then the iteration will generally be short.

Does anyone know of a pre-existing recipe for this or feel like taking
on the challenge?

Bonus points for:
   Best possible BigO notation on switching order and iteration
   Threadsafety
   Extend to also include a mapping version


That's not a set, but a Most Recently Used list.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Self optimizing iterable

2009-07-17 Thread Paul Rubin
Zac Burns  writes:
> An example use case for this would be for something like a large table
> of regular expressions that would be iterated over trying to match in
> some string. If some regular expressions are more statistically more
> successful then the iteration will generally be short.

Generally if you are matching against a bunch of regexps, they will
tend to match overlapping sets, so you want to control precisely what
order they are tested in.  Having stuff bubble around based on hit
counts doesn't sound good.
-- 
http://mail.python.org/mailman/listinfo/python-list


Self optimizing iterable

2009-07-17 Thread Zac Burns
Greetings,

I would like a set like object that when iterated maintains a count of
where iteration stopped and then re-orders itself based on that count
so that the iteration stopped on the most bubble to the top.

An example use case for this would be for something like a large table
of regular expressions that would be iterated over trying to match in
some string. If some regular expressions are more statistically more
successful then the iteration will generally be short.

Does anyone know of a pre-existing recipe for this or feel like taking
on the challenge?

Bonus points for:
   Best possible BigO notation on switching order and iteration
   Threadsafety
   Extend to also include a mapping version



--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer (Digital Overlord)
Zindagi Games
-- 
http://mail.python.org/mailman/listinfo/python-list