Re: How to test if one dict is subset of another?

2007-02-20 Thread Jay Tee
Hi,

thanks!  the code lift from 2.3 to 2.2 worked (thank Guido et al for
BACKWARDS COMPATIBILITY ;-)) ... unfortunately I was in a hurry to get
the release out since a colleague's cluster was croaking under the
load of the old, non-indexed version.  Your solution is nicer looking
than mine, and leads to a less complex implementation.  Rolling it
into the next release is going to have to wait a few weeks, I hope I
can remember it that long.  Thanks!!

JT

On Feb 20, 8:54 pm, Paul Rubin  wrote:
> "Jay Tee" <[EMAIL PROTECTED]> writes:
> > Python 2.2.3 (#1, Oct 26 2003, 11:49:53)
> > ImportError: No module named sets
>
> Hmm, well I think the sets module in 2.3 is written in Python, so you
> could drop it into your application for use in 2.2.  Better would be
> to use the C version from 2.4 if you can.  Or you could fake it with
> dicts.  Sets are really just dicts under the skin.  Instead of
> set([1,2,3]) you'd use {1:True, 2:True, 3:True}.  To intersect
> two of them (untested):
>
> def intersection(d1,d2):
>if len(d2) < len(d1):
>  # swap them so d1 is the smaller one
>  d1,d2 = d2,d1
>r = {}
>for k in d1.iterkeys():
>  if k in d2:
>r[k] = True
>return r
>
> The d1,d2 swap is just an optimization, since access is constant-time
> you want to scan the smaller dictionary to minimize the number of
> lookups.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-20 Thread Paul Rubin
"Jay Tee" <[EMAIL PROTECTED]> writes:
> Python 2.2.3 (#1, Oct 26 2003, 11:49:53)
> ImportError: No module named sets

Hmm, well I think the sets module in 2.3 is written in Python, so you
could drop it into your application for use in 2.2.  Better would be
to use the C version from 2.4 if you can.  Or you could fake it with
dicts.  Sets are really just dicts under the skin.  Instead of
set([1,2,3]) you'd use {1:True, 2:True, 3:True}.  To intersect
two of them (untested):

def intersection(d1,d2):
   if len(d2) < len(d1):
 # swap them so d1 is the smaller one
 d1,d2 = d2,d1
   r = {}
   for k in d1.iterkeys():
 if k in d2:
   r[k] = True
   return r

The d1,d2 swap is just an optimization, since access is constant-time
you want to scan the smaller dictionary to minimize the number of
lookups.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-20 Thread Jay Tee
On Feb 20, 6:44 pm, Paul Rubin  wrote:
> They are sets, not lists.
>
>from sets import Set as set  # use in 2.3 and earlier
>
>l1= set([3, 4, 7, 2])
>l2 = set([2, 3])
>l2 = set([2, 3, 99])
>print l1 & l2

Thanks Paul, but:

bosui:~> python
Python 2.2.3 (#1, Oct 26 2003, 11:49:53)
[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-20)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from sets import Set as set
Traceback (most recent call last):
  File "", line 1, in ?
ImportError: No module named sets

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-20 Thread Paul Rubin
"Jay Tee" <[EMAIL PROTECTED]> writes:
> >>> l1= [3, 4, 7, 2]
> >>> l2 = [2, 3]
> >>> l2 = [2, 3, 99]
> >>> l1 & l2
> Traceback (most recent call last):
>   File "", line 1, in ?
> TypeError: unsupported operand type(s) for &: 'list' and 'list'
> 
> what am I missing?

They are sets, not lists.

   from sets import Set as set  # use in 2.3 and earlier

   l1= set([3, 4, 7, 2])
   l2 = set([2, 3])
   l2 = set([2, 3, 99])
   print l1 & l2
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-20 Thread Jay Tee
Hi

your post had the following construct:

> for j in (index['user']['jeff'] & index['state']['running']):
> do_something()

but

Python 2.3.4 (#1, Oct 11 2006, 06:18:43)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> l1= [3, 4, 7, 2]
>>> l2 = [2, 3]
>>> l2 = [2, 3, 99]
>>> l1 & l2
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: unsupported operand type(s) for &: 'list' and 'list'

what am I missing?

Thanks

JT

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-20 Thread Paul Rubin
"Jay Tee" <[EMAIL PROTECTED]> writes:
> it looks very cool, except that one of the constraints mentioned is
> that the solution has to work properly on pythons 2.2 and 2.3.

That thing doesn't really deeply depend on defaultdict, it's just
convenient.  You can add a few more lines of code in the preprocessing
step to get around it.

> Thanks ... I took the previous poster's suggestion (had also been
> suggested earlier) and built cached versions of the job lists, indexed
> on common queries, while building the master job list.

You can also cache queries with the scheme I suggested, and it lets
you make the cached lists for new queries quickly.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-20 Thread Jay Tee
On Feb 20, 9:12 am, Paul Rubin  wrote:
 >   do_something()
>
> you'd just write:
>
> for j in (index['user']['jeff'] & index['state']['running']):
> do_something()

Hi,

it looks very cool, except that one of the constraints mentioned is
that the solution has to work properly on pythons 2.2 and 2.3.
production systems, certified OS releases, that sort of stuff ...
can't just upgrade a few thousand machines worldwide for my little
program ;-)

Thanks ... I took the previous poster's suggestion (had also been
suggested earlier) and built cached versions of the job lists, indexed
on common queries, while building the master job list.

JT

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-20 Thread Paul Rubin
"Jay Tee" <[EMAIL PROTECTED]> writes:
> for j in jobs:
>if (j.get('user') == 'jeff' and j.get('state')=='running') :
>   do_something()

Sounds like you need some backing data structures, like indexes
in a database, e.g. (untested, uses the cool new defaultdicts of 2.5):

   index = defaultdict(lambda: defaultdict(set))
   for j in jobs:
 for k in j's dict:
index[k][j.get(k)].add(j)

Now for

>if j.subset_attr({'user' : 'jeff', 'state' : 'running'}) :
>   do_something()

you'd just write:

for j in (index['user']['jeff'] & index['state']['running']):
do_something()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-19 Thread Hendrik van Rooyen
 "Jay Tee" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I have some code that does, essentially, the following:
> 
> - gather information on tens of thousands of items (in this case, jobs
> running on a
>  compute cluster)
> - store the information as a list (one per job) of Job items
> (essentially wrapped
>  dictionaries mapping attribute names to values)
> 
> and then does some computations on the data.  One of the things the
> code needs to do, very often, is troll through the list and find jobs
> of a certain class:
> 
> for j in jobs:
>if (j.get('user') == 'jeff' and j.get('state')=='running') :
>   do_something()
> 
> This operation is ultimately the limiting factor in the performance.
> What I would like to try, if it is possible, is instead do something
> like this:
> 

When you are gathering the data and building the lists - why do you not 
simultaneously build a dict of running jobs keyed by the Jeffs?

- Hendrik

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-19 Thread Peter Otten
Jay Tee wrote:

> On Feb 19, 11:07 am, Peter Otten <[EMAIL PROTECTED]> wrote:
> 
>> Use a RDBMS (a database), they tend to be good at this kind of
>> operations.
> 
> yeah, one of the options is metakit ... sqlite and buzhug both looked
> promising but the constraint of pythons 2.2 and 2.3 ruled that out.
> disadvantage of metakit is that it's not pure python, meaning possible
> integration problems.  the system has to be deployed at 200+ sites
> worldwide on a mix of RHEL 3 and 4 based systems, with some Debian
> clusters thrown in, and running real production ...
> 
> hence my desire to find a pure-python solution if at all possible.
> it's looking grim.

The following may speed up things a bit if you have enough memory and your
data is queried more often than changed.

import sys

def generate_id():
for i in xrange(sys.maxint):
yield i
raise ImplementationRestriction

get_id = generate_id().next

class Record(dict):
def __init__(self, *args, **kw):
dict.__init__(self, *args, **kw)
assert not hasattr(self, "_id")
self._id = get_id()
def __setitem__(self, key, value):
raise ImmutableException
def __hash__(self):
return self._id
def __str__(self):
items = self.items()
items.sort()
return ", ".join(["%s: %s" % p for p in items])

records = dict.fromkeys([
Record(user="jack", start=42, state="running"),
Record(user="jack", start=47, state="running"),
Record(user="jack", start=46, state="queued"),
Record(user="jane", start=42, state="running"),
Record(user="jane", start=7),
])

def fill_cache(records):
cache = {}
for record in records:
for p in record.iteritems():
cache.setdefault(p, {})[record] = None
return cache

_cache = fill_cache(records)

def select(*data, **kw):
[data] = data
result = data
for p in kw.iteritems():
try:
c = _cache[p]
except KeyError:
c = {}
if not c:
return {}
result = dict.fromkeys([k for k in result if k in c])
if not result:
break
return result

if __name__ == "__main__":
for filter in [
dict(user="jack"),
dict(state="running"),
dict(user="jack", state="running"),
dict(state="undefined"),
dict(user="jane", state="queued")
]:
print "--- %s ---" % Record(filter)
result = select(records, **filter)
if result:
for record in result:
print record
else:
print "#no matching records"
print

The above code runs with Python 2.3; don't know about 2.2 as I don't have it
on my machine. Of course with sets it would look better...

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-19 Thread Jay Tee
On Feb 19, 11:07 am, Peter Otten <[EMAIL PROTECTED]> wrote:

> Use a RDBMS (a database), they tend to be good at this kind of operations.

yeah, one of the options is metakit ... sqlite and buzhug both looked
promising but the constraint of pythons 2.2 and 2.3 ruled that out.
disadvantage of metakit is that it's not pure python, meaning possible
integration problems.  the system has to be deployed at 200+ sites
worldwide on a mix of RHEL 3 and 4 based systems, with some Debian
clusters thrown in, and running real production ...

hence my desire to find a pure-python solution if at all possible.
it's looking grim.

 JT

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-19 Thread Peter Otten
Jay Tee wrote:

> Hi,
> 
> I have some code that does, essentially, the following:
> 
> - gather information on tens of thousands of items (in this case, jobs
> running on a
>  compute cluster)
> - store the information as a list (one per job) of Job items
> (essentially wrapped
>  dictionaries mapping attribute names to values)
> 
> and then does some computations on the data.  One of the things the
> code needs to do, very often, is troll through the list and find jobs
> of a certain class:
> 
> for j in jobs:
>if (j.get('user') == 'jeff' and j.get('state')=='running') :
>   do_something()
> 
> This operation is ultimately the limiting factor in the performance.
> What I would like to try, if it is possible, is instead do something
> like this:
> 
>if j.subset_attr({'user' : 'jeff', 'state' : 'running'}) :
>   do_something()
> 
> 
> where subset_attr would see if the dict passed in was a subset of the
> underlying attribute dict of j:
> 
>   j1's dict : { 'user' : 'jeff', 'start' : 43, 'queue' : 'qlong',
> 'state' : 'running' }
>   j2's dict : { 'user' : 'jeff', 'start' : 57, 'queue' : 'qlong',
> 'state' : 'queued' }
> 
> so in the second snippet, if j was j1 then subset_attr would return
> true, for j2 the answer would be false (because of the 'state' value
> not being the same).
> 
> Any suggestions?  Constraint : the answer has to work for both python
> 2.2 and 2.3 (and preferably all pythons after that).

Use a RDBMS (a database), they tend to be good at this kind of operations.

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to test if one dict is subset of another?

2007-02-19 Thread Diez B. Roggisch
Jay Tee schrieb:
> Hi,
> 
> I have some code that does, essentially, the following:
> 
> - gather information on tens of thousands of items (in this case, jobs
> running on a
>  compute cluster)
> - store the information as a list (one per job) of Job items
> (essentially wrapped
>  dictionaries mapping attribute names to values)
> 
> and then does some computations on the data.  One of the things the
> code needs to do, very often, is troll through the list and find jobs
> of a certain class:
> 
> for j in jobs:
>if (j.get('user') == 'jeff' and j.get('state')=='running') :
>   do_something()
> 
> This operation is ultimately the limiting factor in the performance.
> What I would like to try, if it is possible, is instead do something
> like this:
> 
>if j.subset_attr({'user' : 'jeff', 'state' : 'running'}) :
>   do_something()
> 
> 
> where subset_attr would see if the dict passed in was a subset of the
> underlying attribute dict of j:

This would still need to run over all items in jobs. No gain.

> 
>   j1's dict : { 'user' : 'jeff', 'start' : 43, 'queue' : 'qlong',
> 'state' : 'running' }
>   j2's dict : { 'user' : 'jeff', 'start' : 57, 'queue' : 'qlong',
> 'state' : 'queued' }
> 
> so in the second snippet, if j was j1 then subset_attr would return
> true, for j2 the answer would be false (because of the 'state' value
> not being the same).

If you're jobs dictionary is immutable regarding the key-set (not from 
it's implementation, but from its usage), the thing you can do to 
enhance performance is to create an index. Take a predicate like

def p(j):
return j.get('user') == 'jeff'

and build a list

jeffs_jobs = [j for j in jobs if p(j)]

Then you can test only over these. Alternatively, if you have quite a 
few of such predicate/action-pairs, try and loop once over all jobs, 
applynig the predicates and actions accordingly.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


How to test if one dict is subset of another?

2007-02-19 Thread Jay Tee
Hi,

I have some code that does, essentially, the following:

- gather information on tens of thousands of items (in this case, jobs
running on a
 compute cluster)
- store the information as a list (one per job) of Job items
(essentially wrapped
 dictionaries mapping attribute names to values)

and then does some computations on the data.  One of the things the
code needs to do, very often, is troll through the list and find jobs
of a certain class:

for j in jobs:
   if (j.get('user') == 'jeff' and j.get('state')=='running') :
  do_something()

This operation is ultimately the limiting factor in the performance.
What I would like to try, if it is possible, is instead do something
like this:

   if j.subset_attr({'user' : 'jeff', 'state' : 'running'}) :
  do_something()


where subset_attr would see if the dict passed in was a subset of the
underlying attribute dict of j:

  j1's dict : { 'user' : 'jeff', 'start' : 43, 'queue' : 'qlong',
'state' : 'running' }
  j2's dict : { 'user' : 'jeff', 'start' : 57, 'queue' : 'qlong',
'state' : 'queued' }

so in the second snippet, if j was j1 then subset_attr would return
true, for j2 the answer would be false (because of the 'state' value
not being the same).

Any suggestions?  Constraint : the answer has to work for both python
2.2 and 2.3 (and preferably all pythons after that).

 JT

-- 
http://mail.python.org/mailman/listinfo/python-list