Re: set using alternative hash function?

2009-10-16 Thread OKB (not okblacke)
Austin Bingham wrote:

 To put it in code, I want this:
 
   s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...)
   ...
   x.name = 'foo'
   y.name = 'foo'
   s.add(x)
   s.add(y) # no-op because of uniqueness criteria
   assert len(s) == 1

The part of this that seems ill-defined to me is the line you 
describe as a no-op.  A set keeps uniqueness because all the elements 
really are unique in the sense of equality, so if x == y and x is in s, 
and then you do s.add(y), you cannot know or care whether it overwrote 
the existing element with y, or did a no-op.  The two are equivalent, 
because either way the result is an element equal to x.

But what you seem to want is one where adding an element that's 
equal on the key is guaranteed a no-op and not an overwrite.  This is 
different, because (until you pasted this code) you didn't specify what 
you wanted to happen when you inserted a new item that had an equal key 
to an existing one, but was unequal to it.

So, can't you just use a dict and use the setdefault method instead 
of regular attribute access?  This seems to do exactly what you want.

d = {}
x.name = 'foo'
y.name = 'foo'
d.setdefault(x.name, x)
d.setdefault(y.name, y)

If you really want the syntactic sugar of [], you could write a 
simple class that wraps a dict and maps MyDict[x] to 
dict.setdefault(x.name, x).

-- 
--OKB (not okblacke)
Brendan Barnwell
Do not follow where the path may lead.  Go, instead, where there is
no path, and leave a trail.
--author unknown
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-16 Thread Dave Angel

Austin Bingham wrote:

On Thu, Oct 15, 2009 at 7:49 PM, Ethan Furman et...@stoneleaf.us wrote:
  

Austin Bingham wrote:
I'm feeling really dense about now... What am I missing?



What you're missing is the entire discussion up to this point. I was
looking for a way to use an alternative uniqueness criteria in a set
instance without needing to modify my class.

  

So is that the behavior you're wanting, keeping the first object and
discarding all others?  Or is there something else I'm still missing?



Yes and yes. I want normal set behavior, but I want the set to use
user-provided hash and equality tests, i.e. ones that don't
necessarily call __hash__ and __eq__ on the candidate elements.

Austin

  
Your original post (10/15) never mentioned the __eq__() method, and 
people went off on a tangent discussing hash functions.  But from what 
you've been saying more recently, you want a set-like collection whose 
behavior isn't related to the standard comparison function.


The standard set is intimately tied to ==, in the sense that if two 
objects meet the a==b relationship, then only one will be in the set.  
And it doesn't matter which one, since they're equal.


You want a new collection that works differently.  Great.  It shouldn't 
be that hard to write, but if you try to base it on existing set you'll 
need a second wrapper object.  If you try to base it on dict, you'll 
need a second key object.  You've said you don't want any extra objects 
to be needed, so you'll need custom code.


A couple of dozen lines should do for the basic collection (add, clear, 
remove).  But if you also want difference, symmetric_difference, 
intersection_update, issubset ... it could grow to several dozen.  And 
you'd have to make some interesting decisions, once you have two 
pseudo-sets with possibly different comparison operators.


The only downside I can see is that apparently the built-in set is 
written in C, and if you're implementing this in pure python, it could 
be a lot slower.  On the other hand, in many cases, the comparison 
function-object might be the time-critical piece, in which case you coud 
be close.


DaveA

--
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-16 Thread Ethan Furman

Austin Bingham wrote:

On Thu, Oct 15, 2009 at 7:49 PM, Ethan Furman et...@stoneleaf.us wrote:


Austin Bingham wrote:
I'm feeling really dense about now... What am I missing?



What you're missing is the entire discussion up to this point. I was
looking for a way to use an alternative uniqueness criteria in a set
instance without needing to modify my class.


Really?  Is that what you wanted?  You know, I would never have guessed 
that from


If I understand things correctly, the set class uses hash()
universally to calculate hash values for its elements. Is
there a standard way to have set use a different function?

But hey, perhaps my reading comprehension was not at its best yesterday.

Hmmm, actually, I think I did get that.  What I missed is why you care 
which object stays in your precious little set, as long as you have one? 
 If it doesn't matter, use a dict and stop wasting our time.  If it 
does, tell us why and assuage our curiousity.



So is that the behavior you're wanting, keeping the first object and
discarding all others?  Or is there something else I'm still missing?



Yes and yes. I want normal set behavior, but I want the set to use
user-provided hash and equality tests, i.e. ones that don't
necessarily call __hash__ and __eq__ on the candidate elements.

Austin


As for what you want:  No, it's not currently possible.  If it's so big 
a deal that the various methods presented don't meet with your approval, 
break out the C and write your own.  Then you could give that back to 
the community instead of your snide remarks.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-16 Thread Anthony Tolle
On Oct 16, 12:24 pm, Ethan Furman et...@stoneleaf.us wrote:
 [snip]
 As for what you want:  No, it's not currently possible.  If it's so big
 a deal that the various methods presented don't meet with your approval,
 break out the C and write your own.  Then you could give that back to
 the community instead of your snide remarks.

 ~Ethan~

I didn't get the impression that Austin was being snide.  Instead, I
get the impression that they are someone from a different programming
background (C++/STL) who has not had sufficient exposure to the Python
way of doing things.  I believe Austin is genuinely curious as to why
Python may not implement features found in another programming
environment.

Coming from a C/C++ background myself, it took me a while to un-learn
certain idioms.  I still find myself thinking in C on occasion, only
to find a more elegant way to accomplish the task.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-16 Thread Carl Banks
On Oct 15, 9:31 pm, Austin Bingham austin.bing...@gmail.com wrote:
 Yes, what you've got there provides the interface of what I want. And
 no doubt we could concoct countless ways to implement some version of
 this. However, if set itself accepted an alternate hash function, then
 I could do it with no extra overhead. Consider your implementation: it
 requires an extra set (extra space) and an extra lookup on many
 operations (extra time.) My original hope was to not have to do that.

Yes, Python let you down.  You've said that repeatedly.  Most people
are happy to accept the extra overhead of homemade solution.  You're
not.  We get it.


If you want to do something about it, write a patch with some test
cases and submit it.

If you want to do something half-assed about it, submit hash= key as a
wishlist item.

If you want to do absolutely nothing about it, continue to whine about
it here.


Carl Banks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Chris Rebert
On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham
austin.bing...@gmail.com wrote:
 If I understand things correctly, the set class uses hash()
 universally to calculate hash values for its elements. Is there a
 standard way to have set use a different function? Say I've got a
 collection of objects with names. I'd like to create a set of these
 objects where the hashing is done on these names. Using the __hash__
 function seems inelegant because it means I have to settle on one type
 of hashing for these objects all of the time, i.e. I can't create a
 set of them based on a different uniqueness criteria later. I'd like
 to create a set instance that uses, say, 'hash(x.name)' rather than
 'hash(x)'.

 Is this possible? Am I just thinking about this problem the wrong way?
 Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
 I'm just missing the obvious. Thanks for any help on this.

You could use wrapper objects that define an appropriate __hash__():

#*completely untested*
class HashWrapper(object):
def __init__(self, obj, criteria):
self._wrapee = obj
self._criteria = criteria

#override __hash__() / hash()
def __hash__(self):
return hash(self._criteria(self._wrapee))

#proxying code
def __getattr__(self, name):
return getattr(self._wrapee, name)

def __setattr__(self, name, val):
setattr(self._wrapee, name, val)

#example
def name_of(obj):
return obj.name

def name_and_serial_num(obj):
return obj.name, obj.serial_number

no_same_names = set(HashWrapper(obj, name_of) for obj in some_collection)
no_same_name_and_serial = set(HashWrapper(obj, name_and_serial_num)
for obj in some_collection)

Cheers,
Chris
--
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Duncan Booth
Austin Bingham austin.bing...@gmail.com wrote:

 If I understand things correctly, the set class uses hash()
 universally to calculate hash values for its elements. Is there a
 standard way to have set use a different function? Say I've got a
 collection of objects with names. I'd like to create a set of these
 objects where the hashing is done on these names. Using the __hash__
 function seems inelegant because it means I have to settle on one type
 of hashing for these objects all of the time, i.e. I can't create a
 set of them based on a different uniqueness criteria later. I'd like
 to create a set instance that uses, say, 'hash(x.name)' rather than
 'hash(x)'.
 
 Is this possible? Am I just thinking about this problem the wrong way?
 Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
 I'm just missing the obvious. Thanks for any help on this.
 

It doesn't make sense to use just part of an object as the key for a set. A 
set is just a collection of values and there is no key separate from the 
value itself. If you build a set from x.name then that works fine, but only 
the names are stored.

What you want in this particular case is a dict: a dict stores both a key 
and a value and lets you retrieve the value from the key.


-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
That's definitely a workable solution, but it still rubs me the wrong
way. The uniqueness criteria of a set seems, to me, like a property of
the set, whereas the python model forces it onto each set element.

Another issue I have with the HashWrapper approach is its space
requirements. Logically, what I'm asking to do is switch out a single
function reference (i.e. to point at get_name() rather than hash()),
but in practice I'm forced to instantiate a new object for each of my
set members. On a large set, this could be disastrous.

Don't get me wrong...your solution is a good one, but it's just not
what I am looking for.

Austin

On Thu, Oct 15, 2009 at 1:36 PM, Chris Rebert c...@rebertia.com wrote:
 On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham
 austin.bing...@gmail.com wrote:
 If I understand things correctly, the set class uses hash()
 universally to calculate hash values for its elements. Is there a
 standard way to have set use a different function? Say I've got a
 collection of objects with names. I'd like to create a set of these
 objects where the hashing is done on these names. Using the __hash__
 function seems inelegant because it means I have to settle on one type
 of hashing for these objects all of the time, i.e. I can't create a
 set of them based on a different uniqueness criteria later. I'd like
 to create a set instance that uses, say, 'hash(x.name)' rather than
 'hash(x)'.

 Is this possible? Am I just thinking about this problem the wrong way?
 Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
 I'm just missing the obvious. Thanks for any help on this.

 You could use wrapper objects that define an appropriate __hash__():

 #*completely untested*
 class HashWrapper(object):
    def __init__(self, obj, criteria):
        self._wrapee = obj
        self._criteria = criteria

    #override __hash__() / hash()
    def __hash__(self):
        return hash(self._criteria(self._wrapee))

    #proxying code
    def __getattr__(self, name):
        return getattr(self._wrapee, name)

    def __setattr__(self, name, val):
        setattr(self._wrapee, name, val)

 #example
 def name_of(obj):
    return obj.name

 def name_and_serial_num(obj):
    return obj.name, obj.serial_number

 no_same_names = set(HashWrapper(obj, name_of) for obj in some_collection)
 no_same_name_and_serial = set(HashWrapper(obj, name_and_serial_num)
 for obj in some_collection)

 Cheers,
 Chris
 --
 http://blog.rebertia.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
I guess we see things differently. I think it's quite natural to want
a set of unique objects where unique is defined as an operation on
some subset/conflation/etc. of the attributes of the elements. That's
all that the regular set class is, except that it always uses the
hash() function to calculate uniqueness. In any event, the notions of
set and uniqueness I'm using are well established in other languages,
so I don't see why it couldn't be made to work in python.

As far as using a dict, that doesn't really solve my problem. It could
be part of a solution, I guess, but I would still need functionality
extrinsic to the dict. What I want is to make sure that no values in
my set have the same name, and dict won't guarantee that for me. A set
that calculated uniqueness based on its elements' names, on the other
hand, would.

Austin

On Thu, Oct 15, 2009 at 1:48 PM, Duncan Booth
duncan.bo...@invalid.invalid wrote:
 It doesn't make sense to use just part of an object as the key for a set. A
 set is just a collection of values and there is no key separate from the
 value itself. If you build a set from x.name then that works fine, but only
 the names are stored.

 What you want in this particular case is a dict: a dict stores both a key
 and a value and lets you retrieve the value from the key.


 --
 Duncan Booth http://kupuguy.blogspot.com
 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Diez B. Roggisch
Austin Bingham wrote:

 If I understand things correctly, the set class uses hash()
 universally to calculate hash values for its elements. Is there a
 standard way to have set use a different function? Say I've got a
 collection of objects with names. I'd like to create a set of these
 objects where the hashing is done on these names. Using the __hash__
 function seems inelegant because it means I have to settle on one type
 of hashing for these objects all of the time, i.e. I can't create a
 set of them based on a different uniqueness criteria later. I'd like
 to create a set instance that uses, say, 'hash(x.name)' rather than
 'hash(x)'.
 
 Is this possible? Am I just thinking about this problem the wrong way?
 Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
 I'm just missing the obvious. Thanks for any help on this.


This is a principal problem in OO - behavior shall not only change based on
the object, but also the context it's used in.

I think the decorator-pattern is the best thing here (in lieu of python
having a hash-argument to sets). Please don't forget to re-define equality
also!

class OtherCriteriaDecorator(object):


   def __init__(self, delegate):
   self._delegate = delegate

   def __hash__(self):
   return hash_based_on_delegate


   def __eq__(self, other):
   return equality_based_on_delegate



HTH,

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Diez B. Roggisch
Chris Rebert wrote:

 On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham
 austin.bing...@gmail.com wrote:
 If I understand things correctly, the set class uses hash()
 universally to calculate hash values for its elements. Is there a
 standard way to have set use a different function? Say I've got a
 collection of objects with names. I'd like to create a set of these
 objects where the hashing is done on these names. Using the __hash__
 function seems inelegant because it means I have to settle on one type
 of hashing for these objects all of the time, i.e. I can't create a
 set of them based on a different uniqueness criteria later. I'd like
 to create a set instance that uses, say, 'hash(x.name)' rather than
 'hash(x)'.

 Is this possible? Am I just thinking about this problem the wrong way?
 Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
 I'm just missing the obvious. Thanks for any help on this.
 
 You could use wrapper objects that define an appropriate __hash__():
 
 #*completely untested*
 class HashWrapper(object):
 def __init__(self, obj, criteria):
 self._wrapee = obj
 self._criteria = criteria
 
 #override __hash__() / hash()
 def __hash__(self):
 return hash(self._criteria(self._wrapee))
 
 #proxying code
 def __getattr__(self, name):
 return getattr(self._wrapee, name)
 
 def __setattr__(self, name, val):
 setattr(self._wrapee, name, val)

This doesn't work for conflicting elements, as the __eq__-method isn't
overriden.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Diez B. Roggisch
Austin Bingham wrote:

 That's definitely a workable solution, but it still rubs me the wrong
 way. The uniqueness criteria of a set seems, to me, like a property of
 the set, whereas the python model forces it onto each set element.

This is a POV, but to to me, the set just deals with a very minimal
protocol - hash-value  equality. Whatever you feed it, it has to cope with
that. It strikes *me* as odd to ask for something else. 

The ideal solution would be to have several hash/eq-methods on your object,
and somehow make the surroundings decide which to use. But there is no
OO-pragma for that.

 Another issue I have with the HashWrapper approach is its space
 requirements. Logically, what I'm asking to do is switch out a single
 function reference (i.e. to point at get_name() rather than hash()),
 but in practice I'm forced to instantiate a new object for each of my
 set members. On a large set, this could be disastrous.
 
 Don't get me wrong...your solution is a good one, but it's just not
 what I am looking for.

It is the only one you have, short of implementing set your own. And then
the terms of implementation would be that you pass a hash- and eq-function
 create wrapper-objects internally.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Diez B. Roggisch
Austin Bingham wrote:

 On Thu, Oct 15, 2009 at 2:23 PM, Diez B. Roggisch de...@nospam.web.de
 wrote:
 Austin Bingham wrote:
 This is a POV, but to to me, the set just deals with a very minimal
 protocol - hash-value  equality. Whatever you feed it, it has to cope
 with that. It strikes *me* as odd to ask for something else.
 
 But I'm not really asking for anything that changes the paradigm of
 how things currently work. All I need is the ability to say something
 like this:
 
 s = set(hash_func = lambda x: hash(x.name))
 
 If set could be de-hardcoded from using hash(), nothing else about how
 it works would have to change. Or am I wrong on that? I see that you
 mention equality in the protocol, but I don't know why a set would
 need that if a) hash(x) == hash(y) -- x == y and b) hash function
 must return a 32 bit value (which I think is the case)...so maybe I
 misunderstand something.

You do. Hashes can collide, and then you need equality. Sets are *based* on
equality actually, the hash is just one optimization. Other optimizations
build on order (usually, the STL requires you to define  on objects for
that. Which is enough because a == b = a  b  b  a). But eventually,
it all boils down to equality. You could implement a set without hash, by
imposing linear behavior on insert and lookup - but none based purely on a
hash.

 
 I wonder...does the requirement of using hash() have to do with the
 low-level implementation of set? That might explain the state of
 things.
 
 The ideal solution would be to have several hash/eq-methods on your
 object, and somehow make the surroundings decide which to use. But there
 is no OO-pragma for that.
 
 But why force objects to know about all of the wacky contexts in which
 they might be used? Let objects be objects and hash-functions be
 hash-functions. If someone wants a set where uniqueness is defined by
 the number of occurrences of 'a' in an object's name, the object
 should never have to know that...it should just have a name.

Because these functions need intimate knowledge of the objects to actually
work. Sure, in python it's easy to define them outside, and just reach into
the object as you wish. But aside this implementation detail, a hash (and
equality of course) always are based upon the object in question. So I
think it's natural to have it defined right there (which __hash__ and
__eq__ show that this seems to be accepted in general).

And if there were something that would decide on context which of several
implementations to use, you'd have less to worry. As things are, there
isn't such thing (I don't even have the slightest idea what could work),
you are as well off with defining two functions.


Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 3:02 PM, Diez B. Roggisch de...@nospam.web.de wrote:
 Austin Bingham wrote:
 You do. Hashes can collide, and then you need equality. Sets are *based* on
 equality actually, the hash is just one optimization. ...

Right, thanks for clearing that up. Not reading closely enough ==
public shaming ;)

 Because these functions need intimate knowledge of the objects to actually
 work. Sure, in python it's easy to define them outside, and just reach into
 the object as you wish. But aside this implementation detail, a hash (and
 equality of course) always are based upon the object in question. So I
 think it's natural to have it defined right there (which __hash__ and
 __eq__ show that this seems to be accepted in general).

 And if there were something that would decide on context which of several
 implementations to use, you'd have less to worry. As things are, there
 isn't such thing (I don't even have the slightest idea what could work),
 you are as well off with defining two functions.

But this context decider you're talking about sounds exactly like
what I'm talking about.  If I had some class with __hash1__ and
__hash2__ (and associated equality operators), you're saying it would
be nice to be able to select which to use based on the context (e.g.
what type of set I'm using.) It might look like this:

   s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y:
x.__eq2__(y))

And if *that* works for you, do you still have a problem with:

  s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y:
x.name == y.name)

?

If you don't like those, what would work for you? As far as I can
tell, your context decider and my alternative hash functions are
the same thing, and we just need to agree on a syntax.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Diez B. Roggisch
 And if there were something that would decide on context which of several
 implementations to use, you'd have less to worry. As things are, there
 isn't such thing (I don't even have the slightest idea what could work),
 you are as well off with defining two functions.
 
 But this context decider you're talking about sounds exactly like
 what I'm talking about.  If I had some class with __hash1__ and
 __hash2__ (and associated equality operators), you're saying it would
 be nice to be able to select which to use based on the context (e.g.
 what type of set I'm using.) It might look like this:
 
s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y:
 x.__eq2__(y))
 
 And if *that* works for you, do you still have a problem with:
 
   s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y:
 x.name == y.name)
 
 ?
 
 If you don't like those, what would work for you? As far as I can
 tell, your context decider and my alternative hash functions are
 the same thing, and we just need to agree on a syntax.

The context-decider isn't the same thing because it isn't designed yet :)
And most probably won't ever be. It's just the abstract idea that
hashing/equality change for one object depending on the circumstances they
are used in, and that one wishes that the decision what to use is as simple
 transparent as possible. 

Your approach is certainly viable, but I guess the current
set-implementation is optimized on working with __hash__ and __eq__ on
objects because for these exist slots in the python object structure in C.
So while you can implement your idea, you will end up passing wrappers as
Chris  me proposed into the current set implementation. 

However, it might be worth thinking of proposing this change to the set-type
in general. But then for orthogonality, dicts should have it as well I
guess. Which opens a whole new can of worms.

I'm undecided on this. I think the cure is straight forward  simple to
implement, and once you have the level to understand  need different
hashing/equality, you should be comfortable writing this simple wrapper
yourself. So for me personally, it's not a needed addition to the language.
YMMV, and thus - go for it if you like.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Mick Krippendorf
Austin Bingham schrieb:
 I guess we see things differently. I think it's quite natural to want
 a set of unique objects where unique is defined as an operation on
 some subset/conflation/etc. of the attributes of the elements. 

What you seem to imply is that the hash function imposes some kind of
uniqueness constraint on the set which uses it. That's just not the
case, the uniqueness constraint is always the (in-)equality of objects,
and for this you can override __eq__. But then you must also ensure that
x.__eq__(y) -- y.__eq__(x)  x.__hash() == y.__hash__().

Diez' solution is the pythonic way, and BTW, mathematically and
pythonically sound, wheras, if the hash function would really impose
uniqueness:

s1 = set(lambda x: hash(x))
s2 = set(lambda x: hash(x.name))

t1 = Buddy(name=jim)
t2 = Buddy(name=joe)
t3 = Buddy(name=joe)

s1.add(t1)
s1.add(t2)
s1.add(t3)

s2.add(t1)
s2.add(t2)
s2.add(t3)

would probaly yield quite strange results:

s3 = s2 | s1 # union. Is s3 == (t1,t2,t3) or is it (t1,t3)?
s4 = s2 - s1 # difference. is s4 == () or do we get an Exception?

In C++/STL the compiler would nag because s1 and s2 are of different
types since the hash function is part of the set's type (AFAIR). With
dynamic typing this isn't possible so you'd have to wait until runtime.

I'd go with Diez' advice and use a dict instead. Or, you could do
something Borg-ish (lookup Borg Pattern) restrained on names, where the
constructor of a class ensures that x.name == y.name -- x == y.

 As far as using a dict, that doesn't really solve my problem. It could
 be part of a solution, I guess, but I would still need functionality
 extrinsic to the dict.

Python isn't strong on encapsulation, so extrinsic functionality is
quite common. Nothing to loose sleep over though, because, as Guido
says, we're all consenting adults here :-)

Mick.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 3:43 PM, Diez B. Roggisch de...@nospam.web.de wrote:
 The context-decider isn't the same thing because it isn't designed yet :)
 And most probably won't ever be. It's just the abstract idea that
 hashing/equality change for one object depending on the circumstances they
 are used in, and that one wishes that the decision what to use is as simple
  transparent as possible.

Fair enough :)

 Your approach is certainly viable, but I guess the current
 set-implementation is optimized on working with __hash__ and __eq__ on
 objects because for these exist slots in the python object structure in C.
 So while you can implement your idea, you will end up passing wrappers as
 Chris  me proposed into the current set implementation.

Yes, I figured that the guts of set relied on particulars to which we
are not privy at the python level. If the syntax let me describe sets
the way I've been laying out here, I could probably tolerate the
underlying implementation relying on wrappers.

 However, it might be worth thinking of proposing this change to the set-type
 in general. But then for orthogonality, dicts should have it as well I
 guess. Which opens a whole new can of worms.

dicts would certainly have to be looked at as well, but I don't think
the can would have that many worms in it if we solve the set issue to
everyone's satisfaction.

In any event, thanks for helping me work through this issue.

Austin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Anthony Tolle
On Oct 15, 7:24 am, Austin Bingham austin.bing...@gmail.com wrote:
 [snip] I'd like to create a set of these
 objects where the hashing is done on these names. [snip]

Why not use a dict?  The key would be the object name.  Pretty much
the same behavior as a set (via the key), and you can still easily
iterate over the objects.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 3:50 PM, Mick Krippendorf mad.m...@gmx.de wrote:
 Austin Bingham schrieb:
 What you seem to imply is that the hash function imposes some kind of
 uniqueness constraint on the set which uses it. That's just not the
 case, the uniqueness constraint is always the (in-)equality of objects,
 and for this you can override __eq__. But then you must also ensure that
 x.__eq__(y) -- y.__eq__(x)  x.__hash() == y.__hash__().

Yes, as was pointed out earlier, I was reading too much into the hash
function's role. However, given well behaved hash and equality (which
would be a great name for a band), I don't see why those functions
need to be defined on the object itself, per se. Right now that's the
case because hash() only knows how to call obj.__hash__ (etc. for
__eq__).

I guess a good analog for what I'm thinking about is list.sort(). It's
more than happy to take a comparison operator, and in spirit that's
exactly what I'm looking for with sets.

 Diez' solution is the pythonic way, and BTW, mathematically and
 pythonically sound, wheras, if the hash function would really impose
 uniqueness: . . .

Yes, my gray c++ roots are showing here; you're right that my brain
was secretly expecting the compiler to take care of things. Your
points about set operations are the strongest in this discussion, and
I haven't got a great answer.

 Python isn't strong on encapsulation, so extrinsic functionality is
 quite common.

I guess that's part of what's frustrating for me on this issue. Python
is generally extremely flexible and open, but in this case I feel like
I'm being shut out. Oh well...in the end it looks like there are ways
to get what I need, if not what I want. Thanks for the input.

Austin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 4:06 PM, Anthony Tolle anthony.to...@gmail.com wrote:
 Why not use a dict?  The key would be the object name.  Pretty much
 the same behavior as a set (via the key), and you can still easily
 iterate over the objects.

To reiterate, dict only gets me part of what I want. Whereas a set
with uniqueness defined over 'obj.name' would guarantee no name
collisions, dict only sorta helps me keep things straight; it doesn't
actually enforce that my values have unique names.

Austin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Anthony Tolle
On Oct 15, 10:42 am, Austin Bingham austin.bing...@gmail.com wrote:
 On Thu, Oct 15, 2009 at 4:06 PM, Anthony Tolle  To reiterate, dict only gets 
 me part of what I want. Whereas a set
 with uniqueness defined over 'obj.name' would guarantee no name
 collisions, dict only sorta helps me keep things straight; it doesn't
 actually enforce that my values have unique names.


I don't understand how a set would help you enforce that your values
ave unique names.  If you have two objects, both named foo, and
added them to the set, no errors would occur.

Please provide an example of how a set provides functionality that a
dict (which enforces unique keys) doesn't.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Chris Rebert
On Thu, Oct 15, 2009 at 5:22 AM, Diez B. Roggisch de...@nospam.web.de wrote:
 Chris Rebert wrote:

 On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham
 austin.bing...@gmail.com wrote:
 If I understand things correctly, the set class uses hash()
 universally to calculate hash values for its elements. Is there a
 standard way to have set use a different function? Say I've got a
 collection of objects with names. I'd like to create a set of these
 objects where the hashing is done on these names. Using the __hash__
 function seems inelegant because it means I have to settle on one type
 of hashing for these objects all of the time, i.e. I can't create a
 set of them based on a different uniqueness criteria later. I'd like
 to create a set instance that uses, say, 'hash(x.name)' rather than
 'hash(x)'.

 Is this possible? Am I just thinking about this problem the wrong way?
 Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
 I'm just missing the obvious. Thanks for any help on this.

 You could use wrapper objects that define an appropriate __hash__():

 #*completely untested*
 class HashWrapper(object):
     def __init__(self, obj, criteria):
         self._wrapee = obj
         self._criteria = criteria

     #override __hash__() / hash()
     def __hash__(self):
         return hash(self._criteria(self._wrapee))

     #proxying code
     def __getattr__(self, name):
         return getattr(self._wrapee, name)

     def __setattr__(self, name, val):
         setattr(self._wrapee, name, val)

 This doesn't work for conflicting elements, as the __eq__-method isn't
 overriden.

Indeed. Good catch. :)
This is why I mark code as completely untested.

Cheers,
Chris
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Gabriel Genellina
En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham  
austin.bing...@gmail.com escribió:


On Thu, Oct 15, 2009 at 4:06 PM, Anthony Tolle anthony.to...@gmail.com  
wrote:

Why not use a dict?  The key would be the object name.  Pretty much
the same behavior as a set (via the key), and you can still easily
iterate over the objects.


To reiterate, dict only gets me part of what I want. Whereas a set
with uniqueness defined over 'obj.name' would guarantee no name
collisions, dict only sorta helps me keep things straight; it doesn't
actually enforce that my values have unique names.


I think you didn't understand correctly Anthony Tolle's suggestion:

py class Foo:
...   def __init__(self, name): self.name = name
...
py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
py objs
[__main__.Foo instance at 0x00BB8238,
 __main__.Foo instance at 0x00BB83C8,
 __main__.Foo instance at 0x00BB7B20,
 __main__.Foo instance at 0x00BB7DC8]
py d = dict((o.name, o) for o in objs)
py d.keys()
['Jim', 'Joe', 'Tom']
py d.values()
[__main__.Foo instance at 0x00BB7DC8,
 __main__.Foo instance at 0x00BB8238,
 __main__.Foo instance at 0x00BB7B20]

keys in a dict are unique; values form the set you're looking for. If it's  
more complex that just a name, define a function:


def my_funny_key_extractor(item):
  return some(operations[on](item))

d = dict((my_funny_key_extractor(o), o) for o in objs)

If you feel so inclined, you could implement the MutableSet ABC based on  
this and have a full featured set implementation.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 5:15 PM, Gabriel Genellina
gagsl-...@yahoo.com.ar wrote:
 En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham
 austin.bing...@gmail.com escribió:
 I think you didn't understand correctly Anthony Tolle's suggestion:

 py class Foo:
 ...   def __init__(self, name): self.name = name
 ...
 py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
 py objs

I understand Anthony perfectly. Yes, I can construct a dict as you
specify, where all of the keys map to values with name attributes
equal to the key. My point is that dict doesn't really help me enforce
that beyond simply letting me set it up; it doesn't care about the
values at all, just the keys. All that I'm asking for, and I think
it's been made pretty clear, is a set that let's me define a
uniqueness criteria function other than hash(). As has been thrashed
out already, this is not as straightforward as I might have liked.

To put it in code, I want this:

  s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...)
  ...
  x.name = 'foo'
  y.name = 'foo'
  s.add(x)
  s.add(y) # no-op because of uniqueness criteria
  assert len(s) == 1

Austin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Rami Chowdhury
On Thu, 15 Oct 2009 09:11:00 -0700, Austin Bingham  
austin.bing...@gmail.com wrote:



On Thu, Oct 15, 2009 at 5:15 PM, Gabriel Genellina
gagsl-...@yahoo.com.ar wrote:

En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham
austin.bing...@gmail.com escribió:
I think you didn't understand correctly Anthony Tolle's suggestion:

py class Foo:
...   def __init__(self, name): self.name = name
...
py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
py objs


I understand Anthony perfectly. Yes, I can construct a dict as you
specify, where all of the keys map to values with name attributes
equal to the key. My point is that dict doesn't really help me enforce
that beyond simply letting me set it up; it doesn't care about the
values at all, just the keys.


Perhaps this is an overly naive solution, but could you not define a class  
that implemented the set interface but used a dict for internal storage,  
and use that? You'd still have uniqueness (by dict key, which your class  
would define as object name) and as a bonus, retrievability by name, which  
set wouldn't give you.



--
Rami Chowdhury
Never attribute to malice that which can be attributed to stupidity --  
Hanlon's Razor

408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
--
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Anthony Tolle
On Oct 15, 12:11 pm, Austin Bingham austin.bing...@gmail.com wrote:
 To put it in code, I want this:

   s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...)
   ...
   x.name = 'foo'
   y.name = 'foo'
   s.add(x)
   s.add(y) # no-op because of uniqueness criteria
   assert len(s) == 1

I wrote a quick subclass of set that does something similar, but uses
just one function for the object uniqueness:

class MySet(set):
def __init__(self, iterable = (), idfunc = lambda x: x):
self.idfunc = idfunc
self.ids = set()
for item in iterable:
self.add(item)

def add(self, item):
id = self.idfunc(item)
if id not in self.ids:
self.ids.add(id)
set.add(self, item)

 class Foo(object):
...  def __init__(self, name):
...   self.name = name
...
 x = Foo('foo')
 y = Foo('foo')
 s = MySet(idfunc = lambda x: x.name)
 s.add(x)
 s
MySet([__main__.Foo object at 0x00A89F90])
 s.add(y)
 s
MySet([__main__.Foo object at 0x00A89F90])

Is this what you are looking for?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Gabriel Genellina
En Thu, 15 Oct 2009 13:24:18 -0300, Rami Chowdhury  
rami.chowdh...@gmail.com escribió:


On Thu, 15 Oct 2009 09:11:00 -0700, Austin Bingham  
austin.bing...@gmail.com wrote:



On Thu, Oct 15, 2009 at 5:15 PM, Gabriel Genellina
gagsl-...@yahoo.com.ar wrote:

En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham
austin.bing...@gmail.com escribió:
I think you didn't understand correctly Anthony Tolle's suggestion:

py class Foo:
...   def __init__(self, name): self.name = name
...
py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
py objs


I understand Anthony perfectly. Yes, I can construct a dict as you
specify, where all of the keys map to values with name attributes
equal to the key. My point is that dict doesn't really help me enforce
that beyond simply letting me set it up; it doesn't care about the
values at all, just the keys.


Perhaps this is an overly naive solution, but could you not define a  
class that implemented the set interface but used a dict for internal  
storage, and use that? You'd still have uniqueness (by dict key, which  
your class would define as object name) and as a bonus, retrievability  
by name, which set wouldn't give you.


Like this?

from collections import MutableSet

class KeyedSet(MutableSet):
A set with a custom element comparison function.

def __init__(self, iterable, key=lambda x: x):
Create a KeyedSet from iterable; key function determines  
uniqueness.


`key` must be a callable taking a single argument; it is
applied to every item in the iterable, and its result
is used to determine set membership. That is, if key(item)
returns the same thing for two items, only one of them will
be in the set.

self._items = dict((key(item),item) for item in iterable)
self._key = key

# NOT a classmethod because self.key must be transferred too!
# Fortunately it is always called as self._from_iterable(...)
# in _abccoll.py
def _from_iterable(self, iterable):
return type(self)(iterable, key=self._key)

def add(self, value):
Return True if it was added, False if already there.
key = self._key(value)
if key in self._items: return False
self._items[key] = value
return True

def discard(self, value):
Return True if it was deleted, False if not there.
key = self._key(value)
if key not in self._items: return False
del self._items[key]
return True

def clear(self):
self._items.clear()

def copy(self):
return type(self)(self._items.values(), key=self._key)

def __iter__(self):
return iter(self._items.values())

def __contains__(self, value):
try: return self._key(value) in self._items
except Exception: return False

def __len__(self):
return len(self._items)

def __repr__(self):
return %s(%r) % (type(self).__name__, self._items.values())


def demo():
  class Foo(object):
def __init__(self, name):
  self.name = name
def __repr__(self):
  return %s(%r) % (type(self).__name__, self.name)

  objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')]
  print objs
  s = KeyedSet(objs, key=lambda o:o.name)
  print len(s), s
  s.add(Foo('Joe'))
  print len(s), s
  moe = Foo('Moe')
  print moe in s, moe in s
  s.add(moe)
  print moe in s, moe in s
  print 'moe' in s, 'moe' in s
  s2 = set([Foo('Luc'), Foo('Jim'), Foo('Dan')])
  print s | s2, s | s2
  print s  s2, s  s2
  s3 = KeyedSet(s2, key=lambda o:o.name)
  print s3 - s, s3 - s
  print s - s3, s - s3
  print s.copy(), s.copy()

demo()

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Ethan Furman

Austin Bingham wrote:

Yes, I can construct a dict as you
specify, where all of the keys map to values with name attributes
equal to the key. My point is that dict doesn't really help me enforce
that beyond simply letting me set it up; it doesn't care about the
values at all, just the keys. All that I'm asking for, and I think
it's been made pretty clear, is a set that let's me define a
uniqueness criteria function other than hash(). As has been thrashed
out already, this is not as straightforward as I might have liked.

To put it in code, I want this:

  s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...)
  ...
  x.name = 'foo'
  y.name = 'foo'
  s.add(x)
  s.add(y) # no-op because of uniqueness criteria
  assert len(s) == 1

Austin



I'm feeling really dense about now... What am I missing?

class obj(object):
def __init__(self, id, value):
self.id = id
self.value = value
def __eq__(self, other):
if isinstance(other, obj):
return self.id == other.id
raise NotImplemented
def __hash__(self):
return hash(self.id)
def __repr__(self):
return obj(%s, %s) % (self.id, self.value)

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit 
(Intel)] on win32

Type help, copyright, credits or license for more information.
-- from obj import obj
-- s1 = set()
-- d1 = dict()
-- o1 = obj(17, 'first')
-- o2 = obj(18, 'second')
-- o3 = obj(17, 'third')
-- o1, o2, o3
(obj(17, first), obj(18, second), obj(17, third))
-- s1.add(o1)
-- s1.add(o2)
-- s1.add(o3)
-- d1[o1.id] = o1
-- d1[o2.id] = o2
-- d1[o3.id] = o3
-- s1
set([obj(17, first), obj(18, second)])
-- d1
{17: obj(17, third), 18: obj(18, second)}
--

Ah ha!  No-op indeed!  If a set already has an item, the new item is 
completely ignored, whereas if a dict already has a key, the new key's 
value replaces the current value already in the dict.


Learn something new everyday!

I'm still not sure I understand your concern about the values in a set, 
though.  Sets keep the first object of a given key, dicts keep the last 
object of a given key; in both cases, all other objects with the same 
key are lost.


So is that the behavior you're wanting, keeping the first object and 
discarding all others?  Or is there something else I'm still missing?


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Anthony Tolle
On Oct 15, 1:49 pm, Ethan Furman et...@stoneleaf.us wrote:
 I'm still not sure I understand your concern about the values in a set,
 though.  Sets keep the first object of a given key, dicts keep the last
 object of a given key; in both cases, all other objects with the same
 key are lost.

 So is that the behavior you're wanting, keeping the first object and
 discarding all others?  Or is there something else I'm still missing?

I think that without a practical example of what this would be used
for, we're all going to be a little lost on this one.

So far, we've not seen the original problem, only the author's
preferred method for solving it.  My guess is there are other, more
pythonic ways to solve the original problem.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 7:05 PM, Anthony Tolle anthony.to...@gmail.com wrote:
 I wrote a quick subclass of set that does something similar, but uses
 just one function for the object uniqueness:

 class MySet(set):
    def __init__(self, iterable = (), idfunc = lambda x: x):
        self.idfunc = idfunc
        self.ids = set()
        for item in iterable:
            self.add(item)

    def add(self, item):
        id = self.idfunc(item)
        if id not in self.ids:
            self.ids.add(id)
            set.add(self, item)

Yes, what you've got there provides the interface of what I want. And
no doubt we could concoct countless ways to implement some version of
this. However, if set itself accepted an alternate hash function, then
I could do it with no extra overhead. Consider your implementation: it
requires an extra set (extra space) and an extra lookup on many
operations (extra time.) My original hope was to not have to do that.

Austin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 7:49 PM, Ethan Furman et...@stoneleaf.us wrote:
 Austin Bingham wrote:
 I'm feeling really dense about now... What am I missing?

What you're missing is the entire discussion up to this point. I was
looking for a way to use an alternative uniqueness criteria in a set
instance without needing to modify my class.

 So is that the behavior you're wanting, keeping the first object and
 discarding all others?  Or is there something else I'm still missing?

Yes and yes. I want normal set behavior, but I want the set to use
user-provided hash and equality tests, i.e. ones that don't
necessarily call __hash__ and __eq__ on the candidate elements.

Austin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set using alternative hash function?

2009-10-15 Thread Austin Bingham
On Thu, Oct 15, 2009 at 8:10 PM, Anthony Tolle anthony.to...@gmail.com wrote:
 I think that without a practical example of what this would be used
 for, we're all going to be a little lost on this one.

 So far, we've not seen the original problem, only the author's
 preferred method for solving it.  My guess is there are other, more
 pythonic ways to solve the original problem.

The original problem was just a question statement: can I use
alternative uniqueness functions on a set? Indeed, I proposed an idea,
which that was sets could be constructed with user-defined hash and
equality functions, the strengths and weaknesses of which have been
gone over in some detail. The short answer is that what I was looking
for (admittedly, a desire motivated by experiences in other languages)
is not really feasible, at least not without a great deal more work.

The other proposed solutions all require linearly extra storage,
linearly extra time, both, or simply don't solve the problem. And in
any event, they miss the point of the original post which was not How
can I get a particular behavior? (which is fairly trivial, both in my
own estimation and as evidenced by the abundance of proposals) but
Can I get a particular behavior in a particular way? (the answer to
which, again, seems to be no.)

Austin
-- 
http://mail.python.org/mailman/listinfo/python-list