Re: set using alternative hash function?
Austin Bingham wrote: To put it in code, I want this: s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...) ... x.name = 'foo' y.name = 'foo' s.add(x) s.add(y) # no-op because of uniqueness criteria assert len(s) == 1 The part of this that seems ill-defined to me is the line you describe as a no-op. A set keeps uniqueness because all the elements really are unique in the sense of equality, so if x == y and x is in s, and then you do s.add(y), you cannot know or care whether it overwrote the existing element with y, or did a no-op. The two are equivalent, because either way the result is an element equal to x. But what you seem to want is one where adding an element that's equal on the key is guaranteed a no-op and not an overwrite. This is different, because (until you pasted this code) you didn't specify what you wanted to happen when you inserted a new item that had an equal key to an existing one, but was unequal to it. So, can't you just use a dict and use the setdefault method instead of regular attribute access? This seems to do exactly what you want. d = {} x.name = 'foo' y.name = 'foo' d.setdefault(x.name, x) d.setdefault(y.name, y) If you really want the syntactic sugar of [], you could write a simple class that wraps a dict and maps MyDict[x] to dict.setdefault(x.name, x). -- --OKB (not okblacke) Brendan Barnwell Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail. --author unknown -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham wrote: On Thu, Oct 15, 2009 at 7:49 PM, Ethan Furman et...@stoneleaf.us wrote: Austin Bingham wrote: I'm feeling really dense about now... What am I missing? What you're missing is the entire discussion up to this point. I was looking for a way to use an alternative uniqueness criteria in a set instance without needing to modify my class. So is that the behavior you're wanting, keeping the first object and discarding all others? Or is there something else I'm still missing? Yes and yes. I want normal set behavior, but I want the set to use user-provided hash and equality tests, i.e. ones that don't necessarily call __hash__ and __eq__ on the candidate elements. Austin Your original post (10/15) never mentioned the __eq__() method, and people went off on a tangent discussing hash functions. But from what you've been saying more recently, you want a set-like collection whose behavior isn't related to the standard comparison function. The standard set is intimately tied to ==, in the sense that if two objects meet the a==b relationship, then only one will be in the set. And it doesn't matter which one, since they're equal. You want a new collection that works differently. Great. It shouldn't be that hard to write, but if you try to base it on existing set you'll need a second wrapper object. If you try to base it on dict, you'll need a second key object. You've said you don't want any extra objects to be needed, so you'll need custom code. A couple of dozen lines should do for the basic collection (add, clear, remove). But if you also want difference, symmetric_difference, intersection_update, issubset ... it could grow to several dozen. And you'd have to make some interesting decisions, once you have two pseudo-sets with possibly different comparison operators. The only downside I can see is that apparently the built-in set is written in C, and if you're implementing this in pure python, it could be a lot slower. On the other hand, in many cases, the comparison function-object might be the time-critical piece, in which case you coud be close. DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham wrote: On Thu, Oct 15, 2009 at 7:49 PM, Ethan Furman et...@stoneleaf.us wrote: Austin Bingham wrote: I'm feeling really dense about now... What am I missing? What you're missing is the entire discussion up to this point. I was looking for a way to use an alternative uniqueness criteria in a set instance without needing to modify my class. Really? Is that what you wanted? You know, I would never have guessed that from If I understand things correctly, the set class uses hash() universally to calculate hash values for its elements. Is there a standard way to have set use a different function? But hey, perhaps my reading comprehension was not at its best yesterday. Hmmm, actually, I think I did get that. What I missed is why you care which object stays in your precious little set, as long as you have one? If it doesn't matter, use a dict and stop wasting our time. If it does, tell us why and assuage our curiousity. So is that the behavior you're wanting, keeping the first object and discarding all others? Or is there something else I'm still missing? Yes and yes. I want normal set behavior, but I want the set to use user-provided hash and equality tests, i.e. ones that don't necessarily call __hash__ and __eq__ on the candidate elements. Austin As for what you want: No, it's not currently possible. If it's so big a deal that the various methods presented don't meet with your approval, break out the C and write your own. Then you could give that back to the community instead of your snide remarks. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Oct 16, 12:24 pm, Ethan Furman et...@stoneleaf.us wrote: [snip] As for what you want: No, it's not currently possible. If it's so big a deal that the various methods presented don't meet with your approval, break out the C and write your own. Then you could give that back to the community instead of your snide remarks. ~Ethan~ I didn't get the impression that Austin was being snide. Instead, I get the impression that they are someone from a different programming background (C++/STL) who has not had sufficient exposure to the Python way of doing things. I believe Austin is genuinely curious as to why Python may not implement features found in another programming environment. Coming from a C/C++ background myself, it took me a while to un-learn certain idioms. I still find myself thinking in C on occasion, only to find a more elegant way to accomplish the task. -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Oct 15, 9:31 pm, Austin Bingham austin.bing...@gmail.com wrote: Yes, what you've got there provides the interface of what I want. And no doubt we could concoct countless ways to implement some version of this. However, if set itself accepted an alternate hash function, then I could do it with no extra overhead. Consider your implementation: it requires an extra set (extra space) and an extra lookup on many operations (extra time.) My original hope was to not have to do that. Yes, Python let you down. You've said that repeatedly. Most people are happy to accept the extra overhead of homemade solution. You're not. We get it. If you want to do something about it, write a patch with some test cases and submit it. If you want to do something half-assed about it, submit hash= key as a wishlist item. If you want to do absolutely nothing about it, continue to whine about it here. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham austin.bing...@gmail.com wrote: If I understand things correctly, the set class uses hash() universally to calculate hash values for its elements. Is there a standard way to have set use a different function? Say I've got a collection of objects with names. I'd like to create a set of these objects where the hashing is done on these names. Using the __hash__ function seems inelegant because it means I have to settle on one type of hashing for these objects all of the time, i.e. I can't create a set of them based on a different uniqueness criteria later. I'd like to create a set instance that uses, say, 'hash(x.name)' rather than 'hash(x)'. Is this possible? Am I just thinking about this problem the wrong way? Admittedly, I'm coming at this from a C++/STL perspective, so perhaps I'm just missing the obvious. Thanks for any help on this. You could use wrapper objects that define an appropriate __hash__(): #*completely untested* class HashWrapper(object): def __init__(self, obj, criteria): self._wrapee = obj self._criteria = criteria #override __hash__() / hash() def __hash__(self): return hash(self._criteria(self._wrapee)) #proxying code def __getattr__(self, name): return getattr(self._wrapee, name) def __setattr__(self, name, val): setattr(self._wrapee, name, val) #example def name_of(obj): return obj.name def name_and_serial_num(obj): return obj.name, obj.serial_number no_same_names = set(HashWrapper(obj, name_of) for obj in some_collection) no_same_name_and_serial = set(HashWrapper(obj, name_and_serial_num) for obj in some_collection) Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham austin.bing...@gmail.com wrote: If I understand things correctly, the set class uses hash() universally to calculate hash values for its elements. Is there a standard way to have set use a different function? Say I've got a collection of objects with names. I'd like to create a set of these objects where the hashing is done on these names. Using the __hash__ function seems inelegant because it means I have to settle on one type of hashing for these objects all of the time, i.e. I can't create a set of them based on a different uniqueness criteria later. I'd like to create a set instance that uses, say, 'hash(x.name)' rather than 'hash(x)'. Is this possible? Am I just thinking about this problem the wrong way? Admittedly, I'm coming at this from a C++/STL perspective, so perhaps I'm just missing the obvious. Thanks for any help on this. It doesn't make sense to use just part of an object as the key for a set. A set is just a collection of values and there is no key separate from the value itself. If you build a set from x.name then that works fine, but only the names are stored. What you want in this particular case is a dict: a dict stores both a key and a value and lets you retrieve the value from the key. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
That's definitely a workable solution, but it still rubs me the wrong way. The uniqueness criteria of a set seems, to me, like a property of the set, whereas the python model forces it onto each set element. Another issue I have with the HashWrapper approach is its space requirements. Logically, what I'm asking to do is switch out a single function reference (i.e. to point at get_name() rather than hash()), but in practice I'm forced to instantiate a new object for each of my set members. On a large set, this could be disastrous. Don't get me wrong...your solution is a good one, but it's just not what I am looking for. Austin On Thu, Oct 15, 2009 at 1:36 PM, Chris Rebert c...@rebertia.com wrote: On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham austin.bing...@gmail.com wrote: If I understand things correctly, the set class uses hash() universally to calculate hash values for its elements. Is there a standard way to have set use a different function? Say I've got a collection of objects with names. I'd like to create a set of these objects where the hashing is done on these names. Using the __hash__ function seems inelegant because it means I have to settle on one type of hashing for these objects all of the time, i.e. I can't create a set of them based on a different uniqueness criteria later. I'd like to create a set instance that uses, say, 'hash(x.name)' rather than 'hash(x)'. Is this possible? Am I just thinking about this problem the wrong way? Admittedly, I'm coming at this from a C++/STL perspective, so perhaps I'm just missing the obvious. Thanks for any help on this. You could use wrapper objects that define an appropriate __hash__(): #*completely untested* class HashWrapper(object): def __init__(self, obj, criteria): self._wrapee = obj self._criteria = criteria #override __hash__() / hash() def __hash__(self): return hash(self._criteria(self._wrapee)) #proxying code def __getattr__(self, name): return getattr(self._wrapee, name) def __setattr__(self, name, val): setattr(self._wrapee, name, val) #example def name_of(obj): return obj.name def name_and_serial_num(obj): return obj.name, obj.serial_number no_same_names = set(HashWrapper(obj, name_of) for obj in some_collection) no_same_name_and_serial = set(HashWrapper(obj, name_and_serial_num) for obj in some_collection) Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
I guess we see things differently. I think it's quite natural to want a set of unique objects where unique is defined as an operation on some subset/conflation/etc. of the attributes of the elements. That's all that the regular set class is, except that it always uses the hash() function to calculate uniqueness. In any event, the notions of set and uniqueness I'm using are well established in other languages, so I don't see why it couldn't be made to work in python. As far as using a dict, that doesn't really solve my problem. It could be part of a solution, I guess, but I would still need functionality extrinsic to the dict. What I want is to make sure that no values in my set have the same name, and dict won't guarantee that for me. A set that calculated uniqueness based on its elements' names, on the other hand, would. Austin On Thu, Oct 15, 2009 at 1:48 PM, Duncan Booth duncan.bo...@invalid.invalid wrote: It doesn't make sense to use just part of an object as the key for a set. A set is just a collection of values and there is no key separate from the value itself. If you build a set from x.name then that works fine, but only the names are stored. What you want in this particular case is a dict: a dict stores both a key and a value and lets you retrieve the value from the key. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham wrote: If I understand things correctly, the set class uses hash() universally to calculate hash values for its elements. Is there a standard way to have set use a different function? Say I've got a collection of objects with names. I'd like to create a set of these objects where the hashing is done on these names. Using the __hash__ function seems inelegant because it means I have to settle on one type of hashing for these objects all of the time, i.e. I can't create a set of them based on a different uniqueness criteria later. I'd like to create a set instance that uses, say, 'hash(x.name)' rather than 'hash(x)'. Is this possible? Am I just thinking about this problem the wrong way? Admittedly, I'm coming at this from a C++/STL perspective, so perhaps I'm just missing the obvious. Thanks for any help on this. This is a principal problem in OO - behavior shall not only change based on the object, but also the context it's used in. I think the decorator-pattern is the best thing here (in lieu of python having a hash-argument to sets). Please don't forget to re-define equality also! class OtherCriteriaDecorator(object): def __init__(self, delegate): self._delegate = delegate def __hash__(self): return hash_based_on_delegate def __eq__(self, other): return equality_based_on_delegate HTH, Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Chris Rebert wrote: On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham austin.bing...@gmail.com wrote: If I understand things correctly, the set class uses hash() universally to calculate hash values for its elements. Is there a standard way to have set use a different function? Say I've got a collection of objects with names. I'd like to create a set of these objects where the hashing is done on these names. Using the __hash__ function seems inelegant because it means I have to settle on one type of hashing for these objects all of the time, i.e. I can't create a set of them based on a different uniqueness criteria later. I'd like to create a set instance that uses, say, 'hash(x.name)' rather than 'hash(x)'. Is this possible? Am I just thinking about this problem the wrong way? Admittedly, I'm coming at this from a C++/STL perspective, so perhaps I'm just missing the obvious. Thanks for any help on this. You could use wrapper objects that define an appropriate __hash__(): #*completely untested* class HashWrapper(object): def __init__(self, obj, criteria): self._wrapee = obj self._criteria = criteria #override __hash__() / hash() def __hash__(self): return hash(self._criteria(self._wrapee)) #proxying code def __getattr__(self, name): return getattr(self._wrapee, name) def __setattr__(self, name, val): setattr(self._wrapee, name, val) This doesn't work for conflicting elements, as the __eq__-method isn't overriden. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham wrote: That's definitely a workable solution, but it still rubs me the wrong way. The uniqueness criteria of a set seems, to me, like a property of the set, whereas the python model forces it onto each set element. This is a POV, but to to me, the set just deals with a very minimal protocol - hash-value equality. Whatever you feed it, it has to cope with that. It strikes *me* as odd to ask for something else. The ideal solution would be to have several hash/eq-methods on your object, and somehow make the surroundings decide which to use. But there is no OO-pragma for that. Another issue I have with the HashWrapper approach is its space requirements. Logically, what I'm asking to do is switch out a single function reference (i.e. to point at get_name() rather than hash()), but in practice I'm forced to instantiate a new object for each of my set members. On a large set, this could be disastrous. Don't get me wrong...your solution is a good one, but it's just not what I am looking for. It is the only one you have, short of implementing set your own. And then the terms of implementation would be that you pass a hash- and eq-function create wrapper-objects internally. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham wrote: On Thu, Oct 15, 2009 at 2:23 PM, Diez B. Roggisch de...@nospam.web.de wrote: Austin Bingham wrote: This is a POV, but to to me, the set just deals with a very minimal protocol - hash-value equality. Whatever you feed it, it has to cope with that. It strikes *me* as odd to ask for something else. But I'm not really asking for anything that changes the paradigm of how things currently work. All I need is the ability to say something like this: s = set(hash_func = lambda x: hash(x.name)) If set could be de-hardcoded from using hash(), nothing else about how it works would have to change. Or am I wrong on that? I see that you mention equality in the protocol, but I don't know why a set would need that if a) hash(x) == hash(y) -- x == y and b) hash function must return a 32 bit value (which I think is the case)...so maybe I misunderstand something. You do. Hashes can collide, and then you need equality. Sets are *based* on equality actually, the hash is just one optimization. Other optimizations build on order (usually, the STL requires you to define on objects for that. Which is enough because a == b = a b b a). But eventually, it all boils down to equality. You could implement a set without hash, by imposing linear behavior on insert and lookup - but none based purely on a hash. I wonder...does the requirement of using hash() have to do with the low-level implementation of set? That might explain the state of things. The ideal solution would be to have several hash/eq-methods on your object, and somehow make the surroundings decide which to use. But there is no OO-pragma for that. But why force objects to know about all of the wacky contexts in which they might be used? Let objects be objects and hash-functions be hash-functions. If someone wants a set where uniqueness is defined by the number of occurrences of 'a' in an object's name, the object should never have to know that...it should just have a name. Because these functions need intimate knowledge of the objects to actually work. Sure, in python it's easy to define them outside, and just reach into the object as you wish. But aside this implementation detail, a hash (and equality of course) always are based upon the object in question. So I think it's natural to have it defined right there (which __hash__ and __eq__ show that this seems to be accepted in general). And if there were something that would decide on context which of several implementations to use, you'd have less to worry. As things are, there isn't such thing (I don't even have the slightest idea what could work), you are as well off with defining two functions. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 3:02 PM, Diez B. Roggisch de...@nospam.web.de wrote: Austin Bingham wrote: You do. Hashes can collide, and then you need equality. Sets are *based* on equality actually, the hash is just one optimization. ... Right, thanks for clearing that up. Not reading closely enough == public shaming ;) Because these functions need intimate knowledge of the objects to actually work. Sure, in python it's easy to define them outside, and just reach into the object as you wish. But aside this implementation detail, a hash (and equality of course) always are based upon the object in question. So I think it's natural to have it defined right there (which __hash__ and __eq__ show that this seems to be accepted in general). And if there were something that would decide on context which of several implementations to use, you'd have less to worry. As things are, there isn't such thing (I don't even have the slightest idea what could work), you are as well off with defining two functions. But this context decider you're talking about sounds exactly like what I'm talking about. If I had some class with __hash1__ and __hash2__ (and associated equality operators), you're saying it would be nice to be able to select which to use based on the context (e.g. what type of set I'm using.) It might look like this: s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y: x.__eq2__(y)) And if *that* works for you, do you still have a problem with: s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y: x.name == y.name) ? If you don't like those, what would work for you? As far as I can tell, your context decider and my alternative hash functions are the same thing, and we just need to agree on a syntax. -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
And if there were something that would decide on context which of several implementations to use, you'd have less to worry. As things are, there isn't such thing (I don't even have the slightest idea what could work), you are as well off with defining two functions. But this context decider you're talking about sounds exactly like what I'm talking about. If I had some class with __hash1__ and __hash2__ (and associated equality operators), you're saying it would be nice to be able to select which to use based on the context (e.g. what type of set I'm using.) It might look like this: s = set(hash_func = lambda x: x.__hash2__, eq_func = lambda x, y: x.__eq2__(y)) And if *that* works for you, do you still have a problem with: s = set(hash_func = lambda x: hash(x.name), eq_func = lambda x,y: x.name == y.name) ? If you don't like those, what would work for you? As far as I can tell, your context decider and my alternative hash functions are the same thing, and we just need to agree on a syntax. The context-decider isn't the same thing because it isn't designed yet :) And most probably won't ever be. It's just the abstract idea that hashing/equality change for one object depending on the circumstances they are used in, and that one wishes that the decision what to use is as simple transparent as possible. Your approach is certainly viable, but I guess the current set-implementation is optimized on working with __hash__ and __eq__ on objects because for these exist slots in the python object structure in C. So while you can implement your idea, you will end up passing wrappers as Chris me proposed into the current set implementation. However, it might be worth thinking of proposing this change to the set-type in general. But then for orthogonality, dicts should have it as well I guess. Which opens a whole new can of worms. I'm undecided on this. I think the cure is straight forward simple to implement, and once you have the level to understand need different hashing/equality, you should be comfortable writing this simple wrapper yourself. So for me personally, it's not a needed addition to the language. YMMV, and thus - go for it if you like. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham schrieb: I guess we see things differently. I think it's quite natural to want a set of unique objects where unique is defined as an operation on some subset/conflation/etc. of the attributes of the elements. What you seem to imply is that the hash function imposes some kind of uniqueness constraint on the set which uses it. That's just not the case, the uniqueness constraint is always the (in-)equality of objects, and for this you can override __eq__. But then you must also ensure that x.__eq__(y) -- y.__eq__(x) x.__hash() == y.__hash__(). Diez' solution is the pythonic way, and BTW, mathematically and pythonically sound, wheras, if the hash function would really impose uniqueness: s1 = set(lambda x: hash(x)) s2 = set(lambda x: hash(x.name)) t1 = Buddy(name=jim) t2 = Buddy(name=joe) t3 = Buddy(name=joe) s1.add(t1) s1.add(t2) s1.add(t3) s2.add(t1) s2.add(t2) s2.add(t3) would probaly yield quite strange results: s3 = s2 | s1 # union. Is s3 == (t1,t2,t3) or is it (t1,t3)? s4 = s2 - s1 # difference. is s4 == () or do we get an Exception? In C++/STL the compiler would nag because s1 and s2 are of different types since the hash function is part of the set's type (AFAIR). With dynamic typing this isn't possible so you'd have to wait until runtime. I'd go with Diez' advice and use a dict instead. Or, you could do something Borg-ish (lookup Borg Pattern) restrained on names, where the constructor of a class ensures that x.name == y.name -- x == y. As far as using a dict, that doesn't really solve my problem. It could be part of a solution, I guess, but I would still need functionality extrinsic to the dict. Python isn't strong on encapsulation, so extrinsic functionality is quite common. Nothing to loose sleep over though, because, as Guido says, we're all consenting adults here :-) Mick. -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 3:43 PM, Diez B. Roggisch de...@nospam.web.de wrote: The context-decider isn't the same thing because it isn't designed yet :) And most probably won't ever be. It's just the abstract idea that hashing/equality change for one object depending on the circumstances they are used in, and that one wishes that the decision what to use is as simple transparent as possible. Fair enough :) Your approach is certainly viable, but I guess the current set-implementation is optimized on working with __hash__ and __eq__ on objects because for these exist slots in the python object structure in C. So while you can implement your idea, you will end up passing wrappers as Chris me proposed into the current set implementation. Yes, I figured that the guts of set relied on particulars to which we are not privy at the python level. If the syntax let me describe sets the way I've been laying out here, I could probably tolerate the underlying implementation relying on wrappers. However, it might be worth thinking of proposing this change to the set-type in general. But then for orthogonality, dicts should have it as well I guess. Which opens a whole new can of worms. dicts would certainly have to be looked at as well, but I don't think the can would have that many worms in it if we solve the set issue to everyone's satisfaction. In any event, thanks for helping me work through this issue. Austin -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Oct 15, 7:24 am, Austin Bingham austin.bing...@gmail.com wrote: [snip] I'd like to create a set of these objects where the hashing is done on these names. [snip] Why not use a dict? The key would be the object name. Pretty much the same behavior as a set (via the key), and you can still easily iterate over the objects. -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 3:50 PM, Mick Krippendorf mad.m...@gmx.de wrote: Austin Bingham schrieb: What you seem to imply is that the hash function imposes some kind of uniqueness constraint on the set which uses it. That's just not the case, the uniqueness constraint is always the (in-)equality of objects, and for this you can override __eq__. But then you must also ensure that x.__eq__(y) -- y.__eq__(x) x.__hash() == y.__hash__(). Yes, as was pointed out earlier, I was reading too much into the hash function's role. However, given well behaved hash and equality (which would be a great name for a band), I don't see why those functions need to be defined on the object itself, per se. Right now that's the case because hash() only knows how to call obj.__hash__ (etc. for __eq__). I guess a good analog for what I'm thinking about is list.sort(). It's more than happy to take a comparison operator, and in spirit that's exactly what I'm looking for with sets. Diez' solution is the pythonic way, and BTW, mathematically and pythonically sound, wheras, if the hash function would really impose uniqueness: . . . Yes, my gray c++ roots are showing here; you're right that my brain was secretly expecting the compiler to take care of things. Your points about set operations are the strongest in this discussion, and I haven't got a great answer. Python isn't strong on encapsulation, so extrinsic functionality is quite common. I guess that's part of what's frustrating for me on this issue. Python is generally extremely flexible and open, but in this case I feel like I'm being shut out. Oh well...in the end it looks like there are ways to get what I need, if not what I want. Thanks for the input. Austin -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 4:06 PM, Anthony Tolle anthony.to...@gmail.com wrote: Why not use a dict? The key would be the object name. Pretty much the same behavior as a set (via the key), and you can still easily iterate over the objects. To reiterate, dict only gets me part of what I want. Whereas a set with uniqueness defined over 'obj.name' would guarantee no name collisions, dict only sorta helps me keep things straight; it doesn't actually enforce that my values have unique names. Austin -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Oct 15, 10:42 am, Austin Bingham austin.bing...@gmail.com wrote: On Thu, Oct 15, 2009 at 4:06 PM, Anthony Tolle To reiterate, dict only gets me part of what I want. Whereas a set with uniqueness defined over 'obj.name' would guarantee no name collisions, dict only sorta helps me keep things straight; it doesn't actually enforce that my values have unique names. I don't understand how a set would help you enforce that your values ave unique names. If you have two objects, both named foo, and added them to the set, no errors would occur. Please provide an example of how a set provides functionality that a dict (which enforces unique keys) doesn't. -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 5:22 AM, Diez B. Roggisch de...@nospam.web.de wrote: Chris Rebert wrote: On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham austin.bing...@gmail.com wrote: If I understand things correctly, the set class uses hash() universally to calculate hash values for its elements. Is there a standard way to have set use a different function? Say I've got a collection of objects with names. I'd like to create a set of these objects where the hashing is done on these names. Using the __hash__ function seems inelegant because it means I have to settle on one type of hashing for these objects all of the time, i.e. I can't create a set of them based on a different uniqueness criteria later. I'd like to create a set instance that uses, say, 'hash(x.name)' rather than 'hash(x)'. Is this possible? Am I just thinking about this problem the wrong way? Admittedly, I'm coming at this from a C++/STL perspective, so perhaps I'm just missing the obvious. Thanks for any help on this. You could use wrapper objects that define an appropriate __hash__(): #*completely untested* class HashWrapper(object): def __init__(self, obj, criteria): self._wrapee = obj self._criteria = criteria #override __hash__() / hash() def __hash__(self): return hash(self._criteria(self._wrapee)) #proxying code def __getattr__(self, name): return getattr(self._wrapee, name) def __setattr__(self, name, val): setattr(self._wrapee, name, val) This doesn't work for conflicting elements, as the __eq__-method isn't overriden. Indeed. Good catch. :) This is why I mark code as completely untested. Cheers, Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham austin.bing...@gmail.com escribió: On Thu, Oct 15, 2009 at 4:06 PM, Anthony Tolle anthony.to...@gmail.com wrote: Why not use a dict? The key would be the object name. Pretty much the same behavior as a set (via the key), and you can still easily iterate over the objects. To reiterate, dict only gets me part of what I want. Whereas a set with uniqueness defined over 'obj.name' would guarantee no name collisions, dict only sorta helps me keep things straight; it doesn't actually enforce that my values have unique names. I think you didn't understand correctly Anthony Tolle's suggestion: py class Foo: ... def __init__(self, name): self.name = name ... py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')] py objs [__main__.Foo instance at 0x00BB8238, __main__.Foo instance at 0x00BB83C8, __main__.Foo instance at 0x00BB7B20, __main__.Foo instance at 0x00BB7DC8] py d = dict((o.name, o) for o in objs) py d.keys() ['Jim', 'Joe', 'Tom'] py d.values() [__main__.Foo instance at 0x00BB7DC8, __main__.Foo instance at 0x00BB8238, __main__.Foo instance at 0x00BB7B20] keys in a dict are unique; values form the set you're looking for. If it's more complex that just a name, define a function: def my_funny_key_extractor(item): return some(operations[on](item)) d = dict((my_funny_key_extractor(o), o) for o in objs) If you feel so inclined, you could implement the MutableSet ABC based on this and have a full featured set implementation. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 5:15 PM, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham austin.bing...@gmail.com escribió: I think you didn't understand correctly Anthony Tolle's suggestion: py class Foo: ... def __init__(self, name): self.name = name ... py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')] py objs I understand Anthony perfectly. Yes, I can construct a dict as you specify, where all of the keys map to values with name attributes equal to the key. My point is that dict doesn't really help me enforce that beyond simply letting me set it up; it doesn't care about the values at all, just the keys. All that I'm asking for, and I think it's been made pretty clear, is a set that let's me define a uniqueness criteria function other than hash(). As has been thrashed out already, this is not as straightforward as I might have liked. To put it in code, I want this: s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...) ... x.name = 'foo' y.name = 'foo' s.add(x) s.add(y) # no-op because of uniqueness criteria assert len(s) == 1 Austin -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, 15 Oct 2009 09:11:00 -0700, Austin Bingham austin.bing...@gmail.com wrote: On Thu, Oct 15, 2009 at 5:15 PM, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham austin.bing...@gmail.com escribió: I think you didn't understand correctly Anthony Tolle's suggestion: py class Foo: ... def __init__(self, name): self.name = name ... py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')] py objs I understand Anthony perfectly. Yes, I can construct a dict as you specify, where all of the keys map to values with name attributes equal to the key. My point is that dict doesn't really help me enforce that beyond simply letting me set it up; it doesn't care about the values at all, just the keys. Perhaps this is an overly naive solution, but could you not define a class that implemented the set interface but used a dict for internal storage, and use that? You'd still have uniqueness (by dict key, which your class would define as object name) and as a bonus, retrievability by name, which set wouldn't give you. -- Rami Chowdhury Never attribute to malice that which can be attributed to stupidity -- Hanlon's Razor 408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD) -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Oct 15, 12:11 pm, Austin Bingham austin.bing...@gmail.com wrote: To put it in code, I want this: s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...) ... x.name = 'foo' y.name = 'foo' s.add(x) s.add(y) # no-op because of uniqueness criteria assert len(s) == 1 I wrote a quick subclass of set that does something similar, but uses just one function for the object uniqueness: class MySet(set): def __init__(self, iterable = (), idfunc = lambda x: x): self.idfunc = idfunc self.ids = set() for item in iterable: self.add(item) def add(self, item): id = self.idfunc(item) if id not in self.ids: self.ids.add(id) set.add(self, item) class Foo(object): ... def __init__(self, name): ... self.name = name ... x = Foo('foo') y = Foo('foo') s = MySet(idfunc = lambda x: x.name) s.add(x) s MySet([__main__.Foo object at 0x00A89F90]) s.add(y) s MySet([__main__.Foo object at 0x00A89F90]) Is this what you are looking for? -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
En Thu, 15 Oct 2009 13:24:18 -0300, Rami Chowdhury rami.chowdh...@gmail.com escribió: On Thu, 15 Oct 2009 09:11:00 -0700, Austin Bingham austin.bing...@gmail.com wrote: On Thu, Oct 15, 2009 at 5:15 PM, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Thu, 15 Oct 2009 11:42:20 -0300, Austin Bingham austin.bing...@gmail.com escribió: I think you didn't understand correctly Anthony Tolle's suggestion: py class Foo: ... def __init__(self, name): self.name = name ... py objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')] py objs I understand Anthony perfectly. Yes, I can construct a dict as you specify, where all of the keys map to values with name attributes equal to the key. My point is that dict doesn't really help me enforce that beyond simply letting me set it up; it doesn't care about the values at all, just the keys. Perhaps this is an overly naive solution, but could you not define a class that implemented the set interface but used a dict for internal storage, and use that? You'd still have uniqueness (by dict key, which your class would define as object name) and as a bonus, retrievability by name, which set wouldn't give you. Like this? from collections import MutableSet class KeyedSet(MutableSet): A set with a custom element comparison function. def __init__(self, iterable, key=lambda x: x): Create a KeyedSet from iterable; key function determines uniqueness. `key` must be a callable taking a single argument; it is applied to every item in the iterable, and its result is used to determine set membership. That is, if key(item) returns the same thing for two items, only one of them will be in the set. self._items = dict((key(item),item) for item in iterable) self._key = key # NOT a classmethod because self.key must be transferred too! # Fortunately it is always called as self._from_iterable(...) # in _abccoll.py def _from_iterable(self, iterable): return type(self)(iterable, key=self._key) def add(self, value): Return True if it was added, False if already there. key = self._key(value) if key in self._items: return False self._items[key] = value return True def discard(self, value): Return True if it was deleted, False if not there. key = self._key(value) if key not in self._items: return False del self._items[key] return True def clear(self): self._items.clear() def copy(self): return type(self)(self._items.values(), key=self._key) def __iter__(self): return iter(self._items.values()) def __contains__(self, value): try: return self._key(value) in self._items except Exception: return False def __len__(self): return len(self._items) def __repr__(self): return %s(%r) % (type(self).__name__, self._items.values()) def demo(): class Foo(object): def __init__(self, name): self.name = name def __repr__(self): return %s(%r) % (type(self).__name__, self.name) objs = [Foo('Joe'), Foo('Jim'), Foo('Tom'), Foo('Jim')] print objs s = KeyedSet(objs, key=lambda o:o.name) print len(s), s s.add(Foo('Joe')) print len(s), s moe = Foo('Moe') print moe in s, moe in s s.add(moe) print moe in s, moe in s print 'moe' in s, 'moe' in s s2 = set([Foo('Luc'), Foo('Jim'), Foo('Dan')]) print s | s2, s | s2 print s s2, s s2 s3 = KeyedSet(s2, key=lambda o:o.name) print s3 - s, s3 - s print s - s3, s - s3 print s.copy(), s.copy() demo() -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
Austin Bingham wrote: Yes, I can construct a dict as you specify, where all of the keys map to values with name attributes equal to the key. My point is that dict doesn't really help me enforce that beyond simply letting me set it up; it doesn't care about the values at all, just the keys. All that I'm asking for, and I think it's been made pretty clear, is a set that let's me define a uniqueness criteria function other than hash(). As has been thrashed out already, this is not as straightforward as I might have liked. To put it in code, I want this: s = set(hash_func = lambda obj: hash(obj.name), eq_func = ...) ... x.name = 'foo' y.name = 'foo' s.add(x) s.add(y) # no-op because of uniqueness criteria assert len(s) == 1 Austin I'm feeling really dense about now... What am I missing? class obj(object): def __init__(self, id, value): self.id = id self.value = value def __eq__(self, other): if isinstance(other, obj): return self.id == other.id raise NotImplemented def __hash__(self): return hash(self.id) def __repr__(self): return obj(%s, %s) % (self.id, self.value) Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. -- from obj import obj -- s1 = set() -- d1 = dict() -- o1 = obj(17, 'first') -- o2 = obj(18, 'second') -- o3 = obj(17, 'third') -- o1, o2, o3 (obj(17, first), obj(18, second), obj(17, third)) -- s1.add(o1) -- s1.add(o2) -- s1.add(o3) -- d1[o1.id] = o1 -- d1[o2.id] = o2 -- d1[o3.id] = o3 -- s1 set([obj(17, first), obj(18, second)]) -- d1 {17: obj(17, third), 18: obj(18, second)} -- Ah ha! No-op indeed! If a set already has an item, the new item is completely ignored, whereas if a dict already has a key, the new key's value replaces the current value already in the dict. Learn something new everyday! I'm still not sure I understand your concern about the values in a set, though. Sets keep the first object of a given key, dicts keep the last object of a given key; in both cases, all other objects with the same key are lost. So is that the behavior you're wanting, keeping the first object and discarding all others? Or is there something else I'm still missing? ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Oct 15, 1:49 pm, Ethan Furman et...@stoneleaf.us wrote: I'm still not sure I understand your concern about the values in a set, though. Sets keep the first object of a given key, dicts keep the last object of a given key; in both cases, all other objects with the same key are lost. So is that the behavior you're wanting, keeping the first object and discarding all others? Or is there something else I'm still missing? I think that without a practical example of what this would be used for, we're all going to be a little lost on this one. So far, we've not seen the original problem, only the author's preferred method for solving it. My guess is there are other, more pythonic ways to solve the original problem. -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 7:05 PM, Anthony Tolle anthony.to...@gmail.com wrote: I wrote a quick subclass of set that does something similar, but uses just one function for the object uniqueness: class MySet(set): def __init__(self, iterable = (), idfunc = lambda x: x): self.idfunc = idfunc self.ids = set() for item in iterable: self.add(item) def add(self, item): id = self.idfunc(item) if id not in self.ids: self.ids.add(id) set.add(self, item) Yes, what you've got there provides the interface of what I want. And no doubt we could concoct countless ways to implement some version of this. However, if set itself accepted an alternate hash function, then I could do it with no extra overhead. Consider your implementation: it requires an extra set (extra space) and an extra lookup on many operations (extra time.) My original hope was to not have to do that. Austin -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 7:49 PM, Ethan Furman et...@stoneleaf.us wrote: Austin Bingham wrote: I'm feeling really dense about now... What am I missing? What you're missing is the entire discussion up to this point. I was looking for a way to use an alternative uniqueness criteria in a set instance without needing to modify my class. So is that the behavior you're wanting, keeping the first object and discarding all others? Or is there something else I'm still missing? Yes and yes. I want normal set behavior, but I want the set to use user-provided hash and equality tests, i.e. ones that don't necessarily call __hash__ and __eq__ on the candidate elements. Austin -- http://mail.python.org/mailman/listinfo/python-list
Re: set using alternative hash function?
On Thu, Oct 15, 2009 at 8:10 PM, Anthony Tolle anthony.to...@gmail.com wrote: I think that without a practical example of what this would be used for, we're all going to be a little lost on this one. So far, we've not seen the original problem, only the author's preferred method for solving it. My guess is there are other, more pythonic ways to solve the original problem. The original problem was just a question statement: can I use alternative uniqueness functions on a set? Indeed, I proposed an idea, which that was sets could be constructed with user-defined hash and equality functions, the strengths and weaknesses of which have been gone over in some detail. The short answer is that what I was looking for (admittedly, a desire motivated by experiences in other languages) is not really feasible, at least not without a great deal more work. The other proposed solutions all require linearly extra storage, linearly extra time, both, or simply don't solve the problem. And in any event, they miss the point of the original post which was not How can I get a particular behavior? (which is fairly trivial, both in my own estimation and as evidenced by the abundance of proposals) but Can I get a particular behavior in a particular way? (the answer to which, again, seems to be no.) Austin -- http://mail.python.org/mailman/listinfo/python-list