Case-insensitive dict, non-destructive, fast, anyone?
I need a dict (well, it would be optimal anyway) class that stores the keys as strings without coercing the case to upper or lower, but still provides fast lookup (i.e. uses hash table). >> d = CiDict([('Hi', 12),('hoho',13)]) >> d['hi'] 12 >> d.keys() ['Hi','hoho'] Note that 'Hi' preserved the case. I imagine that 'Hi' and 'hi' would need to share the same hash value in order for the lookup to be fast. Anyone have a an implementation that I could use? Quick googling only produced implementations that coerce all keys to lowercase. -- Ville Vainio http://tinyurl.com/2prnb -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
Ville Vainio wrote: I need a dict (well, it would be optimal anyway) class that stores the keys as strings without coercing the case to upper or lower, but still provides fast lookup (i.e. uses hash table). Store the original key together with the value and use a lowercase key for lookup. only a sketch: class MyDict: def __init__ (self): self.inner = {} def __setitem__ (self, key, value): self.inner [key.lower ()] = (key, value] def __getitem__ (self, key): realkey, realvalue = self.inner [self] return realvalue def get (self, key, default = None): try: return self [key] except KeyError: return default # or: return self.inner.get (key.lower (), (None, default)) [1] def keys (self): return [realkey for realkey, realval in self.inner.values ()] def values (self): return [realval for realkey, realval in self.inner.values )] def items (): return self.inner.values () # iterators are left as an exercise -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
> "Daniel" == Daniel Dittmar <[EMAIL PROTECTED]> writes: Daniel> Ville Vainio wrote: >> I need a dict (well, it would be optimal anyway) class that >> stores the keys as strings without coercing the case to upper >> or lower, but still provides fast lookup (i.e. uses hash >> table). Daniel> Store the original key together with the value and use a Daniel> lowercase key for lookup. That's what I thought initially, but the strings take most of the space in dict and I didn't feel like doubling the size. It would be the "simplest thing that could possibly work", though. -- Ville Vainio http://tinyurl.com/2prnb -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
Ville Vainio wrote: "Daniel" == Daniel Dittmar <[EMAIL PROTECTED]> writes: Daniel> Ville Vainio wrote: >> I need a dict (well, it would be optimal anyway) class that >> stores the keys as strings without coercing the case to upper >> or lower, but still provides fast lookup (i.e. uses hash >> table). Daniel> Store the original key together with the value and use a Daniel> lowercase key for lookup. That's what I thought initially, but the strings take most of the space in dict and I didn't feel like doubling the size. You could write a string wrapper that changes comparison and hashing. I'm not sure that this (+ 1 (Python object + instance dictionary)) would use less memory than the other proposal (+ 1 Tuple + 1 String). Daniel -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
On 01 Apr 2005 15:55:58 +0300, Ville Vainio <[EMAIL PROTECTED]> wrote: >> "Daniel" == Daniel Dittmar <[EMAIL PROTECTED]> writes: > >Daniel> Ville Vainio wrote: > >>> I need a dict (well, it would be optimal anyway) class that >>> stores the keys as strings without coercing the case to upper >>> or lower, but still provides fast lookup (i.e. uses hash >>> table). > >Daniel> Store the original key together with the value and use a >Daniel> lowercase key for lookup. > >That's what I thought initially, but the strings take most of the >space in dict and I didn't feel like doubling the size. > >It would be the "simplest thing that could possibly work", though. Try access the keys indirectly though another dictionary. That way you don't have to change the original. Lkeys = {} for k dict.keys(): Lkeys[ k.lower] = dict[k] Then use: value = dict[ Lkeys[ key.lower() ] ] To get your value from the original dictionary. Watch out for duplicate keys in differing case in the original dict. Ron -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
[Ville Vainio] > I need a dict (well, it would be optimal anyway) class that stores the > keys as strings without coercing the case to upper or lower, but still > provides fast lookup (i.e. uses hash table). >>> class S(str): def __hash__(self): return hash(self.lower()) def __eq__(self, other): return self.lower() == other.lower() >>> d = {} >>> d[S('ThE')] = 'quick' >>> d[S('the')] 'quick' >>> d {'ThE': 'quick'} Raymond Hettinger -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
On Fri, 01 Apr 2005 18:52:00 GMT, "Raymond Hettinger" <[EMAIL PROTECTED]> wrote: >[Ville Vainio] >> I need a dict (well, it would be optimal anyway) class that stores the >> keys as strings without coercing the case to upper or lower, but still >> provides fast lookup (i.e. uses hash table). > > class S(str): >def __hash__(self): >return hash(self.lower()) >def __eq__(self, other): >return self.lower() == other.lower() > > d = {} d[S('ThE')] = 'quick' d[S('the')] >'quick' d >{'ThE': 'quick'} > Building on your S to sneak in a generalized idea ;-) >>> class idict(dict): ... def __setitem__(self, key, value): ... dict.__setitem__(self, S(key), value) ... def __getitem__(self, key): ... return dict.__getitem__(self, S(key)) ... >>> d = idict() >>> d['ThE'] = 'quick' >>> d['the'] 'quick' >>> d {'ThE': 'quick'} Ok, the idea: I wonder if a dict with a general override hook for hashing all keys would be useful. E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__() but that you could override. Seems like this could potentially be more efficient than key wrappers, and would also make it so you wouldn't have to chase all the affected methods in doing an idict like the above (e.g., get, setdefault, update etc. etc.) Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
[Bengt Richter] > I wonder if a dict with a general override hook for hashing all keys would be useful. > E.g., a dict.__keyhash__ that would take key as arg and default as now returning key.__hash__() > but that you could override. Seems like this could potentially be more efficient than key wrappers, > and would also make it so you wouldn't have to chase all the affected methods in doing an idict > like the above (e.g., get, setdefault, update etc. etc.) There has also been a discussion of adding a seqdict that maintains a keys in insertion order. Another idea was to have a defaultdict that could be instructed to create values as necessary (either zero for counting or [] for list building). Putting the three ideas together, perhaps we should write an extension module with a custom dictionary type with methods/attributes implementing those ideas. Essentially, the thought is to put all the bells and whistles in one place. If the extension became popular, it could ultimately wend its way into the collections module. A facetious naming idea would be to call it TrickyDict, a relative to DictMixin and no relation to a former U.S. president ;-) t = TrickyDict() t.keytransform(str.lower) t['abC'] = 1 # case insensitive dictionary assert t['Abc'] == 1 t = TrickyDict() t.setdefaultvalue(0) t['x'] += 1 # very bag like assert t['x'] == 1 t = TrickyDict() t.setdefaultfunction(list) t['x'] += ['first'] # rather like: t.setdefault('x', []).append('first') t['x'] += ['second'] assert t == {'x': ['first', 'second']} t = TrickyDict() t.sortedkeys = True t['x'] = 1 t['y'] = 2 assert t.keys() == ['x', 'y'] # order is guaranteed This universal dictionary type could also incorporate some methods for other recurrent themes such as a inverting a dictionary: def invdict(self) return self: dict((v,k) for k,v in self.iteritems()). For the performance minded, there could also be a control for dictionary speed/space efficiency: t = TrickyDict() t.maxkeydensity = 40 # resize whenever the dictionary is more than 40% full Taken together, these six attributes/methods could cover many wished for features for the 10% of the cases where a regular dictionary doesn't provide the best solution. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
On Fri, 01 Apr 2005 23:04:42 GMT, "Raymond Hettinger" <[EMAIL PROTECTED]> wrote: >[Bengt Richter] >> I wonder if a dict with a general override hook for hashing all keys would be >useful. >> E.g., a dict.__keyhash__ that would take key as arg and default as now >returning key.__hash__() >> but that you could override. Seems like this could potentially be more >efficient than key wrappers, >> and would also make it so you wouldn't have to chase all the affected methods >in doing an idict >> like the above (e.g., get, setdefault, update etc. etc.) > >There has also been a discussion of adding a seqdict that maintains a keys in >insertion order. Another idea was to have a defaultdict that could be >instructed to create values as necessary (either zero for counting or [] for >list building). Putting the three ideas together, perhaps we should write an >extension module with a custom dictionary type with methods/attributes >implementing those ideas. Essentially, the thought is to put all the bells and >whistles in one place. If the extension became popular, it could ultimately >wend its way into the collections module. > >A facetious naming idea would be to call it TrickyDict, a relative to DictMixin >and no relation to a former U.S. president ;-) > >t = TrickyDict() >t.keytransform(str.lower) >t['abC'] = 1 # case insensitive dictionary >assert t['Abc'] == 1 assert t.keys() == ['abC']# actual keys unchanged (but transformed for hash and cmp) [...] > >Taken together, these six attributes/methods could cover many wished for >features for the 10% of the cases where a regular dictionary doesn't provide >the >best solution. You think as much as 10% ? Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
> >Taken together, these six attributes/methods could cover many wished for > >features for the 10% of the cases where a regular dictionary doesn't provide the > >best solution. > You think as much as 10% ? Rounded up from 9.6 ;-) More important than the percentage is the clarity of the resulting code and the avoidance of continous reinvention of workarounds. Separating tool features into a basic and an advanced version is common solution to managing option/customization complexity. Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
"Raymond Hettinger" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > More important than the percentage is the clarity of the resulting code > and the > avoidance of continous reinvention of workarounds. > > Separating tool features into a basic and an advanced version is common > solution > to managing option/customization complexity. A super bells & whistles dict would certainly pre-answer a lot of queries and save much newsgroup/list response time. tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: Case-insensitive dict, non-destructive, fast, anyone?
> "Bengt" == Bengt Richter <[EMAIL PROTECTED]> writes: Bengt> I wonder if a dict with a general override hook for hashing Bengt> all keys would be useful. E.g., a dict.__keyhash__ that Bengt> would take key as arg and default as now returning Bengt> key.__hash__() but that you could override. Seems like this There would need to be an override hook for key comparison as well (I suppose it always uses == operation now?). But yes, I think it would be *much* more useful than any 'keytransform' feature - is there any use for 'keytransform' feature apart from the uses that would be better covered by hash/comparison hooks? It would be lovely to have something like this in the stdlib (or anywhere, for that matter). Think about the use cases for hashing via by os.path.normcase, str.lower... -- Ville Vainio http://tinyurl.com/2prnb -- http://mail.python.org/mailman/listinfo/python-list