Steve D'Aprano wrote: > I have some key:value data where the keys often are found in contiguous > ranges with identical values. For example: > > {1: "foo", > 2: "foo", > 3: "foo", > # same for keys 4 through 99 > 100: "foo", > 101: "bar", > 102: "bar", > 103: "foobar", > 104: "bar", > 105: "foo", > } > > > So in this case, the keys 1 through 100, plus 105, all have the same > value. > > I have about a million or two keys, with a few hundred or perhaps a few > thousand distinct values. The size of each contiguous group of keys with > the same value can vary from 1 to perhaps a hundred or so. > > Has anyone dealt with data like this and could give a recommendation of > the right data structure to use? > > The obvious way is to explicitly list each key, in a regular dict. But I > wonder whether there are alternatives which may be better (faster, more > memory efficient)?
Use a list, either directly if there are no big holes >>> r = range(10**5) >>> sys.getsizeof(list(r))/sys.getsizeof(dict(zip(r, r))) 0.14306676635590074 or indirectly: ranges_list = [ 1, 101, 103, 104, 105, 106, ] index_to_value = { 1: "foo", 2: "bar", 3: "foobar", 4: "bar", 5: "foo", } def get(key, default="missing"): index = bisect.bisect(ranges_list, key) return index_to_value.get(index, default) -- https://mail.python.org/mailman/listinfo/python-list