Am 05.10.2005 um 00:08 schrieb Martin v. Löwis: > Walter Dörwald wrote: > >>> This array would have to be sparse, of course. >>> >> For encoding yes, for decoding no. >> > [...] > >> For decoding it should be sufficient to use a unicode string of >> length 256. u"\ufffd" could be used for "maps to undefined". Or >> the string might be shorter and byte values greater than the >> length of the string are treated as "maps to undefined" too. > > Right. That's what I meant with "sparse": you somehow need to > represent > "no value".
OK, but I don't think that we really need a sparse data structure for that. I used the following script to check that: ----- import sys, os.path, glob, encodings has = 0 hasnt = 0 for enc in glob.glob("%s/*.py" % os.path.dirname(encodings.__file__)): enc = enc.rsplit(".")[-2].rsplit("/")[-1] try: __import__("encodings.%s" % enc) codec = sys.modules["encodings.%s" % enc] except: pass else: if hasattr(codec, "decoding_map"): print codec for i in xrange(0, 256): if codec.decoding_map.get(i, None) is not None: has += 1 else: hasnt += 1 print "assigned values:", has, "unassigned values:", hasnt ---- It reports that in all the charmap codecs there are 15292 assigned byte values and only 324 unassigned ones. I.e. only about 2% of the byte values map to "undefined". Storing those codepoints in the array as U+FFFD would only need 648 (or 1296 for wide builds) additional bytes. I don 't think a sparse data structure could beat that. >> This might work, although nobody has complained about charmap >> encoding yet. Another option would be to generate a big switch >> statement in C and let the compiler decide about the best data >> structure. > I would try to avoid generating C code at all costs. Maintaining > the build processes will just be a nightmare. Sounds resonable. Bye, Walter Dörwald _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com