On Thu, 13 Jan 2005 05:18:57 GMT, Bengt Richter <[EMAIL PROTECTED]> wrote: > On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <[EMAIL PROTECTED]> wrote: > > >On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <[EMAIL PROTECTED]> wrote: > >> extensiondict = dict( > >> php = 'application/x-php', > >> cpp = 'text/x-c-src', > >> # etcetera > >> xsl = 'test/xsl' > >> ) > >> > >> def detectMimeType(filename): > >> extension = os.path.splitext(filename)[1].replace('.', '') > extension = os.path.splitext(filename)[1].replace('.', '').lower() > # better > > >> try: return extensiondict[extension] > >> except KeyError: > >> basename = os.path.basename(filename) > >> if "Makefile" in basename: return 'text/x-makefile' # XXX case > >> sensitivity? > >> raise NoMimeError > > > >Why not use a regexp based approach. > ISTM the dict setup closely reflects the OP's if/elif tests and makes for an > efficient substitute > for the functionality when later used for lookup. The regex list is O(n) and > the regexes themselves > are at least that, so I don't see a benefit. If you are going to loop through > extensionlist, you > might as well write (untested) <code snipped>
*shrug*, O(n*m) actually, where n is the number of mime-types and m is the length of the extension. > >extensionlist = [ > >(re.compile(r'.*\.php') , "application/x-crap-language"), > >(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'), > >(re.compile(r'[Mm]akefile') , 'text/x-makefile'), > >] > >for regexp, mimetype in extensionlist: > > if regexp.match(filename): > > return mimetype > > > >if you were really concerned about efficiency, you could use something like: > >class SimpleMatch: > > def __init__(self, pattern): self.pattern = pattern > > def match(self, subject): return subject[-len(self.pattern):] == > > self.pattern > > I'm not clear on what you are doing here, but if you think you are going to > compete > with the timbot's dict efficiency with a casual few lines, I suspect you are > PUI ;-) > (Posting Under the Influence ;-) Sorry about that, what I was trying to say was something along the lines of: extensionlist = [ (re.compile(r'.*\.php') , "application/x-crap-language"), (re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'), (re.compile(r'[Mm]akefile') , 'text/x-makefile'), ] can be made more efficient by doing something like this: extensionlist = [ SimpleMatch(".php"), "application/x-crap-language"), (re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'), (re.compile(r'[Mm]akefile') , 'text/x-makefile'), ] Where SimpleMatch uses a slice and a comparison instead of a regular expression engine. SimpleMatch and re.compile both return an object that when you call .match(s) returns a value that can be interpreted as a boolean. As for the overall efficiency concerns, I feel that talking about any of this is premature optimisation. The optimisation that is really required in this situation is the same as with any large-switch-statement idiom, be it C or Python. First one must do a frequency analysis of the inputs to the switch statement in order to discover the optimal order of tests! Regards, Stephen Thorne -- http://mail.python.org/mailman/listinfo/python-list