I'm a relative newbie to Python, so please bear with me. There are currently two standard modules used to access archived data: zipfile and tarfile. The interfaces are completely different. In particular, script wanting to analyze different types of archives must duplicate substantial pieces of logic. The problem is not limited to method names; it includes how stat-like information is accessed.
I think it would be a good thing if a standardized interface existed, similar to PEP 247. This would make it easier for one script to access multiple types of archives, such as RAR, 7-Zip, ISO, etc. In particular, a single factory class could produce PEP 302 import hooks for future as well as current archive formats. I think that an archive module adhering to the standard should adopt a least-common-denominator approach, initially supporting read-only access without seek, i.e. tar files on actual tape. For applications that require a seek method (such as importers) a standard wrapper class could transparently cache archive members in temp files; this would fit in well with Python 3000's rewrite of the I/O interface to support stackable interfaces. To this end, we'd need is_seekable and is_writable attributes for both the module and instances (moduel level would declare if something is possible, not if it is always true). Most importantly, all archive modules should provide a standard API for accessing their individual files via a single archive_content class that provides a standard 'read' method. Less importantly but nice to have would be a way for archives to be auto-magically scanned during walks of directories. Feedback? -- http://mail.python.org/mailman/listinfo/python-list