#17896: Implement file hash computation as a separate method of staticfiles'
CachedFilesMixin
-------------------------------------+-------------------------------------
     Reporter:  mkai                 |      Owner:  nobody
         Type:                       |     Status:  new
  Cleanup/optimization               |    Version:  SVN
    Component:  contrib.staticfiles  |   Keywords:
     Severity:  Normal               |  CachedStaticFilesStorage, hash, md5
 Triage Stage:  Unreviewed           |  Has patch:  1
Easy pickings:  0                    |      UI/UX:  0
-------------------------------------+-------------------------------------
 I'm currently extending django-cumulus (Rackspace Cloud Files storage
 backend) and S3BotoStorage (Amazon S3 storage backend) to use hashed file
 names using the CachedFilesMixin of contrib.staticfiles. That came down to
 (as intended, I suppose!):

 {{{
 class MyCachedCloudFilesStorage(CachedFilesMixin, CloudFilesStorage):
     pass

 }}}

 Now I noticed that both Amazon and Rackspace include file hashes for each
 file in their API responses. It the response (e. g. a file list) is cached
 appropriately, the post-processing of the CachedFilesMixin can be sped up
 dramatically by just getting the hash from the cached response instead of
 calculating it from the file (which is expensive because it reads the file
 over the network).

 I propose the attached patch so that backend authors can implement their
 own method of getting the hash for a file:

 {{{
 diff --git a/django/contrib/staticfiles/storage.py
 b/django/contrib/staticfiles/storage.py
 index c000f97..e59b987 100644
 --- a/django/contrib/staticfiles/storage.py
 +++ b/django/contrib/staticfiles/storage.py
 @@ -65,6 +65,13 @@ class CachedFilesMixin(object):
                  compiled = re.compile(pattern)
                  self._patterns.setdefault(extension, []).append(compiled)

 +    def get_file_hash(self, name, content=None):
 +        """Gets the MD5 hash of the file."""
 +        md5 = hashlib.md5()
 +        for chunk in content.chunks():
 +            md5.update(chunk)
 +        return md5.hexdigest()[:12]
 +
      def hashed_name(self, name, content=None):
          parsed_name = urlsplit(unquote(name))
          clean_name = parsed_name.path.strip()
 @@ -79,13 +86,9 @@ class CachedFilesMixin(object):
                  return name
          path, filename = os.path.split(clean_name)
          root, ext = os.path.splitext(filename)
 -        # Get the MD5 hash of the file
 -        md5 = hashlib.md5()
 -        for chunk in content.chunks():
 -            md5.update(chunk)
 -        md5sum = md5.hexdigest()[:12]
 +        file_hash = self.get_file_hash(clean_name, content)
          hashed_name = os.path.join(path, u"%s.%s%s" %
 -                                   (root, md5sum, ext))
 +                                   (root, file_hash, ext))
 }}}

-- 
Ticket URL: <https://code.djangoproject.com/ticket/17896>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To post to this group, send email to django-updates@googlegroups.com.
To unsubscribe from this group, send email to 
django-updates+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-updates?hl=en.

Reply via email to