Re: Hash collision in 'cache' templatetag

2011-12-19 Thread Paul McMillan
While MD5 is "broken" in a cryptographic sense, it's not broken in the
sense that we use it here. A randomly occurring hash collision is
extremely unlikely (to the point of not being a problem). If we warned
about potential hash collisions here, we (and everyone else who uses
hashes) would have to warn about them all over the place.

To give you a sense of scale, if you have a database of 1 quadrillion
entries (one thousand billion)[1], your chances of a random collision
from MD5 are lower than 0.001%. For practical purposes in this
universe, hashes like this don't randomly collide.

It is on the drawing board to improve this (and most other uses of
hashing) by switching to HMAC-SHA256 and using a larger character set
for the final digest, but that patch isn't likely to make it into 1.4
given our current timeframe.

-Paul

[1] As another point of scale, our universe is estimated to be about
43 Quadrillion seconds old.

On Mon, Dec 19, 2011 at 2:00 AM, Sebastian Goll  wrote:
> Hi all,
>
> The current implementation of the 'cache' templatetag [1] uses an MD5
> hash derived from the vary_on arguments to create a unique cache_key
> for the current template fragment.
>
> However, this approach has the possibility of a hash collision. While
> not very likely, this might nonetheless expose sensitive information.
>
>
> Consider the case where a fragment of a logged in user is cached. This
> fragment might contain sensitive data relevant to that user. Due to a
> hash collision in the vary_on arguments, that same fragment is later
> displayed to a different user.
>
> The current documentation of template fragment caching [2] explicitly
> gives us an example with 'request.user.username' as vary_on argument,
> so we must assume that this is a valid use case.
>
>
> Is this the desired behavior? Or am I perhaps missing something?
>
> Without looking at the code, the existence of a possible information
> leak is not apparent from the docs. Without some idea of how hashing
> and caching works, this might not even be obvious.
>
>
> Some further research into the matter reveals that the current (MD5)
> hashing approach was introduced in ticket #11270 [3], about 3 years
> ago, to reduce the maximum length of cache keys so that memcached
> always works.
>
> The history of #11270 does not give indication that the implications
> of using hashes instead of the actual vary_on values were adequately
> considered (outlined in the above message). Citing the ticket:
>
>  "So we have to use md5 hash instead of the whole cached tag name with all 
> vary variables."
>
>
> How should we proceed here?
>
> Should a note be added to the description of the template fragment
> caching mechanism? Something along the lines that a cache leak is an
> albeit unlikely possibility, and that no sensitive information should
> be stored in a cached template?
>
> Alternatively, the 'cache' templatetag would have to store the actual
> values of all vary_on arguments alongside the cached template fragment
> (while still using a hashed cache key). On retrieval it would use the
> cached template only when all vary_on arguments match. However, this
> would increase both runtime and storage space requirements of the
> template fragment cache.
>
> Regards,
> Sebastian.
>
> [1] 
> https://code.djangoproject.com/browser/django/trunk/django/templatetags/cache.py?rev=16539
> [2] 
> https://docs.djangoproject.com/en/dev/topics/cache/#template-fragment-caching
> [3] https://code.djangoproject.com/ticket/11270
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To post to this group, send email to django-developers@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Hash collision in 'cache' templatetag

2011-12-19 Thread Sebastian Goll
Hi all,

The current implementation of the 'cache' templatetag [1] uses an MD5
hash derived from the vary_on arguments to create a unique cache_key
for the current template fragment.

However, this approach has the possibility of a hash collision. While
not very likely, this might nonetheless expose sensitive information.


Consider the case where a fragment of a logged in user is cached. This
fragment might contain sensitive data relevant to that user. Due to a
hash collision in the vary_on arguments, that same fragment is later
displayed to a different user.

The current documentation of template fragment caching [2] explicitly
gives us an example with 'request.user.username' as vary_on argument,
so we must assume that this is a valid use case.


Is this the desired behavior? Or am I perhaps missing something?

Without looking at the code, the existence of a possible information
leak is not apparent from the docs. Without some idea of how hashing
and caching works, this might not even be obvious.


Some further research into the matter reveals that the current (MD5)
hashing approach was introduced in ticket #11270 [3], about 3 years
ago, to reduce the maximum length of cache keys so that memcached
always works.

The history of #11270 does not give indication that the implications
of using hashes instead of the actual vary_on values were adequately
considered (outlined in the above message). Citing the ticket:

  "So we have to use md5 hash instead of the whole cached tag name with all 
vary variables."


How should we proceed here?

Should a note be added to the description of the template fragment
caching mechanism? Something along the lines that a cache leak is an
albeit unlikely possibility, and that no sensitive information should
be stored in a cached template?

Alternatively, the 'cache' templatetag would have to store the actual
values of all vary_on arguments alongside the cached template fragment
(while still using a hashed cache key). On retrieval it would use the
cached template only when all vary_on arguments match. However, this
would increase both runtime and storage space requirements of the
template fragment cache.

Regards,
Sebastian.

[1] 
https://code.djangoproject.com/browser/django/trunk/django/templatetags/cache.py?rev=16539
[2] 
https://docs.djangoproject.com/en/dev/topics/cache/#template-fragment-caching
[3] https://code.djangoproject.com/ticket/11270

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.