so after scanning this thread and the ticket again - it is still unclear 
that there could be a completely universal solution.

While it would be nice if the storage API had a checksum(name) or md5(name) 
method - not all custom storage backends are going to support a single 
checksum standard.  S3 doesn't explicitly support MD5 (apparently it 
unofficially does through ETags).  Without a universal checksum - you can't 
use it to compare files across arbitrary backends.

I do agree that hacking modified_time return value is a little ugly - the 
API is clearly documented as "returns a datetime..." - so returning a M55 
checksum there is, well, hacky.

If you are passionate about moving this forward, here is what I'd suggest.

Implement, document, and test .md5(name) as a standard method on storage 
backends - like modified_time this would raise NotImplementedError if not 
available - this could easily be its own ticket. md5 is probably the 
closest you'll get to a checksum standard.

Once you have an md5 method defined for backends - you could support a 
--md5 option to collectstatic that would use that as the target/source 
comparison.

Another workaround is to just use collectstatic locally - and rsync 
--checksum to your remote if it supports rsync.

-Preston


On Sunday, October 7, 2012 8:59:16 PM UTC-7, Dan Loewenherz wrote:
>
> This issue just got me again tonight, so I'll try to push once more on 
> this issue. It seems right now most people don't care that this is broken, 
> which is a bummer, but in which case I'll just continue using my working 
> solution.
>
> Dan
>
> On Sat, Oct 6, 2012 at 10:48 AM, Dan Loewenherz <d...@dlo.me <javascript:>
> > wrote:
>
>> Hey Jannis,
>>
>> On Mon, Oct 1, 2012 at 12:47 AM, Jannis Leidel <lei...@gmail.com<javascript:>
>> > wrote:
>>
>>>
>>> On 30.09.2012, at 23:41, Dan Loewenherz <d...@dlo.me <javascript:>> 
>>> wrote:
>>>
>>> > Many backends don't support last modified times, and even if they all 
>>> did, it's incorrect to assume that last modified time is an accurate 
>>> heuristic for whether a file has already been uploaded or not.
>>>
>>> Well but it's an accurate way to decide whether a file has been changed 
>>> on the filesystem, and that's what collectstatic cares about. The storage 
>>> backend *is* the API to extend that when needed, so feel free to use it.
>>>
>>
>> It's accurate *only* in certain situations. And on a distributed 
>> development team, I've run into a lot of issues with developers re-upload 
>> files that have already been uploaded because they just recently updated 
>> their repo.
>>
>> A checksum is the only true accurate method to determine if a file has 
>> changed.
>>
>> Additionally, you didn't address my point that I quoted from. Storage 
>> backends don't just reflect filesystems--they could reflect files stored in 
>> a database, S3, etc. And some of these filesystems don't support last 
>> modified times.
>>  
>> > It might be a better idea to let the backends decide when a file has 
>>> been changed (instead of just calling the backend's last modified method).
>>>
>>> I don't understand, you can easily implement exactly that in the 
>>> last_modified method if you'd like.
>>>
>>
>> This is a bit confusing...why call it last_modified when that's doesn't 
>> necessarily reflect what it's doing? It would be more flexible to create 
>> two methods:
>>
>> def modification_identifier(self):
>>
>> def has_changed(self):
>>
>> Then, any backend could implement these however they might like, and 
>> collectstatic would have no excuse in uploading the same file more than 
>> once. Overloading last_modified to also do things like calculate md5's 
>> seems a bit hacky to me, and confusing for any developer maintaining a 
>> custom storage backend that doesn't support last modified.
>>  
>> Dan
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-developers/-/weKD2x1XY4oJ.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to