On 28.09.2012, at 05:38, Dan Loewenherz <d...@dlo.me> wrote: > On Thu, Sep 27, 2012 at 4:13 PM, Carl Meyer <c...@oddbird.net> wrote: > Hi Dan, > > On 09/27/2012 04:47 PM, Dan Loewenherz wrote: > > Just updated the ticket. > > > > As I commented, the heuristic for checking if a file has been modified > > lies in line 282 of collectstatic.py: > > > > *if not prefixed_path in self.copied_files:* > > * > > return self.log("Skipping '%s' (already copied earlier)" % path) > > > > * > > https://github.com/django/django/blob/master/django/contrib/staticfiles/management/commands/collectstatic.py#L282 > > > > This seems off, since a path may stay the same but a file's contents may > > change. > > That's not checking whether the file has been modified, that's checking > whether the same source file was previously copied in the same > collectstatic run (due to overlapping/duplicate file sources of some kind). > > The check for modification date is up on line 234 in the delete_file > method, which is called by both link_file and copy_file. > > Thanks, I missed that. > > I still see an issue here. In any sort of source control, when a user updates > their repo, local files that were updated remotely show up as modified at the > time the repo is cloned or updated, not when the file was actually last saved > by the last author. You then have the same scenario I pointed to earlier: > when multiple people work on a project, they will re-upload the same files > multiple times. > > Don't get me wrong here--I'm happy to see there is some sort of check here to > avoid collectstatic'ing the same file, but I think it's warranted to push > back on the use of last modified as the heuristic. I think using a checksum > would work much better, and solves the problem this is trying to solve in a > much more foolproof way. With all this said, I don't think this logic belongs > in a "delete_file" method. IMO it'd be better to separate out the logic of > "does this file already exist, if so, skip it" from "delete this file if it > exists on the target" (delete_file). > > @Karen--thanks for digging up that SO post. It actually is super relevant > here. As mentioned, when uploading files to S3, it's quite time consuming to > perform a network round trip to pick up file metadata (such as last modified > time) for each file you're uploading. This was initial reason I chose to > store this data in a single location that I'd only need to grab once.
Dan, I added prefetching the metadata to the Boto based storage backend in django-storages a while ago to reduce the time spent on querying S3. See https://bitbucket.org/david/django-storages/src/1574890d87be/storages/backends/s3boto.py#cl-42 You can either set the setting AWS_PRELOAD_METADATA to True or subclass S3BotoStorage and pass preload_metadata=True into the constructor. Hope that helps, Jannis -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.