On Thu, Sep 27, 2012 at 4:13 PM, Carl Meyer <c...@oddbird.net> wrote:
> Hi Dan, > > On 09/27/2012 04:47 PM, Dan Loewenherz wrote: > > Just updated the ticket. > > > > As I commented, the heuristic for checking if a file has been modified > > lies in line 282 of collectstatic.py: > > > > *if not prefixed_path in self.copied_files:* > > * > > return self.log("Skipping '%s' (already copied earlier)" % path) > > > > * > > > https://github.com/django/django/blob/master/django/contrib/staticfiles/management/commands/collectstatic.py#L282 > > > > This seems off, since a path may stay the same but a file's contents may > > change. > > That's not checking whether the file has been modified, that's checking > whether the same source file was previously copied in the same > collectstatic run (due to overlapping/duplicate file sources of some kind). > > The check for modification date is up on line 234 in the delete_file > method, which is called by both link_file and copy_file. > Thanks, I missed that. I still see an issue here. In any sort of source control, when a user updates their repo, local files that were updated remotely show up as modified at the time the repo is cloned or updated, not when the file was actually last saved by the last author. You then have the same scenario I pointed to earlier: when multiple people work on a project, they will re-upload the same files multiple times. Don't get me wrong here--I'm happy to see there is some sort of check here to avoid collectstatic'ing the same file, but I think it's warranted to push back on the use of last modified as the heuristic. I think using a checksum would work much better, and solves the problem this is trying to solve in a much more foolproof way. With all this said, I don't think this logic belongs in a "delete_file" method. IMO it'd be better to separate out the logic of "does this file already exist, if so, skip it" from "delete this file if it exists on the target" (delete_file). @Karen--thanks for digging up that SO post. It actually is super relevant here. As mentioned, when uploading files to S3, it's quite time consuming to perform a network round trip to pick up file metadata (such as last modified time) for each file you're uploading. This was initial reason I chose to store this data in a single location that I'd only need to grab once. Dan -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.