On Thu, Sep 27, 2012 at 4:13 PM, Carl Meyer <c...@oddbird.net> wrote:

> Hi Dan,
>
> On 09/27/2012 04:47 PM, Dan Loewenherz wrote:
> > Just updated the ticket.
> >
> > As I commented, the heuristic for checking if a file has been modified
> > lies in line 282 of collectstatic.py:
> >
> > *if not prefixed_path in self.copied_files:*
> > *
> > return self.log("Skipping '%s' (already copied earlier)" % path)
> >
> > *
> >
> https://github.com/django/django/blob/master/django/contrib/staticfiles/management/commands/collectstatic.py#L282
> >
> > This seems off, since a path may stay the same but a file's contents may
> > change.
>
> That's not checking whether the file has been modified, that's checking
> whether the same source file was previously copied in the same
> collectstatic run (due to overlapping/duplicate file sources of some kind).
>
> The check for modification date is up on line 234 in the delete_file
> method, which is called by both link_file and copy_file.
>

Thanks, I missed that.

I still see an issue here. In any sort of source control, when a user
updates their repo, local files that were updated remotely show up as
modified at the time the repo is cloned or updated, not when the file was
actually last saved by the last author. You then have the same scenario I
pointed to earlier: when multiple people work on a project, they will
re-upload the same files multiple times.

Don't get me wrong here--I'm happy to see there is some sort of check here
to avoid collectstatic'ing the same file, but I think it's warranted to
push back on the use of last modified as the heuristic. I think using a
checksum would work much better, and solves the problem this is trying to
solve in a much more foolproof way. With all this said, I don't think this
logic belongs in a "delete_file" method. IMO it'd be better to separate out
the logic of "does this file already exist, if so, skip it" from "delete
this file if it exists on the target" (delete_file).

@Karen--thanks for digging up that SO post. It actually is super relevant
here. As mentioned, when uploading files to S3, it's quite time consuming
to perform a network round trip to pick up file metadata (such as last
modified time) for each file you're uploading. This was initial reason I
chose to store this data in a single location that I'd only need to grab
once.

Dan

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to