On 28.09.2012, at 05:38, Dan Loewenherz <d...@dlo.me> wrote:

> On Thu, Sep 27, 2012 at 4:13 PM, Carl Meyer <c...@oddbird.net> wrote:
> Hi Dan,
> 
> On 09/27/2012 04:47 PM, Dan Loewenherz wrote:
> > Just updated the ticket.
> >
> > As I commented, the heuristic for checking if a file has been modified
> > lies in line 282 of collectstatic.py:
> >
> > *if not prefixed_path in self.copied_files:*
> > *
> > return self.log("Skipping '%s' (already copied earlier)" % path)
> >
> > *
> > https://github.com/django/django/blob/master/django/contrib/staticfiles/management/commands/collectstatic.py#L282
> >
> > This seems off, since a path may stay the same but a file's contents may
> > change.
> 
> That's not checking whether the file has been modified, that's checking
> whether the same source file was previously copied in the same
> collectstatic run (due to overlapping/duplicate file sources of some kind).
> 
> The check for modification date is up on line 234 in the delete_file
> method, which is called by both link_file and copy_file.
> 
> Thanks, I missed that.
> 
> I still see an issue here. In any sort of source control, when a user updates 
> their repo, local files that were updated remotely show up as modified at the 
> time the repo is cloned or updated, not when the file was actually last saved 
> by the last author. You then have the same scenario I pointed to earlier: 
> when multiple people work on a project, they will re-upload the same files 
> multiple times.
> 
> Don't get me wrong here--I'm happy to see there is some sort of check here to 
> avoid collectstatic'ing the same file, but I think it's warranted to push 
> back on the use of last modified as the heuristic. I think using a checksum 
> would work much better, and solves the problem this is trying to solve in a 
> much more foolproof way. With all this said, I don't think this logic belongs 
> in a "delete_file" method. IMO it'd be better to separate out the logic of 
> "does this file already exist, if so, skip it" from "delete this file if it 
> exists on the target" (delete_file).
> 
> @Karen--thanks for digging up that SO post. It actually is super relevant 
> here. As mentioned, when uploading files to S3, it's quite time consuming to 
> perform a network round trip to pick up file metadata (such as last modified 
> time) for each file you're uploading. This was initial reason I chose to 
> store this data in a single location that I'd only need to grab once.

Dan,

I added prefetching the metadata to the Boto based storage backend in 
django-storages a while ago to reduce the time spent on querying S3.

See 
https://bitbucket.org/david/django-storages/src/1574890d87be/storages/backends/s3boto.py#cl-42

You can either set the setting AWS_PRELOAD_METADATA to True or subclass 
S3BotoStorage and pass preload_metadata=True into the constructor.

Hope that helps,

Jannis

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to