Thanks for following up. Yes, the query string *should* be there. I found this bug last week when I was looking in to it, though (basically, telling Django-storages to use cloudfront breaks the query string appending code). I'm back from away-from-keyboard vacation tomorrow, and should be able to get a some patches sent upstream. :)
https://github.com/jschneier/django-storages/issues/997 --Danny On Tue, Apr 6, 2021, 2:07 PM David Davis <davidda...@redhat.com> wrote: > Hi Danny, > > I don't know much about AWS logging but Pulp does set the filename in the > response-content-disposition[0]. Could that be used to determine the > filename for each request? > > If not, I'm looking at the boto3 docs for get_object[1] to see if there's > another parameter we could set to help you track the filename in requests > but I'm seeing anything useful. My knowledge of s3 is a bit limited so if > you have a suggestion how we can construct a request to S3 that would help > you to track the filenames of requests to s3, I could probably look at how > we could support it in Pulp 3. > > [0] > https://github.com/pulp/pulpcore/blob/f38f955425b185749b3c8d4d878a7e166cfc05b9/pulpcore/content/handler.py#L613-L614 > [1] > https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object > > David > > > On Tue, Mar 30, 2021 at 10:43 AM Danny Sauer <danny.sa...@konghq.com> > wrote: > >> I've got Pulp set up to serve all the content from S3 behind CloudFront. >> This works really well, except for a minor issue: the content URLs are all >> the UUIDs for artifacts, not, for example, the pretty name of the RPM being >> downloaded. That's an issue in my situation because we'd really like to >> generate download analytics using off-the-shelf tools which consume the AWS >> CDN standard log format. >> >> My initial thought was that it might be easy to have the redirects >> include a query string in the generated URL which notes the original >> filename or relative path requested. But I don't have sufficiently >> developed Django skills to know the easiest way to do that (or if it's even >> reasonable to think that's easy). Using the content server's logs is >> another option, but I have some other content on the same S3 bucket which >> may not necessarily be reached solely through Pulp's content server, so >> that means two log locations, etc. If it was easy to make Django / >> Gunicorn log to an S3 bucket in a manner similar to Cloudfront, that might >> also be ok. Post-processing logs with a series of API calls to work out >> what artifact maps to what repository content would ideally be a last >> resort. >> >> Anyone have some great insights which might help me out here? :) If it >> helps, I'm building my own Docker images which ultimately run in EKS. So >> patches / extra modules are an option, but I'd prefer to stay as close to >> vanilla upstream as possible with environment variable-based config >> adjustments. >> >> Thanks. >> --Danny >> _______________________________________________ >> Pulp-list mailing list >> Pulp-list@redhat.com >> https://listman.redhat.com/mailman/listinfo/pulp-list > >
_______________________________________________ Pulp-list mailing list Pulp-list@redhat.com https://listman.redhat.com/mailman/listinfo/pulp-list