I've got Pulp set up to serve all the content from S3 behind CloudFront.
This works really well, except for a minor issue: the content URLs are all
the UUIDs for artifacts, not, for example, the pretty name of the RPM being
downloaded.  That's an issue in my situation because we'd really like to
generate download analytics using off-the-shelf tools which consume the AWS
CDN standard log format.

My initial thought was that it might be easy to have the redirects include
a query string in the generated URL which notes the original filename or
relative path requested.  But I don't have sufficiently developed Django
skills to know the easiest way to do that (or if it's even reasonable to
think that's easy).  Using the content server's logs is another option, but
I have some other content on the same S3 bucket which may not necessarily
be reached solely through Pulp's content server, so that means two log
locations, etc.  If it was easy to make Django / Gunicorn log to an S3
bucket in a manner similar to Cloudfront, that might also be ok.
Post-processing logs with a series of API calls to work out what artifact
maps to what repository content would ideally be a last resort.

Anyone have some great insights which might help me out here? :)  If it
helps, I'm building my own Docker images which ultimately run in EKS.  So
patches / extra modules are an option, but I'd prefer to stay as close to
vanilla upstream as possible with environment variable-based config
adjustments.

Thanks.
--Danny
_______________________________________________
Pulp-list mailing list
Pulp-list@redhat.com
https://listman.redhat.com/mailman/listinfo/pulp-list

Reply via email to