I've been bitten by this a couple times. I also noticed that ansible actually defines its own version of urljoin:
https://github.com/ansible/ansible/blob/00bd0b893d5d21de040b53032c466707bacb3b93/lib/ansible/galaxy/api.py#L166-L167 David On Mon, Mar 15, 2021 at 3:44 PM Grant Gainey <ggai...@redhat.com> wrote: > Hey folks, > > I was looking at https://pulp.plan.io/issues/7995 with an eye to fixing > it for 3.12. The underlying problem can be summed up as "urljoin doesn't do > what you expect if 'base' doesn't end in a '/'"[0][1]. Using urljoin() as > if all it does is "concatenate these two strings with a '/' between them" > is a pretty common misuse that works 'most of the time', alas. > > Looking around to see if we do this anywhere else, I find that pulp_rpm > uses urljoin() in a number of places that I think might be subject to the > same unintended problem. However, the semantics of sync'ing RPM > repositories is even *more* nuanced than urljoin()'s is! > > I am tempted to replace urljoin() with just straightforward path-creation. > Here's the list of places in pulp_rpm non-test python that uses it > currently: > > (pulp3) (master) ~/github/Pulp3/pulp_rpm $ find . -name \*.py | grep -v > tests | xargs grep -n "urljoin(" > ./pulp_rpm/app/kickstart/treeinfo.py:23: > url=urljoin(remote_url, namespace), > silence_errors_for_response_status_codes={403, 404} > ./pulp_rpm/app/models/repository.py:355: gpgkey_path = > urllib.parse.urljoin( > ./pulp_rpm/app/models/repository.py:358: gpgkey_path = > urllib.parse.urljoin(gpgkey_path, self.base_path, True) > ./pulp_rpm/app/tasks/synchronizing.py:101: downloader = > remote.get_downloader(url=urljoin(url, "repodata/repomd.xml")) > ./pulp_rpm/app/tasks/synchronizing.py:138: downloader = > remote.get_downloader(url=urljoin(remote_url, "repodata/repomd.xml")) > ./pulp_rpm/app/tasks/synchronizing.py:243: new_url = > urljoin(remote_url, path) > ./pulp_rpm/app/tasks/synchronizing.py:430: > url=urljoin(self.data.remote_url, "repodata/repomd.xml") > ./pulp_rpm/app/tasks/synchronizing.py:460: > url=urljoin(self.data.remote_url, self.treeinfo["filename"]), > ./pulp_rpm/app/tasks/synchronizing.py:470: > url=urljoin(self.data.remote_url, path), > ./pulp_rpm/app/tasks/synchronizing.py:599: url = > urljoin(self.data.remote_url, package.location_href) > ./pulp_rpm/app/tasks/synchronizing.py:722: repodata_url = > urljoin(self.data.remote_url, record.location_href) > ./pulp_rpm/app/tasks/synchronizing.py:726: self.data.updateinfo_url > = urljoin(self.data.remote_url, record.location_href) > ./pulp_rpm/app/tasks/synchronizing.py:731: comps_url = > urljoin(self.data.remote_url, record.location_href) > ./pulp_rpm/app/tasks/synchronizing.py:735: self.data.modules_url = > urljoin(self.data.remote_url, record.location_href) > ./pulp_rpm/app/tasks/synchronizing.py:743: > url=urljoin(self.data.remote_url, record.location_href), > ./pulp_rpm/app/downloaders.py:84: url = urljoin(self.url, > auth_param) > (pulp3) (master) ~/github/Pulp3/pulp_rpm $ > > Any thoughts before I dive down this rabbithole? I'm afraid I don't even > have a pocketwatch... > > G > > [0] https://docs.python.org/2/library/urlparse.html#urlparse.urljoin > [1] https://stackoverflow.com/a/10893427 > -- > Grant Gainey > Principal Software Engineer, Red Hat System Management Engineering > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://listman.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://listman.redhat.com/mailman/listinfo/pulp-dev