Ok great. Next question: What is the relationship between sdks/python/container/boot.go and Dataflow? Is this file used within the Dataflow bootstrapping process?
We're currently investigating a switch from Flink to Dataflow, and in doing so we hope to be able to work our way back to using stock Dataflow containers wherever possible. If we make this PR to add pip.conf support, those changes will be largely made in boot.go, and we'd just like to confirm that our updates will also make it into Dataflow, verbatim. -chad On Fri, Sep 11, 2020 at 2:24 PM Ahmet Altay <[email protected]> wrote: > > > On Fri, Sep 11, 2020 at 2:11 PM Robert Bradshaw <[email protected]> > wrote: > >> Hmm... this is a difficult question. I think adding support for a >> pip.conf probably makes the most sense, despite it being yet another >> option. >> > > +1 - I think this is a good flag to add. I heard similar user requests for > passing specific flags to pip before. Supporting a generic way with an > optional flag would address those requests. > > >> >> Another alternative is to simply pre-install the dependencies you want >> (or even just override /etc/pip.conf) in a custom container. >> >> On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <[email protected]> wrote: >> >>> Hi all, >>> We are running into problems trying to use our own pypi mirror with >>> Beam. For those who are not well versed in the esotera of python package >>> management, pip provides a few ways to specify urls for the pypi index >>> server: >>> >>> - command line >>> >>> <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]: >>> via --index-url >>> - environment variables >>> <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]: >>> via PIP_INDEX_URL. In Beam, we don’t have any way to influence the >>> environment of the boot process that runs pip install. >>> - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]: >>> we could provide this as an artifact, but we don’t have any way of >>> placing >>> it in the correct location (e.g. /etc/pip.conf) on the instance that >>> runs pip install. >>> - requirements.txt files can specify certain pip install flags >>> >>> <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4], >>> such as --index-url. As such, passing a requirements file via >>> --requirements_file would theoretically work, but we also want to be >>> able to provide dev packages as wheels via --extra_package, which >>> would be installed independently from the requirements file and thus use >>> the default pypi index. We may be able to upload our wheel as an artifact >>> and refer to it using a local path in the requirements file, but this >>> solution seems a bit brittle as the local artifacts path is different for >>> each job. >>> >>> Are there any known solutions to this problem? Here are some ideas: >>> >>> - add support for providing a pip.conf as a known artifact type >>> (akin to --requirements_file). this is by far the most powerful and >>> straightforward solution, but do we have the stomach for yet another cli >>> option? >>> - add support for providing a destination path for artifacts, which >>> would let us install it into /etc/pip.conf. I can see strong >>> safety/security concerns around this. >>> - provide a guarantee that the working directory for the boot >>> process is inside the artifact directory: then we could refer to wheels >>> inside our requirements file using relative paths. >>> >>> We're happy to make a pull request to add support for this feature, but >>> it'd be great to have some input on the ideal solution before we begin. >>> >>> thanks! >>> -chad >>> >>> [1] >>> https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url >>> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables >>> [3] https://pip.pypa.io/en/stable/user_guide/#config-file >>> [4] >>> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format >>> >>> -chad >>> >>>
