On Fri, Sep 11, 2020 at 3:02 PM Robert Bradshaw <rober...@google.com> wrote:
> The long term goal is for Dataflow to use the external containers rather > than its own. Hopefully this happened sooner rather than later, and until > then you can specify the beam container as a custom container. > > On Fri, Sep 11, 2020 at 2:58 PM Chad Dombrova <chad...@gmail.com> wrote: > >> Ok great. Next question: >> >> What is the relationship between sdks/python/container/boot.go and >> Dataflow? Is this file used within the Dataflow bootstrapping process? >> >> We're currently investigating a switch from Flink to Dataflow, and in >> doing so we hope to be able to work our way back to using stock Dataflow >> containers wherever possible. If we make this PR to add pip.conf support, >> those changes will be largely made in boot.go, and we'd just like to >> confirm that our updates will also make it into Dataflow, verbatim. >> > In addition to Robert's answer. Dataflow uses a similar boot.go file. In case there will be a delay in switching Dataflow to use Beam containers, we _might_ be able to apply changes from the PR to the Dataflow's boot.go file. > >> -chad >> >> >> >> On Fri, Sep 11, 2020 at 2:24 PM Ahmet Altay <al...@google.com> wrote: >> >>> >>> >>> On Fri, Sep 11, 2020 at 2:11 PM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> Hmm... this is a difficult question. I think adding support for a >>>> pip.conf probably makes the most sense, despite it being yet another >>>> option. >>>> >>> >>> +1 - I think this is a good flag to add. I heard similar user requests >>> for passing specific flags to pip before. Supporting a generic way with an >>> optional flag would address those requests. >>> >>> >>>> >>>> Another alternative is to simply pre-install the dependencies you want >>>> (or even just override /etc/pip.conf) in a custom container. >>>> >>>> On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <chad...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> We are running into problems trying to use our own pypi mirror with >>>>> Beam. For those who are not well versed in the esotera of python package >>>>> management, pip provides a few ways to specify urls for the pypi index >>>>> server: >>>>> >>>>> - command line >>>>> >>>>> <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]: >>>>> via --index-url >>>>> - environment variables >>>>> <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]: >>>>> via PIP_INDEX_URL. In Beam, we don’t have any way to influence the >>>>> environment of the boot process that runs pip install. >>>>> - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]: >>>>> we could provide this as an artifact, but we don’t have any way of >>>>> placing >>>>> it in the correct location (e.g. /etc/pip.conf) on the instance >>>>> that runs pip install. >>>>> - requirements.txt files can specify certain pip install flags >>>>> >>>>> <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4], >>>>> such as --index-url. As such, passing a requirements file via >>>>> --requirements_file would theoretically work, but we also want to >>>>> be able to provide dev packages as wheels via --extra_package, >>>>> which would be installed independently from the requirements file and >>>>> thus >>>>> use the default pypi index. We may be able to upload our wheel as an >>>>> artifact and refer to it using a local path in the requirements file, >>>>> but >>>>> this solution seems a bit brittle as the local artifacts path is >>>>> different >>>>> for each job. >>>>> >>>>> Are there any known solutions to this problem? Here are some ideas: >>>>> >>>>> - add support for providing a pip.conf as a known artifact type >>>>> (akin to --requirements_file). this is by far the most powerful >>>>> and straightforward solution, but do we have the stomach for yet >>>>> another >>>>> cli option? >>>>> - add support for providing a destination path for artifacts, >>>>> which would let us install it into /etc/pip.conf. I can see strong >>>>> safety/security concerns around this. >>>>> - provide a guarantee that the working directory for the boot >>>>> process is inside the artifact directory: then we could refer to wheels >>>>> inside our requirements file using relative paths. >>>>> >>>>> We're happy to make a pull request to add support for this feature, >>>>> but it'd be great to have some input on the ideal solution before we >>>>> begin. >>>>> >>>>> thanks! >>>>> -chad >>>>> >>>>> [1] >>>>> https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url >>>>> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables >>>>> [3] https://pip.pypa.io/en/stable/user_guide/#config-file >>>>> [4] >>>>> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format >>>>> >>>>> -chad >>>>> >>>>>