On Fri, Sep 11, 2020 at 3:02 PM Robert Bradshaw <rober...@google.com> wrote:

> The long term goal is for Dataflow to use the external containers rather
> than its own. Hopefully this happened sooner rather than later, and until
> then you can specify the beam container as a custom container.
>
> On Fri, Sep 11, 2020 at 2:58 PM Chad Dombrova <chad...@gmail.com> wrote:
>
>> Ok great.  Next question:
>>
>> What is the relationship between sdks/python/container/boot.go and
>> Dataflow?  Is this file used within the Dataflow bootstrapping process?
>>
>> We're currently investigating a switch from Flink to Dataflow, and in
>> doing so we hope to be able to work our way back to using stock Dataflow
>> containers wherever possible.  If we make this PR to add pip.conf support,
>> those changes will be largely made in boot.go, and we'd just like to
>> confirm that our updates will also make it into Dataflow, verbatim.
>>
>
In addition to Robert's answer. Dataflow uses a similar boot.go file. In
case there will be a delay in switching Dataflow to use Beam containers, we
_might_ be able to apply changes from the PR to the Dataflow's boot.go file.


>
>> -chad
>>
>>
>>
>> On Fri, Sep 11, 2020 at 2:24 PM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Fri, Sep 11, 2020 at 2:11 PM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> Hmm... this is a difficult question. I think adding support for a
>>>> pip.conf probably makes the most sense, despite it being yet another
>>>> option.
>>>>
>>>
>>> +1 - I think this is a good flag to add. I heard similar user requests
>>> for passing specific flags to pip before. Supporting a generic way with an
>>> optional flag would address those requests.
>>>
>>>
>>>>
>>>> Another alternative is to simply pre-install the dependencies you want
>>>> (or even just override /etc/pip.conf) in a custom container.
>>>>
>>>> On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <chad...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>> We are running into problems trying to use our own pypi mirror with
>>>>> Beam. For those who are not well versed in the esotera of python package
>>>>> management, pip provides a few ways to specify urls for the pypi index
>>>>> server:
>>>>>
>>>>>    - command line
>>>>>    
>>>>> <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]:
>>>>>    via --index-url
>>>>>    - environment variables
>>>>>    <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]:
>>>>>    via PIP_INDEX_URL. In Beam, we don’t have any way to influence the
>>>>>    environment of the boot process that runs pip install.
>>>>>    - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]:
>>>>>    we could provide this as an artifact, but we don’t have any way of 
>>>>> placing
>>>>>    it in the correct location (e.g. /etc/pip.conf) on the instance
>>>>>    that runs pip install.
>>>>>    - requirements.txt files can specify certain pip install flags
>>>>>    
>>>>> <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4],
>>>>>    such as --index-url. As such, passing a requirements file via
>>>>>    --requirements_file would theoretically work, but we also want to
>>>>>    be able to provide dev packages as wheels via --extra_package,
>>>>>    which would be installed independently from the requirements file and 
>>>>> thus
>>>>>    use the default pypi index. We may be able to upload our wheel as an
>>>>>    artifact and refer to it using a local path in the requirements file, 
>>>>> but
>>>>>    this solution seems a bit brittle as the local artifacts path is 
>>>>> different
>>>>>    for each job.
>>>>>
>>>>> Are there any known solutions to this problem? Here are some ideas:
>>>>>
>>>>>    - add support for providing a pip.conf as a known artifact type
>>>>>    (akin to --requirements_file).  this is by far the most powerful
>>>>>    and straightforward solution, but do we have the stomach for yet 
>>>>> another
>>>>>    cli option?
>>>>>    - add support for providing a destination path for artifacts,
>>>>>    which would let us install it into /etc/pip.conf. I can see strong
>>>>>    safety/security concerns around this.
>>>>>    - provide a guarantee that the working directory for the boot
>>>>>    process is inside the artifact directory: then we could refer to wheels
>>>>>    inside our requirements file using relative paths.
>>>>>
>>>>> We're happy to make a pull request to add support for this feature,
>>>>> but it'd be great to have some input on the ideal solution before we 
>>>>> begin.
>>>>>
>>>>> thanks!
>>>>> -chad
>>>>>
>>>>> [1]
>>>>> https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url
>>>>> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables
>>>>> [3] https://pip.pypa.io/en/stable/user_guide/#config-file
>>>>> [4]
>>>>> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
>>>>>
>>>>> -chad
>>>>>
>>>>>

Reply via email to