Hi Thomas

Thanks for bring this up.

Now Python uses sdk version as a default tag, while Java and Go use latest
as a default tag. I agree using latest as a tag is problematic. The reason
only Python uses sdk version as a default tag is Python has version.py so
the version is easy to read. For Java and Go, we need to read it from
gradle.properties when creating images with the default tag and when
setting the default image.

Here is what we need to do:
1. Read sdk version from gradle.properties and use this as the default tag.
Done with Python, need to implement it with Java and Go.
2. Remove pulling images before executing docker run command. This should
be fixed for Python, Java and Go.

Is this a blocker for 2.16? If so and above are too much work for 2.16 at
the moment, we can hardcode the default tag for release branch for now.

Using timestamp as a tag is an option as well, as long as runners know
which timestamp they should use.

Hannah

On Wed, Oct 2, 2019 at 10:13 AM Alan Myrvold <[email protected]> wrote:

> Yes, using the latest tag is problematic and can lead to unexpected
> behavior.
> Using a date/time or 2.17.0.dev-$USER tag would be better. The validates
> container shell script uses a datetime
> <https://github.com/apache/beam/blob/6551d0937ee31a8e310b63b222dbc750ec9331f8/sdks/python/container/run_validatescontainer.sh#L87>
> tag, which allows a unique name if no two tests are run in the same second.
>
> On Wed, Oct 2, 2019 at 10:05 AM Thomas Weise <[email protected]> wrote:
>
>> Want to bump this thread.
>>
>> If the current behavior is to replace locally built image with the last
>> published, then this is not only unexpected for developers but also
>> problematic for the CI, where tests should run against what was built from
>> source. Or am I missing something?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Tue, Sep 24, 2019 at 7:08 PM Thomas Weise <[email protected]> wrote:
>>
>>> Hi Hannah,
>>>
>>> I believe this is unexpected from the developer perspective. When
>>> building something locally, we do expect that to be used. We may need to
>>> change to not pull when the image is available locally, at least when it is
>>> a snapshot/master branch. Release images should be immutable anyways.
>>>
>>> Thomas
>>>
>>>
>>> On Tue, Sep 24, 2019 at 4:13 PM Hannah Jiang <[email protected]>
>>> wrote:
>>>
>>>> A minor update, with custom container, the pipeline would not fail, it
>>>> throws out warning and moves on to `docker run` command.
>>>>
>>>> On Tue, Sep 24, 2019 at 4:05 PM Hannah Jiang <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Brian
>>>>>
>>>>> If we pull docker images, it always downloads from remote repository,
>>>>> which is expected behavior.
>>>>> In case we want to run a local image and pull it only when the image
>>>>> is not available at local, we can use `docker run` command directly,
>>>>> without pulling it in advance. [1]
>>>>> In case we want to pull images only when they are not available at
>>>>> local, we can use `docker images -q` to check if images are existing at
>>>>> local before pulling it.
>>>>> Another option is re-tag your local image, pass your image to pipeline
>>>>> and overwrite default one, but the code is still trying to pull, so if 
>>>>> your
>>>>> image is not pushed to the remote repository, it would fail.
>>>>>
>>>>> 1. https://github.com/docker/cli/pull/1498
>>>>>
>>>>> Hannah
>>>>>
>>>>> On Tue, Sep 24, 2019 at 11:56 AM Brian Hulette <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I'm working on a demo cross-language pipeline on a local flink
>>>>>> cluster that relies on my python row coder PR [1]. The PR includes some
>>>>>> changes to the Java worker code, so I need to build a Java SDK container
>>>>>> locally and use that in the pipeline.
>>>>>>
>>>>>> Unfortunately, whenever I run the pipeline,
>>>>>> the apachebeam/java_sdk:latest tag is moved off of my locally built image
>>>>>> to a newly downloaded image with a creation date 2 weeks ago, and that
>>>>>> image is used instead. It looks like the reason is we run `docker pull`
>>>>>> before running the container [2]. As the comment says this should be a
>>>>>> no-op if the image already exists, but that doesn't seem to be the case. 
>>>>>> If
>>>>>> I just run `docker pull apachebeam/java_sdk:latest` on my local machine 
>>>>>> it
>>>>>> downloads the 2 week old image and happily informs me:
>>>>>>
>>>>>> Status: Downloaded newer image for apachebeam/java_sdk:latest
>>>>>>
>>>>>> Does anyone know how I can prevent `docker pull` from doing this? I
>>>>>> can unblock myself for now just by commenting out the docker pull 
>>>>>> command,
>>>>>> but I'd like to understand what is going on here.
>>>>>>
>>>>>> Thanks,
>>>>>> Brian
>>>>>>
>>>>>> [1] https://github.com/apache/beam/pull/9188
>>>>>> [2]
>>>>>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java#L80
>>>>>>
>>>>>

Reply via email to