Ah didn't see your pull request yet Thomas. Will take a look later.

On Mon, Nov 25, 2019 at 10:23 AM Thomas Weise <[email protected]> wrote:

> Thanks, I would prefer to solve this in a way where the user does not need
> to configure anything extra though.
>
>
> On Mon, Nov 25, 2019 at 10:21 AM Kyle Weaver <[email protected]> wrote:
>
>> When we added the class loader artifact stager, we introduced artifact
>> retrieval service type as a pipeline option. It would make sense to put a
>> "none" option there.
>>
>>
>> https://github.com/apache/beam/blob/5fd93af49e6cb86ff52b20f103371df7e0447b7f/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L107
>>
>>   RetrievalServiceType getRetrievalServiceType();
>>
>>
>> On Mon, Nov 25, 2019 at 10:05 AM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as
>>> well. (Should this constant be put in a common location?)
>>>
>>> On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise <[email protected]> wrote:
>>> >
>>> > JIRA: https://issues.apache.org/jira/browse/BEAM-8815
>>> >
>>> >
>>> > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise <[email protected]> wrote:
>>> >>
>>> >> I'm running into the issue Kyle points out when I try to run a
>>> pipeline that does not use artifact staging:
>>> >>
>>> >> 2019-11-23 01:09:18,442 WARN
>>> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
>>> - GetManifest for
>>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
>>> failed.
>>> >> java.util.concurrent.ExecutionException:
>>> java.io.FileNotFoundException:
>>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
>>> (No such file or directory)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
>>> >>
>>> >> This happens when I use /opt/apache/beam/boot to start the worker in
>>> process environment, as it will attempt to retrieve artifacts. The same
>>> would be the case for worker pool also.
>>> >>
>>> >> Thomas
>>> >>
>>> >>
>>> >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw <[email protected]>
>>> wrote:
>>> >>>
>>> >>> FWIW, there are also discussions of adding a preparation phase for
>>> sdk
>>> >>> harness (docker) images, such that artifacts could be staged (and
>>> >>> installed, compiled etc.) ahead of time and shipped as part of the
>>> sdk
>>> >>> image rather than via a side channel (and on every worker). Anyone
>>> not
>>> >>> using these images is probably shipping dependencies in another way
>>> >>> anyways.
>>> >>>
>>> >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw <[email protected]>
>>> wrote:
>>> >>> >
>>> >>> > Certainly there's a lot to be re-thought in terms of artifact
>>> staging,
>>> >>> > especially when it comes to cross-langauge pipelines. I think it
>>> would
>>> >>> > makes sense to have a special retrieval token for the "empty"
>>> >>> > manifest, which would mean a staging directory would never have to
>>> be
>>> >>> > set up if no artifacts happened to be staged.
>>> >>> >
>>> >>> > The UberJar avoids any artifact staging overhead as well.
>>> >>> >
>>> >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver <[email protected]>
>>> wrote:
>>> >>> > >
>>> >>> > > Hi Beamers,
>>> >>> > >
>>> >>> > > We can use artifact staging to make sure SDK workers have access
>>> to a pipeline's dependencies. However, artifact staging is not always
>>> necessary. For example, one can make sure that the environment contains all
>>> the dependencies ahead of time. However, regardless of whether or not
>>> artifacts are used, my understanding is an artifact manifest will be
>>> written and read anyway. For example:
>>> >>> > >
>>> >>> > > INFO AbstractArtifactRetrievalService: GetManifest for
>>> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>>> >>> > >
>>> >>> > > This can be a hassle, because users must set up a staging
>>> directory that all workers can access, even if it isn't used aside from the
>>> (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact
>>> staging altogether [2]. So I was wondering, do you all think it would be
>>> reasonable or useful to create an "off switch" for artifact staging?
>>> >>> > >
>>> >>> > > Thanks,
>>> >>> > > Kyle
>>> >>> > >
>>> >>> > > [1]
>>> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
>>> >>> > > [2]
>>> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
>>>
>>

Reply via email to