JIRA: https://issues.apache.org/jira/browse/BEAM-8815
On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise <[email protected]> wrote: > I'm running into the issue Kyle points out when I try to run a pipeline > that does not use artifact staging: > > 2019-11-23 01:09:18,442 WARN > org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService > - GetManifest for > /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST > failed. > java.util.concurrent.ExecutionException: java.io.FileNotFoundException: > /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST > (No such file or directory) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) > > This happens when I use /opt/apache/beam/boot to start the worker in > process environment, as it will attempt to retrieve artifacts. The same > would be the case for worker pool also. > > Thomas > > > On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw <[email protected]> > wrote: > >> FWIW, there are also discussions of adding a preparation phase for sdk >> harness (docker) images, such that artifacts could be staged (and >> installed, compiled etc.) ahead of time and shipped as part of the sdk >> image rather than via a side channel (and on every worker). Anyone not >> using these images is probably shipping dependencies in another way >> anyways. >> >> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw <[email protected]> >> wrote: >> > >> > Certainly there's a lot to be re-thought in terms of artifact staging, >> > especially when it comes to cross-langauge pipelines. I think it would >> > makes sense to have a special retrieval token for the "empty" >> > manifest, which would mean a staging directory would never have to be >> > set up if no artifacts happened to be staged. >> > >> > The UberJar avoids any artifact staging overhead as well. >> > >> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver <[email protected]> >> wrote: >> > > >> > > Hi Beamers, >> > > >> > > We can use artifact staging to make sure SDK workers have access to a >> pipeline's dependencies. However, artifact staging is not always necessary. >> For example, one can make sure that the environment contains all the >> dependencies ahead of time. However, regardless of whether or not artifacts >> are used, my understanding is an artifact manifest will be written and read >> anyway. For example: >> > > >> > > INFO AbstractArtifactRetrievalService: GetManifest for >> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts >> > > >> > > This can be a hassle, because users must set up a staging directory >> that all workers can access, even if it isn't used aside from the (empty) >> manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging >> altogether [2]. So I was wondering, do you all think it would be reasonable >> or useful to create an "off switch" for artifact staging? >> > > >> > > Thanks, >> > > Kyle >> > > >> > > [1] >> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E >> > > [2] >> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715 >> >
