> I am expecting FileStagingOptions#setFilesToStage in PortablePipelineOptions <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L28> is the way to customize artifacts to be staged and resolved in portable pipeline, however, it looks like that PortableRunner <https://github.com/apache/beam/blob/master/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java#L129> does not add preconfigured files to `filesToStageBuilder` which is used in the final options to prepare the job. Is this the expected behavior or maybe a bug?
Yeah, that looks like a bug. > In addition, do we support specifying an URL in PortablePipelineOptions#filesToStage so that ArtifactRetrievalService can retrieve artifacts from a remote address instead of default from JobServer, which got artifacts from SDK Client. I am asking because I noticed Files can use any of Beam's supported remote file systems (GCS, S3, Azure Blobstore, HDFS). But arbitrary URLs are not supported. On Wed, Apr 28, 2021 at 5:44 PM Ke Wu <[email protected]> wrote: > Hello All, > > I am expecting FileStagingOptions#setFilesToStage in > PortablePipelineOptions > <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L28> > is > the way to customize artifacts to be staged and resolved in portable > pipeline, however, it looks like that PortableRunner > <https://github.com/apache/beam/blob/master/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java#L129> > does > not add preconfigured files to `filesToStageBuilder` which is used in the > final options to prepare the job. Is this the expected behavior or maybe a > bug? > > In addition, do we support specifying an URL in > PortablePipelineOptions#filesToStage so that ArtifactRetrievalService can > retrieve artifacts from a remote address instead of default from JobServer, > which got artifacts from SDK Client. I am asking because I noticed > > public static InputStream getArtifact(RunnerApi.ArtifactInformation artifact) > throws IOException { > switch (artifact.getTypeUrn()) { > case FILE_ARTIFACT_URN: > RunnerApi.ArtifactFilePayload payload = > RunnerApi.ArtifactFilePayload.parseFrom(artifact.getTypePayload()); > return Channels.newInputStream( > FileSystems.open( > FileSystems.matchNewResource(payload.getPath(), false /* is > directory */))); > case EMBEDDED_ARTIFACT_URN: > return > RunnerApi.EmbeddedFilePayload.parseFrom(artifact.getTypePayload()) > .getData() > .newInput(); > default: > throw new UnsupportedOperationException( > "Unexpected artifact type: " + artifact.getTypeUrn()); > } > } > > Which indicates that only File and Embed artifacts seem to be supported > now. > > Best, > Ke >
