> I am expecting FileStagingOptions#setFilesToStage in
PortablePipelineOptions
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L28>
is
the way to customize artifacts to be staged and resolved in portable
pipeline, however, it looks like that PortableRunner
<https://github.com/apache/beam/blob/master/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java#L129>
does
not add preconfigured files to `filesToStageBuilder` which is used in the
final options to prepare the job. Is this the expected behavior or maybe a
bug?

Yeah, that looks like a bug.

> In addition, do we support specifying an URL in
PortablePipelineOptions#filesToStage so that ArtifactRetrievalService can
retrieve artifacts from a remote address instead of default from JobServer,
which got artifacts from SDK Client. I am asking because I noticed

Files can use any of Beam's supported remote file systems (GCS, S3, Azure
Blobstore, HDFS). But arbitrary URLs are not supported.

On Wed, Apr 28, 2021 at 5:44 PM Ke Wu <[email protected]> wrote:

> Hello All,
>
> I am expecting FileStagingOptions#setFilesToStage in
> PortablePipelineOptions
> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L28>
>  is
> the way to customize artifacts to be staged and resolved in portable
> pipeline, however, it looks like that PortableRunner
> <https://github.com/apache/beam/blob/master/runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableRunner.java#L129>
>  does
> not add preconfigured files to `filesToStageBuilder` which is used in the
> final options to prepare the job. Is this the expected behavior or maybe a
> bug?
>
> In addition, do we support specifying an URL in
> PortablePipelineOptions#filesToStage so that ArtifactRetrievalService can
> retrieve artifacts from a remote address instead of default from JobServer,
> which got artifacts from SDK Client. I am asking because I noticed
>
> public static InputStream getArtifact(RunnerApi.ArtifactInformation artifact) 
> throws IOException {
>   switch (artifact.getTypeUrn()) {
>     case FILE_ARTIFACT_URN:
>       RunnerApi.ArtifactFilePayload payload =
>           RunnerApi.ArtifactFilePayload.parseFrom(artifact.getTypePayload());
>       return Channels.newInputStream(
>           FileSystems.open(
>               FileSystems.matchNewResource(payload.getPath(), false /* is 
> directory */)));
>     case EMBEDDED_ARTIFACT_URN:
>       return 
> RunnerApi.EmbeddedFilePayload.parseFrom(artifact.getTypePayload())
>           .getData()
>           .newInput();
>     default:
>       throw new UnsupportedOperationException(
>           "Unexpected artifact type: " + artifact.getTypeUrn());
>   }
> }
>
> Which indicates that only File and Embed artifacts seem to be supported
> now.
>
> Best,
> Ke
>

Reply via email to