[GitHub] [beam] max-lepikhin commented on pull request #24527: adding azure IO

GitBox Tue, 13 Dec 2022 17:15:06 -0800


max-lepikhin commented on PR #24527:
URL: https://github.com/apache/beam/pull/24527#issuecomment-1350207932


   > This seems to be adding this to the job server (not the SDK container). Or 
do you need this to read/write from Azure with non-portable Flink runner 
somehow ?
   
   Not sure about the differences between portable and non-portable. Python 
beam requires a beam flink job-server, we used the jar from maven 
"org.apache.beam:beam-runners-flink-1.14-job-server:2.43.0". The jar was passed 
to the python beam using 
[flink_job_server_jar](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L1479)
 flag. Then, the job server failed to read artifacts from either:
   - /tmp (default) when --artifacts_dir is not set. In this case it fails on 
the flink worker side as artifact's path is passed unchanged to it. Guessing 
the artifacts must be staged in a cloud storage not in local /tmp on the job VM.
   - azfs://. In this case it fails in the job-server as azure file system is 
not on the class path when the server starts.
   This change adds azure file system to the jar so that the job-server didn't 
fail when azfs:// is used in --artifacts_dir.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] max-lepikhin commented on pull request #24527: adding azure IO

Reply via email to