Eugene Kirpichov created BEAM-2712: -------------------------------------- Summary: SerializablePipelineOptions should not call FileSystems.setDefaultPipelineOptions. Key: BEAM-2712 URL: https://issues.apache.org/jira/browse/BEAM-2712 Project: Beam Issue Type: Bug Components: runner-apex, runner-core, runner-flink, runner-spark Reporter: Eugene Kirpichov Assignee: Kenneth Knowles
https://github.com/apache/beam/pull/3654 introduces SerializablePipelineOptions, which on deserialization calls FileSystems.setDefaultPipelineOptions. This is obviously problematic and racy in case the same process uses SerializablePipelineOptions with different filesystem-related options in them. The reason the PR does this is, Flink and Apex runners were already doing it in their respective SerializablePipelineOptions-like classes (being removed in the PR); and Spark wasn't but probably should have. I believe this is done for the sake of having the proper filesystem options automatically available on workers in all places where any kind of PipelineOptions are used. Instead, all 3 runners should pick a better place to initialize their workers, and explicitly call FileSystems.setDefaultPipelineOptions there. It would be even better if FileSystems.setDefaultPipelineOptions didn't exist at all, but that's a topic for a separate JIRA. CC'ing runner contributors [~aljoscha] [~aviemzur] [~thw] -- This message was sent by Atlassian JIRA (v6.4.14#64029)