[ 
https://issues.apache.org/jira/browse/FLINK-33694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl resolved FLINK-33694.
-----------------------------------
    Fix Version/s: 1.19.0
                   1.17.3
                   1.18.2
         Assignee: Patrick Lucas
       Resolution: Fixed

master: 
[a41229b24d82e8c561350c42d8a98dfb865c3f69|https://github.com/apache/flink/commit/a41229b24d82e8c561350c42d8a98dfb865c3f69]
1.18: 
[846ab49afd20ecf49fe76e18dd3e9b41143bf207|https://github.com/apache/flink/commit/846ab49afd20ecf49fe76e18dd3e9b41143bf207]
1.17: 
[257c526d6ae404f4598aeb2b9efa85674df2e6cd|https://github.com/apache/flink/commit/257c526d6ae404f4598aeb2b9efa85674df2e6cd]

> GCS filesystem does not respect gs.storage.root.url config option
> -----------------------------------------------------------------
>
>                 Key: FLINK-33694
>                 URL: https://issues.apache.org/jira/browse/FLINK-33694
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems
>    Affects Versions: 1.18.0, 1.17.2
>            Reporter: Patrick Lucas
>            Assignee: Patrick Lucas
>            Priority: Major
>              Labels: gcs, pull-request-available
>             Fix For: 1.19.0, 1.17.3, 1.18.2
>
>
> The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK 
> directly rather than going through Hadoop. While support has been added to 
> configure credentials correctly based on the standard Hadoop implementation 
> configuration, no other options are passed through to the underlying client.
> Because this only affects the RecoverableWriter-related codepaths, it can 
> result in very surprising differing behavior whether the FileSystem is being 
> used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work 
> fine, a {{{}gs://{}}}-URI FileSink may not work at all.
> We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in 
> testing, and so we override the Hadoop GCS FileSystem config option 
> {{{}gs.storage.root.url{}}}. However, because this option is not considered 
> when creating the GCS client for the RecoverableWriter codepath, in a 
> FileSink the GCS FileSystem attempts to write to the real GCS service rather 
> than fake-gcs-server. At the same time, a FileSource works as expected, 
> reading from fake-gcs-server.
> The fix should be fairly straightforward, reading the {{gs.storage.root.url}} 
> config option from the Hadoop FileSystem config in 
> [{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30]
>  and, if set, passing it to {{storageOptionsBuilder}} in 
> [{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java].
> The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with 
> a patch and use it as a plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to