[ 
https://issues.apache.org/jira/browse/FLINK-33694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792050#comment-17792050
 ] 

Patrick Lucas commented on FLINK-33694:
---------------------------------------

[~martijnvisser] my comment is perhaps not precise, in that "support has been 
added to configure credentials correctly" should be qualified with the comment 
from the docs about which credentials mechanisms are supported. But it is still 
true that other potentially-interesting options are not proxied through, such 
as setting the root URL.

This change as written doesn't affect any credentials handling, only adding 
support for this one additional option. However, I could see an argument for 
implementing the behavior in {{org.apache.flink.fs.gs.utils.ConfigUtils}} as 
the credentials behavior is rather than in {{GSFileSystemOptions}} as I did to 
start with.

> GCS filesystem does not respect gs.storage.root.url config option
> -----------------------------------------------------------------
>
>                 Key: FLINK-33694
>                 URL: https://issues.apache.org/jira/browse/FLINK-33694
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems
>    Affects Versions: 1.18.0, 1.17.2
>            Reporter: Patrick Lucas
>            Priority: Major
>              Labels: gcs, pull-request-available
>
> The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK 
> directly rather than going through Hadoop. While support has been added to 
> configure credentials correctly based on the standard Hadoop implementation 
> configuration, no other options are passed through to the underlying client.
> Because this only affects the RecoverableWriter-related codepaths, it can 
> result in very surprising differing behavior whether the FileSystem is being 
> used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work 
> fine, a {{{}gs://{}}}-URI FileSink may not work at all.
> We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in 
> testing, and so we override the Hadoop GCS FileSystem config option 
> {{{}gs.storage.root.url{}}}. However, because this option is not considered 
> when creating the GCS client for the RecoverableWriter codepath, in a 
> FileSink the GCS FileSystem attempts to write to the real GCS service rather 
> than fake-gcs-server. At the same time, a FileSource works as expected, 
> reading from fake-gcs-server.
> The fix should be fairly straightforward, reading the {{gs.storage.root.url}} 
> config option from the Hadoop FileSystem config in 
> [{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30]
>  and, if set, passing it to {{storageOptionsBuilder}} in 
> [{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java].
> The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with 
> a patch and use it as a plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to