[ https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346542#comment-17346542 ]
Xintong Song commented on FLINK-19481: -------------------------------------- {quote}The runtime complexity of having the additional Hadoop layer will likely be strictly worse. This is because each layer has it's own configuration and things like thread pooling, pool sizes, buffering, and other non-trivial tuning parameters. {quote} I'm not sure about this. Looking into o.a.f.runtime.fs.hdfs.HadoopFileSystem, the Flink filesystem is practically a layer of API mappings around the Hadoop filesystem. It might be true that the parameters to be tuned are separated into different layers, but I wonder how many extra parameters, thus complexity, are introduced due to the additional layer. Shouldn't the total amount of parameters be the same? {quote}In my experience the more native (fewer layers of abstraction) you can achieve the better the result. {quote} I admit that, if we are building the GCS file system from the ground up, the less layers the better. # GCS SDK -> Hadoop FileSystem -> Flink FileSystem # GCS SDK -> Flink FileSystem However, we don't have to build everything from the ground up. In the first path above, there are already off-the-shelf solution for both mappings (google connector for sdk -> hadoop fs, and o.a.f.runtime.fs.hdfs.HadoopFileSystem for hadoop-> flink). It requires almost no extra efforts in addition to assembling existing artifacts. On the other hand, in the second path we need to implement a brand new file system, which seems to be re-inventing the wheel. {quote}It seems from reading the comments here though that a good solution would be a hybrid of Ben's work on the native GCS Filesystem combined with Galen's work on the RecoverableWriter. {quote} Unless there're more inputs on why we should have a native GCS file system, I'm leaning towards not introducing such a native implementation based on the discussion so far. > Add support for a flink native GCS FileSystem > --------------------------------------------- > > Key: FLINK-19481 > URL: https://issues.apache.org/jira/browse/FLINK-19481 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, FileSystems > Affects Versions: 1.12.0 > Reporter: Ben Augarten > Priority: Minor > Labels: auto-deprioritized-major > > Currently, GCS is supported but only by using the hadoop connector[1] > > The objective of this improvement is to add support for checkpointing to > Google Cloud Storage with the Flink File System, > > This would allow the `gs://` scheme to be used for savepointing and > checkpointing. Long term, it would be nice if we could use the GCS FileSystem > as a source and sink in flink jobs as well. > > Long term, I hope that implementing a flink native GCS FileSystem will > simplify usage of GCS because the hadoop FileSystem ends up bringing in many > unshaded dependencies. > > [1] > [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)] -- This message was sent by Atlassian Jira (v8.3.4#803005)