Hi David,

Just to be sure, since you've already included Azure Blob Storage, but did
you deliberately skip Azure Data Lake Store Gen2? That's currently
supported and also used by Flink users [1]. There's also MapR FS, but I
doubt if that is still used.

Best regards,

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/

On Mon, 6 Dec 2021 at 12:28, David Morávek <d...@apache.org> wrote:

> Hi Everyone,
>
> as outlined in FLIP-194 discussion [1], for the future directions of Flink
> HA services, I'd like to verify my thoughts around guarantees of the
> distributed filesystems used with Flink.
>
> Currently some of the services (*JobGraphStore*,
> *CompletedCheckpointStore*) are implemented using a combination of
> strongly consistent Metadata storage (ZooKeeper, K8s CM) and the actual
> FileSystem. Reasoning behind this dates back to days, when S3 was an
> eventually consistent FileSystem and we needed a strongly consistent view
> of the data.
>
> I did some research, and my feeling is that all the major FileSystems that
> Flink supports already provide strong read-after-write consistency, which
> would be sufficient to decrease a complexity of the current HA
> implementations.
>
> FileSystems that I've checked and that seem to support strong
> read-after-write consistency:
> - S3
> - GCS
> - Azure Blob Storage
> - Aliyun OSS
> - HDFS
> - Minio
>
> Are you aware of other FileSystems that are used with Flink? Do they
> support the consistency that is required for starting a new initiatives
> towards simpler / less error-prone HA services? Are you aware of any
> problems with the above mentioned FileSystems that I might have missed?
>
> I'm also bringing this up to user@f.a.o, to make sure we don't miss any
> FileSystems.
>
> [1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
>
> Best,
> D.
>

Reply via email to