Thanks Liebing for the updating. +1 to start the vote.
Best, Jark On Tue, 3 Mar 2026 at 14:34, Liebing Yu <[email protected]> wrote: > > Hi Jark! > > Thank you for your insightful suggestions. This FIP is a small step for > Fluss towards multi-remote (or multi-cloud) storage. As you mentioned, we > envision future support for commit-level multi-pathing, similar to the > approaches taken by Paimon and Lance. > > For your comments on the current FIP. I'm generally in agreement. > > 1. FileSystem#obtainSecurityToken(FsPath f) > For the current implementation, obtainSecurityToken(FsPath f) is actually > redundant and can be removed. > > 2. GetFileSystemSecurityTokenRequest/Response and Client Token Management > Your suggestion simplifies the implementation of multi-path authorization. > By deferring table-level authentication to a later stage, we can expedite > the landing of this FIP. I will update the FIP accordingly. > > Best regards, > Liebing Yu > > > On Tue, 3 Mar 2026 at 01:19, Jark Wu <[email protected]> wrote: > > > Hi Liebing, > > > > Thank you for the proposal. I believe this is an excellent initiative > > to improve throughput for large-scale clusters utilizing remote > > storage. > > > > The current design implements multi-location support at the table or > > partition level, meaning only new tables and partitions will utilize > > new remote locations. Consequently, even after upgrading the cluster > > to support multiple paths, data distribution will remain concentrated > > in a single location for an extended period, failing to achieve rapid > > traffic fan-out. In contrast, industry solutions like Paimon support > > "data-file.external-paths" [1] to distribute new data files across > > multiple paths, and Lance has recently introduced a file-level > > multi-base layout [2]. > > > > Ultimately, we need file-level multi-location support (I believe this > > approach will resolve most of the concerns raised above by Yang Guo). > > However, I am fine with supporting partition-level multi-location as > > an initial phase, provided we have a clear roadmap toward the final > > solution. > > > > Regarding the design details of this FIP, I have the following comments: > > > > 1. FileSystem#obtainSecurityToken(FsPath f) > > We should not add the FsPath parameter to the obtainSecurityToken > > interface for now. Because in current design, this interface only > > retrieve the security token for the entire filesystem rather than for > > a specific path. Since a filesystem is defined per authority, the > > authority does not need to be derived from an FsPath. > > > > In fact, we plan to refactor the Filesystem soon. This refactoring > > will add the FsPath parameter to obtainSecurityToken, ensuring the > > returned token is strictly scoped to that specific path. This change > > aims to address current permission leakage issues where a token > > requested for reading one table inadvertently grants access to all > > remote files of other tables. > > > > 2. GetFileSystemSecurityTokenRequest/Response and Client Token Management > > > > Current Issue: The FIP proposes maintaining a SecurityTokenManager per > > LogScanner. However, since tokens are shared at the filesystem > > granularity, tokens for the same FsKey across different tables should > > be consolidated. Therefore, the DefaultSecurityTokenManager must be > > maintained within the FlussConnection; otherwise, > > SecurityTokenManagers for different tables will overwrite each other's > > tokens. > > > > Recommendation: A straightforward approach is to leave > > GetFileSystemSecurityTokenRequest unchanged while modifying > > GetFileSystemSecurityTokenResponse to return a list of tokens. The > > server side would then return STS tokens for each FsKey configured in > > the cluster. The client-side Filesystem would subsequently retrieve > > the corresponding STS token based on the FsKey. This avoids changes to > > the LogScanner logic. > > > > While this approach retains the existing permission leakage issue, > > that problem is already present today. We can address it in a > > separate, dedicated FIP to simplify the scope and implementation of > > the current proposal. > > > > Best, > > Jark > > > > [1] https://paimon.apache.org/docs/1.3/maintenance/configurations/ > > [2] > > https://lancedb.com/blog/rethinking-table-file-paths-lance-multi-base-layout/ > > > > > > On Sat, 28 Feb 2026 at 20:37, Yang Guo <[email protected]> wrote: > > > > > > Hi Liebing and all, > > > > > > This is a good FIP to resolve bottlenecks in the remote storage. Thanks > > for > > > your effort. The design looks good to me and the above discussion has > > > covered some concerns in my mind. > > > > > > Now there are some further considerations I'm thinking of: > > > > > > 1. What happens if a path goes down? > > > Right now, there’s no automatic failover. If one S3 bucket (or HDFS > > path) > > > dies, every table or partition assigned to it just fails. Could we add > > > simple health checks? If a path looks dead, the remote dir selector > > > temporarily skips it until it’s back up. > > > > > > 2. New paths don't always help old data. > > > The routing only happens when a new table or new partition is created. > > And > > > it depends on the partition strategy. > > > - If the table is using time-based partitions (e.g., daily), adding new > > > paths works well because new data goes to new partitions on new paths. > > > - But for non-partitioned tables, or if it keeps writing to old > > partitions, > > > the new paths sit idle. The traffic never shifts over. > > > It requires developers to think further about partition strategy and > > input > > > data when adding remote dirs. > > > > > > 3. Managing "weights" is tricky manually for developers/maintainers. > > > Since the weighted round-robin is static: > > > - Developers/Maintainers have to determine the right weights based on > > > current traffic. > > > - If you skew weights to favor a path, you have to remember to > > > rebalance them later, or that path gets overloaded forever. E.g. If two > > > paths are weighted [1, 2] in the beginning to rebalance the higher > > traffic > > > in the first path. Developers/Maintainers should remember to change the > > > weights back to [1, 1] after the traffic is balanced between two paths. > > > Otherwise the traffic in the second path will keep growing. > > > - Also, setting a weight to 0 behaves differently depending on your > > > partition type (time-based paths eventually go quiet, but field-based > > ones > > > like "country=US" keep writing there forever). > > > Instead of manual tuning, could we eventually make this dynamic? Let the > > > system adjust weights based on real-time latency or throttling metrics. > > > > > > The points above are about future operational considerations—regarding > > > failover and maintenance after this solution is deployed. I think they > > > won't block this FIP. We may not need to fix these right now. Just bring > > > them into this discussion. > > > > > > Regards, > > > Yang Guo > > > > > > On Fri, Feb 27, 2026 at 5:53 PM Liebing Yu <[email protected]> wrote: > > > > > > > Hi Lorenzo, sorry for the late reply. > > > > > > > > Thanks for the AWS example! This further solidifies the case for > > multi-path > > > > support. > > > > > > > > Regarding your question about multi-cloud support: > > > > Our current design naturally supports multi-cloud object storage > > systems. > > > > Since the implementation is built upon a multi-schema filesystem > > > > abstraction (supporting schemes like s3://, oss://, abfs://, etc.), the > > > > system is inherently "cloud-agnostic." > > > > > > > > Best regards, > > > > Liebing Yu > > > > > > > > > > > > On Wed, 4 Feb 2026 at 23:37, Lorenzo Affetti via dev < > > [email protected] > > > > > > > > > wrote: > > > > > > > > > This is quite an interesting FIP and I think it is a significant > > > > > enhancement, especially for large-scale clusters. > > > > > > > > > > I think you can also add the AWS case in your motivation: > > > > > > > > > > > > > > > > https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-design-patterns.html#optimizing-performance-high-request-rate > > > > > AWS automatically scales if requests exceed 5,500 per second for the > > same > > > > > prefix, which results in transient 503 errors. > > > > > Your approach would eliminate this problem by providing another > > bucket. > > > > > > > > > > I was wondering if it might also provide the possibility of > > configuring > > > > the > > > > > same Fluss cluster for multi-cloud object storage systems. > > > > > From a design perspective, nothing should prevent me from storing > > remote > > > > > data on both Azure and AWS at the same time, probably resulting in > > > > > different performance numbers for different partitions/tables. > > > > > Should the design force the use of only 1 filesystem implementation? > > > > > > > > > > Thank you again! > > > > > > > > > > On Fri, Jan 30, 2026 at 7:59 AM Liebing Yu <[email protected]> > > wrote: > > > > > > > > > > > Hi Yuxia, thanks for the thoughtful response. Let me go through > > your > > > > > > questions one by one. > > > > > > > > > > > > 1. I think after we support `remote.data.dirs`, different schemas > > will > > > > be > > > > > > supported naturally. > > > > > > 2. Yes, I think we should change from `PbTablePath` to > > > > > > `PbPhysicalTablePath`. > > > > > > 3. Thanks for the reminder. I'll poc authentication in > > > > > > https://github.com/apache/fluss/issues/2518. But it doesn't block > > the > > > > > > multiple-paths implementation in Fluss server in > > > > > > https://github.com/apache/fluss/issues/2517. > > > > > > 4. For a partition table, the table itself has a remote data dir > > for > > > > > > metadata (such as lake offset). And each partition has its own > > remote > > > > dir > > > > > > for table data (e.g. kv or log data). > > > > > > 5. Legacy clients can access data in the new cluster. > > > > > > > > > > > > - If the permissions of the paths specified in > > `remote.data.dirs` on > > > > > the > > > > > > new cluster match those configured in `remote.data.dir`, > > seamless > > > > > > access is > > > > > > achievable. > > > > > > - If the permissions are inconsistent, access permissions must > > be > > > > > > explicitly configured. For example, when using OSS, a policy > > > > granting > > > > > > access permissions to the account identified by `fs.oss.roleArn` > > > > must > > > > > be > > > > > > configured for each bucket specified in `remote.data.dirs`. > > > > > > > > > > > > > > > > > > Best regards, > > > > > > Liebing Yu > > > > > > > > > > > > > > > > > > On Thu, 29 Jan 2026 at 10:07, Yuxia Luo <[email protected]> wrote: > > > > > > > > > > > > > Hi, Liebing > > > > > > > > > > > > > > Thanks for the detailed FIP. I have a few questions: > > > > > > > 1. Does `remote.data.dirs` support paths with different schemes? > > For > > > > > > > example: > > > > > > > ``` > > > > > > > remote.data.dirs: oss://bucket1/fluss-data, > > s3://bucket2/fluss-data > > > > > > > ``` > > > > > > > > > > > > > > 2. Should `GetFileSystemSecurityTokenRequest` include partition? > > > > > > > The FIP adds `table_path` to the request, but since different > > > > > partitions > > > > > > > may reside on different remote paths (and require different > > tokens), > > > > > > > should the request also include partition information? > > > > > > > > > > > > > > 3. Just a reminder that `DefaultSecurityTokenManager` will become > > > > more > > > > > > > complex... > > > > > > > This is not a blocker, but worth a poc to recoginize any > > complexity > > > > > > > > > > > > > > 4. I want to confirm my understanding: For a partitioned table, > > does > > > > > the > > > > > > > table itself have a remote dir, AND each partition also has its > > own > > > > > > remote > > > > > > > dir? > > > > > > > > > > > > > > Or is it: > > > > > > > - Non-partitioned table → table-level remote dir > > > > > > > - Partitioned table → only partition-level remote dirs (no > > > > > table-level)? > > > > > > > > > > > > > > 5. Can old clients (without table path in token request) still > > read > > > > > data > > > > > > > from new clusters? > > > > > > > One possibe solution is : For RPCs without table information, the > > > > > server > > > > > > > returns a token for the first dir in `remote.data.dirs`. Or other > > > > ways > > > > > > that > > > > > > > allow users to configure the cluster to keep compatibility > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 2026/01/21 03:52:29 Zhe Wang wrote: > > > > > > > > Thanks for your response, now it looks good to me. > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Zhe Wang > > > > > > > > > > > > > > > > Liebing Yu <[email protected]> 于2026年1月20日周二 14:29写道: > > > > > > > > > > > > > > > > > Hi Zhe, sorry for the late reply. > > > > > > > > > > > > > > > > > > The primary focus of this FIP is not to address read/write > > issues > > > > > at > > > > > > > the > > > > > > > > > table or partition level, but rather to overcome limitations > > at > > > > the > > > > > > > cluster > > > > > > > > > level. Given the current capabilities of object storage, > > > > read/write > > > > > > > > > performance for a single table or partition is unlikely to > > be a > > > > > > > bottleneck; > > > > > > > > > however, for a large-scale Fluss cluster, it can easily > > become > > > > one. > > > > > > > > > Therefore, the core objective here is to distribute the > > > > > cluster-wide > > > > > > > > > read/write traffic across multiple remote storage systems. > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Liebing Yu > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 14 Jan 2026 at 16:07, Zhe Wang < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Liebing, Thanks for the clarification. > > > > > > > > > > >1. To clarify, the data is currently split by partition > > level > > > > > for > > > > > > > > > > partitioned tables and by table for non-partitioned tables. > > > > > > > > > > > > > > > > > > > > Therefore the main aim of this FIP is improving the speed > > of > > > > read > > > > > > > data > > > > > > > > > from > > > > > > > > > > different partitions, store data speed may still limit for > > a > > > > > single > > > > > > > > > system? > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Zhe Wang > > > > > > > > > > > > > > > > > > > > Liebing Yu <[email protected]> 于2026年1月13日周二 19:11写道: > > > > > > > > > > > > > > > > > > > > > Hi Zhe, Thanks for the questions! > > > > > > > > > > > > > > > > > > > > > > 1. To clarify, the data is currently split by partition > > level > > > > > for > > > > > > > > > > > partitioned tables and by table for non-partitioned > > tables. > > > > > > > > > > > > > > > > > > > > > > 2. Regarding RemoteStorageCleaner, you are absolutely > > right. > > > > > > > Supporting > > > > > > > > > > > remote.data.dirs there is necessary for a complete > > cleanup > > > > > when a > > > > > > > table > > > > > > > > > > is > > > > > > > > > > > dropped. > > > > > > > > > > > > > > > > > > > > > > Thanks for pointing that out! > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > Liebing Yu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 12 Jan 2026 at 17:02, Zhe Wang < > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Liebing, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for driving this, I think it's a really useful > > > > > feature. > > > > > > > > > > > > I have two small questions: > > > > > > > > > > > > 1. What's the scope for split data in dirs, I see > > there's a > > > > > > > > > partitionId > > > > > > > > > > > in > > > > > > > > > > > > ZK Data, so the data will spit by partition in > > different > > > > > > > directories, > > > > > > > > > > or > > > > > > > > > > > by > > > > > > > > > > > > bucket? > > > > > > > > > > > > 2. Maybe it needs to support remote.data.dirs in > > > > > > > > > RemoteStorageCleaner? > > > > > > > > > > So > > > > > > > > > > > > we can delete all remoteStorage when delete table. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Zhe Wang > > > > > > > > > > > > > > > > > > > > > > > > Liebing Yu <[email protected]> 于2026年1月8日周四 20:10写道: > > > > > > > > > > > > > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > > > > > > > > > > > > > I propose initiating discussion on FIP-25[1]. Fluss > > > > > leverages > > > > > > > > > remote > > > > > > > > > > > > > storage systems—such as Amazon S3, HDFS, and Alibaba > > > > Cloud > > > > > > > OSS—to > > > > > > > > > > > > deliver a > > > > > > > > > > > > > cost-efficient, highly available, and fault-tolerant > > > > > storage > > > > > > > > > solution > > > > > > > > > > > > > compared to local disk. *However, in production > > > > > environments, > > > > > > > we > > > > > > > > > > often > > > > > > > > > > > > find > > > > > > > > > > > > > that the bandwidth of a single remote storage > > becomes a > > > > > > > bottleneck. > > > > > > > > > > > > *Taking > > > > > > > > > > > > > OSS[2] as an example, the typical upload bandwidth > > limit > > > > > for > > > > > > a > > > > > > > > > single > > > > > > > > > > > > > account is 20 Gbit/s (Internal) and 10 Gbit/s > > (Public). > > > > So > > > > > I > > > > > > > > > > initiated > > > > > > > > > > > > this > > > > > > > > > > > > > FIP which aims to introduce support for multiple > > remote > > > > > > storage > > > > > > > > > paths > > > > > > > > > > > and > > > > > > > > > > > > > enables the dynamic addition of new storage paths > > without > > > > > > > service > > > > > > > > > > > > > interruption. > > > > > > > > > > > > > > > > > > > > > > > > > > Any feedback and suggestions on this proposal are > > > > welcome! > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLUSS/FIP-25%3A+Support+Multi-Location+for+Remote+Storage > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://www.alibabacloud.com/help/en/oss/user-guide/limits?spm=a2c63.l28256.help-menu-31815.d_0_0_5.2ac34d06oZYFvK > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > Liebing Yu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Lorenzo Affetti > > > > > Senior Software Engineer @ Flink Team > > > > > Ververica <http://www.ververica.com> > > > > > > > > > > >
