Hi, Thanks for your explanation. I do have a concern about the behavior of create primary key table with lake enable. Why not throw exception directly to user? As a user, I create a table with `table.datalake.enabled` = `true`, and expect it to be tiered. But the server just modify my option, and will never tier this table. I know nothing about it and still waiting the data in the primary key table to tiered to lance again and again. While a warning mesage is logged, users don't know the warning message since it's logged in server client.
Best regards, Yuxia ----- 原始邮件 ----- 发件人: "Wang Cheng" <[email protected]> 收件人: "dev" <[email protected]> 发送时间: 星期六, 2025年 7 月 05日 下午 11:02:09 主题: Re: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance Hi Yuxia, Thanks for your suggestions! >> So, will it throw un-supported exception or just set `table.datalake.enabled = false` whatever users set it? Currently, user can create the primary key table when the lake format is Lance in Fluss. However, when user creates the primary key table and sets table.datalake.enabled to be true, a warning message will be logged and we'll enforce table.datalake.enabled to be false in the code to disable the tiering from Fluss primary key table to Lance lake. No un-supported exception will be thrown. >> Also, it seems conflicts with "For both log tables and primary key tables, xxx." since IIUC, only log table is supported. I will revise this phrase. Yes, only log table is supported. >> Is it a per-table optiions? If so, I'd like to suggest to rename it to `lance.batch_size` to follow the convention we have for paimon. Yes, it's a per-table option. I'll follow the paimon style. >> Can we commit the bucket end offset to lance with storageOptions. Looking into lance-spark code, it seems it's possible to pass storageOptions while commiting to lance The storage options in Lance's API are used for configuring the writer and reader behavior - such as max_rows_per_file and max_rows_per_group for writing, or index_cache_size and metadata_cache_size for reading. Therefore, passing unrecognized storage options to Lance's API is ineffective when committing. Regards, Cheng ------------------ Original ------------------ From: "dev" <[email protected]>; Date: Fri, Jul 4, 2025 08:02 PM To: "dev"<[email protected]>; Subject: Re: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance Hi, Cheng. Thanks for driving this work. A few comments are below: 1: >> "When creating primary key tables, we enforce table.datalake.enabled = false." So, will it throw un-supported exception or just set `table.datalake.enabled = false` whatever users set it? Also, it seems conflicts with "For both log tables and primary key tables, xxx." since IIUC, only log table is supported. 2: >> "The size of a fragment is controlled by the configuration option datalake.lance.batch_size" Is it a per-table optiions? If so, I'd like to suggest to rename it to `lance.batch_size` to follow the convention we have for paimon. 3: >> "When lake committer needs to find out the bucket end offset of committed lake snapshot, it has to reconstruct this information by reading the entire artificial bucket and offset columns from lake " Can we commit the bucket end offset to lance with storageOptions. Looking into lance-spark code, it seems it's possible to pass storageOptions while commiting to lance Best regards, Yuxia ----- 原始邮件 ----- 发件人: "Wang Cheng" <[email protected]> 收件人: "dev" <[email protected]> 发送时间: 星期五, 2025年 7 月 04日 下午 12:26:43 主题: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance Hi all, Lance is a popular table format designed for performant AI workloads. To enable the integration between Fluss and the multimodal AI data lake ecosystem, I'd like to propose FIP-5: Support tiering Fluss data to Lance [1]. Any feedback and suggestions on this proposal are welcome! [1]: https://cwiki.apache.org/confluence/display/FLUSS/FIP-5%3A+Support+tiering+Fluss+data+to+Lance Regards, Cheng &nbsp;
