Re: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance

yuxia Sun, 06 Jul 2025 19:47:53 -0700

Hi, 
Thanks for your explanation. I do have a concern about the behavior of create 
primary key table with lake enable.  
Why not throw exception directly to user? As a user, I create a table with  
`table.datalake.enabled` = `true`, and expect it to be tiered. But the server 
just modify my option, and will never tier this table. I know nothing about it 
and still waiting the data in the primary key table to tiered to lance again 
and again. 
While a warning mesage is logged, users don't know the warning message since 
it's logged in server client.


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Wang Cheng" <[email protected]>
收件人: "dev" <[email protected]>
发送时间: 星期六, 2025年 7 月 05日 下午 11:02:09
主题: Re: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance

Hi Yuxia,


Thanks for your suggestions!


&gt;&gt; So, will it throw un-supported exception or just set 
`table.datalake.enabled = false` whatever users set it?
Currently, user can create the primary key table when the lake format is Lance 
in Fluss. However, when user creates the primary key table and sets 
table.datalake.enabled to be true, a warning message will be logged and we'll 
enforce table.datalake.enabled to be false in the code to disable the tiering 
from Fluss primary key table to Lance lake.&nbsp;No un-supported exception will 
be thrown.


&gt;&gt; Also, it seems conflicts with "For both log tables and primary key 
tables, xxx." since IIUC, only log table is supported.
I will revise this phrase. Yes, only log table is supported.


&gt;&gt; Is it a per-table optiions? If so, I'd like to suggest to rename it to 
`lance.batch_size` to follow the convention we have for paimon.
Yes, it's a per-table option. I'll follow the paimon style.


&gt;&gt; Can we commit the bucket end offset to lance with storageOptions. 
Looking into lance-spark code, it seems it's possible to pass storageOptions 
while commiting to lance
The storage options in Lance's API are used for configuring the writer and 
reader behavior - such as max_rows_per_file and max_rows_per_group for writing, 
or index_cache_size and metadata_cache_size for reading. Therefore, passing 
unrecognized storage options to Lance's API is ineffective when committing.



Regards,
Cheng



&nbsp;




------------------&nbsp;Original&nbsp;------------------
From:                                                                           
                                             "dev"                              
                                                      
<[email protected]&gt;;
Date:&nbsp;Fri, Jul 4, 2025 08:02 PM
To:&nbsp;"dev"<[email protected]&gt;;

Subject:&nbsp;Re: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance



Hi, Cheng.
Thanks for driving this work. A few comments are below:
1: &gt;&gt; "When creating primary key tables, we enforce 
table.datalake.enabled = false."
So, will it throw un-supported exception or just set `table.datalake.enabled = 
false` whatever users set it?
Also, it seems conflicts with "For both log tables and primary key tables, 
xxx." since IIUC, only log table is supported.

2: &gt;&gt; "The size of a fragment is controlled by the configuration option 
datalake.lance.batch_size"
Is it a per-table optiions? If so, I'd like to suggest to rename it to 
`lance.batch_size` to follow the convention we have for paimon.

3: &gt;&gt; "When lake committer needs to find out the bucket end offset of 
committed lake snapshot, it has to reconstruct this information by reading the 
entire artificial bucket and offset columns from lake "
Can we commit the bucket end offset to lance with storageOptions. Looking into 
lance-spark code, it seems it's possible to pass storageOptions while commiting 
to lance


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Wang Cheng" <[email protected]&gt;
收件人: "dev" <[email protected]&gt;
发送时间: 星期五, 2025年 7 月 04日 下午 12:26:43
主题: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance

Hi all,


Lance is a popular table format designed for performant AI workloads. To enable 
the integration between Fluss and the multimodal AI data lake ecosystem, I'd 
like to propose FIP-5: Support tiering Fluss data to Lance [1].


Any feedback and suggestions on this proposal are welcome!


[1]: 
https://cwiki.apache.org/confluence/display/FLUSS/FIP-5%3A+Support+tiering+Fluss+data+to+Lance



Regards,
Cheng



&amp;nbsp;

Re: [SPAM][DISCUSS] FIP-5: Support tiering Fluss data to Lance

Reply via email to