Re: [DISCUSS] FIP-3: Support tiering Fluss data to Iceberg

yuxia Sun, 06 Jul 2025 19:20:11 -0700

Hi, Mehul.

For Snapshot expiration, as the FIP said, while committing by LakeCommitter, it 
should respect the iceberg table properties[1]. 
It will respect the property `history.expire.max-snapshot-age-ms` to expire 
snapshots while committing as we done for Paimon.


As for removing the orphan files, I feel like it'll be a little of complex for 
Fluss to do that. And also, I think it should be better triggered by user since 
it's hard for Fluss cluster 
to know when to trigger removing the orphan files. So, let's leave it to user 
to see what happens.

[1]https://iceberg.apache.org/docs/1.9.1/configuration/#table-behavior-properties

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Mehul Batra" <[email protected]>
收件人: "dev" <[email protected]>
发送时间: 星期六, 2025年 7 月 05日 下午 11:38:20
主题: Re: [DISCUSS] FIP-3: Support tiering Fluss data to Iceberg

Hi Yuxia,
Great, that sounds good to me and will help the user to have a better read
latency.
How about the Snapshot expiration (to regulate metadata) and removing the
orphan files(which are no longer referenced or dangling files of failed
tasks)?
Are we planning to introduce them as part of automated maintenance provided
by the Fluss cluster?
Warm regards,
Mehul Batra

On Fri, Jul 4, 2025 at 5:02 PM yuxia <[email protected]> wrote:

> Hi, Mehul.
> Thanks for your attention. I think we don't need to introduce an extra
> post-commit hook to manage small files. In the design, all files that belong
> to same bucket(in iceberg, it'll be same partition) be distributed to same
> task to write. So, the task can compact these small files then for the
> partition.
> As this FIP said, while creating IcebergLakeWriter in one round of
> tiering, the writer can scan manifest to know the files in this bucket, if
> found compaction is available, it can
> compact these files while writing new files. We have a similar logic for
> tiering to paimon.
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Mehul Batra" <[email protected]>
> 收件人: "dev" <[email protected]>
> 发送时间: 星期四, 2025年 7 月 03日 下午 5:04:18
> 主题: Re: [DISCUSS] FIP-3: Support tiering Fluss data to Iceberg
>
> +1 This will help us to address the missing table format and provide better
> ecosystem interoperability. Iceberg's growing adoption in the data
> lakehouse space makes this a valuable addition to Fluss's tiering
> capabilities.
> Are there any plans to integrate the Maintenance services as part of
> tiering itself as a post-commit hook to manage small files?
> Warm regards,
> Mehul Batra
>
> On Thu, Jul 3, 2025 at 2:24 PM yuxia <[email protected]> wrote:
>
> > Hi,
> >
> > Fluss currently supports tiering data to Apache Paimon, enabling
> > cost-effective storage management for warm/cold data. However, the lack
> of
> > native Iceberg tiering support limits flexibility and ecosystem
> integration
> > for users who rely on Iceberg’s open table format.
> >
> > To address this gap, I’d like to propose FIP-3: Support Tiering Fluss
> Data
> > to Iceberg[1] which aims to integrate Iceberg into Fluss’s tiering
> > capabilities.
> >
> > Welcome your feedback and suggestions on this proposal. Looking forward
> to
> > a productive discussion!
> >
> > [1]:
> >
> https://cwiki.apache.org/confluence/display/FLUSS/FIP-3%3A+Support+tiering+Fluss+data+to+Iceberg
> >
> > Best regards,
> > Yuxia
> >
> >
>

Re: [DISCUSS] FIP-3: Support tiering Fluss data to Iceberg

Reply via email to