Hi Ahmed,
thanks for such a quick write-up. This is a pretty good start! I left
some comments, but if we have time pressure, I think we can release
"something", but clearly mark it as experimental (or better unstable),
so that users know what is the current state.
WDYT?
Jan
On 6/12/25 02:38, Ahmed Abualsaud via dev wrote:
Hey Jan, thanks for calling that out. Ideally I would have liked to
give more time for extensive engagement like we do with other proposals.
The reason for the accelerated timeline on this PR is to pave the way
for a "Getting Started" section for Beam on the Iceberg website,
ideally before their 1.10.0 release cut on June 23rd. Integrating
IcebergIO with Beam SQL is crucial for reaching a wider audience, and
getting this piece in before the Beam release cut (today/tomorrow)
would allow us to showcase SQL support sooner.
As Talat mentioned, the underlying Java implementation can be iterated
on, which means our immediate priority for consensus is the SQL syntax
itself.
I've put together an "after-the-fact" document to provide more
context, including a scouting report on how other frameworks handle
catalog management, and the proposed Beam SQL syntax. I hope this
helps kickstart the discussion.
https://docs.google.com/document/d/16P0JrcJ28KSoMMpLYExWPZaala7CE4Ezen-jC_ly3M4/edit?tab=t.0
Best,
Ahmed
On Wed, Jun 11, 2025 at 3:18 PM Talat Uyarer <ta...@apache.org> wrote:
Hi Ahmed,
Thank you so much for this change. I have been waiting for these
recent SQL changes for a while.
Going forward, I agree with Jan about having a design doc to
outline these changes. The underlying Java implementation is
largely hidden from users, so that can be changed in the future,
but as a community we should agree on the proposed SQL syntax.
Jan, I am as a Beam user and a small contributor, I've also been
waiting for this feature. And if you don't mind, can we get
Ahmed's changes in this version?
Thanks
On 2025/06/11 18:42:40 Jan Lukavský wrote:
> Hi Ahmed,
>
> this is a great effort which is by no doubt greatly needed by
the Beam
> project as a whole. On the other hand I think we should try to
establish
> a way to pull the community into the discussion process. Could
you sum
> up the the PR (not small) into a design document where we can
have a
> discussion about the goals, alternative solutions, already tried
ways,
> etc? This would be really cool!
>
> Best,
>
> Jan
>
> On 6/10/25 16:12, Ahmed Abualsaud via dev wrote:
> > Hey all,
> >
> > I was integrating our Java IcebergIO with Beam SQL (PR #34799
> > <https://github.com/apache/beam/pull/34799>) and got blocked
on the
> > fact that Beam SQL currently lacks a "Catalog" concept. This is
> > fundamental to modern data architectures like Iceberg, where
they are
> > used to manage table metadata and enable broad ecosystem
integration.
> > To address this gap, I've opened a new PR (#35223
> > <https://github.com/apache/beam/pull/35223>), which introduces
the
> > *Catalog* and *CatalogManager* interfaces, enabling support for:
> >
> > *
> >
> > |CREATE CATALOG my_catalog TYPE 'local' PROPERTIES (...)|
> >
> > *
> >
> > |SET CATALOG my_catalog|
> >
> > *
> >
> > |DROP CATALOG my_catalog|
> >
> > I left a more detailed overview in the PR description.
> >
> > My hope is that this foundational change will benefit not just
> > IcebergIO, but also other IOs and future Beam SQL integrations.
> >
> > Please take a look and share any feedback, especially
regarding major
> > architectural concerns. I'm working on a short timeline, so minor
> > enhancements can be noted for follow-up PRs.
> >
> > Thank you!
> > Ahmed