Re: Summary of experience using Calcite

Sandeep N Thu, 04 Aug 2022 07:31:40 -0700

Thanks for taking the time to read and reply. It is encouraging to hear
that my use of Calcite is not unconventional, it does validate the thought
that we can come back to Calcite in future if necessary.


I couldn't agree more with the point you made, there is simply no other
tool available with the kind of capabilities Calcite provides.

I believe to be successful with Calcite a couple of us (from my team) will
need to make time to engage and contribute to Calcite with the long term
goal of leveraging this toolkit when the time and need arises. We are
building stuff in this space so we might as well stay aware of the tools.
Trying to look at and treat Calcite like any open source library/tool isn't
the right frame of mind.

Regards,

--Sandeep

On Thu, Aug 4, 2022 at 7:00 AM Vladimir Ozerov <ppoze...@gmail.com> wrote:

> Hi Sandeep,
>
> Thank you for sharing your experience. Your feedback matches with what I
> usually see in practice. IMO there are two main problems with Apache
> Calcite:
>
>    1. Documentation that focuses on a single edge case - the end-to-end
>    execution of federated queries with Enumerable. In practice, this
>    approach rarely could be used for anything but simple projects. Instead,
>    major Apache Calcite users (Apache Hive, Apache Flink, etc) do what you
>    mentioned as "unconventional use": they use separate Calcite components
> for
>    query optimization, translation, DSLs, etc. And there is virtually no
>    documentation on how to use them. So your use case is not unusual;
> there is
>    just no docs for it.
>    2. Another big problem is stability and quality. There are many bugs in
>    optimization rules, rel-to-sql conversion, etc. When doing a
> sophisticated
>    Apache Calcite integration, you should be ready to find a creative
>    workarounds for various bugs: method overrides, unusual class
> hierarchies,
>    copy-pasting of Apache Calcite codebase, etc.
>
> Unfortunately, there is no major commercial vendor behind the community
> that systematically invests in documentation and quality, so it is what it
> is.
>
> However, if you manage to get through these difficulties, you will get an
> extremely powerful tool, and I personally not aware of any other project
> that is even close to Apache Calcite in query optimization and translation
> capabilities. This is why there are more and more database and data
> management projects that choose Apache Calcite as the backbone of their
> query processing.
>
> If your use case is simple, then perhaps Apache Calcite might be too
> heavy-weight. However, if you plan to make your system more complicated, I
> would recommend you to keep Apache Calcite in mind still.
>
> Regards,
> Vladimir.
>
> чт, 4 авг. 2022 г. в 04:42, Sandeep N <federatedqu...@gmail.com>:
>
> > Hi all,
> >
> > I wanted to share my experience with one of the current projects that I
> > tried to use Calcite with and through this experience hope to shed light
> on
> > some of the challenges I ran into and share my thoughts on how they could
> > be addressed.
> >
> > Apologies for the long drawn explanation but I think context will help
> you
> > guys understand the challenges I hope to surface.
> >
> > For the tldr version jump to the paragraph with heading "In Summary".
> >
> > Let me first note that my use of Calcite  is unconventional in the sense
> > that I was looking to convert queries written in a homegrown DSL to SQL
> > that could be run on Snowflake. There were two main challenges : (a)
> > Mapping the DSL to SQL and (b) Generating SQL compliant to Snowflake with
> > support for custom data-types like VARIANT.
> >
> > The approach I landed on based on feedback from this forum was to take
> the
> > AST from the DSL, use a tree walker to generate a relational algebra tree
> > which then can generate SQL using one of the Dialects (Snowflake to be
> > added).
> >
> > The first task involved asking questions on the dev forum and reading
> > through previously asked questions. Here I have to extend thanks to
> > Stamatis for the pointers and links he provided which helped immensely.
> In
> > the end it took those pointers + step through debugging of the code via
> > unit-tests to figure how to use the relational algebra tree (RelNode and
> > RexNode).
> >
> > The second task was to add Snowflake specific constructs and here again
> > thanks to  Stamatis and Julian I had pointers to start looking at the
> > pieces. But before I went down the path of extending relational algebra
> or
> > figuring adding snowflake functions I shared my prototype work with my
> team
> > and talked through the approach and the Calcite integration. My goal was
> to
> > get a general sense of whether my team thought this was a good path
> forward
> > (some of the reason for this will be obvious in the In Summary section).
> >
> > A few of the team members looked at extending my prototype with Calcite
> and
> > after a couple of weeks of working with it we as a team arrived at the
> > decision that we will hold off on using  Calcite and instead build
> > something simpler and homegrown. "Simpler" because the types of queries
> we
> > needed to translate to SQL were very limited and did not exercise all the
> > possible types of SQL queries. We may revisit this in the future and come
> > back to Calcite.
> >
> > In Summary : The feedback I have is as follows
> > (a) Calcite does not have enough examples, docs and how-to guides so it
> is
> > difficult to get started. For this one I think our usage of Calcite being
> > unconventional added to the complexity as the samples might not directly
> > map to what we were looking for - one needs to extrapolate from what is
> > done from SQL to applying the same for another DSL.
> >
> > (b) Learning curve for Calcite is steep. - It takes a lot of time and
> > patience to debug through code and understand Calcite before using it. It
> > takes a certain mind-set of "don't give up" and "must figure this out" to
> > make headway. My experience with some of the technologies (having written
> > micro-containers and classloaders) have left me with a higher threshold
> of
> > patience - but that is not for everyone.
> >
> > I will say that the dev forums are prompt to respond to queries but at
> > times it might not be adequate for newbies - One suggestion I can think
> of
> > is to pair a newbie with a seasoned Calcite contributor as a mentor for
> > tasks which need such a pairing e.g. adding a new Database Dialect. It
> is a
> > fairly big commitment/investment on the part of the mentor but it might
> > make inroads into the project easier. Mentors could assign tasks of
> > improving docs/samples/how-to guides to the newbies as they work with
> them.
> > It would address the challenges for future adopters and also build
> > confidence in the newbies to contribute.
> >
> > I will end on a thank you note, just the process of figuring out the
> > relational algebra tree and translation taught me a lot. Thanks to you
> guys
> > for all the contributions. Like I said earlier I am still hopeful we will
> > come back to Calcite and thus plan to stay plugged in.
> >
> > Cheers,
> >
> > --Sandeep Nayak
> >
>

Re: Summary of experience using Calcite

Reply via email to