Re: A question regarding querying Google Cloud BigTable or Spanner through Apache Calcite

Slim Bouguerra Tue, 03 Nov 2020 17:59:02 -0800

H Jason,
Calcite is a great Logical optimizer Framework that can be used to do very
sophisticated data federation and query rewriting.
But I don't think you want to use Calcite as it is to perform the physical
join between 2 big data systems.
To put it simply If you are using Calcite you need to make sure that the
join can be done in one single host and should fit in the heap of a single
JVM.
Since you can not modify spanner code, you are left with 2 options:
A/ doing the joining in Druid (small dimension to fact table join should
work well read about Druid Lookups)
B/ use another Big data system like Hive or Spark for Fact Fact joins


If you go with Druid here is a good PRs to read
https://github.com/apache/druid/pull/9648
https://github.com/apache/druid/pull/9294


On Tue, Nov 3, 2020 at 4:18 PM Haisheng Yuan <hy...@apache.org> wrote:

> Hi Jason,
>
> Absolutely it is.
>
> On 2020/11/03 20:53:08, Jason Chen <jason.c...@shopify.com.INVALID>
> wrote:
> > Hey,
> >
> > I am Jason Chen from Shopify Data Science and Engineering team. I have a
> few questions regarding the Apache Calcite, and I am not sure if the Apache
> Calcite fits our use cases. Feel free to point me to the correct email or
> Slack channel if this email is not the correct one for asking questions.
> >
> > We are exploring the approaches to do mixed querying across multiple
> storage resources. One use cases is doing the “JOIN” in query time of query
> results from both Druid and BigTable/Spanner. Is this a good use case for
> Apache Calcite?
> >
> > Thank you for any help!
> >
> > Regards,
> > Jason Chen
> >
> >
> > Jason (Jianbin) Chen
> > Senior Data Developer
> > p: +1 2066608351 | e: jason.c...@shopify.com
> > a: 234 Laurier Ave W Ottawa, ON K1N 5X8
> >
>

Re: A question regarding querying Google Cloud BigTable or Spanner through Apache Calcite

Reply via email to