Re: spark as data warehouse?

Cheng Pan Sat, 26 Mar 2022 02:01:11 -0700

Sorry I missed the original channel, added it back.

-----


I have less knowledge about dbt. If it supports Hive, it should support Kyuubi.
Basically, Kyuubi is gateway between your client(e.g. beeline, hive
jdbc client) and compute engine(e.g. Spark, Flink, Trino), I think the
most valuable things are:
1) Kyuubi reuses the Hive Thrift Protocol, it say you can treat Kyuubi
as a HiveServer2, and continue use beeline, hive jdbc driver to
connect Kyuubi to run SQL(in your compute engine dialect). Ideally, if
a tool claims it supports Hive, then it supports Kyuubi.
2) Kyuubi manages the compute engine lifecycle and share level, makes
a good trade-off between isolation and resource consumption.[1]

PS: Kyuubi's support for Spark is very mature, you can find lots of
production use cases here[2]. The support for Flink & Trino is in beta
phase.

[1] https://kyuubi.apache.org/docs/latest/deployment/engine_share_level.html
[2] https://github.com/apache/incubator-kyuubi/discussions/925

Thanks,
Cheng Pan

-------

Thanks, I'll check it out.
I have a use case where we want to use dbt as data middling tool .
Will it take dbt queries and create the resulting model ?
I see it supports Trino , so I am guessing yes .

I will love to contribute to it as well.

Thanks
Deepak

-------

Spark SQL can indeed take over your Hive workloads, and if you're
looking for an open source solution, Apache Kyuubi(Incubating)[1]
might help.

[1] https://kyuubi.apache.org/

Thanks,
Cheng Pan

On Sat, Mar 26, 2022 at 4:51 PM Cheng Pan <pan3...@gmail.com> wrote:
>
> I have less knowledge about dbt. If it supports Hive, it should support 
> Kyuubi.
> Basically, Kyuubi is gateway between your client(e.g. beeline, hive
> jdbc client) and compute engine(e.g. Spark, Flink, Trino), I think the
> most valuable things are:
> 1) Kyuubi reuses the Hive Thrift Protocol, it say you can treat Kyuubi
> as a HiveServer2, and continue use beeline, hive jdbc driver to
> connect Kyuubi to run SQL(in your compute engine dialect). Ideally, if
> a tool claims it supports Hive, then it supports Kyuubi.
> 2) Kyuubi manages the compute engine lifecycle and share level, makes
> a good trade-off between isolation and resource consumption.[1]
>
> PS: Kyuubi's support for Spark is very mature, you can find lots of
> production use cases here[2]. The support for Flink & Trino is in beta
> phase.
>
> [1] https://kyuubi.apache.org/docs/latest/deployment/engine_share_level.html
> [2] https://github.com/apache/incubator-kyuubi/discussions/925
>
> Thanks,
> Cheng Pan
>
> On Sat, Mar 26, 2022 at 4:16 PM Deepak Sharma <deepakmc...@gmail.com> wrote:
> >
> > Thanks, I'll check it out.
> > I have a use case where we want to use dbt as data middling tool .
> > Will it take dbt queries and create the resulting model ?
> > I see it supports Trino , so I am guessing yes .
> >
> > I will love to contribute to it as well.
> >
> >
> > Thanks
> > Deepak
> >
> > On Sat, 26 Mar 2022 at 1:24 PM, Cheng Pan <pan3...@gmail.com> wrote:
> >>
> >> Spark SQL can indeed take over your Hive workloads, and if you're
> >> looking for an open source solution, Apache Kyuubi(Incubating)[1]
> >> might help.
> >>
> >> [1] https://kyuubi.apache.org/
> >>
> >> Thanks,
> >> Cheng Pan
> >>
> >> On Sat, Mar 26, 2022 at 11:45 AM Deepak Sharma <deepakmc...@gmail.com> 
> >> wrote:
> >> >
> >> > It can be used as warehouse but then you have to keep long running spark 
> >> > jobs.
> >> > This can be possible using cached data frames or dataset .
> >> >
> >> > Thanks
> >> > Deepak
> >> >
> >> > On Sat, 26 Mar 2022 at 5:56 AM, <capitnfrak...@free.fr> wrote:
> >> >>
> >> >> In the past time we have been using hive for building the data
> >> >> warehouse.
> >> >> Do you think if spark can used for this purpose? it's even more realtime
> >> >> than hive.
> >> >>
> >> >> Thanks.
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >> >>
> >> > --
> >> > Thanks
> >> > Deepak
> >> > www.bigdatabig.com
> >> > www.keosha.net
> >
> > --
> > Thanks
> > Deepak
> > www.bigdatabig.com
> > www.keosha.net

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: spark as data warehouse?

Reply via email to