Re: Druid + Presto?

Gian Merlino Thu, 09 Jul 2020 23:48:33 -0700

One other thing I'm wondering is how similar are the two forks of Presto?
Are patches generally being shared between them or are they going off in
different directions? One example: as I understand it, aggregate pushdown
support was added to the core of both forks relatively recently — within
the last year or so — does it work the same way in each one? I'm wondering
how much work can be shared between these different efforts and perhaps
between these efforts and the Druid project itself.


On Thu, Jul 9, 2020 at 11:24 PM Gian Merlino <[email protected]> wrote:

> Hey Samarth,
>
> Thanks for sharing these details.
>
> In the overall warehouse + Druid setup you're envisioning, would Druid be
> the main way of querying the tables that it stores? Or would they all be
> synced periodically from the warehouse into Druid, using the warehouse as a
> source of truth? I'm asking since I'm wondering how important it is to
> think about functionality that might help load datasources based on tables
> that are in the Presto metastore.
>
> >  You bring up an interesting idea on the reverse connector. What do you
> think the value of such a connector will be? I am assuming Druid SQL for
> the most part is ANSI SQL.
>
> Druid SQL is ANSI SQL for the most part but there are two big differences.
> First, it doesn't support everything in ANSI SQL (two examples: it
> currently doesn't support shuffle joins and windowed aggregations). Second,
> it supports some functionality that is not in ANSI SQL (like the TIME_ and
> DS_ operators). So it is smaller in some ways and bigger in other ways. I
> was thinking a reverse translator could let you write a Druid SQL query
> that uses our special operators, but also requires a shuffle join, and then
> translate and execute it as an equivalent Presto SQL query. The idea being
> you can express your query in either dialect and get routed to the right
> place in the end.
>
> On Thu, Jul 9, 2020 at 4:36 PM Samarth Jain <[email protected]> wrote:
>
>> Gian,
>>
>> For the presto-sql version of Druid connector, for V1, we decided to
>> pursue
>> the JDBC route. You can follow along on the progress here -
>> https://github.com/prestosql/presto/issues/1855
>> My colleague, Parth (cc'ed as well) is working on implementing Druid
>> aggregation push down including support for top-n style queries. Our
>> immediate use cases, and what we think Druid
>> generally is more suitable for, is for solving for aggregate group by
>> style
>> queries. Having a presto-druid connector also enables us to join data in
>> Druid with the rest of our warehouse.
>> In general though, for queries that don't do any aggregations i.e. which
>> get translated to Druid SCAN queries, it makes sense to by-pass the Druid
>> datanodes altogether and directly go
>> to the deep storage. I think Druid provides enough metadata about the
>> active segment files to be able to do that relatively easily.
>>
>> You bring up an interesting idea on the reverse connector. What do you
>> think the value of such a connector will be? I am assuming Druid SQL for
>> the most part is ANSI SQL.
>>
>> On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <[email protected]>
>> wrote:
>>
>> > Thank you, Mainak.
>> >
>> > Hi Gian,
>> >
>> > Glad to see you are interested in Presto Druid connector.
>> >
>> > My colleague, @Hao Luo <[email protected]> @Beinan Wang
>> > <[email protected]> and
>> > me, together, implemented the Presto Druid connector in PrestoDB:
>> > https://prestodb.io/docs/current/connector/druid.html
>> >
>> > Our implementation includes:
>> > 1. Presto could scan Druid segments to compute SQL results
>> > 2. aggregation pushdown, where Presto leverages Druid fast aggregation
>> > capabilities, and stream aggregated result from Druid
>> > actually, we implemented 2 execution paths, users could use
>> configurations
>> > to control whether they'd like to scan segments or pushdown all
>> sub-queries
>> > to Druid
>> >
>> > We had run benchmarkings comparing Presto Druid connector with other SQL
>> > engines. And are ready to run production workloads.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> > On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <[email protected]>
>> wrote:
>> >
>> > > Hello Gian,
>> > >
>> > > We are currently testing the (other) Presto Druid connector at our
>> end.
>> > It
>> > > has aggregation push down support. Adding Zhenxiao to this thread
>> since
>> > he
>> > > is the primary developer of the connector. He can provide the kind of
>> > > details you are looking for.
>> > >
>> > > Thanks,
>> > > Mainak
>> > >
>> > > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <[email protected]> wrote:
>> > > >
>> > > > By the way, I see that the other Presto has a Druid connector too:
>> > > > https://prestodb.io/docs/current/connector/druid.html. From the
>> docs
>> > it
>> > > > looks like it has different lineage and might even work differently.
>> > > >
>> > > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <[email protected]>
>> wrote:
>> > > >
>> > > >> I was thinking of exploring ideas like pushing down aggregations,
>> > > enabling
>> > > >> Presto to query directly from deep storage (in cases where there
>> > aren't
>> > > any
>> > > >> interesting things to push down, this may be more efficient than
>> > > querying
>> > > >> Druid servers), enabling translation from Druid's SQL dialect to
>> > > Presto's
>> > > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on
>> > this
>> > > >> list) have any thoughts on any of those?
>> > > >>
>> > > >> I'm also curious what kinds of improvements you're planning to the
>> > > >> connector you built.
>> > > >>
>> > > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <
>> [email protected]>
>> > > >> wrote:
>> > > >>
>> > > >>> Hi Gian,
>> > > >>>
>> > > >>> I contributed the jdbc based presto-druid connector in prestosql
>> > which
>> > > >>> went
>> > > >>> out in release 337
>> > > >>> https://prestosql.io/docs/current/release/release-337.html. The
>> v1
>> > > >>> version
>> > > >>> of the connector doesn’t support aggregate push down yet. It is
>> being
>> > > >>> actively worked on and we expect it to be improved over the next
>> few
>> > > >>> releases. We are currently evaluating using the presto-druid
>> > connector
>> > > in
>> > > >>> our Tableau setup. It would be interesting to see what changes in
>> > Druid
>> > > >>> would be needed to support that integration.
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Samarth
>> > > >>>
>> > > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <[email protected]>
>> > wrote:
>> > > >>>
>> > > >>>> Hey Druids,
>> > > >>>>
>> > > >>>> I was wondering, is anyone on this list using Druid + Presto
>> > together?
>> > > >>> If
>> > > >>>> so, what does your architecture look like and which edition /
>> flavor
>> > > of
>> > > >>>> Presto and Druid connector are you using? What's your experience
>> > been
>> > > >>> like?
>> > > >>>> I'm asking since I'm starting to think about whether it makes
>> sense
>> > to
>> > > >>> look
>> > > >>>> at ways to improve the integration between the two projects.
>> > > >>>>
>> > > >>>> Gian
>> > > >>>>
>> > > >>>
>> > > >>
>> > >
>> > >
>> >
>>
>

Re: Druid + Presto?

Reply via email to