Re: [DISCUSS] Some ideas for Drill 1.21

Ted Dunning Wed, 09 Feb 2022 21:14:52 -0800

The planning time has been extensively analyzed.

It is inherent in a Volcano-style cost-based optimizer. This is a
branch-and-bound search of an exponential design space.


This bottleneck is very well understood.

Further, it has been accelerated under specialized conditions. As part of
OJAI, there was a limited form of Drill that was included that could work
on specific kinds of tables built into MapR FS. With some rather severe
truncations of the space that the optimizer had to search, the planning
time could be reduced to tens of milliseconds. That was fine for a limited
mission, but some of the really dramatic benefits of Drill on large
queries across complex domains would be impossible with that truncated rule
set.



On Wed, Feb 9, 2022 at 7:06 PM Paul Rogers <par0...@gmail.com> wrote:

> Hi All,
>
> Would be great to understand the source of the slow planning. Back in the
> day, I recall colleagues trying all kinds of things to speed up planning,
> but without the time to really figure out where the time went.
>
> I wonder if the two points are related. If most of that planning time is
> spent waiting for a plugin metadata, then James' & Charles' issue could
> possibly be the cause of the slowness that Ted saw.
>
> James, it is still not clear what plugin metadata is being retrieved, and
> when. Now, it is hard to figure that out; that code is complex. Ideally, if
> you have a dozen plugins enabled, but query only one, then only that one
> should be doing anything. Further, if you're using an external system (like
> JDBC), the plugin should query the remote system tables only for the
> table(s) you hit in your query. If the code asks ALL plugins for
> information, or grabs all tables from the remote system, they, yeah, it's
> going to be slow.
>
> Adding per-plugin caching might make sense. For JDBC, say, it is not likely
> that the schema of the remote DB changes between queries, so caching for
> some amount of time is probably fine. And, if a query asks for an unknown
> column, the plugin could refresh metadata to see if the column was just
> added. (I was told that Impala users constantly had to run REFRESH METADATA
> to pick up new files added to HDFS.)
>
> For the classic, original use case (Parquet or CSV files on an HDFS-like
> system), the problem was the need to scan the directory structure at plan
> time to figure out which files to scan at run time. For Parquet, the
> planner also wants to do Parquet row group pruning, which requires reading
> the header of every one of the target files. Since this was slow, Drill
> would create a quick & dirty cache, but with large numbers of files, even
> reading that cache was slow (and, Drill would rebuild it any time a
> directory changed, which greatly slowed planning.)
>
> For that classic use case, saved plans never seemed a win because the
> "shape" of the query heavily depended on the WHERE clause: one clause might
> hit a small set of files, another hit a large set, and that then throws off
> join planning, hash/broadcast exchange decisions and so on.
>
> So, back to the suggestion to start with understanding where the time goes.
> Any silly stuff we can just stop doing? Is the cost due to external
> factors, such as those cited above? Or, is Calcite itself just heavy
> weight? Calcite is a rules engine. Add more rules or more nodes in the DAG,
> and the cost of planning rises steeply. So, are we fiddling about too much
> in the planning process?
>
> One way to test: use a mock data source and plan-time components to
> eliminate all external factors. Time various query shapes using EXPLAIN.
> How long does Calcite take? If a long time, then we've got a rather
> difficult problem as Calcite is hard to fix/replace.
>
> Then, time the plugins of interest. Figure out how to optimize those.
>
> My guess is that the bottleneck won't turn out to be what we think it is.
> It usually isn't.
>
> - Paul
>
> On Tue, Feb 8, 2022 at 8:19 AM Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> > James, you make some good points.
> >
> > I would generally support what you say except for one special case. I
> think
> > that there is a case to be made to be able to cache query plans in some
> > fashion.
> >
> > The traditional approach to do this is to use "prepared queries" by which
> > the application signals that it is willing to trust that a query plan
> will
> > continue to be correct for the duration of its execution. My experience
> > (and I think the industry's as well) is that the query plan is more
> stable
> > than the underlying details of the metadata and this level of caching (or
> > more) is a very good idea.
> >
> > In particular, the benefit to Drill is that we have a very expensive
> query
> > planning phase (I have seen numbers in the range 200-800ms routinely)
> but I
> > have seen execution times that are as low as a few 10's of ms. This
> > imbalance severely compromises the rate of concurrent querying for fast
> > queries. Having some form of plan caching would allow this planning
> > overhead to drop to zero in select cases.
> >
> > I have been unable to even consider working on this problem, but it seems
> > that one interesting heuristic would be based on two factors:
> > - the ratio of execution time to planning time
> > The rationale is that if a query takes much longer to run than to plan,
> we
> > might as well do planning each time. Conversely, if a query takes much
> less
> > time to run than it takes to plan, it is very important to avoid that
> > planning time.
> >
> > - the degree to which recent execution times seem inconsistent with
> longer
> > history
> > The rationale here is that a persistent drop in performance for a query
> is
> > a strong indicator that any cached plan is no longer valid and should be
> > updated. Conversely, if recent query history is consistent with long-term
> > history, that is a vote of confidence for the plan. Furthermore,
> depending
> > on how this is implemented the chance of a false positive change
> detection
> > may increase with very long histories which gives a desirable side effect
> > of occasional replanning.
> >
> > What do other people think about this?
> >
> >
> >
> >
> > On Tue, Feb 8, 2022 at 4:48 AM James Turton <dz...@apache.org> wrote:
> >
> > > My 2c while I wait for 1.20.0 RC1 to upload.
> > >
> > > I think it's good that we continue to bring every design decision out
> > > here to the community like Charles did with this one.  Some relevant
> > > excerpts I turned up while zipping around a few ASF docs now.
> > >
> > >     "Mailing lists are the virtual rooms where ASF communities live,
> > >     form and grow. All formal decisions the project's PMC makes need to
> > >     have an email thread (possibly with a recorded vote) as an audit
> > >     trail that this was an official decision." [1]
> > >
> > >     "We firmly believe in hats
> > >     <https://www.apache.org/foundation/how-it-works.html#hats>. Your
> > >     role at the ASF is one assigned to you personally, and is bestowed
> > >     on you by your peers. It is not tied to your job or current
> employer
> > >     or company." [2]
> > >
> > >     "Unless they specifically state otherwise, whatever an ASF
> > >     participant posts on any mailing list is done /as themselves/. It
> is
> > >     the individual point-of-view, wearing their personal hat and not as
> > >     a mouthpiece for whatever company happens to be signing their
> > >     paychecks right now, and not even as a director of the ASF." [2]
> > >
> > >
> > > Info schema.  Info schema is slow when the set of enabled storage
> > > plugins is slow to register schemas.  Flaky plugins can be so slow to
> do
> > > this as to make info schema appear broken.  Info schema recently had
> its
> > > filter push down improved so that unneeded schema registration is
> > > avoidable [3], and I tested it working in the case of an unreachable
> > > active PostgreSQL plugin (provided my WHERE clause excluded said pg).
> > >
> > > In my opinion making today's "on-demand" info schema, which re-fetches
> > > schema metadata from sources whenever a query requests it, more
> > > efficient is the right place to start.  Rewriting it on EVF2 would, I
> > > understand, gain it limit push down support for free, though filter
> push
> > > down seems more likely to be helpful on this kind of data to me.  There
> > > is also no reason I can see for info schema not to fetch schema
> metadata
> > > from plugins concurrently.  I don't know if this would be best achieved
> > > by explicit programming of the concurrency, or by making the info
> schema
> > > look "splittable" to Drill so that multiple fragments get created.
> > >
> > > Lastly, I'm generally against introducing any sort of results caching,
> > > data or metadata, except in special circumstances such as when the
> > > planner can be certain that the underlying data has not changed (seldom
> > > or never the case for Drill because it doesn't control its own storage
> > > layer).  I think that databases, reliable ones anyway, tend to shun
> > > results caching and push it to the application layer, since only that
> > > layer can decide what kind of staleness is acceptable, but correct me
> if
> > > I'm wrong.  My conclusion here is that I'd rather do this last, and
> only
> > > after careful consideration.
> > >
> > > [1] https://infra.apache.org/mailing-list-moderation.html
> > > [2] https://www.apache.org/foundation/how-it-works.html#management
> > > [3] https://github.com/apache/drill/pull/2388
> > >
> > > On 2022/02/07 21:05, Ted Dunning wrote:
> > > > Another option is to store metadata as data in a distributed data
> > store.
> > > > For static resources, that can scale very well. For highly dynamic
> > > > resources like conventional databases behind JDBC connections, you
> can
> > > > generally delegate metadata to that layer. Performance for delegated
> > > > metadata won't necessarily be great, but those systems are usually
> > either
> > > > small (like Postgress or mySQL) or fading away (like Hive).
> > > >
> > > > Focusing metadata and planning to a single node will make query
> > > concurrency
> > > > much worse (and it's already not good).
> > > >
> > > >
> > > > On Sun, Feb 6, 2022 at 6:28 PM Paul Rogers<par0...@gmail.com>
> wrote:
> > > >
> > > >> Hi All,
> > > >>
> > > >> Drill, like all open source projects, exists to serve those that use
> > > it. To
> > > >> that end, the best contributions come when some company needs a
> > feature
> > > >> badly enough that it is worth the effort to develop and contribute a
> > > >> solution. That's pretty standard, as along as the contribution is
> > > general
> > > >> purpose. In fact, I hope everyone using Drill in support of their
> > > company
> > > >> will contribute enhancements back to Drill. If you maintain your own
> > > >> private fork, you're not helping the community that provided you
> with
> > > the
> > > >> bulk of the code.
> > > >>
> > > >> For the info schema, I'm at a loss to guess why this would be slow,
> > > unless
> > > >> every plugin is going off and scanning some external source. Knowing
> > > that
> > > >> we have a dozen plugins is not slow. Looking at plugin configs is
> not
> > > slow.
> > > >> What could be slow is if you want to know about every possible file
> in
> > > HDFS
> > > >> or S3, every database and table in an external DB, etc. In this
> case,
> > > the
> > > >> bottleneck is either the external system, or the act of querying a
> > dozen
> > > >> different external systems. Perhap, Charles, you can elaborate on
> the
> > > >> specific scenario you have in mind.
> > > >>
> > > >> Depending on the core issues, there are various solutions. One
> > solution
> > > is
> > > >> to cache all the external metadata in Drill. That's what Impala did
> > with
> > > >> the Hive Metastore, and it was a mess. I don't expect Drill would do
> > any
> > > >> better a job. One reason it was a mess is that, in a production
> > system,
> > > >> there is a vast amount of metadata. You end up playing all manner of
> > > tricks
> > > >> to try to compress it. Since Drill (and Impala) are fully symmetric,
> > > each
> > > >> node has to hold the entire cache. That is memory that can't be used
> > to
> > > run
> > > >> queries. So, to gain performance (for metadata) you give up
> > performance
> > > (at
> > > >> run time.)
> > > >>
> > > >> One solution is to create a separate metadata cache node. The query
> > > goes to
> > > >> some Drillbit that acts as Foreman. The Foreman plans the query and
> > > >> retrieves the needed metadata from the metadata node. The challenge
> > > here is
> > > >> that there will be a large amount of metadata transferred, and the
> > next
> > > >> thing we know we'll want to cache it in each Drillbit, putting us
> back
> > > >> where we started.
> > > >>
> > > >> So, one can go another step: shift all query planning to the
> metadata
> > > node
> > > >> and have a single planner node. The user connects to any Drillbit as
> > > >> Foreman, but that Foreman first talks to the "planner/metadata" node
> > to
> > > >> give it SQL and get back a plan. The Foreman then runs the plan as
> > > usual.
> > > >> (The Foreman runs the root fragment of the plan, which can be
> compute
> > > >> intensive, so we don't want the planner node to also act as the
> > > Foreman.)
> > > >> The notion here is that the SQL in/plan out is much smaller than the
> > > >> metadata that is needed to compute the plan.
> > > >>
> > > >> The idea about metadata has long been that Drill should provide a
> > > metadata
> > > >> API. The Drill metastore should be seen as just one of many metadata
> > > >> implementations. The Drill metastore is a "starter solution" for
> those
> > > who
> > > >> have not already invested in another solution. (Many shops have HMS
> or
> > > >> Amazon Glue, which is Amazon's version of HMS, or one of the newer
> > > >> metadata/catalog solutions.)
> > > >>
> > > >> One can go even further. Consider file and directory pruning in HMS.
> > > Every
> > > >> tool has to do the exact same thing: given a set of predicates, find
> > the
> > > >> directories and files that match. Impala does it. Spark must do it.
> > > >> Preso/Trino probably does it. Drill, when operating in Hive/HMS mode
> > > must
> > > >> do it. Maybe someone has come with the One True Metadata Pruner and
> > > Drill
> > > >> can just delegate the task to that external tool, and get back the
> > list
> > > of
> > > >> directories and files to scan. Far better than building yet another
> > > pruner.
> > > >> (I think Drill currently has two Parquet metadata pruners,
> duplicating
> > > what
> > > >> many other tools have done.)
> > > >>
> > > >> If we see the source of metadata as plugable, then a shop such as
> DDR
> > > that
> > > >> has specific needs (maybe caching those external schemas), can
> build a
> > > >> metadata plugin for that use case. If the solution is general, it
> can
> > be
> > > >> contributed to Drill as another metadata option.
> > > >>
> > > >> In any case, if we can better understand the specific problem you
> are
> > > >> encountering, we can perhaps offer more specific suggestions.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> - Paul
> > > >>
> > > >> On Sun, Feb 6, 2022 at 8:11 AM Charles Givre<cgi...@gmail.com>
> > wrote:
> > > >>
> > > >>> Hi Luoc,
> > > >>> Thanks for your concern.  Apache projects are often backed
> > unofficially
> > > >> by
> > > >>> a company.  Drill was, for years, backed my MapR as evident by all
> > the
> > > >> MapR
> > > >>> unique code that is still in the Drill codebase. However, since
> > MapR's
> > > >>> acquisition, I think it is safe to say that Drill really has
> become a
> > > >>> community-driven project.  While some of the committers are
> > colleagues
> > > of
> > > >>> mine at DataDistillr, and Drill is a core part of DataDisitllr,
> from
> > > our
> > > >>> perspective, we've really just been focusing on making Drill better
> > for
> > > >>> everyone as well as building the community of Drill users,
> regardless
> > > of
> > > >>> whether they use DataDistillr or not.  We haven't rejected any PRs
> > > >> because
> > > >>> they go against our business model or tried to steer Drill against
> > the
> > > >>> community or anything like that.
> > > >>>
> > > >>> Just for your awareness, there are other OSS projects, including
> some
> > > >>> Apache projects where one company controls everything.  Outside
> > > >>> contributions are only accepted if they fit the company's roadmap,
> > and
> > > >>> there is no real community-building that happens.  From my
> > perspective,
> > > >>> that is not what I want from Drill.  My personal goal is to build
> an
> > > >> active
> > > >>> community of users and developers around an awesome tool.
> > > >>>
> > > >>> I hope this answers your concerns.
> > > >>> Best,
> > > >>> -- C
> > > >>>
> > > >>>
> > > >>>> On Feb 6, 2022, at 9:42 AM, luoc<l...@apache.org>  wrote:
> > > >>>>
> > > >>>>
> > > >>>> Before we discuss the next release, I would like to explain that
> > > Apache
> > > >>> project should not be directly linked to a commercial company,
> > > otherwise
> > > >>> this will affect the motivation of the community to contribute.
> > > >>>> Thanks.
> > > >>>>
> > > >>>>> On Feb 6, 2022, at 21:29, Charles Givre<cgi...@gmail.com>
> wrote:
> > > >>>>>
> > > >>>>> Hello all,
> > > >>>>> Firstly, I wanted to thank everyone for all the work that has
> gone
> > > >> into
> > > >>> Drill 1.20 as well as the ongoing discussion around Drill 2.0.   I
> > > wanted
> > > >>> to start a discussion around topic for Drill 1.21 and that is
> > > INFO_SCHEMA
> > > >>> improvements.  As my company wades further and further into Drill,
> it
> > > has
> > > >>> become apparent that the INFO_SCHEMA could use some attention.
> James
> > > >>> Turton submitted a PR which was merged into Drill 1.20, but in so
> > doing
> > > >> he
> > > >>> uncovered an entire Pandora's box of other issues which might be
> > worth
> > > >>> addressing.  In a nutshell, the issues with the INFO_SCHEMA are all
> > > >>> performance related: it can be very slow and also can consume
> > > significant
> > > >>> resources when executing even basic queries.
> > > >>>>> My understanding of how the info schema (IS) works is that when a
> > > user
> > > >>> executes a query, Drill will attempt to instantiate every enabled
> > > storage
> > > >>> plugin to discover schemata and other information. As you might
> > > imagine,
> > > >>> this can be costly.
> > > >>>>> So, (and again, this is only meant as a conversation starter), I
> > was
> > > >>> thinking there are some general ideas as to how we might improve
> the
> > > IS:
> > > >>>>> 1.  Implement a limit pushdown:  As far as I can tell, there is
> no
> > > >>> limit pushdown in the IS and this could be a relatively quick win
> for
> > > >>> improving IS query performance.
> > > >>>>> 2.  Caching:  I understand that caching is tricky, but perhaps we
> > > >> could
> > > >>> add some sort of schema caching for IS queries, or make better use
> of
> > > the
> > > >>> Drill metastore to reduce the number of connections during IS
> > queries.
> > > >>> Perhaps in combination with the metastore, we could implement some
> > sort
> > > >> of
> > > >>> "metastore first" plan, whereby Drill first hits the metastore for
> > > query
> > > >>> results and if the limit is reached, we're done.  If not, query the
> > > >> storage
> > > >>> plugins...
> > > >>>>> 3.  Parallelization:  It did not appear to me that Drill
> > parallelizes
> > > >>> IS queries.   We may be able to add some parallelization which
> would
> > > >>> improve overall speed, but not necessarily reduce overall compute
> > cost
> > > >>>>> 4.  Convert to EVF2:  Not sure that there's a performance benefit
> > > >> here,
> > > >>> but at least we could get rid of cruft
> > > >>>>> 5.  Reduce SeDe:   I imagine there was a good reason for doing
> > this,
> > > >>> but the IS seems to obtain a POJO from the storage plugin then
> write
> > > >> these
> > > >>> results to old-school Drill vectors.  I'm sure there was a reason
> it
> > > was
> > > >>> done this way, (or maybe not) but I have to wonder if there is a
> more
> > > >>> efficient way of obtaining the information from the storage plugin,
> > > >> ideally
> > > >>> w/o all the object creation.
> > > >>>>> These are just some thoughts, and I'm curious as to what the
> > > community
> > > >>> thinks about this.  Thanks everyone!
> > > >>>>> -- C
> > > >>>
> > >
> >
>

Re: [DISCUSS] Some ideas for Drill 1.21

Reply via email to