[
https://issues.apache.org/jira/browse/IMPALA-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831072#comment-17831072
]
Michael Smith commented on IMPALA-12540:
----------------------------------------
The approach I've proposed is to create a virtual System Table - named
{{sys.impala_query_live}} - backed by SystemTableScanNodes that get scheduled
on all coordinators in a cluster (ignoring limitations related to executor
groups and dedicated coordinators). There are a couple consequences of this to
consider:
* Impala queries currently fail if an executor shuts down or disappears during
execution. For SystemTableScanNode queries, that now includes coordinators;
taking coordinators offline if they're not running queries hasn't previously
been a concern, but could now fail a query against impala_query_live. I was not
able to get {{RETRY_FAILED_QUERIES}} to succeed on a retried query when a
coordinator restarts during the run.
* Scheduling on coordinators is restricted to fragments containing
SystemTableScanNodes. However those tables could still be part of a union, or
end up on the probe side of a join and thus do some aggregation; that can
result in significant work being done on a coordinator. This can be mitigated
somewhat with {{SELECT STRAIGHT_JOIN}} and considering what tables you
{{UNION}} with it, but can not be completely addressed.
An alternative approach could be to model it as a DataSource table, schedule it
on one or more executors, and have them query coordinators via an out-of-band
mechanism (like their HTTP interface). However that could have some additional
issues around serialization performance and authentication that haven't fully
been explored.
I think it's worth follow-up work to allow System Table queries to be
incomplete: failure of a particular fragment instance results in a warning, but
does not fail the query.
We should also review why {{RETRY_FAILED_QUERIES}} doesn't work in this case,
and if there are ways we can improve it.
> SQL Interface to Running Queries/DDLs/DMLs
> ------------------------------------------
>
> Key: IMPALA-12540
> URL: https://issues.apache.org/jira/browse/IMPALA-12540
> Project: IMPALA
> Issue Type: Improvement
> Components: be, fe
> Reporter: Jason Fehr
> Assignee: Michael Smith
> Priority: Major
> Labels: features
>
> Provide a SQL interface that will show all currently running queries across
> all coordinators.
> The results will have a subset of the data available in the query history
> table. It will be the subset of data that is available for a running query
> (e.g. end time will not be available since that is not determined until the
> query completes).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]