[ 
https://issues.apache.org/jira/browse/IMPALA-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831072#comment-17831072
 ] 

Michael Smith commented on IMPALA-12540:
----------------------------------------

The approach I've proposed is to create a virtual System Table - named 
{{sys.impala_query_live}} - backed by SystemTableScanNodes that get scheduled 
on all coordinators in a cluster (ignoring limitations related to executor 
groups and dedicated coordinators). There are a couple consequences of this to 
consider:
* Impala queries currently fail if an executor shuts down or disappears during 
execution. For SystemTableScanNode queries, that now includes coordinators; 
taking coordinators offline if they're not running queries hasn't previously 
been a concern, but could now fail a query against impala_query_live. I was not 
able to get {{RETRY_FAILED_QUERIES}} to succeed on a retried query when a 
coordinator restarts during the run.
* Scheduling on coordinators is restricted to fragments containing 
SystemTableScanNodes. However those tables could still be part of a union, or 
end up on the probe side of a join and thus do some aggregation; that can 
result in significant work being done on a coordinator. This can be mitigated 
somewhat with {{SELECT STRAIGHT_JOIN}} and considering what tables you 
{{UNION}} with it, but can not be completely addressed.

An alternative approach could be to model it as a DataSource table, schedule it 
on one or more executors, and have them query coordinators via an out-of-band 
mechanism (like their HTTP interface). However that could have some additional 
issues around serialization performance and authentication that haven't fully 
been explored.

I think it's worth follow-up work to allow System Table queries to be 
incomplete: failure of a particular fragment instance results in a warning, but 
does not fail the query.

We should also review why {{RETRY_FAILED_QUERIES}} doesn't work in this case, 
and if there are ways we can improve it.

> SQL Interface to Running Queries/DDLs/DMLs
> ------------------------------------------
>
>                 Key: IMPALA-12540
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12540
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: be, fe
>            Reporter: Jason Fehr
>            Assignee: Michael Smith
>            Priority: Major
>              Labels: features
>
> Provide a SQL interface that will show all currently running queries across 
> all coordinators.
> The results will have a subset of the data available in the query history 
> table.  It will be the subset of data that is available for a running query 
> (e.g. end time will not be available since that is not determined until the 
> query completes).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to