[ https://issues.apache.org/jira/browse/IMPALA-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831072#comment-17831072 ]
Michael Smith commented on IMPALA-12540: ---------------------------------------- The approach I've proposed is to create a virtual System Table - named {{sys.impala_query_live}} - backed by SystemTableScanNodes that get scheduled on all coordinators in a cluster (ignoring limitations related to executor groups and dedicated coordinators). There are a couple consequences of this to consider: * Impala queries currently fail if an executor shuts down or disappears during execution. For SystemTableScanNode queries, that now includes coordinators; taking coordinators offline if they're not running queries hasn't previously been a concern, but could now fail a query against impala_query_live. I was not able to get {{RETRY_FAILED_QUERIES}} to succeed on a retried query when a coordinator restarts during the run. * Scheduling on coordinators is restricted to fragments containing SystemTableScanNodes. However those tables could still be part of a union, or end up on the probe side of a join and thus do some aggregation; that can result in significant work being done on a coordinator. This can be mitigated somewhat with {{SELECT STRAIGHT_JOIN}} and considering what tables you {{UNION}} with it, but can not be completely addressed. An alternative approach could be to model it as a DataSource table, schedule it on one or more executors, and have them query coordinators via an out-of-band mechanism (like their HTTP interface). However that could have some additional issues around serialization performance and authentication that haven't fully been explored. I think it's worth follow-up work to allow System Table queries to be incomplete: failure of a particular fragment instance results in a warning, but does not fail the query. We should also review why {{RETRY_FAILED_QUERIES}} doesn't work in this case, and if there are ways we can improve it. > SQL Interface to Running Queries/DDLs/DMLs > ------------------------------------------ > > Key: IMPALA-12540 > URL: https://issues.apache.org/jira/browse/IMPALA-12540 > Project: IMPALA > Issue Type: Improvement > Components: be, fe > Reporter: Jason Fehr > Assignee: Michael Smith > Priority: Major > Labels: features > > Provide a SQL interface that will show all currently running queries across > all coordinators. > The results will have a subset of the data available in the query history > table. It will be the subset of data that is available for a running query > (e.g. end time will not be available since that is not determined until the > query completes). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org