[jira] [Commented] (HIVE-28019) Fix query type information in proto files for load and explain queries
[ https://issues.apache.org/jira/browse/HIVE-28019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838616#comment-17838616 ] Stamatis Zampetakis commented on HIVE-28019: Thanks for elaborating more and sharing your perspective [~rameshkumar]. The PREHOOK and POSTHOOK messages are emitted by the PreExecutePrinter and PostExecutePrinter hooks and by looking at the git history I get the impression that these were added only for testing purposes. I guess this hooks are not (and were not meant to be) used by end users. In principle, we could do any kind of change there that could facilitate testing without worrying too much about backwards compatibility. The {{commandType}} became part of the pre/post output in HIVE-854 but the respective JIRA and history do not have enough info on why this was done. Given that {{commandType}} was added in {{SessionState}} as part of HIVE-854, I assume that the intention of printing it in qtests was to verify that it is set correctly, and it reflects the authorization implications of each statement. In that sense, if we now change the type there to be EXPLAIN then in terms of testing we lose the information of what authorization mode applies to the statement under test. Moreover, in terms of testing the {{PREHOOK: query:}} entry shows clearly if we are dealing with an EXPLAIN query (or not) so from my perspective changing also the type to EXPLAIN will lead to reduced test coverage so the proposed changed does not look as an improvement. The same reasoning more or less applies to the HiveProtoLoggingHook but this class is something that is meant to be used in production so changes here will have more impact to the end-users. In both cases, I would suggest to keep the type field in the output unchanged. If we need to distinguish if a query is an EXPLAIN this could be done by an additional field but I also feel that this is not strongly required since it can be easily inferred via a regular expression in the output. As I wrote previously, the appropriate content for the type field in the hooks is a bit subjective so the best way to move forward is to gather opinions from more people. I am perfectly fine to follow the majority if others feels that the query type in the aforementioned hooks should change. > Fix query type information in proto files for load and explain queries > -- > > Key: HIVE-28019 > URL: https://issues.apache.org/jira/browse/HIVE-28019 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > > Certain query types like LOAD, export, import and explain queries did not > produce the right Hive operation type -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28019) Fix query type information in proto files for load and explain queries
[ https://issues.apache.org/jira/browse/HIVE-28019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837401#comment-17837401 ] Ramesh Kumar Thangarajan commented on HIVE-28019: - Hi [~zabetak] First of all, thank you very much for the review on this. :) I am with you on the fact that HiveOperation was introduced for authorization and may be we should not change it to represent the query type. But I still believe we should do the change for PREHOOK: type: and POSTHOOK: type: and also the HiveProtoLoggingHook. I feel that the change to HiveOperation.Explain for the explain queries is needed mostly because we use the HiveOperation to print in the preexecute and postexecute actions. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/PreExecutePrinter.java#L69] At present we report the type information for the queries in preexec and postexec as below: PREHOOK: type: QUERY POSTHOOK: type: QUERY I think this is the query type information that is reported along with other information on the query. If that is the case I feel we should not report other type for explain queries. If this change is loss of information shouldn't the usage of type wrong by the users? Although we can skip this and fix only the HiveProtoLoggingHook to address right query type, I feel we will report two different information for the same query in different places. Also keeping them synchronized will help us in the complete testing for all types of queries. Please let me know if you think my points make sense. I will address to not touch the commandType and rather create a field to represent explain queries and use that to report the correct query type in HiveProtoLoggingHook and the PREHOOK: type: and POSTHOOK: type. > Fix query type information in proto files for load and explain queries > -- > > Key: HIVE-28019 > URL: https://issues.apache.org/jira/browse/HIVE-28019 > Project: Hive > Issue Type: Task > Components: HiveServer2 >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > > Certain query types like LOAD, export, import and explain queries did not > produce the right Hive operation type -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28019) Fix query type information in proto files for load and explain queries
[ https://issues.apache.org/jira/browse/HIVE-28019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834211#comment-17834211 ] Stamatis Zampetakis commented on HIVE-28019: The Jira summary and description claims that something is broken/buggy in Hive. However, after looking at the proposed fix and the changes caused by the PR I get the impression that "broken/buggy" can be a bit subjective. The current proposal claims that if the query is an EXPLAIN statement then it should be tagged as {{HiveOperation.EXPLAIN}}. Without much context this seems like an natural fit but we should take into account that a large number of Hive operations can be used in conjunction with EXPLAIN. I did some post-processing on the .q.out changes introduced by PR#5022 to gauge the impact on existing applications/users. {noformat} git diff --word-diff -U0 d73aef569f72d2bd8ecbf8b708e3d55d59c6615d..HEAD | grep "HOOK: type" | sed 's/.\+HOOK: type: //' | sort | uniq -c 2 [-ABORT TRANSACTIONS-]{+EXPLAIN+} 2 [-ALTERDATABASE-]{+EXPLAIN+} 4 [-ALTERDATABASE_LOCATION-]{+EXPLAIN+} 2 [-ALTERDATABASE_OWNER-]{+EXPLAIN+} 6 [-ALTER MAPPING-]{+EXPLAIN+} 142 [-ALTER_MATERIALIZED_VIEW_REBUILD-]{+EXPLAIN+} 12 [-ALTER_MATERIALIZED_VIEW_REWRITE-]{+EXPLAIN+} 8 [-ALTER_PARTITION_MERGE-]{+EXPLAIN+} 10 [-ALTER POOL-]{+EXPLAIN+} 26 [-ALTER RESOURCEPLAN-]{+EXPLAIN+} 2 [-ALTERTABLE_ADDCOLS-]{+EXPLAIN+} 2 [-ALTERTABLE_ADDCONSTRAINT-]{+EXPLAIN+} 12 [-ALTERTABLE_ADDPARTS-]{+EXPLAIN+} 2 [-ALTERTABLE_ARCHIVE-]{+EXPLAIN+} 2 [-ALTERTABLE_BUCKETNUM-]{+EXPLAIN+} 6 [-ALTERTABLE_CLUSTER_SORT-]{+EXPLAIN+} 20 [-ALTERTABLE_COMPACT-]{+EXPLAIN+} 20 [-ALTERTABLE_CONVERT-]{+EXPLAIN+} 12 [-ALTERTABLE_CREATEBRANCH-]{+EXPLAIN+} 4 [-ALTERTABLE_CREATETAG-]{+EXPLAIN+} 4 [-ALTERTABLE_DROPBRANCH-]{+EXPLAIN+} 2 [-ALTERTABLE_DROPCONSTRAINT-]{+EXPLAIN+} 12 [-ALTERTABLE_DROPPARTS-]{+EXPLAIN+} 4 [-ALTERTABLE_DROPTAG-]{+EXPLAIN+} 10 [-ALTERTABLE_EXCHANGEPARTITION-]{+EXPLAIN+} 8 [-ALTERTABLE_EXECUTE-]{+EXPLAIN+} 2 [-ALTERTABLE_FILEFORMAT-]{+EXPLAIN+} 2 [-ALTERTABLE_LOCATION-]{+EXPLAIN+} 2 [-ALTER_TABLE_MERGE-]{+EXPLAIN+} 2 [-ALTERTABLE_OWNER-]{+EXPLAIN+} 4 [-ALTERTABLE_PARTCOLTYPE-]{+EXPLAIN+} 4 [-ALTERTABLE_PROPERTIES-]{+EXPLAIN+} 8 [-ALTERTABLE_RENAMECOL-]{+EXPLAIN+} 14 [-ALTERTABLE_RENAME-]{+EXPLAIN+} 4 [-ALTERTABLE_RENAMEPART-]{+EXPLAIN+} 2 [-ALTERTABLE_REPLACECOLS-]{+EXPLAIN+} 4 [-ALTERTABLE_SERDEPROPERTIES-]{+EXPLAIN+} 2 [-ALTERTABLE_SERIALIZER-]{+EXPLAIN+} 4 [-ALTERTABLE_SKEWED-]{+EXPLAIN+} 4 [-ALTERTABLE_TOUCH-]{+EXPLAIN+} 2 [-ALTERTABLE_UNARCHIVE-]{+EXPLAIN+} 4 [-ALTERTABLE_UPDATECOLUMNS-]{+EXPLAIN+} 2 [-ALTERTBLPART_SKEWED_LOCATION-]{+EXPLAIN+} 6 [-ALTER TRIGGER-]{+EXPLAIN+} 90 [-ANALYZE_TABLE-]{+EXPLAIN+} 8 [-CREATEDATABASE-]{+EXPLAIN+} 18 [-CREATEFUNCTION-]{+EXPLAIN+} 4 [-CREATEMACRO-]{+EXPLAIN+} 8 [-CREATE MAPPING-]{+EXPLAIN+} 34 [-CREATE_MATERIALIZED_VIEW-]{+EXPLAIN+} 6 [-CREATE POOL-]{+EXPLAIN+} 6 [-CREATE RESOURCEPLAN-]{+EXPLAIN+} 4 [-CREATEROLE-]{+EXPLAIN+} 136 [-CREATETABLE_AS_SELECT-]{+EXPLAIN+} 60 [-CREATETABLE-]{+EXPLAIN+} 6 [-CREATE TRIGGER-]{+EXPLAIN+} 30 [-CREATEVIEW-]{+EXPLAIN+} 10 [-DESCDATABASE-]{+EXPLAIN+} 4 [-DESCFUNCTION-]{+EXPLAIN+} 8 [-DESCTABLE-]{+EXPLAIN+} 2 [-DROPDATABASE-]{+EXPLAIN+} 6 [-DROPFUNCTION-]{+EXPLAIN+} 4 [-DROPMACRO-]{+EXPLAIN+} 8 [-DROP MAPPING-]{+EXPLAIN+} 6 [-DROP POOL-]{+EXPLAIN+} 6 [-DROP RESOURCEPLAN-]{+EXPLAIN+} 4 [-DROPROLE-]{+EXPLAIN+} 8 [-DROPTABLE-]{+EXPLAIN+} 6 [-DROP TRIGGER-]{+EXPLAIN+} 4 [-DROPVIEW-]{+EXPLAIN+} 28 [-EXECUTE QUERY-]{+EXPLAIN+} 8 [-GRANT_PRIVILEGE-]{+EXPLAIN+} 4 [-GRANT_ROLE-]{+EXPLAIN+} 4 [-KILL QUERY-]{+EXPLAIN+} 20 [-LOAD-]{+EXPLAIN+} 22 [-MSCK-]{+EXPLAIN+} 16626 [-QUERY-]{+EXPLAIN+} 40 [-QUERY-]{+LOAD+} 4 [-RELOADFUNCTION-]{+EXPLAIN+} 8 [-REVOKE_PRIVILEGE-]{+EXPLAIN+} 12 [-SHOWCOLUMNS-]{+EXPLAIN+} 2 [-SHOW COMPACTIONS-]{+EXPLAIN+} 2 [-SHOWCONF-]{+EXPLAIN+} 2 [-SHOW_CREATEDATABASE-]{+EXPLAIN+} 2 [-SHOWDATABASES-]{+EXPLAIN+} 6 [-SHOWFUNCTIONS-]{+EXPLAIN+} 8 [-SHOW_GRANT-]{+EXPLAIN+} 4 [-SHOWLOCKS-]{+EXPLAIN+} 20 [-SHOWMATERIALIZEDVIEWS-]{+EXPLAIN+} 28 [-SHOWPARTITIONS-]{+EXPLAIN+} 12 [-SHOW RESOURCEPLAN-]{+EXPLAIN+} 4 [-SHOW_ROLE_GRANT-]{+EXPLAIN+} 24 [-SHOWTABLES-]{+EXPLAIN+} 2 [-SHOW_TABLESTATUS-]{+EXPLAIN+} 2 [-SHOW TRANSACTIONS-]{+EXPLAIN+} 10 [-SHOWVIEWS-]{+EXPLAIN+} 14 [-SWITCHDATABASE-]{+EXPLAIN+} 30 [-TRUNCATETABLE-]{+EXPLAIN+} {noformat} There are lots of SQL