[jira] [Commented] (HIVE-28019) Fix query type information in proto files for load and explain queries

2024-04-18 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838616#comment-17838616
 ] 

Stamatis Zampetakis commented on HIVE-28019:


Thanks for elaborating more and sharing your perspective [~rameshkumar].

The PREHOOK and POSTHOOK messages are emitted by the PreExecutePrinter and 
PostExecutePrinter hooks and by looking at the git history I get the impression 
that these were added only for testing purposes. I guess this hooks are not 
(and were not meant to be) used by end users. In principle, we could do any 
kind of change there that could facilitate testing without worrying too much 
about backwards compatibility. 

The {{commandType}} became part of the pre/post output in HIVE-854 but the 
respective JIRA and history do not have enough info on why this was done. Given 
that {{commandType}} was added in {{SessionState}} as part of HIVE-854, I 
assume that the intention of printing it in qtests was to verify that it is set 
correctly, and it reflects the authorization implications of each statement. In 
that sense, if we now change the type there to be EXPLAIN then in terms of 
testing we lose the information of what authorization mode applies to the 
statement under test.

Moreover, in terms of testing the {{PREHOOK: query:}} entry shows clearly if we 
are dealing with an EXPLAIN query (or not) so from my perspective changing also 
the type to EXPLAIN will lead to reduced test coverage so the proposed changed 
does not look as an improvement.

The same reasoning more or less applies to the HiveProtoLoggingHook but this 
class is something that is meant to be used in production so changes here will 
have more impact to the end-users.

In both cases, I would suggest to keep the type field in the output unchanged. 
If we need to distinguish if a query is an EXPLAIN this could be done by an 
additional field but I also feel that this is not strongly required since it 
can be easily inferred via a regular expression in the output.

As I wrote previously, the appropriate content for the type field in the hooks 
is a bit subjective so the best way to move forward is to gather opinions from 
more people. I am perfectly fine to follow the majority if others feels that 
the query type in the aforementioned hooks should change.


> Fix query type information in proto files for load and explain queries
> --
>
> Key: HIVE-28019
> URL: https://issues.apache.org/jira/browse/HIVE-28019
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Certain query types like LOAD, export, import and explain queries did not 
> produce the right Hive operation type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28019) Fix query type information in proto files for load and explain queries

2024-04-15 Thread Ramesh Kumar Thangarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837401#comment-17837401
 ] 

Ramesh Kumar Thangarajan commented on HIVE-28019:
-

Hi [~zabetak] First of all, thank you very much for the review on this. :)

I am with you on the fact that HiveOperation was introduced for authorization 
and may be we should not change it to represent the query type. But I still 
believe we should do the change for PREHOOK: type: and POSTHOOK: type: and also 
the HiveProtoLoggingHook.

I feel that the change to HiveOperation.Explain for the explain queries is 
needed mostly because we use the HiveOperation to print in the preexecute and 
postexecute actions.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/PreExecutePrinter.java#L69]

At present we report the type information for the queries in preexec and 
postexec as below:

PREHOOK: type: QUERY

POSTHOOK: type: QUERY

I think this is the query type information that is reported along with other 
information on the query. If that is the case I feel we should not report other 
type for explain queries. If this change is loss of information shouldn't the 
usage of type wrong by the users? 

Although we can skip this and fix only the HiveProtoLoggingHook to address 
right query type, I feel we will report two different information for the same 
query in different places. Also keeping them synchronized will help us in the 
complete testing for all types of queries.

Please let me know if you think my points make sense. I will address to not 
touch the commandType and rather create a field to represent explain queries 
and use that to report the correct query type in HiveProtoLoggingHook and the 
PREHOOK: type: and POSTHOOK: type.

> Fix query type information in proto files for load and explain queries
> --
>
> Key: HIVE-28019
> URL: https://issues.apache.org/jira/browse/HIVE-28019
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Certain query types like LOAD, export, import and explain queries did not 
> produce the right Hive operation type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28019) Fix query type information in proto files for load and explain queries

2024-04-05 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834211#comment-17834211
 ] 

Stamatis Zampetakis commented on HIVE-28019:


The Jira summary and description claims that something is broken/buggy in Hive. 
However, after looking at the proposed fix and the changes caused by the PR I 
get the impression that "broken/buggy" can be a bit subjective. 

The current proposal claims that if the query is an EXPLAIN statement then it 
should be tagged as {{HiveOperation.EXPLAIN}}. Without much context this seems 
like an natural fit but we should take into account that a large number of Hive 
operations can be used in conjunction with EXPLAIN. 

I did some post-processing on the .q.out changes introduced by PR#5022 to gauge 
the impact on existing applications/users.
{noformat}
git diff --word-diff -U0 d73aef569f72d2bd8ecbf8b708e3d55d59c6615d..HEAD | grep 
"HOOK: type" | sed 's/.\+HOOK: type: //' | sort  | uniq -c
  2 [-ABORT TRANSACTIONS-]{+EXPLAIN+}
  2 [-ALTERDATABASE-]{+EXPLAIN+}
  4 [-ALTERDATABASE_LOCATION-]{+EXPLAIN+}
  2 [-ALTERDATABASE_OWNER-]{+EXPLAIN+}
  6 [-ALTER MAPPING-]{+EXPLAIN+}
142 [-ALTER_MATERIALIZED_VIEW_REBUILD-]{+EXPLAIN+}
 12 [-ALTER_MATERIALIZED_VIEW_REWRITE-]{+EXPLAIN+}
  8 [-ALTER_PARTITION_MERGE-]{+EXPLAIN+}
 10 [-ALTER POOL-]{+EXPLAIN+}
 26 [-ALTER RESOURCEPLAN-]{+EXPLAIN+}
  2 [-ALTERTABLE_ADDCOLS-]{+EXPLAIN+}
  2 [-ALTERTABLE_ADDCONSTRAINT-]{+EXPLAIN+}
 12 [-ALTERTABLE_ADDPARTS-]{+EXPLAIN+}
  2 [-ALTERTABLE_ARCHIVE-]{+EXPLAIN+}
  2 [-ALTERTABLE_BUCKETNUM-]{+EXPLAIN+}
  6 [-ALTERTABLE_CLUSTER_SORT-]{+EXPLAIN+}
 20 [-ALTERTABLE_COMPACT-]{+EXPLAIN+}
 20 [-ALTERTABLE_CONVERT-]{+EXPLAIN+}
 12 [-ALTERTABLE_CREATEBRANCH-]{+EXPLAIN+}
  4 [-ALTERTABLE_CREATETAG-]{+EXPLAIN+}
  4 [-ALTERTABLE_DROPBRANCH-]{+EXPLAIN+}
  2 [-ALTERTABLE_DROPCONSTRAINT-]{+EXPLAIN+}
 12 [-ALTERTABLE_DROPPARTS-]{+EXPLAIN+}
  4 [-ALTERTABLE_DROPTAG-]{+EXPLAIN+}
 10 [-ALTERTABLE_EXCHANGEPARTITION-]{+EXPLAIN+}
  8 [-ALTERTABLE_EXECUTE-]{+EXPLAIN+}
  2 [-ALTERTABLE_FILEFORMAT-]{+EXPLAIN+}
  2 [-ALTERTABLE_LOCATION-]{+EXPLAIN+}
  2 [-ALTER_TABLE_MERGE-]{+EXPLAIN+}
  2 [-ALTERTABLE_OWNER-]{+EXPLAIN+}
  4 [-ALTERTABLE_PARTCOLTYPE-]{+EXPLAIN+}
  4 [-ALTERTABLE_PROPERTIES-]{+EXPLAIN+}
  8 [-ALTERTABLE_RENAMECOL-]{+EXPLAIN+}
 14 [-ALTERTABLE_RENAME-]{+EXPLAIN+}
  4 [-ALTERTABLE_RENAMEPART-]{+EXPLAIN+}
  2 [-ALTERTABLE_REPLACECOLS-]{+EXPLAIN+}
  4 [-ALTERTABLE_SERDEPROPERTIES-]{+EXPLAIN+}
  2 [-ALTERTABLE_SERIALIZER-]{+EXPLAIN+}
  4 [-ALTERTABLE_SKEWED-]{+EXPLAIN+}
  4 [-ALTERTABLE_TOUCH-]{+EXPLAIN+}
  2 [-ALTERTABLE_UNARCHIVE-]{+EXPLAIN+}
  4 [-ALTERTABLE_UPDATECOLUMNS-]{+EXPLAIN+}
  2 [-ALTERTBLPART_SKEWED_LOCATION-]{+EXPLAIN+}
  6 [-ALTER TRIGGER-]{+EXPLAIN+}
 90 [-ANALYZE_TABLE-]{+EXPLAIN+}
  8 [-CREATEDATABASE-]{+EXPLAIN+}
 18 [-CREATEFUNCTION-]{+EXPLAIN+}
  4 [-CREATEMACRO-]{+EXPLAIN+}
  8 [-CREATE MAPPING-]{+EXPLAIN+}
 34 [-CREATE_MATERIALIZED_VIEW-]{+EXPLAIN+}
  6 [-CREATE POOL-]{+EXPLAIN+}
  6 [-CREATE RESOURCEPLAN-]{+EXPLAIN+}
  4 [-CREATEROLE-]{+EXPLAIN+}
136 [-CREATETABLE_AS_SELECT-]{+EXPLAIN+}
 60 [-CREATETABLE-]{+EXPLAIN+}
  6 [-CREATE TRIGGER-]{+EXPLAIN+}
 30 [-CREATEVIEW-]{+EXPLAIN+}
 10 [-DESCDATABASE-]{+EXPLAIN+}
  4 [-DESCFUNCTION-]{+EXPLAIN+}
  8 [-DESCTABLE-]{+EXPLAIN+}
  2 [-DROPDATABASE-]{+EXPLAIN+}
  6 [-DROPFUNCTION-]{+EXPLAIN+}
  4 [-DROPMACRO-]{+EXPLAIN+}
  8 [-DROP MAPPING-]{+EXPLAIN+}
  6 [-DROP POOL-]{+EXPLAIN+}
  6 [-DROP RESOURCEPLAN-]{+EXPLAIN+}
  4 [-DROPROLE-]{+EXPLAIN+}
  8 [-DROPTABLE-]{+EXPLAIN+}
  6 [-DROP TRIGGER-]{+EXPLAIN+}
  4 [-DROPVIEW-]{+EXPLAIN+}
 28 [-EXECUTE QUERY-]{+EXPLAIN+}
  8 [-GRANT_PRIVILEGE-]{+EXPLAIN+}
  4 [-GRANT_ROLE-]{+EXPLAIN+}
  4 [-KILL QUERY-]{+EXPLAIN+}
 20 [-LOAD-]{+EXPLAIN+}
 22 [-MSCK-]{+EXPLAIN+}
  16626 [-QUERY-]{+EXPLAIN+}
 40 [-QUERY-]{+LOAD+}
  4 [-RELOADFUNCTION-]{+EXPLAIN+}
  8 [-REVOKE_PRIVILEGE-]{+EXPLAIN+}
 12 [-SHOWCOLUMNS-]{+EXPLAIN+}
  2 [-SHOW COMPACTIONS-]{+EXPLAIN+}
  2 [-SHOWCONF-]{+EXPLAIN+}
  2 [-SHOW_CREATEDATABASE-]{+EXPLAIN+}
  2 [-SHOWDATABASES-]{+EXPLAIN+}
  6 [-SHOWFUNCTIONS-]{+EXPLAIN+}
  8 [-SHOW_GRANT-]{+EXPLAIN+}
  4 [-SHOWLOCKS-]{+EXPLAIN+}
 20 [-SHOWMATERIALIZEDVIEWS-]{+EXPLAIN+}
 28 [-SHOWPARTITIONS-]{+EXPLAIN+}
 12 [-SHOW RESOURCEPLAN-]{+EXPLAIN+}
  4 [-SHOW_ROLE_GRANT-]{+EXPLAIN+}
 24 [-SHOWTABLES-]{+EXPLAIN+}
  2 [-SHOW_TABLESTATUS-]{+EXPLAIN+}
  2 [-SHOW TRANSACTIONS-]{+EXPLAIN+}
 10 [-SHOWVIEWS-]{+EXPLAIN+}
 14 [-SWITCHDATABASE-]{+EXPLAIN+}
 30 [-TRUNCATETABLE-]{+EXPLAIN+}
{noformat}

There are lots of SQL