tanishq-chugh opened a new pull request, #6253:
URL: https://github.com/apache/hive/pull/6253

   …sRead metrics for tables with multiple partitions
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: 
https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., 
'[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a 
faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   Fix HiveProtoLoggingHook to ensure no duplicate entries are populated for 
TablesRead metrics in case of tables with multiple partitions
   
   
   ### Why are the changes needed?
   Currently, When a SELECT * query is executed on a table with multiple 
partitions, the TablesRead metric is populated with duplicate entries of the 
same table - one for each partition accessed.
   
   As a result, the TablesReadCount metric also reports an incorrect value.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, in generated proto files. Currently, incorrect value is produced for 
TablesRead metric when a SELECT * is run on a table with multiple partitions. 
For example, when the following queries are run:
   
   ```
   CREATE TABLE tbl_test_part(a int) partitioned by (b int) stored as orc 
tblproperties("transactional"="true");
   INSERT INTO tbl_test_part PARTITION (b=1) VALUES (11);
   INSERT INTO tbl_test_part PARTITION (b=2) VALUES (22);
   INSERT INTO tbl_test_part PARTITION (b=3) VALUES (33);
   
   SELECT * FROM tbl_test_part;
   ```
   
   With the current behaviour, in the proto file generated for the last SELECT 
query, TablesRead metric / list will contain 3 duplicate entries of 
default.tbl_test_part . (Correspondingly, the TablesReadCount metric is 3)
   
   After this fix, it will contain only one entry of default.tbl_test_part . 
(Correspondingly, the TablesReadCount metric is 1)
   
   
   ### How was this patch tested?
   Manually tested
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to