[ https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=706691&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706691 ]
ASF GitHub Bot logged work on HIVE-25842: ----------------------------------------- Author: ASF GitHub Bot Created on: 11/Jan/22 08:51 Start Date: 11/Jan/22 08:51 Worklog Time Spent: 10m Work Description: deniskuzZ commented on pull request #2915: URL: https://github.com/apache/hive/pull/2915#issuecomment-1009723611 Per design doc: ```` Since we do not want to access the FileSystem in a new separate MetricsSystem, this only can be collected at points where we already list the table/partition directory content. One way would be to use Initiator / Cleaner for this purpose, but that won’t be available in DWX for default DBC-s. The best option seems to be the AcidUtils.getAcidState call, which is called by every read query (In TEZ AM). ```` Idea was not to add extra overhead with the metrics activation. Are we addressing these concerns here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 706691) Time Spent: 40m (was: 0.5h) > Reimplement delta file metric collection > ---------------------------------------- > > Key: HIVE-25842 > URL: https://issues.apache.org/jira/browse/HIVE-25842 > Project: Hive > Issue Type: Improvement > Reporter: László Pintér > Assignee: László Pintér > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > FUNCTIONALITY: Metrics are collected only when a Tez query runs a table > (select * and select count( * ) don't update the metrics) > Metrics aren't updated after compaction or cleaning after compaction, so > users will probably see "issues" with compaction (like many active or > obsolete or small deltas) that don't exist. > RISK: Metrics are collected during queries – we tried to put a try-catch > around each method in DeltaFilesMetricsReporter but of course this isn't > foolproof. This is a HUGE performance and functionality liability. Tests > caught some issues, but our tests aren't perfect. -- This message was sent by Atlassian Jira (v8.20.1#820001)