Mingmin Xu created HIVE-27677:
---------------------------------

             Summary: print out HiveMetaStore.audit to json format
                 Key: HIVE-27677
                 URL: https://issues.apache.org/jira/browse/HIVE-27677
             Project: Hive
          Issue Type: Improvement
          Components: Standalone Metastore
            Reporter: Mingmin Xu
            Assignee: Mingmin Xu


This task aims to print a new[1] line of HiveMetaStore audit log in JSON 
format, similar as [https://github.com/apache/hive/pull/1582] but extend to 
`cmd` details as well.

# existing audit log

```
HiveMetaStore.audit: ugi=xxx ip=xx.xx.xx.xx cmd=source:xx.xx.xx.xx get_table : 
db=xxx tbl=xxx
HiveMetaStore.audit: ugi=xxx ip=xx.xx.xx.xx cmd=source:xx.xx.xx.xx 
get_partition_with_auth : db=xx tbl=xx[xxx]
```

# The new audit log
```
HiveMetaStore.audit: \{ugi: "xxx", ip: "xx.xx.xx.xx", cmd={source: 
"xx.xx.xx.xx", api="get_table", params={db: "xxx", tbl: "xxx"}}}
HiveMetaStore.audit: \{ugi: "xxx", ip: "xx.xx.xx.xx", cmd={source: 
"xx.xx.xx.xx", api="get_partition_with_auth", params={db: "xxx", tbl: "xxx", 
key=["xxx"]}}}
```

----------------
For some context, we're tracking the usage of the shared Hive Metastore 
Service. HiveMetaStore auditLog is the raw data we reply on, to understand the 
traffic on different dimensions, source(IP), API, database, table, etc. 

Currently the audit log is in raw string without a standard format, especially 
for
extraLogInfo, code point 
[here|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L182-L200],
 makes it harder to analyze.

[1] should we print another line instead of replacing the existing one, to 
avoid a breaking-change?



 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to