Kevin Risden created RANGER-1837:
------------------------------------
Summary: HDFS Audit Compression
Key: RANGER-1837
URL: https://issues.apache.org/jira/browse/RANGER-1837
Project: Ranger
Issue Type: Improvement
Components: audit
Reporter: Kevin Risden
My team has done some research and found that Ranger HDFS audits are:
* Stored as JSON objects (one per line)
* Not compressed
This is currently very verbose and would benefit from compression since this
data is not frequently accessed.
>From Bosco on the mailing list:
{quote}You are right, currently one of the options is saving the audits in HDFS
itself as JSON files in one folder per day. I have loaded these JSON files from
the folder into Hive as compressed ORC format. The compressed files in ORC were
less than 10% of the original size. So, it was significant decrease in size.
Also, it is easier to run analytics on the Hive tables.
So, there are couple of ways of doing it.
Write an Oozie job which runs every night and loads the previous day worth
audit logs into ORC or other format
Write a AuditDestination which can write into the format you want to.
Regardless which approach you take, this would be a good feature for
Ranger.{quote}
http://mail-archives.apache.org/mod_mbox/ranger-user/201710.mbox/%3CCAJU9nmiYzzUUX1uDEysLAcMti4iLmX7RE%3DmN2%3DdoLaaQf87njQ%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)