[
https://issues.apache.org/jira/browse/RANGER-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224086#comment-16224086
]
Madhan Neethiraj commented on RANGER-1837:
------------------------------------------
[~rmani] - the approach you detailed looks good!
- I think having compression enabled by default would be good.
- Currently, the default is to write one audit log file to HDFS per day; I
think this interval would work for ORC as well. If a deployment needs more
frequent push to HDFS (to make the data available sooner for queries, for
example in Hive), they can configure for a shorter interval; but that would
leave a large number of files per day in HDFS
- Given many deployments would use audit files in ORC format in Hive, it will
be helpful to include Hive table creation statement in the documentation
This will be a very useful addition to Ranger audit framework. Looking forward
to the patch. Thanks [~rmani].
> HDFS Audit Compression
> ----------------------
>
> Key: RANGER-1837
> URL: https://issues.apache.org/jira/browse/RANGER-1837
> Project: Ranger
> Issue Type: Improvement
> Components: audit
> Reporter: Kevin Risden
>
> My team has done some research and found that Ranger HDFS audits are:
> * Stored as JSON objects (one per line)
> * Not compressed
> This is currently very verbose and would benefit from compression since this
> data is not frequently accessed.
> From Bosco on the mailing list:
> {quote}You are right, currently one of the options is saving the audits in
> HDFS itself as JSON files in one folder per day. I have loaded these JSON
> files from the folder into Hive as compressed ORC format. The compressed
> files in ORC were less than 10% of the original size. So, it was significant
> decrease in size. Also, it is easier to run analytics on the Hive tables.
>
> So, there are couple of ways of doing it.
>
> Write an Oozie job which runs every night and loads the previous day worth
> audit logs into ORC or other format
> Write a AuditDestination which can write into the format you want to.
>
> Regardless which approach you take, this would be a good feature for
> Ranger.{quote}
> http://mail-archives.apache.org/mod_mbox/ranger-user/201710.mbox/%3CCAJU9nmiYzzUUX1uDEysLAcMti4iLmX7RE%3DmN2%3DdoLaaQf87njQ%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)