[
https://issues.apache.org/jira/browse/RANGER-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685676#comment-16685676
]
Kevin Risden commented on RANGER-1837:
--------------------------------------
I took a look but missed the submit review button. Changes look reasonable I
don't have a cluster to test this on anymore.
> Enhance Ranger Audit to HDFS to support ORC file format
> -------------------------------------------------------
>
> Key: RANGER-1837
> URL: https://issues.apache.org/jira/browse/RANGER-1837
> Project: Ranger
> Issue Type: Improvement
> Components: audit
> Reporter: Kevin Risden
> Assignee: Ramesh Mani
> Priority: Major
> Attachments:
> 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support-.patch,
> 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support-002.patch,
> 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support_001.patch,
> AuditDataFlow.png
>
>
> My team has done some research and found that Ranger HDFS audits are:
> * Stored as JSON objects (one per line)
> * Not compressed
> This is currently very verbose and would benefit from compression since this
> data is not frequently accessed.
> From Bosco on the mailing list:
> {quote}You are right, currently one of the options is saving the audits in
> HDFS itself as JSON files in one folder per day. I have loaded these JSON
> files from the folder into Hive as compressed ORC format. The compressed
> files in ORC were less than 10% of the original size. So, it was significant
> decrease in size. Also, it is easier to run analytics on the Hive tables.
>
> So, there are couple of ways of doing it.
>
> Write an Oozie job which runs every night and loads the previous day worth
> audit logs into ORC or other format
> Write a AuditDestination which can write into the format you want to.
>
> Regardless which approach you take, this would be a good feature for
> Ranger.{quote}
> http://mail-archives.apache.org/mod_mbox/ranger-user/201710.mbox/%3CCAJU9nmiYzzUUX1uDEysLAcMti4iLmX7RE%3DmN2%3DdoLaaQf87njQ%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)