[ 
https://issues.apache.org/jira/browse/RANGER-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238644#comment-16238644
 ] 

Ramesh Mani edited comment on RANGER-1837 at 11/4/17 12:40 AM:
---------------------------------------------------------------

{code:java}
ORC FILE FORMAT in HDFS Ranger Audit log:

    1. Enable Ranger Audit to HDFS in ORC file format
        - To enable Ranger Audit to HDFS with ORC format, we need to first 
enable AuditFileCacheProvider to spool the audit to local first.
          This is done to create batch for the ORC file.
            * In Namenode host, create spool directory and make sure the path 
can be read/write/execute for hdfs:hadoop user/Group

                $ mkdir -p  /var/log/hadoop/hdfs/audit/spool
                $ cd /var/log/hadoop/hdfs/audit/
                $ chown hdfs:hadoop spool

            * Enable AuditFileCacheProvider via following params in 
ranger-<component>-audit.xml
               xasecure.audit.provider.filecache.is.enabled=true
               
xasecure.audit.provider.filecache.filespool.file.rollover.sec=3600 ( 1 hr batch 
will be created to create ORC file/files based on size)
               
xasecure.audit.provider.filecache.filespool.dir=/var/log/hadoop/hdfs/audit/spool

    2. Enable ORC fileformat for Ranger HDFS Audit.
          - This is done by having the following param in 
ranger-<component>-audit.xml. By default the value is "json"

            xasecure.audit.destination.hdfs.filetype=orc

    3. Provision to control the compression techniques for ORC format. Default 
is 'snappy'
            xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none

    4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes 
and '100000' bytes respectively. This will decide the batch size on ORC file in 
hdfs.
            xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
            xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)

    5. Hive Query to create ORC table with default 'snappy' compresssion.

        CREATE EXTERNAL TABLE ranger_audit_event (
        repositoryType int,
        repositoryName string,
        reqUser string,
        evtTime string,
        accessType string,
        resourcePath string,
        resourceType string,
        action  string,
        accessResult string,
        agentId string,
        policyId  bigint,
        resultReason string,
        aclEnforcer string,
        sessionId string,
        clientType string,
        clientIP string,
        requestData string,
        clusterName string
        )
        STORED AS ORC
        LOCATION '/ranger/audit/hdfs'
        TBLPROPERTIES  ("orc.compress"="SNAPPY");

{code}


was (Author: rmani):
{code:java}
ORC FILE FORMAT in HDFS Ranger Audig log:

    1. Enable Ranger Audit to HDFS in ORC file format
        - To enable Ranger Audit to HDFS with ORC format, we need to first 
enable AuditFileCacheProvider to spool the audit to local first.
          This is done to create batch for the ORC file.
            * In Namenode host, create spool directory and make sure the path 
can be read/write/execute for hdfs:hadoop user/Group

                $ mkdir -p  /var/log/hadoop/hdfs/audit/spool
                $ cd /var/log/hadoop/hdfs/audit/
                $ chown hdfs:hadoop spool

            * Enable AuditFileCacheProvider via following params in 
ranger-<component>-audit.xml
               xasecure.audit.provider.filecache.is.enabled=true
               
xasecure.audit.provider.filecache.filespool.file.rollover.sec=3600 ( 1 hr batch 
will be created to create ORC file/files based on size)
               
xasecure.audit.provider.filecache.filespool.dir=/var/log/hadoop/hdfs/audit/spool

    2. Enable ORC fileformat for Ranger HDFS Audit.
          - This is done by having the following param in 
ranger-<component>-audit.xml. By default the value is "json"

            xasecure.audit.destination.hdfs.filetype=orc

    3. Provision to control the compression techniques for ORC format. Default 
is 'snappy'
            xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none

    4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes 
and '100000' bytes respectively. This will decide the batch size on ORC file in 
hdfs.
            xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
            xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)

    5. Hive Query to create ORC table with default 'snappy' compresssion.

        CREATE EXTERNAL TABLE ranger_audit_event (
        repositoryType int,
        repositoryName string,
        reqUser string,
        evtTime string,
        accessType string,
        resourcePath string,
        resourceType string,
        action  string,
        accessResult string,
        agentId string,
        policyId  bigint,
        resultReason string,
        aclEnforcer string,
        sessionId string,
        clientType string,
        clientIP string,
        requestData string,
        clusterName string
        )
        STORED AS ORC
        LOCATION '/ranger/audit/hdfs'
        TBLPROPERTIES  ("orc.compress"="SNAPPY");

{code}

> Enhance Ranger Audit to HDFS to support ORC file format
> -------------------------------------------------------
>
>                 Key: RANGER-1837
>                 URL: https://issues.apache.org/jira/browse/RANGER-1837
>             Project: Ranger
>          Issue Type: Improvement
>          Components: audit
>            Reporter: Kevin Risden
>            Priority: Major
>         Attachments: RANGER-1837-HDFS-Audit-Compression-0001.patch
>
>
> My team has done some research and found that Ranger HDFS audits are:
> * Stored as JSON objects (one per line)
> * Not compressed
> This is currently very verbose and would benefit from compression since this 
> data is not frequently accessed. 
> From Bosco on the mailing list:
> {quote}You are right, currently one of the options is saving the audits in 
> HDFS itself as JSON files in one folder per day. I have loaded these JSON 
> files from the folder into Hive as compressed ORC format. The compressed 
> files in ORC were less than 10% of the original size. So, it was significant 
> decrease in size. Also, it is easier to run analytics on the Hive tables.
>  
> So, there are couple of ways of doing it.
>  
> Write an Oozie job which runs every night and loads the previous day worth 
> audit logs into ORC or other format
> Write a AuditDestination which can write into the format you want to.
>  
> Regardless which approach you take, this would be a good feature for 
> Ranger.{quote}
> http://mail-archives.apache.org/mod_mbox/ranger-user/201710.mbox/%3CCAJU9nmiYzzUUX1uDEysLAcMti4iLmX7RE%3DmN2%3DdoLaaQf87njQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to