Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Carolyn Duby Fri, 01 Jun 2018 11:02:03 -0700

Hi Charles - 

I think your best bet is to create a csv file and use the flatfile_loader.sh  
This will be easier and you won’t have to worry if the format of Hbase storage 
changes:


https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#loading-utilities


The flat file loader is located here:

https://github.com/apache/metron/blob/master/metron-platform/metron-data-management/src/main/scripts/flatfile_loader.sh


Here is an example of an enrichment that maps a userid to a user category.

Here is the csv mapping the userid to a category.  For example tsausner has 
user category BAD_GUY.

[centos@metron-demo-4 rangeraudit]$ cat user_enrichment.csv 
tsausner,BAD_GUY 
ndhanase,CONTRACTOR 
svelagap,ADMIN 
jprivite,EMPLOYEE 
nolan,EMPLOYEE

Create an extractor config file that maps the columns of the csv file to 
enrichments.  The indicator_column is the key for the enrichment.   


[centos@metron-demo-4 rangeraudit]$ cat user_extraction.json 
{
  "config" : {
    "columns" : {
         "user_id" : 0
        ,"user_category" : 1 
    }
    ,"indicator_column" : "user_id"
    ,"type" : "user_categorization"
    ,"separator" : ","
  }
  ,"extractor" : "CSV"
}

This is an optional step where you can specify where to use the enrichments in 
Metron, when you import the enrichment data.  You can skip this step if the 
enrichments are already configured or you can add them later.
This config file applies the user_categorization enrichment using the reqUser 
field as the key.  
 
[centos@metron-demo-4 rangeraudit]$ cat 
rangeradmin_user_category_enrichment.json 
{
  "zkQuorum": 
"metron-demo-2.field.hortonworks.com:2181,metron-demo-0.field.hortonworks.com:2181,metron-demo-1.field.hortonworks.com:2181",
  "sensorToFieldList": {
    "rangeradmin": {
      "type": "ENRICHMENT",
      "fieldToEnrichmentTypes": {
        "reqUser": [
          "user_categorization"
        ]
      }
    }
  }
}

The command below imports the enrichment mappings into Hbase and adds the 
enrichment to the rangeradmin sensor data.   The result is that when a ranger 
admin event is enriched, metron will use the reqUser field value as a key into 
the user_categorization enrichment.  If the value of the field is present in 
the CSV data the enriched event will have a new field indicating the user 
category:

[centos@metron-demo-4 rangeraudit]$ 
/usr/hcp/1.4.0.0-38/metron/bin/flatfile_loader.sh -e user_extraction.json -t 
enrichment -i user_enrichment.csv -c t -n 
rangeradmin_user_category_enrichment.json


Base will look similar to this:

hbase(main):002:0> scan 'enrichment'
ROW                                  COLUMN+CELL                                
                                                               
 \x01\x12\x8Bjx@d.\xF3\xBF\xD3\xB2\x column=t:v, timestamp=1518118740456, 
value={"user_category":"BAD_GUY ","user_id":"tsausner"}              
 81\xEB\xB5\xD2\x00\x13user_categori                                            
                                                               
 zation\x00\x08tsausner                                                         
                                                               
 /\xA8\xEB\xB1\xE0N\xBE\xCBv?\xCAz9\ column=t:v, timestamp=1518118740540, 
value={"user_category":"ADMIN ","user_id":"svelagap"}                
 xF6;\xD3\x00\x13user_categorization                                            
                                                               
 \x00\x08svelagap                                                               
                                                               
 l\xF1F\x83t\xD6x\xF9\xBEwrk3\x00M2\ column=t:v, timestamp=1518118740522, 
value={"user_category":"CONTRACTOR ","user_id":"ndhanase"}           
 x00\x13user_categorization\x00\x08n                                            
                                                               
 dhanase  



After the enrichment data is in Hbase, create an event and add it to the 
rangeradmin topic.  For example if the reqUser field is set to nnolan, the 
enriched event will have the following fields:

enrichments:hbaseEnrichment:reqUser:user_categorization:user_category
EMPLOYEE

enrichments:hbaseEnrichment:reqUser:user_categorization:user_id
nnolan



Thanks

Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/1/18, 6:26 AM, "Charles Joynt" <charles.jo...@gresearch.co.uk> wrote:

>Hello,
>
>I work as a Dev/Ops Data Engineer within the security team at a company in 
>London where we are in the process of implementing Metron. I have been tasked 
>with implementing feeds of network environment data into HBase so that this 
>data can be used as enrichment sources for our security events. First-off I 
>wanted to pull in DNS data for an internal domain.
>
>I am assuming that I need to write data into HBase in such a way that it 
>exactly matches what I would get from the flatfile_loader.sh script. A 
>colleague of mine has already loaded some DNS data using that script, so I am 
>using that as a reference.
>
>I have implemented a flow in NiFi which takes JSON data from a HTTP listener 
>and routes it to a PutHBaseJSON processor. The flow is working, in the sense 
>that data is successfully written to HBase, but despite (naively) specifying 
>"Row Identifier Encoding Strategy = Binary", the results in HBase don't look 
>correct. Comparing the output from HBase scan commands I see:
>
>flatfile_loader.sh produced:
>
>ROW:      
>\xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\x0E192.168.0.198
>CELL: column=data:v, timestamp=1516896203840, 
>value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>
>PutHBaseJSON produced:
>
>ROW:  server.domain.local
>CELL: column=dns:v, timestamp=1527778603783, 
>value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>
>From source JSON:
>
>{"k":"server.domain.local","v":{"name":"server.domain.local","type":"A","data":"192.168.0.198"}}
>
>I know that there are some differences in column family / field names, but my 
>worry is the ROW id. Presumably I need to encode my row key, "k" in the JSON 
>data, in a way that matches how the flatfile_loader.sh script did it.
>
>Can anyone explain how I might convert my Id to the correct format?
>-or-
>Does this matter-can Metron use the human-readable ROW ids?
>
>Charlie Joynt
>
>--------------
>G-RESEARCH believes the information provided herein is reliable. While every 
>care has been taken to ensure accuracy, the information is furnished to the 
>recipients with no warranty as to the completeness and accuracy of its 
>contents and on condition that any errors or omissions shall not be made the 
>basis of any claim, demand or cause of action.
>The information in this email is intended only for the named recipient.  If 
>you are not the intended recipient please notify us immediately and do not 
>copy, distribute or take action based on this e-mail.
>All messages sent to and from this e-mail address will be logged by G-RESEARCH 
>and are subject to archival storage, monitoring, review and disclosure.
>G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, Whittington 
>House, 19-30 Alfred Place, London WC1E 7EA.
>Trenchant Limited is a company registered in England with company number 
>08127121.
>--------------

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Reply via email to