-jiras-
On June 13, 2018 at 10:30:26, Simon Elliston Ball ( si...@simonellistonball.com) wrote: That’s where something like the Nifi solution would come in... With the PutEnrichment processor and a ProcessHttpRequest processor, you do have a web service for loading enrichments. We could probably also create a rest service end point for it, which would make some sense, but there is a nice multi-source, queuing, and lineage element to the nifi solution. Simon > On 13 Jun 2018, at 15:04, Casey Stella <ceste...@gmail.com> wrote: > > no, sadly we do not. > >> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com> wrote: >> >> Agreed….Streaming enrichments is the right solution for DNS data. >> >> Do we have a web service for writing enrichments? >> >> Carolyn Duby >> Solutions Engineer, Northeast >> cd...@hortonworks.com >> +1.508.965.0584 >> >> Join my team! >> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 >> Solutions Engineer – Boston - http://grnh.se/8gbxy41 >> Need Answers? Try https://community.hortonworks.com < >> https://community.hortonworks.com/answers/index.html> >> >> >> >> >> >> >> >> >> On 6/13/18, 6:25 AM, "Charles Joynt" <charles.jo...@gresearch.co.uk> >> wrote: >> >>> Regarding why I didn't choose to load data with the flatfile loader >> script... >>> >>> I want to be able to SEND enrichment data to Metron rather than have to >> set up cron jobs to PULL data. At the moment I'm trying to prove that the >> process works with a simple data source. In the future we will want >> enrichment data in Metron that comes from systems (e.g. HR databases) that >> I won't have access to, hence will need someone to be able to send us the >> data. >>> >>>> Carolyn: just call the flat file loader from a script processor... >>> >>> I didn't believe that would work in my environment. I'm pretty sure the >> script has dependencies on various Metron JARs, not least for the row id >> hashing algorithm. I suppose this would require at least a partial install >> of Metron alongside NiFi, and would introduce additional work on the NiFi >> cluster for any Metron upgrade. In some (enterprise) environments there >> might be separation of ownership between NiFi and Metron. >>> >>> I also prefer not to have a Java app calling a bash script which calls a >> new java process, with logs or error output that might just get swallowed >> up invisibly. Somewhere down the line this could hold up effective >> troubleshooting. >>> >>>> Simon: I have actually written a stellar processor, which applies >> stellar to all FlowFile attributes... >>> >>> Gulp. >>> >>>> Simon: what didn't you like about the flatfile loader script? >>> >>> The flatfile loader script has worked fine for me when prepping >> enrichment data in test systems, however it was a bit of a chore to get the >> JSON configuration files set up, especially for "wide" data sources that >> may have 15-20 fields, e.g. Active Directory. >>> >>> More broadly speaking, I want to embrace the streaming data paradigm and >> tried to avoid batch jobs. With the DNS example, you might imagine a future >> where the enrichment data is streamed based on DHCP registrations, DNS >> update events, etc. In principle this could reduce the window of time where >> we might enrich a data source with out-of-date data. >>> >>> Charlie >>> >>> -----Original Message----- >>> From: Carolyn Duby [mailto:cd...@hortonworks.com] >>> Sent: 12 June 2018 20:33 >>> To: dev@metron.apache.org >>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON >>> >>> I like the streaming enrichment solutions but it depends on how you are >> getting the data in. If you get the data in a csv file just call the flat >> file loader from a script processor. No special Nifi required. >>> >>> If the enrichments don’t arrive in bulk, the streaming solution is better. >>> >>> Thanks >>> Carolyn Duby >>> Solutions Engineer, Northeast >>> cd...@hortonworks.com >>> +1.508.965.0584 >>> >>> Join my team! >>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions >> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try >> https://community.hortonworks.com < >> https://community.hortonworks.com/answers/index.html> >>> >>> >>> On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> >> wrote: >>> >>>> Good solution. The streaming enrichment writer makes a lot of sense for >>>> this, especially if you're not using huge enrichment sources that need >>>> the batch based loaders. >>>> >>>> As it happens I have written most of a NiFi processor to handle this >>>> use case directly - both non-record and Record based, especially for >> Otto :). >>>> The one thing we need to figure out now is where to host that, and how >>>> to handle releases of a nifi-metron-bundle. I'll probably get round to >>>> putting the code in my github at least in the next few days, while we >>>> figure out a more permanent home. >>>> >>>> Charlie, out of curiosity, what didn't you like about the flatfile >>>> loader script? >>>> >>>> Simon >>>> >>>> On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk> >>>> wrote: >>>> >>>>> Thanks for the responses. I appreciate the willingness to look at >>>>> creating a NiFi processer. That would be great! >>>>> >>>>> Just to follow up on this (after a week looking after the "ops" side >>>>> of >>>>> dev-ops): I really don't want to have to use the flatfile loader >>>>> script, and I'm not going to be able to write a Metron-style HBase >>>>> key generator any time soon, but I have had some success with a >> different approach. >>>>> >>>>> 1. Generate data in CSV format, e.g. "server.domain.local","A"," >>>>> 192.168.0.198" >>>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic >>>>> >>>>> I then followed your instructions in this blog: >>>>> https://cwiki.apache.org/confluence/display/METRON/ >>>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm >>>>> ent >>>>> >>>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and >>>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this >>>>> into HBase: >>>>> >>>>> { >>>>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser", >>>>> "writerClassName": "org.apache.metron.enrichment.writer. >>>>> SimpleHbaseEnrichmentWriter", >>>>> "sensorTopic": "dns", >>>>> "parserConfig": { >>>>> "shew.table": " dns", >>>>> "shew.cf": "dns", >>>>> "shew.keyColumns": "name", >>>>> "shew.enrichmentType": "dns", >>>>> "columns": { >>>>> "name": 0, >>>>> "type": 1, >>>>> "data": 2 >>>>> } >>>>> }, >>>>> } >>>>> >>>>> And... it seems to be working. At least, I have data in HBase which >>>>> looks more like the output of the flatfile loader. >>>>> >>>>> Charlie >>>>> >>>>> -----Original Message----- >>>>> From: Casey Stella [mailto:ceste...@gmail.com] >>>>> Sent: 05 June 2018 14:56 >>>>> To: dev@metron.apache.org >>>>> Subject: Re: Writing enrichment data directly from NiFi with >>>>> PutHBaseJSON >>>>> >>>>> The problem, as you correctly diagnosed, is the key in HBase. We >>>>> construct the key very specifically in Metron, so it's unlikely to >>>>> work out of the box with the NiFi processor unfortunately. The key >>>>> that we use is formed here in the codebase: >>>>> https://github.com/cestella/incubator-metron/blob/master/ >>>>> metron-platform/metron-enrichment/src/main/java/org/ >>>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51 >>>>> >>>>> To put that in english, consider the following: >>>>> >>>>> - type - The enrichment type >>>>> - indicator - the indicator to use >>>>> - hash(*) - A murmur 3 128bit hash function >>>>> >>>>> the key is hash(indicator) + type + indicator >>>>> >>>>> This hash prefixing is a standard practice in hbase key design that >>>>> allows the keys to be uniformly distributed among the regions and >>>>> prevents hotspotting. Depending on how the PutHBaseJSON processor >>>>> works, if you can construct the key and pass it in, then you might be >>>>> able to either construct the key in NiFi or write a processor to >> construct the key. >>>>> Ultimately though, what Carolyn said is true..the easiest approach is >>>>> probably using the flatfile loader. >>>>> If you do get this working in NiFi, however, do please let us know >>>>> and/or consider contributing it back to the project as a PR :) >>>>> >>>>> >>>>> >>>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < >>>>> charles.jo...@gresearch.co.uk> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I work as a Dev/Ops Data Engineer within the security team at a >>>>>> company in London where we are in the process of implementing Metron. >>>>>> I have been tasked with implementing feeds of network environment >>>>>> data into HBase so that this data can be used as enrichment sources >>>>>> for our >>>>> security events. >>>>>> First-off I wanted to pull in DNS data for an internal domain. >>>>>> >>>>>> I am assuming that I need to write data into HBase in such a way >>>>>> that it exactly matches what I would get from the >>>>>> flatfile_loader.sh script. A colleague of mine has already loaded >>>>>> some DNS data using that script, so I am using that as a reference. >>>>>> >>>>>> I have implemented a flow in NiFi which takes JSON data from a HTTP >>>>>> listener and routes it to a PutHBaseJSON processor. The flow is >>>>>> working, in the sense that data is successfully written to HBase, >>>>>> but despite (naively) specifying "Row Identifier Encoding Strategy >>>>>> = Binary", the results in HBase don't look correct. Comparing the >>>>>> output from HBase scan commands I >>>>>> see: >>>>>> >>>>>> flatfile_loader.sh produced: >>>>>> >>>>>> ROW: >>>>>> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x >>>>>> 00\ >>>>>> x0E192.168.0.198 >>>>>> CELL: column=data:v, timestamp=1516896203840, >>>>>> value={"clientname":"server.domain.local","clientip":"192.168.0.198 >>>>>> "} >>>>>> >>>>>> PutHBaseJSON produced: >>>>>> >>>>>> ROW: server.domain.local >>>>>> CELL: column=dns:v, timestamp=1527778603783, >>>>>> value={"name":"server.domain.local","type":"A","data":"192.168.0.19 >>>>>> 8"} >>>>>> >>>>>> From source JSON: >>>>>> >>>>>> >>>>>> {"k":"server.domain.local","v":{"name":"server.domain.local","type" >>>>>> :"A >>>>>> ","data":"192.168.0.198"}} >>>>>> >>>>>> I know that there are some differences in column family / field >>>>>> names, but my worry is the ROW id. Presumably I need to encode my >>>>>> row key, "k" in the JSON data, in a way that matches how the >>>>>> flatfile_loader.sh >>>>> script did it. >>>>>> >>>>>> Can anyone explain how I might convert my Id to the correct format? >>>>>> -or- >>>>>> Does this matter-can Metron use the human-readable ROW ids? >>>>>> >>>>>> Charlie Joynt >>>>>> >>>>>> -------------- >>>>>> G-RESEARCH believes the information provided herein is reliable. >>>>>> While every care has been taken to ensure accuracy, the information >>>>>> is furnished to the recipients with no warranty as to the >>>>>> completeness and accuracy of its contents and on condition that any >>>>>> errors or omissions shall not be made the basis of any claim, >>>>>> demand or cause of >>>>> action. >>>>>> The information in this email is intended only for the named >> recipient. >>>>>> If you are not the intended recipient please notify us immediately >>>>>> and do not copy, distribute or take action based on this e-mail. >>>>>> All messages sent to and from this e-mail address will be logged by >>>>>> G-RESEARCH and are subject to archival storage, monitoring, review >>>>>> and disclosure. >>>>>> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, >>>>>> Whittington House, 19-30 Alfred Place, London WC1E 7EA. >>>>>> Trenchant Limited is a company registered in England with company >>>>>> number 08127121. >>>>>> -------------- >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> simon elliston ball >>>> @sireb >>