-jiras-

On June 13, 2018 at 10:30:26, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

That’s where something like the Nifi solution would come in...

With the PutEnrichment processor and a ProcessHttpRequest processor, you do
have a web service for loading enrichments.

We could probably also create a rest service end point for it, which would
make some sense, but there is a nice multi-source, queuing, and lineage
element to the nifi solution.

Simon

> On 13 Jun 2018, at 15:04, Casey Stella <ceste...@gmail.com> wrote:
>
> no, sadly we do not.
>
>> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com>
wrote:
>>
>> Agreed….Streaming enrichments is the right solution for DNS data.
>>
>> Do we have a web service for writing enrichments?
>>
>> Carolyn Duby
>> Solutions Engineer, Northeast
>> cd...@hortonworks.com
>> +1.508.965.0584
>>
>> Join my team!
>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
>> Solutions Engineer – Boston - http://grnh.se/8gbxy41
>> Need Answers? Try https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 6/13/18, 6:25 AM, "Charles Joynt" <charles.jo...@gresearch.co.uk>
>> wrote:
>>
>>> Regarding why I didn't choose to load data with the flatfile loader
>> script...
>>>
>>> I want to be able to SEND enrichment data to Metron rather than have to
>> set up cron jobs to PULL data. At the moment I'm trying to prove that
the
>> process works with a simple data source. In the future we will want
>> enrichment data in Metron that comes from systems (e.g. HR databases)
that
>> I won't have access to, hence will need someone to be able to send us
the
>> data.
>>>
>>>> Carolyn: just call the flat file loader from a script processor...
>>>
>>> I didn't believe that would work in my environment. I'm pretty sure the
>> script has dependencies on various Metron JARs, not least for the row id
>> hashing algorithm. I suppose this would require at least a partial
install
>> of Metron alongside NiFi, and would introduce additional work on the
NiFi
>> cluster for any Metron upgrade. In some (enterprise) environments there
>> might be separation of ownership between NiFi and Metron.
>>>
>>> I also prefer not to have a Java app calling a bash script which calls
a
>> new java process, with logs or error output that might just get
swallowed
>> up invisibly. Somewhere down the line this could hold up effective
>> troubleshooting.
>>>
>>>> Simon: I have actually written a stellar processor, which applies
>> stellar to all FlowFile attributes...
>>>
>>> Gulp.
>>>
>>>> Simon: what didn't you like about the flatfile loader script?
>>>
>>> The flatfile loader script has worked fine for me when prepping
>> enrichment data in test systems, however it was a bit of a chore to get
the
>> JSON configuration files set up, especially for "wide" data sources that
>> may have 15-20 fields, e.g. Active Directory.
>>>
>>> More broadly speaking, I want to embrace the streaming data paradigm
and
>> tried to avoid batch jobs. With the DNS example, you might imagine a
future
>> where the enrichment data is streamed based on DHCP registrations, DNS
>> update events, etc. In principle this could reduce the window of time
where
>> we might enrich a data source with out-of-date data.
>>>
>>> Charlie
>>>
>>> -----Original Message-----
>>> From: Carolyn Duby [mailto:cd...@hortonworks.com]
>>> Sent: 12 June 2018 20:33
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with
PutHBaseJSON
>>>
>>> I like the streaming enrichment solutions but it depends on how you are
>> getting the data in. If you get the data in a csv file just call the
flat
>> file loader from a script processor. No special Nifi required.
>>>
>>> If the enrichments don’t arrive in bulk, the streaming solution is
better.
>>>
>>> Thanks
>>> Carolyn Duby
>>> Solutions Engineer, Northeast
>>> cd...@hortonworks.com
>>> +1.508.965.0584
>>>
>>> Join my team!
>>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
>> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
>> https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>>
>>>
>>> On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com>

>> wrote:
>>>
>>>> Good solution. The streaming enrichment writer makes a lot of sense
for
>>>> this, especially if you're not using huge enrichment sources that need
>>>> the batch based loaders.
>>>>
>>>> As it happens I have written most of a NiFi processor to handle this
>>>> use case directly - both non-record and Record based, especially for
>> Otto :).
>>>> The one thing we need to figure out now is where to host that, and how
>>>> to handle releases of a nifi-metron-bundle. I'll probably get round to
>>>> putting the code in my github at least in the next few days, while we
>>>> figure out a more permanent home.
>>>>
>>>> Charlie, out of curiosity, what didn't you like about the flatfile
>>>> loader script?
>>>>
>>>> Simon
>>>>
>>>> On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk>

>>>> wrote:
>>>>
>>>>> Thanks for the responses. I appreciate the willingness to look at
>>>>> creating a NiFi processer. That would be great!
>>>>>
>>>>> Just to follow up on this (after a week looking after the "ops" side
>>>>> of
>>>>> dev-ops): I really don't want to have to use the flatfile loader
>>>>> script, and I'm not going to be able to write a Metron-style HBase
>>>>> key generator any time soon, but I have had some success with a
>> different approach.
>>>>>
>>>>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>>>>> 192.168.0.198"
>>>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>>>>
>>>>> I then followed your instructions in this blog:
>>>>> https://cwiki.apache.org/confluence/display/METRON/
>>>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>>>>> ent
>>>>>
>>>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
>>>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
>>>>> into HBase:
>>>>>
>>>>> {
>>>>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>>>> "writerClassName": "org.apache.metron.enrichment.writer.
>>>>> SimpleHbaseEnrichmentWriter",
>>>>> "sensorTopic": "dns",
>>>>> "parserConfig": {
>>>>> "shew.table": " dns",
>>>>> "shew.cf": "dns",
>>>>> "shew.keyColumns": "name",
>>>>> "shew.enrichmentType": "dns",
>>>>> "columns": {
>>>>> "name": 0,
>>>>> "type": 1,
>>>>> "data": 2
>>>>> }
>>>>> },
>>>>> }
>>>>>
>>>>> And... it seems to be working. At least, I have data in HBase which
>>>>> looks more like the output of the flatfile loader.
>>>>>
>>>>> Charlie
>>>>>
>>>>> -----Original Message-----
>>>>> From: Casey Stella [mailto:ceste...@gmail.com]
>>>>> Sent: 05 June 2018 14:56
>>>>> To: dev@metron.apache.org
>>>>> Subject: Re: Writing enrichment data directly from NiFi with
>>>>> PutHBaseJSON
>>>>>
>>>>> The problem, as you correctly diagnosed, is the key in HBase. We
>>>>> construct the key very specifically in Metron, so it's unlikely to
>>>>> work out of the box with the NiFi processor unfortunately. The key
>>>>> that we use is formed here in the codebase:
>>>>> https://github.com/cestella/incubator-metron/blob/master/
>>>>> metron-platform/metron-enrichment/src/main/java/org/
>>>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>>>>
>>>>> To put that in english, consider the following:
>>>>>
>>>>> - type - The enrichment type
>>>>> - indicator - the indicator to use
>>>>> - hash(*) - A murmur 3 128bit hash function
>>>>>
>>>>> the key is hash(indicator) + type + indicator
>>>>>
>>>>> This hash prefixing is a standard practice in hbase key design that
>>>>> allows the keys to be uniformly distributed among the regions and
>>>>> prevents hotspotting. Depending on how the PutHBaseJSON processor
>>>>> works, if you can construct the key and pass it in, then you might be
>>>>> able to either construct the key in NiFi or write a processor to
>> construct the key.
>>>>> Ultimately though, what Carolyn said is true..the easiest approach is
>>>>> probably using the flatfile loader.
>>>>> If you do get this working in NiFi, however, do please let us know
>>>>> and/or consider contributing it back to the project as a PR :)
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>>>>> charles.jo...@gresearch.co.uk>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I work as a Dev/Ops Data Engineer within the security team at a
>>>>>> company in London where we are in the process of implementing
Metron.
>>>>>> I have been tasked with implementing feeds of network environment
>>>>>> data into HBase so that this data can be used as enrichment sources
>>>>>> for our
>>>>> security events.
>>>>>> First-off I wanted to pull in DNS data for an internal domain.
>>>>>>
>>>>>> I am assuming that I need to write data into HBase in such a way
>>>>>> that it exactly matches what I would get from the
>>>>>> flatfile_loader.sh script. A colleague of mine has already loaded
>>>>>> some DNS data using that script, so I am using that as a reference.
>>>>>>
>>>>>> I have implemented a flow in NiFi which takes JSON data from a HTTP
>>>>>> listener and routes it to a PutHBaseJSON processor. The flow is
>>>>>> working, in the sense that data is successfully written to HBase,
>>>>>> but despite (naively) specifying "Row Identifier Encoding Strategy
>>>>>> = Binary", the results in HBase don't look correct. Comparing the
>>>>>> output from HBase scan commands I
>>>>>> see:
>>>>>>
>>>>>> flatfile_loader.sh produced:
>>>>>>
>>>>>> ROW:
>>>>>> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>>>>>> 00\
>>>>>> x0E192.168.0.198
>>>>>> CELL: column=data:v, timestamp=1516896203840,
>>>>>> value={"clientname":"server.domain.local","clientip":"192.168.0.198
>>>>>> "}
>>>>>>
>>>>>> PutHBaseJSON produced:
>>>>>>
>>>>>> ROW: server.domain.local
>>>>>> CELL: column=dns:v, timestamp=1527778603783,
>>>>>> value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>>>>>> 8"}
>>>>>>
>>>>>> From source JSON:
>>>>>>
>>>>>>
>>>>>> {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>>>>>> :"A
>>>>>> ","data":"192.168.0.198"}}
>>>>>>
>>>>>> I know that there are some differences in column family / field
>>>>>> names, but my worry is the ROW id. Presumably I need to encode my
>>>>>> row key, "k" in the JSON data, in a way that matches how the
>>>>>> flatfile_loader.sh
>>>>> script did it.
>>>>>>
>>>>>> Can anyone explain how I might convert my Id to the correct format?
>>>>>> -or-
>>>>>> Does this matter-can Metron use the human-readable ROW ids?
>>>>>>
>>>>>> Charlie Joynt
>>>>>>
>>>>>> --------------
>>>>>> G-RESEARCH believes the information provided herein is reliable.
>>>>>> While every care has been taken to ensure accuracy, the information
>>>>>> is furnished to the recipients with no warranty as to the
>>>>>> completeness and accuracy of its contents and on condition that any
>>>>>> errors or omissions shall not be made the basis of any claim,
>>>>>> demand or cause of
>>>>> action.
>>>>>> The information in this email is intended only for the named
>> recipient.
>>>>>> If you are not the intended recipient please notify us immediately
>>>>>> and do not copy, distribute or take action based on this e-mail.
>>>>>> All messages sent to and from this e-mail address will be logged by
>>>>>> G-RESEARCH and are subject to archival storage, monitoring, review
>>>>>> and disclosure.
>>>>>> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>>>>>> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>>>>>> Trenchant Limited is a company registered in England with company
>>>>>> number 08127121.
>>>>>> --------------
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> simon elliston ball
>>>> @sireb
>>

Reply via email to