Not convinced we should be writing Jiras against the metron project, or the
nifi project if we don't know where it's actually going to end up to be
honest. In any case, working code:
https://github.com/simonellistonball/metron/tree/nifi/nifi-metron-bundle
which is currently in a metron fork, for no particular reason. Also, it
needs proper tests, docs and all that jazz, but PoC grade, it works,
scales, and is moderately robust as long as hbase doesn't fall over too
much.

Simon

On 13 June 2018 at 15:24, Otto Fowler <ottobackwa...@gmail.com> wrote:

> Do we even have a jira?  If not maybe Carolyn et. al. can write one up that
> lays out some
> requirements and context.
>
>
> On June 13, 2018 at 10:04:27, Casey Stella (ceste...@gmail.com) wrote:
>
> no, sadly we do not.
>
> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com>
> wrote:
>
> > Agreed….Streaming enrichments is the right solution for DNS data.
> >
> > Do we have a web service for writing enrichments?
> >
> > Carolyn Duby
> > Solutions Engineer, Northeast
> > cd...@hortonworks.com
> > +1.508.965.0584
> >
> > Join my team!
> > Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> > Solutions Engineer – Boston - http://grnh.se/8gbxy41
> > Need Answers? Try https://community.hortonworks.com <
> > https://community.hortonworks.com/answers/index.html>
> >
> >
> >
> >
> >
> >
> >
> >
> > On 6/13/18, 6:25 AM, "Charles Joynt" <charles.jo...@gresearch.co.uk>
> > wrote:
> >
> > >Regarding why I didn't choose to load data with the flatfile loader
> > script...
> > >
> > >I want to be able to SEND enrichment data to Metron rather than have to
> > set up cron jobs to PULL data. At the moment I'm trying to prove that the
> > process works with a simple data source. In the future we will want
> > enrichment data in Metron that comes from systems (e.g. HR databases)
> that
> > I won't have access to, hence will need someone to be able to send us the
> > data.
> > >
> > >> Carolyn: just call the flat file loader from a script processor...
> > >
> > >I didn't believe that would work in my environment. I'm pretty sure the
> > script has dependencies on various Metron JARs, not least for the row id
> > hashing algorithm. I suppose this would require at least a partial
> install
> > of Metron alongside NiFi, and would introduce additional work on the NiFi
> > cluster for any Metron upgrade. In some (enterprise) environments there
> > might be separation of ownership between NiFi and Metron.
> > >
> > >I also prefer not to have a Java app calling a bash script which calls a
> > new java process, with logs or error output that might just get swallowed
> > up invisibly. Somewhere down the line this could hold up effective
> > troubleshooting.
> > >
> > >> Simon: I have actually written a stellar processor, which applies
> > stellar to all FlowFile attributes...
> > >
> > >Gulp.
> > >
> > >> Simon: what didn't you like about the flatfile loader script?
> > >
> > >The flatfile loader script has worked fine for me when prepping
> > enrichment data in test systems, however it was a bit of a chore to get
> the
> > JSON configuration files set up, especially for "wide" data sources that
> > may have 15-20 fields, e.g. Active Directory.
> > >
> > >More broadly speaking, I want to embrace the streaming data paradigm and
> > tried to avoid batch jobs. With the DNS example, you might imagine a
> future
> > where the enrichment data is streamed based on DHCP registrations, DNS
> > update events, etc. In principle this could reduce the window of time
> where
> > we might enrich a data source with out-of-date data.
> > >
> > >Charlie
> > >
> > >-----Original Message-----
> > >From: Carolyn Duby [mailto:cd...@hortonworks.com]
> > >Sent: 12 June 2018 20:33
> > >To: dev@metron.apache.org
> > >Subject: Re: Writing enrichment data directly from NiFi with
> PutHBaseJSON
> > >
> > >I like the streaming enrichment solutions but it depends on how you are
> > getting the data in. If you get the data in a csv file just call the flat
> > file loader from a script processor. No special Nifi required.
> > >
> > >If the enrichments don’t arrive in bulk, the streaming solution is
> better.
> > >
> > >Thanks
> > >Carolyn Duby
> > >Solutions Engineer, Northeast
> > >cd...@hortonworks.com
> > >+1.508.965.0584
> > >
> > >Join my team!
> > >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
> > Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
> > https://community.hortonworks.com <
> > https://community.hortonworks.com/answers/index.html>
> > >
> > >
> > >On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com
> >
> > wrote:
> > >
> > >>Good solution. The streaming enrichment writer makes a lot of sense for
> > >>this, especially if you're not using huge enrichment sources that need
> > >>the batch based loaders.
> > >>
> > >>As it happens I have written most of a NiFi processor to handle this
> > >>use case directly - both non-record and Record based, especially for
> > Otto :).
> > >>The one thing we need to figure out now is where to host that, and how
> > >>to handle releases of a nifi-metron-bundle. I'll probably get round to
> > >>putting the code in my github at least in the next few days, while we
> > >>figure out a more permanent home.
> > >>
> > >>Charlie, out of curiosity, what didn't you like about the flatfile
> > >>loader script?
> > >>
> > >>Simon
> > >>
> > >>On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk
> >
> > >>wrote:
> > >>
> > >>> Thanks for the responses. I appreciate the willingness to look at
> > >>> creating a NiFi processer. That would be great!
> > >>>
> > >>> Just to follow up on this (after a week looking after the "ops" side
> > >>> of
> > >>> dev-ops): I really don't want to have to use the flatfile loader
> > >>> script, and I'm not going to be able to write a Metron-style HBase
> > >>> key generator any time soon, but I have had some success with a
> > different approach.
> > >>>
> > >>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> > >>> 192.168.0.198"
> > >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
> > >>>
> > >>> I then followed your instructions in this blog:
> > >>> https://cwiki.apache.org/confluence/display/METRON/
> > >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+
> Streaming+Enrichm
> > >>> ent
> > >>>
> > >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
> > >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
> > >>> into HBase:
> > >>>
> > >>> {
> > >>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> > >>> "writerClassName": "org.apache.metron.enrichment.writer.
> > >>> SimpleHbaseEnrichmentWriter",
> > >>> "sensorTopic": "dns",
> > >>> "parserConfig": {
> > >>> "shew.table": " dns",
> > >>> "shew.cf": "dns",
> > >>> "shew.keyColumns": "name",
> > >>> "shew.enrichmentType": "dns",
> > >>> "columns": {
> > >>> "name": 0,
> > >>> "type": 1,
> > >>> "data": 2
> > >>> }
> > >>> },
> > >>> }
> > >>>
> > >>> And... it seems to be working. At least, I have data in HBase which
> > >>> looks more like the output of the flatfile loader.
> > >>>
> > >>> Charlie
> > >>>
> > >>> -----Original Message-----
> > >>> From: Casey Stella [mailto:ceste...@gmail.com]
> > >>> Sent: 05 June 2018 14:56
> > >>> To: dev@metron.apache.org
> > >>> Subject: Re: Writing enrichment data directly from NiFi with
> > >>> PutHBaseJSON
> > >>>
> > >>> The problem, as you correctly diagnosed, is the key in HBase. We
> > >>> construct the key very specifically in Metron, so it's unlikely to
> > >>> work out of the box with the NiFi processor unfortunately. The key
> > >>> that we use is formed here in the codebase:
> > >>> https://github.com/cestella/incubator-metron/blob/master/
> > >>> metron-platform/metron-enrichment/src/main/java/org/
> > >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > >>>
> > >>> To put that in english, consider the following:
> > >>>
> > >>> - type - The enrichment type
> > >>> - indicator - the indicator to use
> > >>> - hash(*) - A murmur 3 128bit hash function
> > >>>
> > >>> the key is hash(indicator) + type + indicator
> > >>>
> > >>> This hash prefixing is a standard practice in hbase key design that
> > >>> allows the keys to be uniformly distributed among the regions and
> > >>> prevents hotspotting. Depending on how the PutHBaseJSON processor
> > >>> works, if you can construct the key and pass it in, then you might be
> > >>> able to either construct the key in NiFi or write a processor to
> > construct the key.
> > >>> Ultimately though, what Carolyn said is true..the easiest approach is
> > >>> probably using the flatfile loader.
> > >>> If you do get this working in NiFi, however, do please let us know
> > >>> and/or consider contributing it back to the project as a PR :)
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > >>> charles.jo...@gresearch.co.uk>
> > >>> wrote:
> > >>>
> > >>> > Hello,
> > >>> >
> > >>> > I work as a Dev/Ops Data Engineer within the security team at a
> > >>> > company in London where we are in the process of implementing
> Metron.
> > >>> > I have been tasked with implementing feeds of network environment
> > >>> > data into HBase so that this data can be used as enrichment sources
> > >>> > for our
> > >>> security events.
> > >>> > First-off I wanted to pull in DNS data for an internal domain.
> > >>> >
> > >>> > I am assuming that I need to write data into HBase in such a way
> > >>> > that it exactly matches what I would get from the
> > >>> > flatfile_loader.sh script. A colleague of mine has already loaded
> > >>> > some DNS data using that script, so I am using that as a reference.
> > >>> >
> > >>> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > >>> > listener and routes it to a PutHBaseJSON processor. The flow is
> > >>> > working, in the sense that data is successfully written to HBase,
> > >>> > but despite (naively) specifying "Row Identifier Encoding Strategy
> > >>> > = Binary", the results in HBase don't look correct. Comparing the
> > >>> > output from HBase scan commands I
> > >>> > see:
> > >>> >
> > >>> > flatfile_loader.sh produced:
> > >>> >
> > >>> > ROW:
> > >>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> x05whois\x
> > >>> > 00\
> > >>> > x0E192.168.0.198
> > >>> > CELL: column=data:v, timestamp=1516896203840,
> > >>> > value={"clientname":"server.domain.local","clientip":"192.
> 168.0.198
> > >>> > "}
> > >>> >
> > >>> > PutHBaseJSON produced:
> > >>> >
> > >>> > ROW: server.domain.local
> > >>> > CELL: column=dns:v, timestamp=1527778603783,
> > >>> > value={"name":"server.domain.local","type":"A","data":"192.
> 168.0.19
> > >>> > 8"}
> > >>> >
> > >>> > From source JSON:
> > >>> >
> > >>> >
> > >>> > {"k":"server.domain.local","v":{"name":"server.domain.local"
> ,"type"
> > >>> > :"A
> > >>> > ","data":"192.168.0.198"}}
> > >>> >
> > >>> > I know that there are some differences in column family / field
> > >>> > names, but my worry is the ROW id. Presumably I need to encode my
> > >>> > row key, "k" in the JSON data, in a way that matches how the
> > >>> > flatfile_loader.sh
> > >>> script did it.
> > >>> >
> > >>> > Can anyone explain how I might convert my Id to the correct format?
> > >>> > -or-
> > >>> > Does this matter-can Metron use the human-readable ROW ids?
> > >>> >
> > >>> > Charlie Joynt
> > >>> >
> > >>> > --------------
> > >>> > G-RESEARCH believes the information provided herein is reliable.
> > >>> > While every care has been taken to ensure accuracy, the information
> > >>> > is furnished to the recipients with no warranty as to the
> > >>> > completeness and accuracy of its contents and on condition that any
> > >>> > errors or omissions shall not be made the basis of any claim,
> > >>> > demand or cause of
> > >>> action.
> > >>> > The information in this email is intended only for the named
> > recipient.
> > >>> > If you are not the intended recipient please notify us immediately
> > >>> > and do not copy, distribute or take action based on this e-mail.
> > >>> > All messages sent to and from this e-mail address will be logged by
> > >>> > G-RESEARCH and are subject to archival storage, monitoring, review
> > >>> > and disclosure.
> > >>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > >>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > >>> > Trenchant Limited is a company registered in England with company
> > >>> > number 08127121.
> > >>> > --------------
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >>--
> > >>--
> > >>simon elliston ball
> > >>@sireb
> >
>



-- 
--
simon elliston ball
@sireb

Reply via email to