Re: [DISCUSS] Metron assessment tool

zeo...@gmail.com Tue, 12 Jul 2016 11:54:18 -0700

Hi All,

Has there been any additional discussion or development regarding this?  I
did take a brief look around the jira and didn't see anything regarding
this, but I may have missed it.  Thanks,


Jon

On Fri, Apr 15, 2016 at 2:01 PM Nick Allen <n...@nickallen.org> wrote:

> I definitely agree that you need this level of understanding of your
> cluster.  It definitely could work the way that you describe.
>
> I was thinking of it slightly differently though.  The metrics for this
> purpose (understanding performance of existing cluster) should come from
> the actual sensors themselves.  For example, I need to instrument the
> packet capture process so that it kicks out time-series-ish metrics that
> you can monitor in a dashboard over time.
>
> On Fri, Apr 15, 2016 at 1:40 PM, zeo...@gmail.com <zeo...@gmail.com>
> wrote:
>
> > However, it would be handy to have something like this perpetually
> running
> > so you know when to scale up/out/down/in a cluster.
> >
> > On Fri, Apr 15, 2016, 13:35 Nick Allen <n...@nickallen.org> wrote:
> >
> > > I think it is slightly different.  I don't even want to install minimal
> > > Kafka infrastructure (Look ma, no Kafka!)
> > >
> > > The exact implementation would differ based on the data inputs that you
> > are
> > > trying to measure, but for example...
> > >
> > >    - To understand raw packet rates I would have a specialized sensor
> > that
> > >    counts packets and size on the wire.  It doesn't do anything more
> than
> > > that.
> > >    - To understand Netflow rates, it would watch for Netflow packets
> and
> > >    count those.
> > >    - To understand sizing around application logs, a sensor would watch
> > for
> > >    Syslog packets and count those.
> > >
> > > The implementation would be more similar to raw packet capture with
> some
> > > DPI.  No Hadoop-y components required.
> > >
> > >
> > >
> > > On Fri, Apr 15, 2016 at 1:10 PM, James Sirota <jsir...@hortonworks.com
> >
> > > wrote:
> > >
> > > > So this is exactly what I am proposing.  Calculate the metrics on the
> > fly
> > > > without landing any data in the cluster.  The problem is that that
> > > > enterprise data volumes are so large you can’t just point them at a
> > Java
> > > or
> > > > a C++ program or sensor.  You either need an existing minimal Kafka
> > > > infrastructure to take that load or sample the data.
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > >
> > > >
> > > >
> > > > On 4/15/16, 9:54 AM, "Nick Allen" <n...@nickallen.org> wrote:
> > > >
> > > > >Or we have the assessment tool not actually land any data.  The
> > > assessment
> > > > >tool becomes a 'sensor' in its own right.  You just point the input
> > data
> > > > >sets at the assessment tool, it builds metrics on the input (for
> > > example:
> > > > >count the number of packets per second) and then we use those
> metrics
> > to
> > > > >estimate cluster size.
> > > > >
> > > > >On Wed, Apr 13, 2016 at 5:45 PM, James Sirota <
> > jsir...@hortonworks.com>
> > > > >wrote:
> > > > >
> > > > >> That’s an excellent point.  So I think there are three ways
> forward.
> > > > >>
> > > > >> One is we can assume that there has to be at least a minimal
> > > > >> infrastructure in place (at least a subset of Kafka and Storm
> > > > resources) to
> > > > >> run a full-scale assessment.  If you point something that blasts
> > > > millions
> > > > >> of messages per second at something like ActiveMQ you are going to
> > > blow
> > > > >> up.  So the infrastructure to at least receive these kinds of
> > message
> > > > >> volumes has to exist as a pre-requisite. There is no way to get
> > around
> > > > that.
> > > > >>
> > > > >> The second approach I see is sampling.  Sampling is a lot less
> > precise
> > > > and
> > > > >> you can miss peaks that fall outside of your sampling windows.
> But
> > > the
> > > > >> obvious benefit is that you don’t need a cluster to process these
> > > > streams.
> > > > >> You can probably perform most of your calculations with a
> > > multithreaded
> > > > >> java program.  Sampling poses a few design challenges.  First,
> where
> > > do
> > > > you
> > > > >> sample?  Do you sample on the sensor? (the implication here is
> that
> > we
> > > > have
> > > > >> to program some sort of sampling capability in our sensors) . Do
> you
> > > > sample
> > > > >> on transport? (maybe a Flume interceptor or NiFi processor).
> There
> > is
> > > > also
> > > > >> a question of what the sampling rate should be.  Not knowing
> > > statistical
> > > > >> properties of a stream ahead of time it’s hard to make that call.
> > > > >>
> > > > >> The third option I think is MR job.  We can blast the data into
> HDFS
> > > and
> > > > >> then go over it with MR to derive the metrics we are looking for.
> > > Then
> > > > we
> > > > >> don’t have to sample or setup expensive infrastructure to receive
> a
> > > > deluge
> > > > >> of data.  But then we run into the chicken and the egg problem
> that
> > in
> > > > >> order to size your HDFS you need to have data in HDFS.  Ideally
> you
> > > > need to
> > > > >> capture at least one full weeks worth of logs because patterns
> > > > throughout
> > > > >> the day as well as every day of the week have different
> statistical
> > > > >> properties.  So you need off peak, on peak, weekdays and weekends
> to
> > > > derive
> > > > >> these stats in batch.
> > > > >>
> > > > >> Any other design ideas?
> > > > >>
> > > > >> Thanks,
> > > > >> James
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 4/13/16, 1:59 PM, "Nick Allen" <n...@nickallen.org> wrote:
> > > > >>
> > > > >> >If the tool starts at Kafka, the user would have to already have
> > > > committed
> > > > >> >to the investment in the infrastructure and time to setup the
> > sensors
> > > > that
> > > > >> >feed Kafka and Kafka itself.  Maybe it would need to be further
> > > > upstream?
> > > > >> >On Apr 13, 2016 1:05 PM, "James Sirota" <jsir...@hortonworks.com
> >
> > > > wrote:
> > > > >> >
> > > > >> >> Hi Goerge,
> > > > >> >>
> > > > >> >> This article defines micro-tuning of the existing cluster.
> What
> > I
> > > am
> > > > >> >> proposing is a level up from that.  When you start with Metron
> > how
> > > do
> > > > >> you
> > > > >> >> even know how many nodes you need?  And of these nodes how many
> > do
> > > > you
> > > > >> >> allocate to Storm, indexing, storage?  How much storage do you
> > > need?
> > > > >> >> Tuning would be the next step in the process, but this tool
> would
> > > > answer
> > > > >> >> more fundamental questions about what a Metron deployment
> should
> > > look
> > > > >> like
> > > > >> >> given the number of telemetries and retention policies of the
> > > > >> enterprise.
> > > > >> >>
> > > > >> >> The best way to get this data (in my opinion) is to have some
> > tool
> > > > that
> > > > >> we
> > > > >> >> can plug into Metron’s point of ingest (kafka topics) and run
> > that
> > > > for
> > > > >> >> about a week or a month to be able to figure that out and spit
> > out
> > > > these
> > > > >> >> relevant metrics.  Based on these metrics we can figure out the
> > > > >> fundamental
> > > > >> >> things about what metron should look like.  Tuning would be the
> > > next
> > > > >> step.
> > > > >> >>
> > > > >> >> Thanks,
> > > > >> >> James
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> On 4/13/16, 9:52 AM, "George Vetticaden" <
> > > > gvettica...@hortonworks.com>
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >> >I have used the following Kafka and Storm Best Practices guide
> > at
> > > > >> numerous
> > > > >> >> >customer implementations.
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-b
> > > > >> >> >est-practices-guide.html
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >We need to have something similar and prescriptive for Metron
> > > based
> > > > on:
> > > > >> >> >1. What data sources are we enabling
> > > > >> >> >2. What enrichment services are we enabling
> > > > >> >> >3. What threat intel services are we enabling
> > > > >> >> >4. What are we indexing into Solr/Elastic and how long
> > > > >> >> >5. What are we persisting into HDFS..
> > > > >> >> >
> > > > >> >> >Ideally, the The metron assessment tool combined with an
> > > > introspection
> > > > >> of
> > > > >> >> >the user’s  ansible configuration should drive what ambari
> > > blueprint
> > > > >> type
> > > > >> >> >and configuration should be used when the cluster is spun up
> and
> > > the
> > > > >> storm
> > > > >> >> >topology is deployed.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >--
> > > > >> >> >George VetticadenPrincipal, COE
> > > > >> >> >gvettica...@hortonworks.com
> > > > >> >> >(630) 909-9138
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >On 4/13/16, 11:40 AM, "George Vetticaden" <
> > > > gvettica...@hortonworks.com
> > > > >> >
> > > > >> >> >wrote:
> > > > >> >> >
> > > > >> >> >>+ 1 to James suggestion.
> > > > >> >> >>We also need to consider not just the data volume and storage
> > > > >> >> requirements
> > > > >> >> >>for proper cluster sizing but also processing requirements as
> > > well.
> > > > >> Given
> > > > >> >> >>that in the new architecture, we have moved to single
> > enrichment
> > > > >> topology
> > > > >> >> >>that will support all data sources, proper sizing of the
> > > enrichment
> > > > >> >> >>topology  will be even more crucial to maintain SLAs and HA
> > > > >> requirements.
> > > > >> >> >>The following key questions will apply to each parser
> topology
> > > and
> > > > >> single
> > > > >> >> >>enrichment topology
> > > > >> >> >>
> > > > >> >> >>1. Number of workers?
> > > > >> >> >>2. Number of workers per machine?
> > > > >> >> >>3. Size of each workers (in memory)?
> > > > >> >> >>4. Supervisor memory settings
> > > > >> >> >>
> > > > >> >> >>The assessment tool should also be used to size topologies
> > > > correctly
> > > > >> as
> > > > >> >> >>well.
> > > > >> >> >>
> > > > >> >> >>Tuning Kafka, Hbase and Solr/Elastic should also be governed
> by
> > > the
> > > > >> >> Metron
> > > > >> >> >>assessment tool.
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>--
> > > > >> >> >>George Vetticaden
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>On 4/13/16, 11:28 AM, "James Sirota" <
> jsir...@hortonworks.com>
> > > > wrote:
> > > > >> >> >>
> > > > >> >> >>>Prior to adoption of Metron each adopting entity needs to
> > > > guesstimate
> > > > >> >> >>>it¹s data volume and data storage requirements so they can
> > size
> > > > their
> > > > >> >> >>>cluster properly.  I propose a creation of an assessment
> tool
> > > that
> > > > >> can
> > > > >> >> >>>plug in to a Kafka topic for a given telemetry and over time
> > > > produce
> > > > >> >> >>>statistics for ingest volumes and storage requirement.  The
> > idea
> > > > is
> > > > >> that
> > > > >> >> >>>prior to adoption of Metron someone can set up all the feeds
> > and
> > > > >> kafka
> > > > >> >> >>>topics, but instead of deploying Metron right away they
> would
> > > > deploy
> > > > >> >> this
> > > > >> >> >>>tool.  This tool would then produce statistics for data
> > > > >> ingest/storage
> > > > >> >> >>>requirement, and all relevant information needed for cluster
> > > > sizing.
> > > > >> >> >>>
> > > > >> >> >>>Some of the metrics that can be recorded are:
> > > > >> >> >>>
> > > > >> >> >>>  *   Number of system events per second (average, max,
> mean,
> > > > >> standard
> > > > >> >> >>>dev)
> > > > >> >> >>>  *   Message size  (average, max, mean, standard dev)
> > > > >> >> >>>  *   Average number of peaks
> > > > >> >> >>>  *   Duration of peaks  (average, max, mean, standard dev)
> > > > >> >> >>>
> > > > >> >> >>>If the parser for a telemetry exist the tool can produce
> > > > additional
> > > > >> >> >>>statistics
> > > > >> >> >>>
> > > > >> >> >>>  *   Number of keys/fields parsed (average, max, mean,
> > standard
> > > > dev)
> > > > >> >> >>>  *   Length of field parsed (average, max, mean, standard
> > dev)
> > > > >> >> >>>  *   Length of key parsed (average, max, mean, standard
> dev)
> > > > >> >> >>>
> > > > >> >> >>>The tool can run for a week or a month and produce these
> kinds
> > > of
> > > > >> >> >>>statistics.  Then once the statistics are available we can
> > come
> > > up
> > > > >> with
> > > > >> >> a
> > > > >> >> >>>guidance documentation of recommended cluster setup.
> > Otherwise
> > > > it¹s
> > > > >> >> hard
> > > > >> >> >>>to properly size a cluster and setup streaming parallelism
> not
> > > > >> knowing
> > > > >> >> >>>these metrics.
> > > > >> >> >>>
> > > > >> >> >>>
> > > > >> >> >>>Thoughts/ideas?
> > > > >> >> >>>
> > > > >> >> >>>Thanks,
> > > > >> >> >>>James
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > >--
> > > > >Nick Allen <n...@nickallen.org>
> > > >
> > >
> > >
> > >
> > > --
> > > Nick Allen <n...@nickallen.org>
> > >
> > --
> >
> > Jon
> >
>
>
>
> --
> Nick Allen <n...@nickallen.org>
>
-- 

Jon

Re: [DISCUSS] Metron assessment tool

Reply via email to