Re: [Discuss] Minimal General-purpose Monitoring Engine Proposal

Hao Chen Wed, 11 Nov 2015 02:14:40 -0800

Edward,

I completely agree what you describe as above, while here I am just
thinking about that we should also consider the general purpose monitoring
engine in different stages and facing to different audiences.


As to audiences, we would have:

1. Traditional Users: most of the traditional monitoring system assume the
users to pre-transform the data into acceptable format like time-series
metric before sending to alerting system, for these users, they may be only
interested in the scalability and real-time alerting of eagle. And this is
also the fastest way to adopt eagle as a general-monitoring engine.  As to
these users, they should only need to learn an single new term "Stream"
migrating from original system:
 (1) Collection: Collect their data/logs and transform into certain format
and send to kafka (configured topic) by the traditional way
 (2) Stream Definition: Define stream schema in eagle for the inputed kafka
stream
 (3) Stream Policy: After define the stream schema, they could immediately
easily use the UI to edit policy and get alert.

2. Advanced Streaming App developers: we are trying to provide a highly
abstracted alerting framework upon independent streaming execution
environment, current one of the pain point is that the stream definition is
a little complex:
https://github.com/eBay/Eagle/blob/master/eagle-assembly/src/main/bin/eagle-topology-init.sh#L74-L91.
To develop a new app, developers have to :

(1) Import stream schema in a very inconvenient way with script, rest api
and manual json content.
(2) Only after importing the stream, the developer could start to write
code for following business logic in a different place and cost some
additional mental pressure

A possible better way: developer could define stream, transform steam,
alert stream with policy in single place, because in our current
implementations, we already fix the code schema in code for certain data
source like HDFS/HIVE, which will never change except we change logic. The
benefit of defining stream schema in data base is we could friendly use in
UI.

As the problem "how to make sure the single source of truth": no matter how
the stream schema is defined on database, local xml, or inline code, the
framework could sync it to database as the single source of truth, as to
developers, they don't need to care.

Meanwhile I think we could make additional value for community like
separating the streaming framework for general streaming process purpose
supporting environment independent fluent api/siddhi/alert without
depending on the relatively eagle storage layer.

3. New Eagle Users: I completely agree what you describe as above about the
great unique value eagle would deliver in future. User could send any raw
type/format of data into eagle, define any kinds of data sources, do
transformation on UI, define alert/policy on UI, and eagle will manage any
topology related works in backend. This is very good. As a summarization,
we could build a Streaming DAG building and management engine, which could
be extended from above core simple use cases.

How do you think?

--

Hao


On Wed, Nov 11, 2015 at 2:42 PM, Zhang, Edward (GDI Hadoop) <
yonzh...@ebay.com> wrote:

> I figured out a graceful way of onboarding a new datasource (Kafka) which
> enables general purpose monitoring framework, please review and give
> feedback.
>
> (If we do good design on this, then we can easily onboard large number of
> hadoop jmx metrics :-) )
>
> 1. In UI, Role Modeler describes kafka settings (topic, text message to
> object mapping, partition key, etc.)
> 2. Eagle stores the result of step 1 into storage as one datasource which
> contains everything including mapping logic (regular expression)
>     - may provide groovy for hot deploying complex transformation
>     - also provide parser class full qualified name, but put jar file into
> eagle installation directory
> 3. Eagle UI provides a button for user to kickoff the topology
> 4. Eagle backend provides a scheduler which polls topology creation events
> and try to kick off the topology
>
> Actually Eagle should manage lifecycle of topology.
>
>
> Thanks
> Edward
>
>
>
>
> On 11/10/15, 15:44, "Zhang, Edward (GDI Hadoop)" <yonzh...@ebay.com>
> wrote:
>
> >I think over this problem again, now I want to bring up some
> >considerations we should take into account while we do design.
> >
> >1. Simple general-purpose monitoring framework and how we model stream are
> >two different problems
> >   Simple general-purpose monitoring framework tries to onboard a new data
> >source without any coding required.
> >   How we model stream is the process of creating schema for stream,
> >either through Eagle web service API or through local text file.
> >
> >2. For simple general-purpose monitoring framework, we need do the
> >following
> >   2.1 In UI, ask user to describe data source schema.
> >   2.2 In UI, ask user to transform data into alert-able data by simple
> >regex, filtering etc.
> >   2.3 In UI, ask user to describe the data to be evaluated with policy
> >   2.4 Eagle generates stream schema in eagle database
> >   2.5 Eagle generates intermediate bolts to reflect data processing logic
> >
> >3. For stream modeling
> >   2.1 Eagle web service should be source-of-truth
> >   2.2 We can use local text file to describe stream but that is not
> >perfect. I strongly suggest we don’t do that.If we want to do that, that
> >only happens in dev environment.
> >   2.3 The problem with stream modeling is not that schema is persisted in
> >eagle service, but how easily to use the API. Instead of using local text
> >file, we should simplify the process of creating schema in Eagle service.
> >Why not we think of the following 2 methods:
> >      2.3.1 use UI to define stream schema
> >      2.3.2 use CLI to define stream schema for example to support JDBC
> >interface which people are very familiar with.
> >
> >We should not be afraid of using eagle service as source of truth.
> >Actually not only schema but also policy, sensitivity information are all
> >stored in eagle service, we should not duplicate all of them into local
> >file. That is not worthy.
> >
> >Instead, we should simplify using eagle service for stream definition,
> >policy definition, sensitivity definition by all means.
> >
> >Thanks
> >Edward Zhang
> >
> >
> >On 11/10/15, 0:47, "Hao Chen" <h...@apache.org> wrote:
> >
> >>As the stream schema model, you could refer to:
> >>
> >>stream {
> >>    name = "MonitoredStream"
> >>    executor = "MonitoredStream"
> >>    attributes = [
> >>        {
> >>            name = "value",
> >>            type = "double",
> >>            // more attribute properties
> >>        },
> >>        // more attributes definition
> >>    ]
> >>}
> >>
> >>As to backend service, you could refer to:
> >>
> >>## AlertStreamService: alert streams generated from data source
> >>echo ""
> >>echo "Importing AlertStreamService for HDFS... "
> >>curl -u ${EAGLE_SERVICE_USER}:${EAGLE_SERVICE_PASSWD} -X POST -H
> >>'Content-Type:application/json'
> >>"http://
> ${EAGLE_SERVICE_HOST}:${EAGLE_SERVICE_PORT}/eagle-service/rest/en
> >>t
> >>ities?serviceName=AlertStreamService"
> >>-d
> >>'*[{"prefix":"alertStream","tags":{"dataSource":"hdfsAuditLog","streamNam
> >>e
> >>":"hdfsAuditLogEventStream"},"desc":"alert
> >>event stream from hdfs audit log"}]*'
> >>
> >>## AlertExecutorService: what alert streams are consumed by alert
> >>executor
> >>echo ""
> >>echo "Importing AlertExecutorService for HDFS... "
> >>curl -u ${EAGLE_SERVICE_USER}:${EAGLE_SERVICE_PASSWD} -X POST -H
> >>'Content-Type:application/json'
> >>"http://
> ${EAGLE_SERVICE_HOST}:${EAGLE_SERVICE_PORT}/eagle-service/rest/en
> >>t
> >>ities?serviceName=AlertExecutorService"
> >>-d
> >>'*[{"prefix":"alertExecutor","tags":{"dataSource":"hdfsAuditLog","alertEx
> >>e
> >>cutorId":"hdfsAuditLogAlertExecutor","streamName":"hdfsAuditLogEventStrea
> >>m
> >>"},"desc":"alert
> >>executor for hdfs audit log event stream"}]*'
> >>
> >>## AlertStreamSchemaService: schema for event from alert stream
> >>echo ""
> >>echo "Importing AlertStreamSchemaService for HDFS... "
> >>curl -u ${EAGLE_SERVICE_USER}:${EAGLE_SERVICE_PASSWD} -X POST -H
> >>'Content-Type:application/json'
> >>"http://
> ${EAGLE_SERVICE_HOST}:${EAGLE_SERVICE_PORT}/eagle-service/rest/en
> >>t
> >>ities?serviceName=AlertStreamSchemaService"
> >>-d
> >>'*[{"prefix":"alertStreamSchema","tags":{"dataSource":"hdfsAuditLog","str
> >>e
> >>amName":"hdfsAuditLogEventStream","attrName":"src"},"attrDescription":"so
> >>u
> >>rce
> >>directory or file, such as
> >>/tmp","attrType":"string","category":"","attrValueResolver":"eagle.servic
> >>e
> >>.security.hdfs.resolver.HDFSResourceResolver"},{"prefix":"alertStreamSche
> >>m
> >>a","tags":{"dataSource":"hdfsAuditLog","streamName":"hdfsAuditLogEventStr
> >>e
> >>am","attrName":"dst"},"attrDescription":"destination
> >>directory, such as
> >>/tmp","attrType":"string","category":"","attrValueResolver":"eagle.servic
> >>e
> >>.security.hdfs.resolver.HDFSResourceResolver"},{"prefix":"alertStreamSche
> >>m
> >>a","tags":{"dataSource":"hdfsAuditLog","streamName":"hdfsAuditLogEventStr
> >>e
> >>am","attrName":"host"},"attrDescription":"hostname,
> >>such as
> >>localhost","attrType":"string","category":"","attrValueResolver":""},{"pr
> >>e
> >>fix":"alertStreamSchema","tags":{"dataSource":"hdfsAuditLog","streamName"
> >>:
> >>"hdfsAuditLogEventStream","attrName":"timestamp"},"attrDescription":"mill
> >>i
> >>seconds
> >>of the
> >>datetime","attrType":"long","category":"","attrValueResolver":""},{"prefi
> >>x
> >>":"alertStreamSchema","tags":{"dataSource":"hdfsAuditLog","streamName":"h
> >>d
> >>fsAuditLogEventStream","attrName":"allowed"},"attrDescription":"true,
> >>false or
> >>none","attrType":"bool","category":"","attrValueResolver":""},{"prefix":"
> >>a
> >>lertStreamSchema","tags":{"dataSource":"hdfsAuditLog","streamName":"hdfsA
> >>u
> >>ditLogEventStream","attrName":"user"},"attrDescription":"process
> >>user","attrType":"string","category":"","attrValueResolver":""},{"prefix"
> >>:
> >>"alertStreamSchema","tags":{"dataSource":"hdfsAuditLog","streamName":"hdf
> >>s
> >>AuditLogEventStream","attrName":"cmd"},"attrDescription":"file/directory
> >>operation, such as getfileinfo, open, listStatus and so
> >>on","attrType":"string","category":"","attrValueResolver":"eagle.service.
> >>s
> >>ecurity.hdfs.resolver.HDFSCommandResolver"},{"prefix":"alertStreamSchema"
> >>,
> >>"tags":{"dataSource":"hdfsAuditLog","streamName":"hdfsAuditLogEventStream
> >>"
> >>,"attrName":"sensitivityType"},"attrDescription":"mark
> >>such as AUDITLOG,
> >>SECURITYLOG","attrType":"string","category":"","attrValueResolver":"eagle
> >>.
> >>service.security.hdfs.resolver.HDFSSensitivityTypeResolver"},{"prefix":"a
> >>l
> >>ertStreamSchema","tags":{"dataSource":"hdfsAuditLog","streamName":"hdfsAu
> >>d
> >>itLogEventStream","attrName":"securityZone"},"attrDescription":"","attrTy
> >>p
> >>e":"string","category":"","attrValueResolver":""}]*
> >>'
> >>
> >>Regards,
> >>Hao
> >>
> >>
> >>On Tue, Nov 10, 2015 at 4:29 PM, 蒋吉麟 <smith3...@gmail.com> wrote:
> >>
> >>> Agree. Use api create stream schema is not user friendly. If not
> >>>familiar
> >>> with create entity structure, it's not easy to use. I think we can
> >>>design
> >>> the UI for simple add the stream schema. :)
> >>>
> >>> 2015-11-10 16:06 GMT+08:00 Hao Chen <h...@apache.org>:
> >>>
> >>> > *Jirap*
> >>> >
> >>> > https://issues.apache.org/jira/browse/EAGLE-5
> >>> >
> >>> > *Use Cases*
> >>> >
> >>> > Currently Eagle supports very complex data processing pipeline for
> >>>hadoop
> >>> > audit/security logs,  but I think we reuse some valuable components
> >>>in
> >>> > Eagle:
> >>> >
> >>> > 1) distributed policy engine
> >>> >
> >>> > 2) highly abstracted streaming program API
> >>> >
> >>> > 3) user-friendly policy & alert management UI
> >>> >
> >>> >  for more general cases like what traditional monitoring produces do.
> >>> >
> >>> > *Use Case One:* For example, as to ops team like DBA, Hardware or
> >>>Cloud
> >>> > team, lots of users just would assume that given monitoring data
> >>>format
> >>> is
> >>> > known like typical time series data points, in such case, user just
> >>>need
> >>> to
> >>> > tell Eagle what's the stream schema and preprocess the data into
> >>>kafka
> >>> with
> >>> > external program like scripts or agents, and Eagle could provide a
> >>> generic
> >>> > topology to monitor the stream without any programming, just like
> >>>most
> >>> > traditional monitoring products' paradigm.
> >>> >
> >>> > *Use Case Two:  *Some advanced users with development skill may want
> >>>to
> >>> > easily use Eagle streaming program API for process complex monitoring
> >>> data
> >>> > like some complex logs, connect to Eagle's metadata engine for
> >>>managing
> >>> > policy in UI, and execute the policy in eagle distributed policy
> >>>engine
> >>> in
> >>> > real-time.
> >>> >
> >>> > *Design*
> >>> >
> >>> > So that we need to do following works:
> >>> >
> >>> > 1. Implement a generic pipeline topology as starting like: Kafka ->
> >>> > EventParser(JSON) -> Metadata Manager -> PolicyEngine which could be
> >>> reused
> >>> > for lots of  simple use cases like metrics monitoring.
> >>> >
> >>> > 2. Allow to import or design stream schema in UI. Today, we only
> >>>assume
> >>> the
> >>> > stream schema is already defined in databases, but as to most general
> >>> > cases, we should allow to define stream schema by eagle tool like UI
> >>>for
> >>> > more generic purpose.
> >>> >
> >>> > 3. For advanced users like developers, we should make our streaming
> >>> > framework more easy for use. One of the most critical parts is
> >>>developers
> >>> > has to define stream schema in database, and write code
> >>>independently. In
> >>> > fact, for most cases like hadoop security monitoring, the schema will
> >>> never
> >>> > change independently, in such case, we should even define the stream
> >>> schema
> >>> > in code, even we could also define policy inline as well, so we could
> >>>run
> >>> > the eagle monitoring engine without metadata store (hbase).
> >>> >
> >>> > Regards,
> >>> > Hao Chen
> >>> >
> >>>
> >
>
>

Re: [Discuss] Minimal General-purpose Monitoring Engine Proposal

Reply via email to