Re: [DISCUSS] Top domains enrichment config/extractor management

Matt Foley Fri, 24 Feb 2017 16:35:04 -0800

+1 to using an Ambari view.  As for just presenting a JSON editor, that’s a lot 
better (for a first stab) than presenting a plain text editor :-)
And doing so also makes sense for an “advanced” tab, just as Ambari typically
exposes a text editor in an advanced tab for text config files.



On 2/24/17, 3:15 PM, "Ryan Merriman" <merrim...@gmail.com> wrote:

    +1 to an Ambari view over the management UI.  If we're going to go to the
    trouble of exposing this feature through a UI it should be intuitive and
    easy to use.  Simply exposing a json editor in Ambari gets a -1 from me.
    
    Are we keeping track of which enrichments have been loaded?  I believe the
    enrichment loader currently does this by adding a new enrichment type to
    the various enrichment configs.  It's been a while since I've been in that
    part of the code so please correct me if it has evolved since then.  If my
    previous statement is true, then that's not ideal because a user should
    have a list of available enrichments to pick from.  If we use separate
    HBase tables for enrichment types then this problem goes away but if we
    continue to use one HBase table then there needs to be some kind of
    registry that is maintained by the enrichment loader.
    
    On Fri, Feb 24, 2017 at 4:46 PM, Michael Miklavcic <
    michael.miklav...@gmail.com> wrote:
    
    > The reason I posed this question to the community is because I started to
    > recognize some of the shortcomings of doing this solely through Ambari, as
    > you and Nick have pointed out. I think an Ambari view over the management
    > UI is a great idea. And I'd love to see us provide a more robust mechanism
    > for loading these enrichments via the management UI. As you said, perhaps
    > Ambari could be used to manage the ZK config around active
    > enrichments/locations (the "USE" part of it) while the management UI is
    > used for actually loading and managing the enrichments themselves?
    >
    >
    > On Fri, Feb 24, 2017 at 8:12 AM, Casey Stella <ceste...@gmail.com> wrote:
    >
    > > Late to chime in here, but I feel that we have discussed Ambari's role
    > > before and I think we should probably clarify, as a community a few
    > things
    > > with regards Ambari vs a management UI built around the REST PR 
currently
    > > under review.  (I promise, I will get to the topic at hand eventually ;)
    > :
    > >
    > >    - Where functionality should live
    > >    - Who is responsible for what
    > >
    > > I will now make a couple (possibly controversial) statements (some of
    > > which) we have actually discussed prior to this on the dev list:
    > >
    > >
    > >    - I view Ambari as managing the install and the static configuration
    > for
    > >    Metron.  For us, this would include zookeeper configs as well as
    > > topology
    > >    configuration.  This would be the persistent store of truth.
    > >    - I view Zookeeper to be our runtime configuration store for the
    > >    topologies.
    > >
    > >
    > >    - I view a management UI (and the Stellar Shell) as managing
    > >    functionality for interacting with the system.  Where it changes
    > >    configuration, it must go through Ambari.
    > >    - I believe the management UI should be exposed as an ambari view
    > >
    > > As such, I see the importation and management of enrichments, which is a
    > > data task, to be squarely in the purview of the management UI, whose job
    > is
    > > the care and feeding of the data.  That being said, any configuration
    > > changes to USE the enrichment should at least be routed through ambari,
    > but
    > > should be managed in the UI.
    > >
    > > Now the question becomes, should we have enrichment collateral (I'm
    > > including both hbase as well as geo or anything else we have) loaded at
    > > install-time.  I would argue that we should not.  Rather, we should
    > design
    > > the management UI so that the enrichments can be added easily, with a
    > > wizard to enable the use of the enrichment via stellar for a sensor
    > >
    > > On that topic, I think we are doing too much as part of our install.  I
    > > would argue that we shouldn't pre-load even the geo data or depend on it
    > > for the default parsers.
    > >
    > > Casey
    > >
    > >
    > >
    > > On Tue, Feb 21, 2017 at 6:31 PM, Michael Miklavcic <
    > > michael.miklav...@gmail.com> wrote:
    > >
    > > > With the work committed in
    > > > https://github.com/apache/incubator-metron/pull/445 and
    > > > https://github.com/apache/incubator-metron/pull/432, we now have a
    > > robust
    > > > and flexible means to import enrichment sources and transform their
    > > > contents as they are inserted into HBase. One of the main motivators
    > for
    > > > this new functionality was to add the ability to load top domain
    > rankings
    > > > from sources such as Alexa. The proposal is to make this type of
    > > enrichment
    > > > a top-level feature in Metron by introducing it to the Ambari
    > management
    > > UI
    > > > as a configurable set of properties in the MPack install. This comes
    > with
    > > > some options and challenges in how we want to manage the
    > configurations,
    > > > which I will outline below.
    > > >
    > > > *Use cases:*
    > > >
    > > >    - Single load of top domains file
    > > >    - Re-loading top domains file - need to be able to cleanup properly
    > > >    - Cleaning up/deleting old enrichment data (this is a general
    > feature
    > > >    that we currently lack - I think it is worth a separate Jira/PR for
    > > >    creating a MapReduce job that enables cleanup to occur).
    > > >    - Modifying default top domains file source - there are other
    > options
    > > >    besides Alexa. And users may want to load a file from local URI
    > since
    > > > many
    > > >    data centers do not have direct access to the internet.
    > > >    - Ability to modify the default extractor config JSON and tune the
    > > >    Stellar transformations for both the value and indicator 
transforms.
    > > > Allows
    > > >    more flexible handling of data based on other sources.
    > > >    - Loading multiple top domains source enrichments. (Maybe a 
separate
    > > PR
    > > >    for this if we even think it would be useful)
    > > >    - Updating the top domain enrichment - This needs to be an atomic
    > > >    operation in order to prevent incorrect data.
    > > >    - Rolling back to an older version of the top domains enrichment.
    > Also
    > > >    needs to be atomic.
    > > >    - Ability to schedule an enrichment load on schedule - we would 
like
    > > to
    > > >    defer this to an external scheduling mechanism, e.g. cron or 
Control
    > > M.
    > > > The
    > > >    enrichment loading system should have the necessary features to
    > enable
    > > > this
    > > >    type of automation without data integrity issues.
    > > >
    > > > *Considerations:*
    > > >
    > > >    - As mentioned above, we want to add this feature to the Ambari
    > MPack.
    > > >    This requires at least 2 parameters to work. We need the ability to
    > > > specify
    > > >    a URI as well as an extractor config.
    > > >    - How do we want to manage the extractor config? The most obvious
    > > >    solution is to provide a text field in Ambari with a default JSON
    > > > config.
    > > >    When a load is initiated, Ambari would place a fresh copy of the
    > > > extractor
    > > >    config in the /tmp/ directory. This is an ephemeral file that isn't
    > > > needed
    > > >    other than during a load.
    > > >    - It seems easy enough to have the load occur during the initial
    > > >    install, however subsequent loads would require a different
    > workflow.
    > > > How
    > > >    do folks feel about adding a set of dropdown options in the Ambari
    > UI
    > > > for
    > > >    loading, updating, and deleting the top domains enrichment? I
    > believe
    > > we
    > > >    are doing something similar for the ElasticSearch templates
    > currently.
    > > >    - In the case of atomic operations for updates and rollbacks, I
    > > propose
    > > >    we add a property to Zookeeper that is reference-able in the
    > > enrichment
    > > >    itself. The idea would be to create a "top-domains" property in ZK
    > > that
    > > >    points to an enrichment key with a load timestamp associated with
    > it,
    > > > e.g.
    > > >    top-domains_20170221042000. This would also allow a mapreduce job 
to
    > > be
    > > >    written that cleans up old enrichments. Another option is to create
    > a
    > > > new
    > > >    table in HBase if/when you update the enrichment and change the
    > > > enrichment
    > > >    config manually. Deleting an old enrichment would simply be a 
matter
    > > of
    > > >    dropping the table in HBase. A relevant discussion of the tradeoffs
    > of
    > > >    having many small tables versus 1 large table can be found here -
    > > >    http://grokbase.com/t/hbase/user/11bjbdw94q/multiple-
    > > > tables-vs-big-fat-table
    > > >    - In order to update or rollback an enrichment as mentioned above,
    > we
    > > >    would also ideally provide a mechanism for changing the rowkey
    > pointed
    > > > to
    > > >    by the enrichment.
    > > >
    > > > In summary of the use cases and considerations above, this boils down
    > to
    > > > how we'd like to leverage Ambari here. Do we want Ambari to handle 
only
    > > the
    > > > initial install/load and have end users be responsible on an ongoing
    > > basis
    > > > for updates (users would be responsible for copying or distributing 
the
    > > > extractor_config.json for instance), or do we want to enable Ambari to
    > > > manage the configuration ongoing and enable functionality for
    > reloading,
    > > > updating, and rollback?
    > > >
    > > > Best,
    > > > Mike
    > > >
    > >
    >

Re: [DISCUSS] Top domains enrichment config/extractor management

Reply via email to