Re: [DISCUSS] Persisting user data

Ryan Merriman Thu, 03 Aug 2017 06:16:26 -0700

Spring is JDBC-generic so I think we're good there.  Improving our docs on
this topic is being discussed in https://github.com/apache/metron/pull/646
so hopefully this will be clear once that's worked out.


Simon is correct, I found out the hard way that Hibernate is not an option
because of it's license.  I think EclipseLink would be a good alternative.
I've seen it used in other open source projects (Ambari for example) and I
was able to get it working in a POC without much effort.

On Thu, Aug 3, 2017 at 5:26 AM, Simon Elliston Ball <
[email protected]> wrote:

> Anything spring based is likely multi-db by definition as long as a we
> pick a good friendly ORM (not hibernate because licensing problems with
> apache, eclipselink?) But I suspect we should pick a good default and that
> that default should be postgres.
>
> > On 3 Aug 2017, at 10:24, Casey Stella <[email protected]> wrote:
> >
> > I'd vote for a DB-based solution, but I'd argue that any solution
> shouldn't
> > be database specific (i.e. postgres), but JDBC-generic.  People and
> > organizations have very strong views regarding databases and I'd prefer
> to
> > side-step those holy wars by being agnostic.
> >
> > On Wed, Aug 2, 2017 at 9:36 PM, Ryan Merriman <[email protected]>
> wrote:
> >
> >> Spring supports a variety of databases including Postgres.  I have no
> >> problem with using Postgres instead of MySQL.
> >>
> >> On Wed, Aug 2, 2017 at 3:32 PM, Simon Elliston Ball <
> >> [email protected]> wrote:
> >>
> >>> Agreed on Postgres. It's a lot easier to work with license-wise in
> apache
> >>> projects, and has a lot of the capability we need here, especially if
> we
> >>> can find a sensible ORM. Anyone got any thoughts on what would work
> >> there?
> >>>
> >>> Simon
> >>>
> >>>> On 2 Aug 2017, at 21:21, Matt Foley <[email protected]> wrote:
> >>>>
> >>>> Hi Ryan,
> >>>> Zookeeper has a default (and seldom changed) max znode size of 1MB,
> but
> >>> it is “designed to store data on the order of kilobytes in size.”[1]
> And
> >>> it’s not really intended for frequently-changing data, which is okay
> >> here.
> >>> But I just included it for completeness, I’m not advocating for its use
> >>> here.
> >>>>
> >>>> I agree with you that the problem, especially because it includes
> >> shared
> >>> config, would fit well in a db.  I’d suggest you consider PostgreSQL
> >> rather
> >>> than MySQL, as postgres is built into Redhat 6 and 7, and Ambari now
> uses
> >>> it by default, so an available server might be conveniently at hand in
> >> most
> >>> deployments.  Definitely assume the user will want to use an external
> db
> >>> instance, rather than one dedicated to this use.  Conveniently Postgres
> >>> also has a native REST interface, with the usual authorization options.
> >>>>
> >>>> Never mind about Ambari Views for now.  It’s just a way to get GUI
> >>> dashboards without writing all the infrastructure for it, which as you
> >> say
> >>> is somewhat water under the bridge.
> >>>> Cheers,
> >>>> --Matt
> >>>>
> >>>> [1] https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html
> >>>>
> >>>>
> >>>>
> >>>> On 8/2/17, 12:34 PM, "Ryan Merriman" <[email protected]> wrote:
> >>>>
> >>>>   Matt,
> >>>>
> >>>>   Thank you for the suggestions.  I forgot to include Zookeeper.  Are
> >>> there
> >>>>   any tradeoffs we should be aware of if we decide to use Zookeeper?
> >>> Are
> >>>>   there guidelines for how much data can be stored in Zookeeper?
> >>>>
> >>>>   To answer your questions:
> >>>>
> >>>>   1.  I think both use cases make sense so a combination of shared and
> >>>>   personal.
> >>>>   2.  I was planning on managing authorization in the REST layer.  For
> >>> now
> >>>>   viewer login auth (which is really REST auth) will suffice but we
> >>> might
> >>>>   consider other methods since authentication is pluggable here.
> >>>>   3.  I had not considered Ambari Views since this will support an
> >>> existing
> >>>>   UI.  How would Ambari Views help us here?
> >>>>
> >>>>   I will proceed initially with a saved search POC using a relational
> >>>>   database unless you think that is a bad idea or there are other
> >> better
> >>>>   options.  Hopefully an example will further the discussion.
> >>>>
> >>>>   Ryan
> >>>>
> >>>>>   On Wed, Jul 26, 2017 at 6:31 PM, Matt Foley <[email protected]>
> >>> wrote:
> >>>>>
> >>>>> There’s a couple other places you could put config info (but maybe
> not
> >>>>> saved searches):
> >>>>> -  Zookeeper
> >>>>> -  metron-alerts-ui/config.xml or config.json  file
> >>>>> -  the Ambari database, whichever it happens to be
> >>>>>
> >>>>> Questions that influence the decision include:
> >>>>> 1. Should there be one configuration shared among users, or strictly
> >>>>> per-user config?  Or a combination of shared and personal?
> >>>>> 2. What security do you wish to maintain on changing those settings,
> >>> both
> >>>>> shared and personal?  What authentication/authorization scheme will
> >> you
> >>>>> use?  Is viewer login auth sufficient for this?
> >>>>> 3. Will you assume Ambari exists?  Did you consider using Ambari
> Views
> >>> as
> >>>>> the basis? (https://cwiki.apache.org/confluence/display/AMBARI/Views
> >> )
> >>>>>
> >>>>> On 7/26/17, 2:54 PM, "Ryan Merriman" <[email protected]> wrote:
> >>>>>
> >>>>>   In anticipation of METRON-988 being merged into master, there will
> >>> be a
> >>>>>   need to persist user preferences such as UI layout, saved searches,
> >>>>> search
> >>>>>   history, etc.  I think where and how we persist this data should be
> >>>>>   discussed in order to facilitate a design.  This data won't be
> >> large
> >>> in
> >>>>>   scale and may or may not be relational.  The initial features I am
> >>>>> aware of
> >>>>>   don't require a relational model but I'm sure there will be some
> >> that
> >>>>> do in
> >>>>>   the future.  I'm also assuming this code will live in the REST
> >>>>> application
> >>>>>   but someone correct me if there is a reason to keep it somewhere
> >>> else.
> >>>>>
> >>>>>   I think it would be preferable to leverage something that is
> >> already
> >>>>> in our
> >>>>>   stack and available as a dependency.  However I would not be
> >> against
> >>>>> adding
> >>>>>   something if it really were the right tool for the job.  Assuming
> >>>>> others
> >>>>>   agree we should stick with out current stack, I see these options:
> >>>>>
> >>>>>      - MySQL (or other relational database)
> >>>>>         - good fit for the size of data
> >>>>>         - relational capabilities
> >>>>>         - an ORM framework will be necessary which will increase our
> >>>>>         dependencies and complexity
> >>>>>      - HBase
> >>>>>         - client setup and code will likely be simpler and less
> >> complex
> >>>>>         - limited data model
> >>>>>      - Elasticsearch
> >>>>>         - json is a convenient data model
> >>>>>         - we already store user preferences here (Kibana dashboards)
> >>>>>         - we have abstracted our search engine interactions in
> >> several
> >>>>> places
> >>>>>         and would have to here too
> >>>>>
> >>>>>   Elasticsearch is out for me because we view search engines as
> >>>>> pluggable.  I
> >>>>>   think HBase would be the easiest to implement and get working but
> >> I'm
> >>>>>   worried we'll have similar use cases that won't be a good fit for
> >>>>> HBase.
> >>>>>   In that case we would need to come up with an alternative
> >> persistence
> >>>>>   solution anyways.  I think MySQL is a good fit long term but I'm
> >>>>> concerned
> >>>>>   about adding a heavy ORM framework.  Also, we can't use Hibernate
> >>>>> because
> >>>>>   it is not license friendly.
> >>>>>
> >>>>>   Does anyone have any thoughts on these options or other ideas?
> >>>>>
> >>>>>   This requirement also brings up another topic that is outside of
> >> this
> >>>>>   discussion.  Should we reevaluate our authentication strategy?
> >>>>> Currently
> >>>>>   the REST application uses JDBC for this but if we decide a
> >> different
> >>>>>   mechanism is better then we no longer need a relational database.
> >>> This
> >>>>>   might affect our decision to use MySQL for this kind of data
> >>>>> persistence.
> >>>>>
> >>>>>   Ryan
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] Persisting user data

Reply via email to