Spring is JDBC-generic so I think we're good there. Improving our docs on this topic is being discussed in https://github.com/apache/metron/pull/646 so hopefully this will be clear once that's worked out.
Simon is correct, I found out the hard way that Hibernate is not an option because of it's license. I think EclipseLink would be a good alternative. I've seen it used in other open source projects (Ambari for example) and I was able to get it working in a POC without much effort. On Thu, Aug 3, 2017 at 5:26 AM, Simon Elliston Ball < [email protected]> wrote: > Anything spring based is likely multi-db by definition as long as a we > pick a good friendly ORM (not hibernate because licensing problems with > apache, eclipselink?) But I suspect we should pick a good default and that > that default should be postgres. > > > On 3 Aug 2017, at 10:24, Casey Stella <[email protected]> wrote: > > > > I'd vote for a DB-based solution, but I'd argue that any solution > shouldn't > > be database specific (i.e. postgres), but JDBC-generic. People and > > organizations have very strong views regarding databases and I'd prefer > to > > side-step those holy wars by being agnostic. > > > > On Wed, Aug 2, 2017 at 9:36 PM, Ryan Merriman <[email protected]> > wrote: > > > >> Spring supports a variety of databases including Postgres. I have no > >> problem with using Postgres instead of MySQL. > >> > >> On Wed, Aug 2, 2017 at 3:32 PM, Simon Elliston Ball < > >> [email protected]> wrote: > >> > >>> Agreed on Postgres. It's a lot easier to work with license-wise in > apache > >>> projects, and has a lot of the capability we need here, especially if > we > >>> can find a sensible ORM. Anyone got any thoughts on what would work > >> there? > >>> > >>> Simon > >>> > >>>> On 2 Aug 2017, at 21:21, Matt Foley <[email protected]> wrote: > >>>> > >>>> Hi Ryan, > >>>> Zookeeper has a default (and seldom changed) max znode size of 1MB, > but > >>> it is “designed to store data on the order of kilobytes in size.”[1] > And > >>> it’s not really intended for frequently-changing data, which is okay > >> here. > >>> But I just included it for completeness, I’m not advocating for its use > >>> here. > >>>> > >>>> I agree with you that the problem, especially because it includes > >> shared > >>> config, would fit well in a db. I’d suggest you consider PostgreSQL > >> rather > >>> than MySQL, as postgres is built into Redhat 6 and 7, and Ambari now > uses > >>> it by default, so an available server might be conveniently at hand in > >> most > >>> deployments. Definitely assume the user will want to use an external > db > >>> instance, rather than one dedicated to this use. Conveniently Postgres > >>> also has a native REST interface, with the usual authorization options. > >>>> > >>>> Never mind about Ambari Views for now. It’s just a way to get GUI > >>> dashboards without writing all the infrastructure for it, which as you > >> say > >>> is somewhat water under the bridge. > >>>> Cheers, > >>>> --Matt > >>>> > >>>> [1] https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html > >>>> > >>>> > >>>> > >>>> On 8/2/17, 12:34 PM, "Ryan Merriman" <[email protected]> wrote: > >>>> > >>>> Matt, > >>>> > >>>> Thank you for the suggestions. I forgot to include Zookeeper. Are > >>> there > >>>> any tradeoffs we should be aware of if we decide to use Zookeeper? > >>> Are > >>>> there guidelines for how much data can be stored in Zookeeper? > >>>> > >>>> To answer your questions: > >>>> > >>>> 1. I think both use cases make sense so a combination of shared and > >>>> personal. > >>>> 2. I was planning on managing authorization in the REST layer. For > >>> now > >>>> viewer login auth (which is really REST auth) will suffice but we > >>> might > >>>> consider other methods since authentication is pluggable here. > >>>> 3. I had not considered Ambari Views since this will support an > >>> existing > >>>> UI. How would Ambari Views help us here? > >>>> > >>>> I will proceed initially with a saved search POC using a relational > >>>> database unless you think that is a bad idea or there are other > >> better > >>>> options. Hopefully an example will further the discussion. > >>>> > >>>> Ryan > >>>> > >>>>> On Wed, Jul 26, 2017 at 6:31 PM, Matt Foley <[email protected]> > >>> wrote: > >>>>> > >>>>> There’s a couple other places you could put config info (but maybe > not > >>>>> saved searches): > >>>>> - Zookeeper > >>>>> - metron-alerts-ui/config.xml or config.json file > >>>>> - the Ambari database, whichever it happens to be > >>>>> > >>>>> Questions that influence the decision include: > >>>>> 1. Should there be one configuration shared among users, or strictly > >>>>> per-user config? Or a combination of shared and personal? > >>>>> 2. What security do you wish to maintain on changing those settings, > >>> both > >>>>> shared and personal? What authentication/authorization scheme will > >> you > >>>>> use? Is viewer login auth sufficient for this? > >>>>> 3. Will you assume Ambari exists? Did you consider using Ambari > Views > >>> as > >>>>> the basis? (https://cwiki.apache.org/confluence/display/AMBARI/Views > >> ) > >>>>> > >>>>> On 7/26/17, 2:54 PM, "Ryan Merriman" <[email protected]> wrote: > >>>>> > >>>>> In anticipation of METRON-988 being merged into master, there will > >>> be a > >>>>> need to persist user preferences such as UI layout, saved searches, > >>>>> search > >>>>> history, etc. I think where and how we persist this data should be > >>>>> discussed in order to facilitate a design. This data won't be > >> large > >>> in > >>>>> scale and may or may not be relational. The initial features I am > >>>>> aware of > >>>>> don't require a relational model but I'm sure there will be some > >> that > >>>>> do in > >>>>> the future. I'm also assuming this code will live in the REST > >>>>> application > >>>>> but someone correct me if there is a reason to keep it somewhere > >>> else. > >>>>> > >>>>> I think it would be preferable to leverage something that is > >> already > >>>>> in our > >>>>> stack and available as a dependency. However I would not be > >> against > >>>>> adding > >>>>> something if it really were the right tool for the job. Assuming > >>>>> others > >>>>> agree we should stick with out current stack, I see these options: > >>>>> > >>>>> - MySQL (or other relational database) > >>>>> - good fit for the size of data > >>>>> - relational capabilities > >>>>> - an ORM framework will be necessary which will increase our > >>>>> dependencies and complexity > >>>>> - HBase > >>>>> - client setup and code will likely be simpler and less > >> complex > >>>>> - limited data model > >>>>> - Elasticsearch > >>>>> - json is a convenient data model > >>>>> - we already store user preferences here (Kibana dashboards) > >>>>> - we have abstracted our search engine interactions in > >> several > >>>>> places > >>>>> and would have to here too > >>>>> > >>>>> Elasticsearch is out for me because we view search engines as > >>>>> pluggable. I > >>>>> think HBase would be the easiest to implement and get working but > >> I'm > >>>>> worried we'll have similar use cases that won't be a good fit for > >>>>> HBase. > >>>>> In that case we would need to come up with an alternative > >> persistence > >>>>> solution anyways. I think MySQL is a good fit long term but I'm > >>>>> concerned > >>>>> about adding a heavy ORM framework. Also, we can't use Hibernate > >>>>> because > >>>>> it is not license friendly. > >>>>> > >>>>> Does anyone have any thoughts on these options or other ideas? > >>>>> > >>>>> This requirement also brings up another topic that is outside of > >> this > >>>>> discussion. Should we reevaluate our authentication strategy? > >>>>> Currently > >>>>> the REST application uses JDBC for this but if we decide a > >> different > >>>>> mechanism is better then we no longer need a relational database. > >>> This > >>>>> might affect our decision to use MySQL for this kind of data > >>>>> persistence. > >>>>> > >>>>> Ryan > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>> > >> > >
