Hope you don’t mind if I chime in. There are a couple very basic points which are in the documentation, but may not jump out at a new user, who is trying to learn Metron at the same time as MaaS.
1. In the thread below there is only a brief reference to the main documentation page for MaaS, at https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service Hopefully you’ve read it, but if not please do. 2. The “model”, with its required REST API, is expected to run in its own sub-system, which may or may not be co-resident with the Metron installation, depending on load considerations. Metron provides very useful optional infrastructure for YARN provisioning, deployment, and monitoring of co-resident models, as described in the above web page. However, the model subsystem is considered to be external to Metron itself, by design. 3. The interface to MaaS is via a few specific Stellar function outcalls, which access the model’s REST API, and configuration information in Zookeeper. These calls may be used anywhere Stellar is acceptable. The most logical place to use an outcall to MaaS is in a Stellar Enrichment bolt, but it also makes sense to use it in a Stellar field transformation in a Parser bolt. Hope this is useful, --Matt On 6/6/17, 10:43 AM, "Casey Stella" <ceste...@gmail.com> wrote: So, first off, it's not a basic question at all and thanks for asking it. I'm sure if it's not clear to you, then it's not clear to many and bears some reinforcement and clarification. - Metron does indeed enable the deployment and use of machine learning models on data ingested into Metron - Metron runs atop Hadoop (storm + kafka + hdfs + hbase), so you likely wouldn't run this successfully on a VM, but rather a cluster. We do support running Metron for demonstration purposes and development purposes inside a VM, but that's not a production configuration, I'd like to make clear. Models deployed via MaaS can be interacted with via Stellar on data ingested into Metron under a couple caveats. There are two ways to ingest data into Metron: - Via a packet capture sensor (fastcapa) to Kafka to the pcap storm topology, which writes directly to HDFS with no preamble or enrichment - Via another, lower velocity sensor (e.g. bro for deep packet inspection or yaf for flow data) which is routed to a parser topology, then to enrichment and finally to indexing We do not, at present, support interacting with models (or, indeed, any enrichment) on raw packet data (the first case above). We do, however, support it on the second usecase. The example at https://github.com/apache/ metron/tree/master/metron-analytics/metron-maas-service#example demonstrates ingesting web proxy data and using a dummy machine learning model to pick out domains which are synthetic and likely to represent communication to a botnet (the DGA model in that example is crude and could easily be replaced with the example I posed earlier, btw). Anyway, so for you to use your own ML model, you'd do the following: 1. Ingest the sensor data source that you want to ingest into a kafka topic 2. Create or reuse one of the existing parsers that we support to convert the data from your data source 3. Create your model (see https://gist.github.com/cestella/ 8dd83031b8898a732b6a5a60fce1b616 <https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616> as an example) 4. refer to your model from stellar 1. In the example I mentioned, we're doing that at https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#adjust-configurations-for-squid-to-call-model 2. You might consider doing it in the enrichment topology, but to get you started, doing it as a field trasnformation as in the example should suffice Hopefully that'll clear some things up. I'm about to give a talk about this next week at Dataworks summit, so I'll be sure to follow-up here with the deck. There's also a blog post that will eventually be going out with this walked through more directly. If I missed osmething or if something isn't clear yet, I'll be sure to keep at it. :) Best, Casey On Mon, Jun 5, 2017 at 1:21 PM, <sml...@libero.it> wrote: > Hello Casey, > > your answer makes something more clear, but not at all. > > My question about ML models was because somewhere on the web I read that > Metron comes with ML. > But maybe it's better to say that it supports ML models. > > If I understood well, I can run Metron in a virtual machine connected to > my network. With NIFI I can select the protocols/packets that I would store > (similar as Wireshark does). > > Then, I do not understand how to fill the data in to the ML algorithm. > > Can you try to explain me something more, or indicate any tutorial that > can explain the implementation process. > > For example if I have an SVM algo that I would test into Metron and that > ML algortihm has been developed in python using scikit-py. > > How can I do that? > > Thank you and I'm sorry for the very basic question. > > Best Regards, > > Simone > > Il 5 giugno 2017 alle 18.45 Casey Stella <ceste...@gmail.com> ha scritto: > > We do not ship any ML models currently with metron, just the infrastructure > to deploy your own models and interact with those models from within > Metron. That being said, you might be interested in > https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 That's > the code to take a DGA model written in scikit-learn from > https://github.com/ClickSecurity/data_hacking/tree/master/dga_detection > and > suitable for deployment via MaaS. > > If you want more information about MaaS, I'll be giving a talk on it next > week at DataWorks Summit and that deck will be public. > > On Mon, Jun 5, 2017 at 12:09 PM, <sml...@libero.it> wrote: > > Hello Simon, > > thank you for your prompt replay and for the link as well. > > I'm more confortable with Python. > > May I ask you if there is any example in python that I use as template to > receive network packets and then implement the machine learning algorithm? > > Moreover, where can I find documentation about the ML algorithm already > implemeneted into the Metron? > > Best Regards, > > Simone > > Il 5 giugno 2017 alle 18.00 Simon Elliston Ball < > si...@simonellistonball.com> ha scritto: > > Hi Simone, and welcome to the community. > > There are a number of extension points in Metron, the key ones being > around machine learning. I suggest taking a look at > https://github.com/apache/metron/tree/master/metron- > analytics/metron-maas-service for more information about the model as a > service. This is the bit that helps you add models in pretty much any > language that will run in a yarn container (python, R and spark models are > probably the most popular). > > Hope that helps, and looking forward to hearing more about your > research, and any contributions you feel like adding to the community. > > Simon > > On 5 Jun 2017, at 16:54, sml...@libero.it mailto: > sml...@libero.it wrote: > > Dear community, > > my name is Simone and I'm researcher in the field of > cybersecurity. > > I've just read about Apache Metron and I would ask: > > - > > does it use machine learning or artificial intelligence? > - > > can I extend the machine learining algo already present into > the Metron with mines? > - > > which is the language that I have to use to extend Metron > with my algorithms? > > Thank you. > > Best Regards, > > Simone > > > > >