[ 
https://issues.apache.org/jira/browse/METRON-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Casey Stella updated METRON-265:
--------------------------------
    Comment: was deleted

(was: James,

Let me answer these inline:

> 1. We need to build in logic into a bolt that it discovers and uses the model 
> service that is local to it (on the same node). So there should be a 
> requirement for every node that carries a storm ML bolt to also have the ML 
> scoring service

Agreed, we should prefer local execution wherever possible, but we may have 
many models and each of them could be rather large (tree-based models can get 
pretty chunky memory wise), so enforcing that a storm node house *every* model 
might be difficult.  Also, from a technical perspective, since we are using 
Yarn to manage the deployment of the model services, we may be limited to what 
Yarn labels allows us in terms of locality.  I would hope that we can do a lot 
to alleviate the transport burden from intelligent caching of results within 
the stellar transformation function, but I think we can do a lot to ensure high 
probability of local access.

> 2. We would want caching on the bolt itself so a cache hit means there is no 
> call to the model service at all. If you put a cache on the model side that 
> means that the call still needs to be made

Definitely, agreed. I'll make that clearer in the next revision of the doc (EOD 
today).

>3. I think the way this needs to work is that you train your model once on a 
>global data set to come up with a global model and then push the model out to 
>the scoring service which is local to the ML scoring bolt. The bolt needs to 
>reference zookeeper to figure out which service is local to it or 
>alternatively assume that it always has a local service and connect to it

So, as it stands, the architecture suggests the following workflow:
* Model is trained and uploaded to HDFS
* New model is registered within Zookeeper
* Each REST endpoint that services some version of that model loads the new 
model from HDFS (and possibly unloads older models depending on configuration).
* Storm bolts that are local to the REST endpoints that service models prefer 
those endpoints over ones across the cluster.

Is that sufficiently close in your mind?

> 4. If we are using Stellar for scoring then we may be able to get away with 
> something faster than REST. REST is good for abstracting the complexity of 
> the call to the model. But if we use Stellar to abstract that call we can 
> potentially get away with something as basic as a socket threat pool to the 
> service. Thoughts? We probably still need to provide a REST endpoint for 
> testing and validating the model

So this came up on the mailing list discussion too.  I don't mind using 
alternative communication models, honestly.  My choice of REST was mostly due 
to the fact that I want to support multi-language (I think that's a very 
important requirement) and there are REST libraries for pretty much everything. 
 I do agree, however, that JSON transport can get chunky.  How about a 
compromise and use REST, but the input and output payloads for scoring are Maps 
encoded in [msgpack|http://msgpack.org/index.html] rather than JSON.  There is 
a msgpack library for pretty much every language out there (almost) and 
certainly all of the ones we'd like to target.

The other option is to just create and expose protobuf bindings (thrift doesn't 
have a native client for R) for all of the languages that we want to support.  
I'm perfectly fine with that, but I had some worries about the maturity of the 
bindings.

The final option, as you suggest, is to just use raw sockets.  I think if we 
went that route, we might have to create a layer for each language rather than 
relying on model creators to create a TCP server.  I thought that might be a 
bit onerous for a MVP.

Given the discussion, though, what it has made me aware of is that we might not 
want to dictate a transport mechanism at all, but rather allow that to be 
pluggable and extensible (so each model would be associated with a transport 
mechanism handler that would know how to communicate to it.  We would provide 
default mechanisms for msgpack over REST, JSON over REST and maybe msgpack over 
raw TCP.)  Thoughts?

)

> Provide Model as a Service infrastructure to Metron
> ---------------------------------------------------
>
>                 Key: METRON-265
>                 URL: https://issues.apache.org/jira/browse/METRON-265
>             Project: Metron
>          Issue Type: New Feature
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>             Fix For: 0.2.1BETA
>
>         Attachments: Model Management Infrastructure in Metron.docx
>
>
> One of the main features envisioned and requested is the ability to augment 
> the threat intelligence and enrichment processes with insights derived from 
> machine learning or statistical models.  The challenges with this sort of 
> infrastructure are
> •     Applying the model may be sufficiently computationally/resource 
> intensive that we need to support scaling via load balancing, which will 
> require service discovery and management.
> •     Models require out of band and frequent training to react to growing 
> threats and new patterns that emerge.
> •     Models should be language/environment agnostic as much as possible.  
> These should include small-data and big-data libraries and languages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to