The idea would be to use the online optimizer: first training the model on a 
whole day’s worth of data to establish a model foothold, finding anomalies 
within that first day. From then on minibatches would be brought in (near real 
time) to further train the model and evaluate the most recent anomalies. Do you 
have thoughts on this topic Giacomo? Are you hoping to contribute?

Brandon

On 6/20/17, 10:01 AM, "Giacomo Bernardi" <[email protected]> wrote:

    Thanks.
    I wasn't referring to extra time based series, but to the topic
    modelling and anomaly detection itself. So, plan is to use
    OnlineLDAOptimizer with mini-batches of the last (few?) minutes, then?
    
    G.
    
    
    On 20 June 2017 at 17:45, Edwards, Brandon <[email protected]> 
wrote:
    > Giacomo,
    > Spark has an online optimizer for LDA which would enable the use of LDA 
in a mini-batch or streaming use case. However, if you are talking about 
machine learning that would look for anomalies that incorporate time-based 
features, we would like to explore this. It’s on the road map, but is not being 
worked on right now. We have thought of including new time based features into 
the LDA model, and/or training additional time series models to be included 
with LDA in a model-ensemble.
    > Brandon
    >
    > On 6/20/17, 8:58 AM, "Giacomo Bernardi" <[email protected]> wrote:
    >
    >     Hi Brandon and all,
    >     I'm resuming this thread to check whether any thought has already been
    >     given to such "streaming use case".
    >
    >     Are you planning of somehow using streaming-LDA in that case too? Or
    >     something different (fancy RNNs? HTM?) to model the state of each IP?
    >
    >     Thanks,
    >     Giacomo
    >
    >
    >     On 25 May 2017 at 18:27, Edwards, Brandon <[email protected]> 
wrote:
    >
    >     > The Spot team feels that changes are needed to this ‘feedback’
    >     > functionality, and see these changes as happening concurrent with
    >     > improvements to the ability for context from an LDA model trained 
on a given
    >     > batch of data to be carried forward to the next training run (or 
even
    >     > training in a streaming use case). The value of ‘feedback’ is 
dependent on
    >     > the quality of the model-context we can carry over.
    >
    >
    

Reply via email to