Daniël,

These are good questions, and an especially good example of a Real
World Application that might be created with HTM. I'll do my best to
answer below.

On Thu, Aug 20, 2015 at 12:17 PM, D. Ducro <[email protected]> wrote:
>
> First I’ll begin explaining our situation regarding the traffic data. We
> contacted the NDW, the company which is responsible for storing all traffic
> data in the Netherlands. They agreed to facilitate us with traffic
> speeds/flows and events like incidents or road works. We want to combine the
> traffic data with weather data and find anomalies, and hopefully try to
> predict incidents.

I'm not sure how you'll predict traffic accidents. From my research
with NYC traffic data, HTM can certainly identify that a traffic
incident has occurred, and with raw traffic flow sensor data, should
be able to identify the vicinity in which the accident took place. But
because accidents are by their nature anomalies in the traffic flow,
it is going to be hard to identify them.

It may be possible, however, to calculate some probability that an
accident will impede traffic based upon predictions of different
sensors. For example, for each sensor model, if you looked deeper into
the prediction probability distribution, you might be able to extract
the probability that traffic will slow significantly in the future,
even if that is not the most probable prediction.

> We’ll be attending at The World Port Hackathon (September 4-5) in Rotterdam.
> At this event we would like to build a prototype proving our hypothesis that
> traffic accidents can be predicted.
>
> Our data is from the past three years in an area of Rotterdam.
> We have the following data from about hundred measurement sites:
>
>       ●      traffic speeds / min. / lane / vehicle-type
>       ●      traffic intensity / min. / lane / vehicle-type
>       ●      traffic events e.g. incidents, road work and more.
>
> The weather data is available from each Dutch weather station per hour.
> We would like to combine the data from the weather station in Rotterdam with
> all the traffic data.
>
> This is how we think we should approach it using HTM Engine:
>
>       ●      Define a model with the following fields:
>             ○      average traffic speed (int)
>             ○      average traffic intensity (int)

(From my experience with traffic flow data, these two values might
contain almost exactly the same patterns. One is sort of a function
the other. You can probably get away with only using one.)

>             ○      incident (close to this point) (boolean)
>             ○      horizontal visibility (in meters)
>             ○      rain (boolean)
>             ○      icing (boolean)
>             ○      snow (boolean)
>       ●      Create an api to communicate with HTM Engine
>       ●      Create a model for each data point

Before we get to the problems you have identified, there are missing
features in HTM engine that will disallow this approach. The biggest
one is that currently HTM Engine models only work against 1 input
field. This means that you can't create an HTM Engine model with 7
fields as you defined above without changes to the HTM Engine
framework.

The second thing is that HTM Engine models are anomaly-only models.
Currently, they do not store model predictions, even though that data
is generated by NuPIC. The HTM Engine was built to do anomaly
detection, so this was left out.

The good news is that adding prediction would probably be pretty easy.
Creating models with multiple input fields may be harder.

> There are two main problems with this approach in our opinion.
>
> The first issue is that we don't take account of the traffic flow across
> multiple points/highways. We've been told that there is a strong relation in
> the flow between some specific highways. These patterns are known and we
> think we need to find a way to use these connections to improve the context
> of the accidents.

There are certainly traffic correlations between roads. But your goal
is to predict accidents, not react to them, correct? So the
correlations only come into play after an accident has occurred. For
example, say sensor-1-model shows anomalies because of a traffic
slow-down, so sensor-2-model can predict that it will also see slow
traffic if it is also using the sensor-1 data as an input for
predicting traffic flow at sensor-2. But if your goal is to predict
accidents before they happen, this doesn't help you. It only helps
predict the propagation of slow traffic outwards from an accident
site.

> The second issue concerns the different weather factors, which are different
> per season. We can make specific models for the winter and summer so it
> includes temperature in the summer and icing/snow in the winter.

You have 3 years of data, which might be enough for NuPIC models to
learn some yearly patterns. Perhaps you don't need to have
winter/summer models?

If you do end up creating models that use weather data as input for
creating predictions at the traffic sensor level, this will have much
more of a tangible impact on predictions anyway. That data will
affectively incorporate the yearly weather cycles.

> But what we’re very interested in is how we can “connect” the data from
> multiple measurement sites.
>
> Another approach for the model can be:
>
>       ●      average traffic speed (int)
>       ●      average traffic intensity (int)
>       ●      incident (close to this point) (boolean)
>       ●      horizontal visibility (in meters)
>       ●      rain (boolean)
>       ●      icing (boolean) (in winter)
>       ●      snow (boolean) (in winter)
>       ●      temperature (in summer)
>       ●      related point A average traffic speed (int)
>       ●      related point A average traffic intensity (int)
>       ●      related point B average traffic speed (int)
>       ●      related point B average traffic intensity (int)

Ah, now I see that you're including an "incident" indicating that a
traffic incident exists at this time within a certain distance. I
imagine this is going to be the predictedField?

But the usefulness of the "incident" depends strongly on the quality
of the data you get from the NDW. If they provide "incidents" that
include a timestamp for the *time reported*, that is much different
from the *time of occurrence*. This is very important because it's a
matter of cause and effect. The incident causes the traffic slowdowns
to occur. The incidents will also cause anomalies to occur in the
models. If you train your models to predict incidents, and the
timestamp for each incident is actually occurring after the traffic
slowdowns in the surrounding area, this won't do well. These incident
timestamps MUST be the time of the incident occurrence, and must be
accurate.

> This also isn’t the ideal approach in our opinion.
> Nupic probably won’t give accurate anomalies/predictions with so many
> properties.

I think it may be better to remove the "traffic intensity" fields and
try to simplify the weather fields into one or two fields instead of
5.

>
> We thought of a third option, but we’re not sure how to approach it.
> What if we make separate smaller models per measurement point, this way we
> can find out which performs the best, something like:
>
>       ●      model A:
>             ○      average traffic speed (int)
>       ●      model B:
>             ○      average traffic intensity (int)
>       ●      model C
>             ○      average traffic speed (int)
>             ○      average traffic intensity (int)
>       ●      model X
>             ○      model A,B,C with certain weather data
>
> Then swarm on anomaly scores (and maybe with raw input data) of different
> sites to find relations between measurement sites. Then use these models
> with incident data to predict them.
>
> The last option is probably the most difficult, but could be the most
> promising.

So you are saying that model X will have the anomaly scores from
models A,B,C as input fields? That is interesting, and I've never seen
anyone do it before. I have no idea how well it would work.

> What are your thoughts? Any input is greatly appreciated.

To do what you are trying to do, you're going to need to run lots of
models, that is for sure. I also think you'll probably need to run
models with more than one input field if you want to incorporate
weather data. You'll also need to get predictions out of the models.

You will need to either:
  1. Use HTM Engine
  2. Build a custom solution for running many models at once that uses
NuPIC OPF or Network API directly

If you choose #1, this also means you'll need to:
- Update HTM Engine to output predictions (probably trivial)
- Update HTM Engine to allow multiple input fields (probably not trivial)
- Allow day_of_week and weekend encoding options [1]

This work will need to be done by you and your team as contributors to
Numenta open source projects. I will of course try to guide you along,
but you will need to do the work to file the proper feature requests,
create the pull requests, etc, following our development process [2].

> Beside the approaches we have some smaller questions.
>
> We would like to start with the skeleton-htmengine-app, expand the api so it
> excepts multiple, different kind of models and build a webapp interfacing
> with the api.

Great, so you are already thinking about #1 above. :)

> We couldn’t find a lot of documentation regarding the HTM Engine.
>       ●      Can we swarm subsets to create model params using HTM Engine?
>             ○      This way we can try multiple model params for models with
> different values, or do we have to create them beforehand?

No, but that is a good idea. It would be nice to be able to create a
model through the HTM Engine with more than just a min/max value, but
with a full set of model parameters from a swarm.

>       ●      I have little knowledge about HTM Engine
>             ○      Can you give some info about the services
> (anomaly_service, metric_listener, metric_storer, model_scheduler) and if we
> need them, how to interface with them?

As you know, this code was very recently made open source. We don't
have a lot of documentation for it at this point, but you can email
this list with questions about it. I will point the right people to
this email so they know you may have questions in the future.
Hopefully we can build up proper documentation over time. Now that
people want to use the HTM Engine, it makes sense to try to put
together some better documentation. I'll bring this up with Jared
Casner, the numenta-apps project manager.

>       ●      Do you have any thoughts about “combining/connecting/merging”
> different traffic points for the best accident prediction?

I'm not really sure this is necessary for what you want to do. As I
said above, if you are trying to predict traffic accidents, I don't
think training one model on many traffic sensors will help because of
the unpredictable occurrences of accidents.

> We understand these are a lot of questions. Therefor we would be very
> grateful if you are able to find the time to answer them. Thank you again.

You are welcome. Whew!

[1] https://github.com/numenta/numenta-apps/issues/104
[2] https://github.com/numenta/nupic/wiki/Development-Process

---------
Matt Taylor
OS Community Flag-Bearer
Numenta

Reply via email to