Daniël,

So how did the hackathon go?



---------
Matt Taylor
OS Community Flag-Bearer
Numenta

On Wed, Sep 2, 2015 at 2:13 AM, D. Ducro <[email protected]> wrote:

> Hey Matt,
>
>
> Thank you very much for taking the time to help us with our questions.
>
> Your answers are very helpful.
>
>
> We imported lots of measurement and event data into mongo and mysql
> databases. The Hackathon will be held this weekend.
>
>
> We slightly changed our plans for this weekend. We’ll be focussing on
> anomaly detection on the measurement data and evaluate it with the event
> data. We want to point out the time of occurrence and location of an
> accident when it happens.
>
>
> We don’t have the time to expand the HTM-engine right now. If we can
> proceed the project after the hackathon we can look in to the HTM-engine
> modifications.
>
>
> I’ll try to send some results after the hackathon.
>
>
> Thanks,
> —
> Daniël Ducro
> Pionect
>
>> Daniël,
>>
>> These are good questions, and an especially good example of a Real
>> World Application that might be created with HTM. I'll do my best to
>> answer below.
>>
>> On Thu, Aug 20, 2015 at 12:17 PM, D. Ducro <[email protected]> wrote:
>> >
>> > First I’ll begin explaining our situation regarding the traffic data.
>> We
>> > contacted the NDW, the company which is responsible for storing all
>> traffic
>> > data in the Netherlands. They agreed to facilitate us with traffic
>> > speeds/flows and events like incidents or road works. We want to
>> combine the
>> > traffic data with weather data and find anomalies, and hopefully try to
>> > predict incidents.
>>
>> I'm not sure how you'll predict traffic accidents. From my research
>> with NYC traffic data, HTM can certainly identify that a traffic
>> incident has occurred, and with raw traffic flow sensor data, should
>> be able to identify the vicinity in which the accident took place. But
>> because accidents are by their nature anomalies in the traffic flow,
>> it is going to be hard to identify them.
>>
>> It may be possible, however, to calculate some probability that an
>> accident will impede traffic based upon predictions of different
>> sensors. For example, for each sensor model, if you looked deeper into
>> the prediction probability distribution, you might be able to extract
>> the probability that traffic will slow significantly in the future,
>> even if that is not the most probable prediction.
>>
>> > We’ll be attending at The World Port Hackathon (September 4-5) in
>> Rotterdam.
>> > At this event we would like to build a prototype proving our hypothesis
>> that
>> > traffic accidents can be predicted.
>> >
>> > Our data is from the past three years in an area of Rotterdam.
>> > We have the following data from about hundred measurement sites:
>> >
>> > ● traffic speeds / min. / lane / vehicle-type
>> > ● traffic intensity / min. / lane / vehicle-type
>> > ● traffic events e.g. incidents, road work and more.
>> >
>> > The weather data is available from each Dutch weather station per hour.
>> > We would like to combine the data from the weather station in Rotterdam
>> with
>> > all the traffic data.
>> >
>> > This is how we think we should approach it using HTM Engine:
>> >
>> > ● Define a model with the following fields:
>> > ○ average traffic speed (int)
>> > ○ average traffic intensity (int)
>>
>> (From my experience with traffic flow data, these two values might
>> contain almost exactly the same patterns. One is sort of a function
>> the other. You can probably get away with only using one.)
>>
>> > ○ incident (close to this point) (boolean)
>> > ○ horizontal visibility (in meters)
>> > ○ rain (boolean)
>> > ○ icing (boolean)
>> > ○ snow (boolean)
>> > ● Create an api to communicate with HTM Engine
>> > ● Create a model for each data point
>>
>> Before we get to the problems you have identified, there are missing
>> features in HTM engine that will disallow this approach. The biggest
>> one is that currently HTM Engine models only work against 1 input
>> field. This means that you can't create an HTM Engine model with 7
>> fields as you defined above without changes to the HTM Engine
>> framework.
>>
>> The second thing is that HTM Engine models are anomaly-only models.
>> Currently, they do not store model predictions, even though that data
>> is generated by NuPIC. The HTM Engine was built to do anomaly
>> detection, so this was left out.
>>
>> The good news is that adding prediction would probably be pretty easy.
>> Creating models with multiple input fields may be harder.
>>
>> > There are two main problems with this approach in our opinion.
>> >
>> > The first issue is that we don't take account of the traffic flow
>> across
>> > multiple points/highways. We've been told that there is a strong
>> relation in
>> > the flow between some specific highways. These patterns are known and
>> we
>> > think we need to find a way to use these connections to improve the
>> context
>> > of the accidents.
>>
>> There are certainly traffic correlations between roads. But your goal
>> is to predict accidents, not react to them, correct? So the
>> correlations only come into play after an accident has occurred. For
>> example, say sensor-1-model shows anomalies because of a traffic
>> slow-down, so sensor-2-model can predict that it will also see slow
>> traffic if it is also using the sensor-1 data as an input for
>> predicting traffic flow at sensor-2. But if your goal is to predict
>> accidents before they happen, this doesn't help you. It only helps
>> predict the propagation of slow traffic outwards from an accident
>> site.
>>
>> > The second issue concerns the different weather factors, which are
>> different
>> > per season. We can make specific models for the winter and summer so it
>> > includes temperature in the summer and icing/snow in the winter.
>>
>> You have 3 years of data, which might be enough for NuPIC models to
>> learn some yearly patterns. Perhaps you don't need to have
>> winter/summer models?
>>
>> If you do end up creating models that use weather data as input for
>> creating predictions at the traffic sensor level, this will have much
>> more of a tangible impact on predictions anyway. That data will
>> affectively incorporate the yearly weather cycles.
>>
>> > But what we’re very interested in is how we can “connect” the data from
>> > multiple measurement sites.
>> >
>> > Another approach for the model can be:
>> >
>> > ● average traffic speed (int)
>> > ● average traffic intensity (int)
>> > ● incident (close to this point) (boolean)
>> > ● horizontal visibility (in meters)
>> > ● rain (boolean)
>> > ● icing (boolean) (in winter)
>> > ● snow (boolean) (in winter)
>> > ● temperature (in summer)
>> > ● related point A average traffic speed (int)
>> > ● related point A average traffic intensity (int)
>> > ● related point B average traffic speed (int)
>> > ● related point B average traffic intensity (int)
>>
>> Ah, now I see that you're including an "incident" indicating that a
>> traffic incident exists at this time within a certain distance. I
>> imagine this is going to be the predictedField?
>>
>> But the usefulness of the "incident" depends strongly on the quality
>> of the data you get from the NDW. If they provide "incidents" that
>> include a timestamp for the *time reported*, that is much different
>> from the *time of occurrence*. This is very important because it's a
>> matter of cause and effect. The incident causes the traffic slowdowns
>> to occur. The incidents will also cause anomalies to occur in the
>> models. If you train your models to predict incidents, and the
>> timestamp for each incident is actually occurring after the traffic
>> slowdowns in the surrounding area, this won't do well. These incident
>> timestamps MUST be the time of the incident occurrence, and must be
>> accurate.
>>
>> > This also isn’t the ideal approach in our opinion.
>> > Nupic probably won’t give accurate anomalies/predictions with so many
>> > properties.
>>
>> I think it may be better to remove the "traffic intensity" fields and
>> try to simplify the weather fields into one or two fields instead of
>> 5.
>>
>> >
>> > We thought of a third option, but we’re not sure how to approach it.
>> > What if we make separate smaller models per measurement point, this way
>> we
>> > can find out which performs the best, something like:
>> >
>> > ● model A:
>> > ○ average traffic speed (int)
>> > ● model B:
>> > ○ average traffic intensity (int)
>> > ● model C
>> > ○ average traffic speed (int)
>> > ○ average traffic intensity (int)
>> > ● model X
>> > ○ model A,B,C with certain weather data
>> >
>> > Then swarm on anomaly scores (and maybe with raw input data) of
>> different
>> > sites to find relations between measurement sites. Then use these
>> models
>> > with incident data to predict them.
>> >
>> > The last option is probably the most difficult, but could be the most
>> > promising.
>>
>> So you are saying that model X will have the anomaly scores from
>> models A,B,C as input fields? That is interesting, and I've never seen
>> anyone do it before. I have no idea how well it would work.
>>
>> > What are your thoughts? Any input is greatly appreciated.
>>
>> To do what you are trying to do, you're going to need to run lots of
>> models, that is for sure. I also think you'll probably need to run
>> models with more than one input field if you want to incorporate
>> weather data. You'll also need to get predictions out of the models.
>>
>> You will need to either:
>> 1. Use HTM Engine
>> 2. Build a custom solution for running many models at once that uses
>> NuPIC OPF or Network API directly
>>
>> If you choose #1, this also means you'll need to:
>> - Update HTM Engine to output predictions (probably trivial)
>> - Update HTM Engine to allow multiple input fields (probably not trivial)
>> - Allow day_of_week and weekend encoding options [1]
>>
>> This work will need to be done by you and your team as contributors to
>> Numenta open source projects. I will of course try to guide you along,
>> but you will need to do the work to file the proper feature requests,
>> create the pull requests, etc, following our development process [2].
>>
>> > Beside the approaches we have some smaller questions.
>> >
>> > We would like to start with the skeleton-htmengine-app, expand the api
>> so it
>> > excepts multiple, different kind of models and build a webapp
>> interfacing
>> > with the api.
>>
>> Great, so you are already thinking about #1 above. :)
>>
>> > We couldn’t find a lot of documentation regarding the HTM Engine.
>> > ● Can we swarm subsets to create model params using HTM Engine?
>> > ○ This way we can try multiple model params for models with
>> > different values, or do we have to create them beforehand?
>>
>> No, but that is a good idea. It would be nice to be able to create a
>> model through the HTM Engine with more than just a min/max value, but
>> with a full set of model parameters from a swarm.
>>
>> > ● I have little knowledge about HTM Engine
>> > ○ Can you give some info about the services
>> > (anomaly_service, metric_listener, metric_storer, model_scheduler) and
>> if we
>> > need them, how to interface with them?
>>
>> As you know, this code was very recently made open source. We don't
>> have a lot of documentation for it at this point, but you can email
>> this list with questions about it. I will point the right people to
>> this email so they know you may have questions in the future.
>> Hopefully we can build up proper documentation over time. Now that
>> people want to use the HTM Engine, it makes sense to try to put
>> together some better documentation. I'll bring this up with Jared
>> Casner, the numenta-apps project manager.
>>
>> > ● Do you have any thoughts about “combining/connecting/merging”
>> > different traffic points for the best accident prediction?
>>
>> I'm not really sure this is necessary for what you want to do. As I
>> said above, if you are trying to predict traffic accidents, I don't
>> think training one model on many traffic sensors will help because of
>> the unpredictable occurrences of accidents.
>>
>> > We understand these are a lot of questions. Therefor we would be very
>> > grateful if you are able to find the time to answer them. Thank you
>> again.
>>
>> You are welcome. Whew!
>>
>> [1] https://github.com/numenta/numenta-apps/issues/104
>> [2] https://github.com/numenta/nupic/wiki/Development-Process
>>
>> ---------
>> Matt Taylor
>> OS Community Flag-Bearer
>> Numenta
>>
>>

Reply via email to