Hey Matt,
Thank you very much for taking the time to help us with our questions. Your answers are very helpful. We imported lots of measurement and event data into mongo and mysql databases. The Hackathon will be held this weekend. We slightly changed our plans for this weekend. We’ll be focussing on anomaly detection on the measurement data and evaluate it with the event data. We want to point out the time of occurrence and location of an accident when it happens. We don’t have the time to expand the HTM-engine right now. If we can proceed the project after the hackathon we can look in to the HTM-engine modifications. I’ll try to send some results after the hackathon. Thanks, —Daniël Ducro Pionect Daniël, These are good questions, and an especially good example of a Real World Application that might be created with HTM. I'll do my best to answer below. On Thu, Aug 20, 2015 at 12:17 PM, D. Ducro <[email protected]> wrote: > > First I’ll begin explaining our situation regarding the traffic data. We > contacted the NDW, the company which is responsible for storing all traffic > data in the Netherlands. They agreed to facilitate us with traffic > speeds/flows and events like incidents or road works. We want to combine the > traffic data with weather data and find anomalies, and hopefully try to > predict incidents. I'm not sure how you'll predict traffic accidents. From my research with NYC traffic data, HTM can certainly identify that a traffic incident has occurred, and with raw traffic flow sensor data, should be able to identify the vicinity in which the accident took place. But because accidents are by their nature anomalies in the traffic flow, it is going to be hard to identify them. It may be possible, however, to calculate some probability that an accident will impede traffic based upon predictions of different sensors. For example, for each sensor model, if you looked deeper into the prediction probability distribution, you might be able to extract the probability that traffic will slow significantly in the future, even if that is not the most probable prediction. > We’ll be attending at The World Port Hackathon (September 4-5) in Rotterdam. > At this event we would like to build a prototype proving our hypothesis that > traffic accidents can be predicted. > > Our data is from the past three years in an area of Rotterdam. > We have the following data from about hundred measurement sites: > > ● traffic speeds / min. / lane / vehicle-type > ● traffic intensity / min. / lane / vehicle-type > ● traffic events e.g. incidents, road work and more. > > The weather data is available from each Dutch weather station per hour. > We would like to combine the data from the weather station in Rotterdam with > all the traffic data. > > This is how we think we should approach it using HTM Engine: > > ● Define a model with the following fields: > ○ average traffic speed (int) > ○ average traffic intensity (int) (From my experience with traffic flow data, these two values might contain almost exactly the same patterns. One is sort of a function the other. You can probably get away with only using one.) > ○ incident (close to this point) (boolean) > ○ horizontal visibility (in meters) > ○ rain (boolean) > ○ icing (boolean) > ○ snow (boolean) > ● Create an api to communicate with HTM Engine > ● Create a model for each data point Before we get to the problems you have identified, there are missing features in HTM engine that will disallow this approach. The biggest one is that currently HTM Engine models only work against 1 input field. This means that you can't create an HTM Engine model with 7 fields as you defined above without changes to the HTM Engine framework. The second thing is that HTM Engine models are anomaly-only models. Currently, they do not store model predictions, even though that data is generated by NuPIC. The HTM Engine was built to do anomaly detection, so this was left out. The good news is that adding prediction would probably be pretty easy. Creating models with multiple input fields may be harder. > There are two main problems with this approach in our opinion. > > The first issue is that we don't take account of the traffic flow across > multiple points/highways. We've been told that there is a strong relation in > the flow between some specific highways. These patterns are known and we > think we need to find a way to use these connections to improve the context > of the accidents. There are certainly traffic correlations between roads. But your goal is to predict accidents, not react to them, correct? So the correlations only come into play after an accident has occurred. For example, say sensor-1-model shows anomalies because of a traffic slow-down, so sensor-2-model can predict that it will also see slow traffic if it is also using the sensor-1 data as an input for predicting traffic flow at sensor-2. But if your goal is to predict accidents before they happen, this doesn't help you. It only helps predict the propagation of slow traffic outwards from an accident site. > The second issue concerns the different weather factors, which are different > per season. We can make specific models for the winter and summer so it > includes temperature in the summer and icing/snow in the winter. You have 3 years of data, which might be enough for NuPIC models to learn some yearly patterns. Perhaps you don't need to have winter/summer models? If you do end up creating models that use weather data as input for creating predictions at the traffic sensor level, this will have much more of a tangible impact on predictions anyway. That data will affectively incorporate the yearly weather cycles. > But what we’re very interested in is how we can “connect” the data from > multiple measurement sites. > > Another approach for the model can be: > > ● average traffic speed (int) > ● average traffic intensity (int) > ● incident (close to this point) (boolean) > ● horizontal visibility (in meters) > ● rain (boolean) > ● icing (boolean) (in winter) > ● snow (boolean) (in winter) > ● temperature (in summer) > ● related point A average traffic speed (int) > ● related point A average traffic intensity (int) > ● related point B average traffic speed (int) > ● related point B average traffic intensity (int) Ah, now I see that you're including an "incident" indicating that a traffic incident exists at this time within a certain distance. I imagine this is going to be the predictedField? But the usefulness of the "incident" depends strongly on the quality of the data you get from the NDW. If they provide "incidents" that include a timestamp for the *time reported*, that is much different from the *time of occurrence*. This is very important because it's a matter of cause and effect. The incident causes the traffic slowdowns to occur. The incidents will also cause anomalies to occur in the models. If you train your models to predict incidents, and the timestamp for each incident is actually occurring after the traffic slowdowns in the surrounding area, this won't do well. These incident timestamps MUST be the time of the incident occurrence, and must be accurate. > This also isn’t the ideal approach in our opinion. > Nupic probably won’t give accurate anomalies/predictions with so many > properties. I think it may be better to remove the "traffic intensity" fields and try to simplify the weather fields into one or two fields instead of 5. > > We thought of a third option, but we’re not sure how to approach it. > What if we make separate smaller models per measurement point, this way we > can find out which performs the best, something like: > > ● model A: > ○ average traffic speed (int) > ● model B: > ○ average traffic intensity (int) > ● model C > ○ average traffic speed (int) > ○ average traffic intensity (int) > ● model X > ○ model A,B,C with certain weather data > > Then swarm on anomaly scores (and maybe with raw input data) of different > sites to find relations between measurement sites. Then use these models > with incident data to predict them. > > The last option is probably the most difficult, but could be the most > promising. So you are saying that model X will have the anomaly scores from models A,B,C as input fields? That is interesting, and I've never seen anyone do it before. I have no idea how well it would work. > What are your thoughts? Any input is greatly appreciated. To do what you are trying to do, you're going to need to run lots of models, that is for sure. I also think you'll probably need to run models with more than one input field if you want to incorporate weather data. You'll also need to get predictions out of the models. You will need to either: 1. Use HTM Engine 2. Build a custom solution for running many models at once that uses NuPIC OPF or Network API directly If you choose #1, this also means you'll need to: - Update HTM Engine to output predictions (probably trivial) - Update HTM Engine to allow multiple input fields (probably not trivial) - Allow day_of_week and weekend encoding options [1] This work will need to be done by you and your team as contributors to Numenta open source projects. I will of course try to guide you along, but you will need to do the work to file the proper feature requests, create the pull requests, etc, following our development process [2]. > Beside the approaches we have some smaller questions. > > We would like to start with the skeleton-htmengine-app, expand the api so it > excepts multiple, different kind of models and build a webapp interfacing > with the api. Great, so you are already thinking about #1 above. :) > We couldn’t find a lot of documentation regarding the HTM Engine. > ● Can we swarm subsets to create model params using HTM Engine? > ○ This way we can try multiple model params for models with > different values, or do we have to create them beforehand? No, but that is a good idea. It would be nice to be able to create a model through the HTM Engine with more than just a min/max value, but with a full set of model parameters from a swarm. > ● I have little knowledge about HTM Engine > ○ Can you give some info about the services > (anomaly_service, metric_listener, metric_storer, model_scheduler) and if we > need them, how to interface with them? As you know, this code was very recently made open source. We don't have a lot of documentation for it at this point, but you can email this list with questions about it. I will point the right people to this email so they know you may have questions in the future. Hopefully we can build up proper documentation over time. Now that people want to use the HTM Engine, it makes sense to try to put together some better documentation. I'll bring this up with Jared Casner, the numenta-apps project manager. > ● Do you have any thoughts about “combining/connecting/merging” > different traffic points for the best accident prediction? I'm not really sure this is necessary for what you want to do. As I said above, if you are trying to predict traffic accidents, I don't think training one model on many traffic sensors will help because of the unpredictable occurrences of accidents. > We understand these are a lot of questions. Therefor we would be very > grateful if you are able to find the time to answer them. Thank you again. You are welcome. Whew! [1] https://github.com/numenta/numenta-apps/issues/104 [2] https://github.com/numenta/nupic/wiki/Development-Process --------- Matt Taylor OS Community Flag-Bearer Numenta
