Daniël, So how did the hackathon go?
--------- Matt Taylor OS Community Flag-Bearer Numenta On Wed, Sep 2, 2015 at 2:13 AM, D. Ducro <[email protected]> wrote: > Hey Matt, > > > Thank you very much for taking the time to help us with our questions. > > Your answers are very helpful. > > > We imported lots of measurement and event data into mongo and mysql > databases. The Hackathon will be held this weekend. > > > We slightly changed our plans for this weekend. We’ll be focussing on > anomaly detection on the measurement data and evaluate it with the event > data. We want to point out the time of occurrence and location of an > accident when it happens. > > > We don’t have the time to expand the HTM-engine right now. If we can > proceed the project after the hackathon we can look in to the HTM-engine > modifications. > > > I’ll try to send some results after the hackathon. > > > Thanks, > — > Daniël Ducro > Pionect > >> Daniël, >> >> These are good questions, and an especially good example of a Real >> World Application that might be created with HTM. I'll do my best to >> answer below. >> >> On Thu, Aug 20, 2015 at 12:17 PM, D. Ducro <[email protected]> wrote: >> > >> > First I’ll begin explaining our situation regarding the traffic data. >> We >> > contacted the NDW, the company which is responsible for storing all >> traffic >> > data in the Netherlands. They agreed to facilitate us with traffic >> > speeds/flows and events like incidents or road works. We want to >> combine the >> > traffic data with weather data and find anomalies, and hopefully try to >> > predict incidents. >> >> I'm not sure how you'll predict traffic accidents. From my research >> with NYC traffic data, HTM can certainly identify that a traffic >> incident has occurred, and with raw traffic flow sensor data, should >> be able to identify the vicinity in which the accident took place. But >> because accidents are by their nature anomalies in the traffic flow, >> it is going to be hard to identify them. >> >> It may be possible, however, to calculate some probability that an >> accident will impede traffic based upon predictions of different >> sensors. For example, for each sensor model, if you looked deeper into >> the prediction probability distribution, you might be able to extract >> the probability that traffic will slow significantly in the future, >> even if that is not the most probable prediction. >> >> > We’ll be attending at The World Port Hackathon (September 4-5) in >> Rotterdam. >> > At this event we would like to build a prototype proving our hypothesis >> that >> > traffic accidents can be predicted. >> > >> > Our data is from the past three years in an area of Rotterdam. >> > We have the following data from about hundred measurement sites: >> > >> > ● traffic speeds / min. / lane / vehicle-type >> > ● traffic intensity / min. / lane / vehicle-type >> > ● traffic events e.g. incidents, road work and more. >> > >> > The weather data is available from each Dutch weather station per hour. >> > We would like to combine the data from the weather station in Rotterdam >> with >> > all the traffic data. >> > >> > This is how we think we should approach it using HTM Engine: >> > >> > ● Define a model with the following fields: >> > ○ average traffic speed (int) >> > ○ average traffic intensity (int) >> >> (From my experience with traffic flow data, these two values might >> contain almost exactly the same patterns. One is sort of a function >> the other. You can probably get away with only using one.) >> >> > ○ incident (close to this point) (boolean) >> > ○ horizontal visibility (in meters) >> > ○ rain (boolean) >> > ○ icing (boolean) >> > ○ snow (boolean) >> > ● Create an api to communicate with HTM Engine >> > ● Create a model for each data point >> >> Before we get to the problems you have identified, there are missing >> features in HTM engine that will disallow this approach. The biggest >> one is that currently HTM Engine models only work against 1 input >> field. This means that you can't create an HTM Engine model with 7 >> fields as you defined above without changes to the HTM Engine >> framework. >> >> The second thing is that HTM Engine models are anomaly-only models. >> Currently, they do not store model predictions, even though that data >> is generated by NuPIC. The HTM Engine was built to do anomaly >> detection, so this was left out. >> >> The good news is that adding prediction would probably be pretty easy. >> Creating models with multiple input fields may be harder. >> >> > There are two main problems with this approach in our opinion. >> > >> > The first issue is that we don't take account of the traffic flow >> across >> > multiple points/highways. We've been told that there is a strong >> relation in >> > the flow between some specific highways. These patterns are known and >> we >> > think we need to find a way to use these connections to improve the >> context >> > of the accidents. >> >> There are certainly traffic correlations between roads. But your goal >> is to predict accidents, not react to them, correct? So the >> correlations only come into play after an accident has occurred. For >> example, say sensor-1-model shows anomalies because of a traffic >> slow-down, so sensor-2-model can predict that it will also see slow >> traffic if it is also using the sensor-1 data as an input for >> predicting traffic flow at sensor-2. But if your goal is to predict >> accidents before they happen, this doesn't help you. It only helps >> predict the propagation of slow traffic outwards from an accident >> site. >> >> > The second issue concerns the different weather factors, which are >> different >> > per season. We can make specific models for the winter and summer so it >> > includes temperature in the summer and icing/snow in the winter. >> >> You have 3 years of data, which might be enough for NuPIC models to >> learn some yearly patterns. Perhaps you don't need to have >> winter/summer models? >> >> If you do end up creating models that use weather data as input for >> creating predictions at the traffic sensor level, this will have much >> more of a tangible impact on predictions anyway. That data will >> affectively incorporate the yearly weather cycles. >> >> > But what we’re very interested in is how we can “connect” the data from >> > multiple measurement sites. >> > >> > Another approach for the model can be: >> > >> > ● average traffic speed (int) >> > ● average traffic intensity (int) >> > ● incident (close to this point) (boolean) >> > ● horizontal visibility (in meters) >> > ● rain (boolean) >> > ● icing (boolean) (in winter) >> > ● snow (boolean) (in winter) >> > ● temperature (in summer) >> > ● related point A average traffic speed (int) >> > ● related point A average traffic intensity (int) >> > ● related point B average traffic speed (int) >> > ● related point B average traffic intensity (int) >> >> Ah, now I see that you're including an "incident" indicating that a >> traffic incident exists at this time within a certain distance. I >> imagine this is going to be the predictedField? >> >> But the usefulness of the "incident" depends strongly on the quality >> of the data you get from the NDW. If they provide "incidents" that >> include a timestamp for the *time reported*, that is much different >> from the *time of occurrence*. This is very important because it's a >> matter of cause and effect. The incident causes the traffic slowdowns >> to occur. The incidents will also cause anomalies to occur in the >> models. If you train your models to predict incidents, and the >> timestamp for each incident is actually occurring after the traffic >> slowdowns in the surrounding area, this won't do well. These incident >> timestamps MUST be the time of the incident occurrence, and must be >> accurate. >> >> > This also isn’t the ideal approach in our opinion. >> > Nupic probably won’t give accurate anomalies/predictions with so many >> > properties. >> >> I think it may be better to remove the "traffic intensity" fields and >> try to simplify the weather fields into one or two fields instead of >> 5. >> >> > >> > We thought of a third option, but we’re not sure how to approach it. >> > What if we make separate smaller models per measurement point, this way >> we >> > can find out which performs the best, something like: >> > >> > ● model A: >> > ○ average traffic speed (int) >> > ● model B: >> > ○ average traffic intensity (int) >> > ● model C >> > ○ average traffic speed (int) >> > ○ average traffic intensity (int) >> > ● model X >> > ○ model A,B,C with certain weather data >> > >> > Then swarm on anomaly scores (and maybe with raw input data) of >> different >> > sites to find relations between measurement sites. Then use these >> models >> > with incident data to predict them. >> > >> > The last option is probably the most difficult, but could be the most >> > promising. >> >> So you are saying that model X will have the anomaly scores from >> models A,B,C as input fields? That is interesting, and I've never seen >> anyone do it before. I have no idea how well it would work. >> >> > What are your thoughts? Any input is greatly appreciated. >> >> To do what you are trying to do, you're going to need to run lots of >> models, that is for sure. I also think you'll probably need to run >> models with more than one input field if you want to incorporate >> weather data. You'll also need to get predictions out of the models. >> >> You will need to either: >> 1. Use HTM Engine >> 2. Build a custom solution for running many models at once that uses >> NuPIC OPF or Network API directly >> >> If you choose #1, this also means you'll need to: >> - Update HTM Engine to output predictions (probably trivial) >> - Update HTM Engine to allow multiple input fields (probably not trivial) >> - Allow day_of_week and weekend encoding options [1] >> >> This work will need to be done by you and your team as contributors to >> Numenta open source projects. I will of course try to guide you along, >> but you will need to do the work to file the proper feature requests, >> create the pull requests, etc, following our development process [2]. >> >> > Beside the approaches we have some smaller questions. >> > >> > We would like to start with the skeleton-htmengine-app, expand the api >> so it >> > excepts multiple, different kind of models and build a webapp >> interfacing >> > with the api. >> >> Great, so you are already thinking about #1 above. :) >> >> > We couldn’t find a lot of documentation regarding the HTM Engine. >> > ● Can we swarm subsets to create model params using HTM Engine? >> > ○ This way we can try multiple model params for models with >> > different values, or do we have to create them beforehand? >> >> No, but that is a good idea. It would be nice to be able to create a >> model through the HTM Engine with more than just a min/max value, but >> with a full set of model parameters from a swarm. >> >> > ● I have little knowledge about HTM Engine >> > ○ Can you give some info about the services >> > (anomaly_service, metric_listener, metric_storer, model_scheduler) and >> if we >> > need them, how to interface with them? >> >> As you know, this code was very recently made open source. We don't >> have a lot of documentation for it at this point, but you can email >> this list with questions about it. I will point the right people to >> this email so they know you may have questions in the future. >> Hopefully we can build up proper documentation over time. Now that >> people want to use the HTM Engine, it makes sense to try to put >> together some better documentation. I'll bring this up with Jared >> Casner, the numenta-apps project manager. >> >> > ● Do you have any thoughts about “combining/connecting/merging” >> > different traffic points for the best accident prediction? >> >> I'm not really sure this is necessary for what you want to do. As I >> said above, if you are trying to predict traffic accidents, I don't >> think training one model on many traffic sensors will help because of >> the unpredictable occurrences of accidents. >> >> > We understand these are a lot of questions. Therefor we would be very >> > grateful if you are able to find the time to answer them. Thank you >> again. >> >> You are welcome. Whew! >> >> [1] https://github.com/numenta/numenta-apps/issues/104 >> [2] https://github.com/numenta/nupic/wiki/Development-Process >> >> --------- >> Matt Taylor >> OS Community Flag-Bearer >> Numenta >> >>
