Hi Pascal, So, let me see if I understand correctly. For now, you don't require any geo-encoding of data (but it sounds like that might be a useful feature in the future?) Instead, you will create a list of regions / polygons that represent a geofenced area. Within each region, you will have some set of sensors - air pressure, humidity, wind speed, seismic activity, temperature, etc. Your goal is to generate anomaly scores for each of those sensors - which produce scalar data. You then plan to do some additional logistic regression on top of the anomaly scores to predict the likelihood of a natural disaster (earthquake, meteorological, etc) in that region or nearby regions. It would be up to the statistician to correlate regions in the short term, correct? Also, if I've understood you correctly, the biggest issue that researchers face currently with respect to this problem is that their predictions for each sensor aren't always accurate because of daily variations in the data that are unexpected?
I hope I've now understood the problem, but please clarify if I've mis-stated anything. Assuming I have a basic understanding of the problem, I think you may be able to simplify the engineering task a little bit. It seems to me that your primary objective isn't to have an easy-to-read user interface that displays data to an end user. Instead, you want data available to researchers in a format that they can do the logistic regression on. So, perhaps you can simplify your project by starting with HTMEngine directly. I'm sure by now you've seen Matt's demo [1] of HTMEngine - that may be a good place to start. In his NYC Traffic demo [2], each road segment represents a geolocation and has a scalar metric (average speed) associated to it. Assuming you have easy access to the data, you can probably use this as a good basis for getting started. The output is available in both json and csv formats, so should be easily accessible to a researcher. To answer one of your original questions about Numenta engineers helping out on this project, they're all free to help in their off time! One of our big objectives of opening access to NuPIC and the Numenta Apps was to provide a means for you - and those like you - to get in and do things that we just don't have the bandwidth to do internally. I'm thrilled to see your excitement and hope that others in the community will want to get involved to help you out! Cheers, Jared [1] https://www.youtube.com/watch?v=lzJd_a6y6-E <%22> [2] https://github.com/nupic-community/htmengine-traffic-tutorial > > ---------- Forwarded message ---------- > From: Pascal Weinberger <[email protected]> > To: "NuPIC general mailing list." <[email protected]> > Cc: > Date: Tue, 4 Aug 2015 12:13:04 +0200 > Subject: Re: nostradamIQ Project help needed! > Matt, > That's true, but you do not need it at all: > Take the world, splice it in polygons (according to the density of data > availably and resolution needed); label you polygons, and get your data for > each polygon with the label consisting of Where:What, with where being the > label of the specific geo-area according to your above system, and what the > label for what kind of data you push (like seismic etc.). And there you > have your data format: Label to scalar! > Now the htmengine outputs you anomaly scores for each > Label Where:What and you take these to hierarchically (in a > geo-hierarchie) build logistic regression models, trained by the anomaly > output, and a binary value for whether a certain disaster happened there at > a time X later time or not. (This needs some past data which is why the > highest priority is getting the data polled and htmengine trained). You go > for logistic regression because that is what literature finds to perform > best. Now when that works, you have your 'live' data stream and get > predictions in the form of probabilities for the disaster occurring X time > in the future... > > This was the basic idea.. of course you will need to test it and refine > the architecture etc. But you got your work-around :) > > So htmengine is not supposed to do the entire job. its more for feature > detection :) The problem researchers find when building log-reg models with > real data (raw scalars of the sensors) is that they periodically make wrong > predictions due to daily etc. patterns. This is what HTM should filter out > ;) > > The point of using tuarus as a starter therefore is that you already have > your basic infrastructure of companies (your geo-polygons) and different > metrics (the different sensor data in that region).. > > Does it make more sense now? :) Of course a geoencoder and so would be > nice in addition to capture more of the patterns, but this is what I would > hope to achieve with the geo-hierarchy of log-reg models so they capture > the spatial relationships in their input weights (of course only based on > historical data)... I do not think the geoEncoder Would get this as well.. > When running the demo_app, you find that the geoendoding with > radius=Magnitude or any exponential function thereof makes HTM immune to > regions where at least one strong quake happend... and you dont want that. > > but David, you may think about building a engine for java as well :) Just > cause its faster ;D > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
