Re: nostradamIQ Project help needed!

Pascal Weinberger Wed, 05 Aug 2015 08:37:17 -0700

@Jeff Thanks! That looks really interesting, although they do not provide
any info on how they do it :$


@Jared Yes to some extent that is true, although I also would like to
provide researchers with a good overview of wath happens. Therfore a UI
like Taurus//Grok or sth along those lines is a goal.

I like your saying 'Assuming you have easy access to the data' :D That is
the major problem right now, a lot of mailing to agencies and organisations
which unfortunately keeps me from coding :'(
Thank you for the heads up! And I also
*hope that someone out there (You, yes right you reading this!)* Would like
to help making this a reality!
In case the link got lost, here is how you can help:
https://github.com/nupic-community/nostradamIQ/blob/master/CONTRIBUTING.md

Thanks :)
On Aug 5, 2015 12:34 AM, "Jeff Fohl" <[email protected]> wrote:

> Pascal -
>
> Your idea reminds me a bit of Banjo: http://ban.jo/
>
> This is a private corporation, but doing something somewhat similar - at
> least in that they have divided the globe up into a giant grid, and within
> each cell of that grid, they do anomaly detection. Except, instead of
> geophysical data, they are monitoring social activity by observing
> geotagged photos, tweets, posts, etc.
>
> - Jeff
>
> On Tue, Aug 4, 2015 at 3:27 PM Jared Casner <[email protected]> wrote:
>
>> Hi Pascal,
>>
>> So, let me see if I understand correctly.  For now, you don't require any
>> geo-encoding of data (but it sounds like that might be a useful feature in
>> the future?)  Instead, you will create a list of regions / polygons that
>> represent a geofenced area.  Within each region, you will have some set of
>> sensors - air pressure, humidity, wind speed, seismic activity,
>> temperature, etc.  Your goal is to generate anomaly scores for each of
>> those sensors - which produce scalar data.  You then plan to do some
>> additional logistic regression on top of the anomaly scores to predict the
>> likelihood of a natural disaster (earthquake, meteorological, etc) in that
>> region or nearby regions.  It would be up to the statistician to correlate
>> regions in the short term, correct?  Also, if I've understood you
>> correctly, the biggest issue that researchers face currently with respect
>> to this problem is that their predictions for each sensor aren't always
>> accurate because of daily variations in the data that are unexpected?
>>
>> I hope I've now understood the problem, but please clarify if I've
>> mis-stated anything.
>>
>> Assuming I have a basic understanding of the problem, I think you may be
>> able to simplify the engineering task a little bit.  It seems to me that
>> your primary objective isn't to have an easy-to-read user interface that
>> displays data to an end user.  Instead, you want data available to
>> researchers in a format that they can do the logistic regression on.  So,
>> perhaps you can simplify your project by starting with HTMEngine directly.
>> I'm sure by now you've seen Matt's demo [1] of HTMEngine - that may be a
>> good place to start.  In his NYC Traffic demo [2], each road segment
>> represents a geolocation and has a scalar metric (average speed) associated
>> to it.  Assuming you have easy access to the data, you can probably use
>> this as a good basis for getting started.  The output is available in both
>> json and csv formats, so should be easily accessible to a researcher.
>>
>> To answer one of your original questions about Numenta engineers helping
>> out on this project, they're all free to help in their off time!  One of
>> our big objectives of opening access to NuPIC and the Numenta Apps was to
>> provide a means for you - and those like you - to get in and do things that
>> we just don't have the bandwidth to do internally.  I'm thrilled to see
>> your excitement and hope that others in the community will want to get
>> involved to help you out!
>>
>> Cheers,
>>
>> Jared
>>
>> [1] https://www.youtube.com/watch?v=lzJd_a6y6-E
>> [2] https://github.com/nupic-community/htmengine-traffic-tutorial
>>
>>
>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Pascal Weinberger <[email protected]>
>>> To: "NuPIC general mailing list." <[email protected]>
>>> Cc:
>>> Date: Tue, 4 Aug 2015 12:13:04 +0200
>>> Subject: Re: nostradamIQ Project help needed!
>>> Matt,
>>> That's true, but you do not need it at all:
>>> Take the world, splice it in polygons (according to the density of data
>>> availably and resolution needed); label you polygons, and get your data for
>>> each polygon with the label consisting of Where:What, with where being the
>>> label of the specific geo-area according to your above system, and what the
>>> label for what kind of data you push (like seismic etc.). And there you
>>> have your data format: Label to scalar!
>>> Now the htmengine outputs you anomaly scores for each
>>> Label Where:What and you take these to hierarchically (in a
>>> geo-hierarchie) build logistic regression models, trained by the anomaly
>>> output, and a binary value for whether a certain disaster happened there at
>>> a time X later time or not. (This needs some past data which is why the
>>> highest priority is getting the data polled and htmengine trained). You go
>>> for logistic regression because that is what literature finds to perform
>>> best. Now when that works, you have your 'live' data stream and get
>>> predictions in the form of probabilities for the disaster occurring X time
>>> in the future...
>>>
>>> This was the basic idea.. of course you will need to test it and refine
>>> the architecture etc. But you got your work-around :)
>>>
>>> So htmengine is not supposed to do the entire job. its more for feature
>>> detection :) The problem researchers find when building log-reg models with
>>> real data (raw scalars of the sensors) is that they periodically make wrong
>>> predictions due to daily etc. patterns. This is what HTM should filter out
>>> ;)
>>>
>>> The point of using tuarus as a starter therefore is that you already
>>> have your basic infrastructure of companies (your geo-polygons) and
>>> different metrics (the different sensor data in that region)..
>>>
>>> Does it make more sense now? :) Of course a geoencoder and so would be
>>> nice in addition to capture more of the patterns, but this is what I would
>>> hope to achieve with the geo-hierarchy of  log-reg models so they capture
>>> the spatial relationships in their input weights (of course only based on
>>> historical data)... I do not think the geoEncoder Would get this as well..
>>> When running the demo_app, you find that the geoendoding with
>>> radius=Magnitude or any exponential function thereof makes HTM immune to
>>> regions where at least one strong quake happend... and you dont want that.
>>>
>>> but David, you may think about building a engine for java as well :)
>>> Just cause its faster ;D
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>

Re: nostradamIQ Project help needed!

Reply via email to