Re: HTM.Java performance in HTM-Moclu

Fergal Byrne Tue, 08 Dec 2015 03:36:09 -0800

Hi David,

No, I'm afraid this is an insurmountable problem if you use objects instead
of unboxed arrays. NuPIC can do this by storing all the state in a set of
C++ unboxed arrays, and it can persist that, free the memory, and load a
different model. That's how HTM Engine (and Grok) can run lots of models.
You can't do this in a single JVM if you have live refs to your Networks,
and rebuilding a Network from disk is a big cost.


The right way to do this is to have one or a very small number of Networks
in each JVM, and manage them using something like htm-moclu.

(By the way, this means you can't run NuPIC on GAE because they prohibit
user-provided C/C++. Just Compute Engine or Container Engine).

Regards

Fergal

On Tue, Dec 8, 2015 at 11:21 AM, cogmission (David Ray) <
[email protected]> wrote:

> Hi Fergal,
>
> It's not that big of a deal. I just haven't done a round of profiling yet.
> Therefore there is lots of room for improvement in terms of memory
> handling. There are lots of JVM applications running really data heavy
> applications and the state of the art JVM GC is fully capable of handling
> these loads. I did a preliminary profiling session back in January and
> found some places where memory consumption could be optimized - meaning to
> get back to it after the Network API was finished because I didn't want to
> optimize things before I got to see some typical usage patterns. If GC were
> an inescapable problem you wouldn't have the tons of mission critical apps
> in the financial industry that are running today.
>
> I am only one person, and I will get around to it. HTM.java is technically
> a pre-release (alpha) version for this reason.
>
> Cheers,
> David
>
> On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne <[email protected]>
> wrote:
>
>> Hi Matt,
>>
>> As Stuart Holloway explains here [1], on the JVM, it's always GC. I can
>> barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym hourly
>> data - it slows to a crawl after 1000 rows because it's thrashing the GC
>> trying to free space on the heap (setting -Xmx2800m as JVM params helps).
>> Good luck trying to keep any more than one model per JVM up for any length
>> of time.
>>
>> If you run htm-moclu in a single JVM, something somewhere will have live
>> references leading to every Network you have loaded. So your live heap is
>> going to be at least N models x heap per model. This is not a big problem
>> until you start growing distal segments, which are Java objects on the
>> heap. In HTM.java this happens in the TM, which grows as it learns.
>>
>> GC will detect an impending OOM condition, then will stop the world and
>> mark all these live references, traversing your millions of objects.
>> Finding nothing to free, the JVM will eventually fail at some unpredictable
>> and unrelated point in the code.
>>
>> To check this, run this function every few rows:
>>
>> void mem() {
>>     int mb = 1024 * 1024;
>>
>>     // get Runtime instance
>>     Runtime instance = Runtime.getRuntime();
>>
>>     //System.out.println("***** Heap utilization statistics [MB] *****\n");
>>
>>     // available memory
>>     System.out.println("Total: " + instance.totalMemory() / mb
>>             + "\tFree: " + instance.freeMemory() / mb
>>             + "\tUsed Memory: " + (instance.totalMemory() - 
>> instance.freeMemory()) / mb
>>             + "\tMax Memory: " + instance.maxMemory() / mb);
>> }
>>
>> Regards,
>>
>> Fergal Byrne
>>
>> [1] https://youtu.be/FihU5JxmnBg?t=38m6s
>>
>> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) <
>> [email protected]> wrote:
>>
>>> Hey Matt, did you try ramping up from 1 model to see if it was a
>>> capacity issue? I would be interested to see how the system responds as an
>>> increasing number of models are added. Anyway, I can't really comment on
>>> moclu as I don't know what's happening there and I don't have time these
>>> days to help investigate as I am stretched a bit thin at the moment.
>>>
>>> @antidata if you could explain what you mean by "renders the JVM
>>> unresponsive" it would help me possibly attend to any issue there might be
>>> in the Network API though I never had any problems with unresponsiveness at
>>> all. Thanks...
>>>
>>> Cheers,
>>> David
>>>
>>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <[email protected]> wrote:
>>>
>>>> David, BTW the failure in the video is a 4m:
>>>> https://youtu.be/DnKxrd4TLT8?t=4m
>>>> ---------
>>>> Matt Taylor
>>>> OS Community Flag-Bearer
>>>> Numenta
>>>>
>>>>
>>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <[email protected]>
>>>> wrote:
>>>> > David and Mike,
>>>> >
>>>> > I've moved this to another topic to discuss.
>>>> >
>>>> > So what I tried with moclu was to take the HTM engine traffic app as
>>>> shown here:
>>>> >
>>>> >
>>>> https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg
>>>> >
>>>> > And I swapped out the entire green python box containing the HTM
>>>> > Engine and replaced it with a local instance of moclu. When the
>>>> > traffic app starts up, it creates 153 models immediately and then
>>>> > starts pushing data into all of them at once:
>>>> >
>>>> > https://youtu.be/lzJd_a6y6-E?t=15m
>>>> >
>>>> > This caused dramatic failure in HTM Moclu, and I think that is what
>>>> > Mike's talking about. I recorded it for Mike here:
>>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8
>>>> >
>>>> > I hope that explains some things.
>>>> >
>>>> > ---------
>>>> > Matt Taylor
>>>> > OS Community Flag-Bearer
>>>> > Numenta
>>>> >
>>>> >
>>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray)
>>>> > <[email protected]> wrote:
>>>> >>>  the issue you faced is that it can't create hundreds of models at
>>>> the
>>>> >>> same time (like its done by the traffic example) because
>>>> instantiate a
>>>> >>> Network object from Htm.java is an expensive operation that turns
>>>> the JVM
>>>> >>> unresponsive.
>>>> >>
>>>> >> What is being implied here? Are you saying that instantiating
>>>> HTM.java is
>>>> >> anymore expensive than instantiating any other medium weight
>>>> application?
>>>> >>
>>>> >> Cheers,
>>>> >> David
>>>> >>
>>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta <[email protected]>
>>>> wrote:
>>>> >>>
>>>> >>> Hello Matt, folks
>>>> >>>
>>>> >>> You can currently use Htm-MoClu in just one computer, the issue you
>>>> faced
>>>> >>> is that it can't create hundreds of models at the same time (like
>>>> its done
>>>> >>> by the traffic example) because instantiate a Network object from
>>>> Htm.java
>>>> >>> is an expensive operation that turns the JVM unresponsive.
>>>> >>>
>>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and the
>>>> only
>>>> >>> thing missing from your specs is:
>>>> >>>
>>>> >>> `allows POST of full model params`
>>>> >>>
>>>> >>> Will chat over Gitter to get more details on this.
>>>> >>>
>>>> >>> You can find an example of its usage in
>>>> https://github.com/antidata/ATAD
>>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates to
>>>> the browser
>>>> >>> in real time (similar to web sockets proposition) and saves the
>>>> requests  +
>>>> >>> results into MongoDB so you can query both the data coming from
>>>> outside and
>>>> >>> the data generated from HTM (anomaly score + predictions).
>>>> >>> One last comment is that Htm-Moclu is web framework agnostic, you
>>>> can use
>>>> >>> any web framework that works on the JVM.
>>>> >>>
>>>> >>> Feel free to ping me if any of you like to contribute to this
>>>> project.
>>>> >>>
>>>> >>> Thanks!
>>>> >>>
>>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <[email protected]>
>>>> wrote:
>>>> >>>>
>>>> >>>> Ok folks, let's move discussion of the implementation to Github.
>>>> First
>>>> >>>> question to answer is which HTM implementation to use:
>>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2
>>>> >>>>
>>>> >>>> Anyone else reading this is free to jump in and help out, but I
>>>> want
>>>> >>>> to define our work properly using Github issues so we all know
>>>> what is
>>>> >>>> happening and who is working on what.
>>>> >>>> ---------
>>>> >>>> Matt Taylor
>>>> >>>> OS Community Flag-Bearer
>>>> >>>> Numenta
>>>> >>>>
>>>> >>>>
>>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie <
>>>> [email protected]>
>>>> >>>> wrote:
>>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for
>>>> getting an
>>>> >>>> > web
>>>> >>>> > app off the ground quickly in python I recommend pyramid:
>>>> >>>> > http://www.pylonsproject.org/
>>>> >>>> >
>>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor <[email protected]>
>>>> wrote:
>>>> >>>> >>
>>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in this
>>>> >>>> >> email. But first, who reading this would want to use an HTM
>>>> over HTTP
>>>> >>>> >> service like this? It means that you won't need to have HTM
>>>> running on
>>>> >>>> >> the same system that is generating the data. It's basically HTM
>>>> in the
>>>> >>>> >> Cloud. :)
>>>> >>>> >>
>>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis <
>>>> [email protected]>
>>>> >>>> >> wrote:
>>>> >>>> >> > I'm interested in HTTP GET, inspecting models.
>>>> >>>> >>
>>>> >>>> >> Great feature to add after a minimum viable product has been
>>>> created,
>>>> >>>> >> but this adds the complexity of either caching or persistence
>>>> >>>> >> (depending on how much history you want).
>>>> >>>> >>
>>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray)
>>>> >>>> >> <[email protected]> wrote:
>>>> >>>> >> > One thing I am concerned about is the call/answer nature of
>>>> the
>>>> >>>> >> > interface
>>>> >>>> >> > you describe because of the latency involved in a
>>>> >>>> >> > submit-one-row-per-call
>>>> >>>> >> > methodology? Should it not be able to "batch" process rows of
>>>> data
>>>> >>>> >> > instead?
>>>> >>>> >> > (batches could contain one row if you were dedicated to being
>>>> a
>>>> >>>> >> > masochist)?
>>>> >>>> >>
>>>> >>>> >> Yes, we will eventually need that, but I don't need it in the
>>>> >>>> >> prototype. Let's focus on one row at a time and expand to
>>>> batching
>>>> >>>> >> later.
>>>> >>>> >>
>>>> >>>> >> > Next, at Cortical we use a technology called DropWizard which
>>>> makes
>>>> >>>> >> > it
>>>> >>>> >> > very
>>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries (I
>>>> have
>>>> >>>> >> > done
>>>> >>>> >> > this
>>>> >>>> >> > for Twitter processing involving HTM.java).
>>>> >>>> >>
>>>> >>>> >> If this is going to use NuPIC and python, I have found that
>>>> it's super
>>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for writing
>>>> a class
>>>> >>>> >> and a few functions. For REST on the JVM, I am open for
>>>> suggestions.
>>>> >>>> >>
>>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger
>>>> >>>> >> <[email protected]> wrote:
>>>> >>>> >> > Like a extended version of HTM engine?
>>>> >>>> >> > This would be the solution to the htmengine prediction issue
>>>> :)
>>>> >>>> >>
>>>> >>>> >> If we chose the HTM Engine option, then yes we would need to
>>>> add some
>>>> >>>> >> features to HTM Engine, especially prediction and user-defined
>>>> model
>>>> >>>> >> params. This is not a little job, but it would be great to have
>>>> a
>>>> >>>> >> scaling platform already built into the HTTP server. I would be
>>>> happy
>>>> >>>> >> even if we just started with an attempt to make HTM Engine (and
>>>> the
>>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud.
>>>> Even with
>>>> >>>> >> it's current capabilities, I could start using it immediately
>>>> and we
>>>> >>>> >> could add features over time.
>>>> >>>> >>
>>>> >>>> >> > Will you set up a repo in the community? :)
>>>> >>>> >>
>>>> >>>> >> Placeholder: https://github.com/nupic-community/htm-over-http
>>>> >>>> >>
>>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision is
>>>> to
>>>> >>>> >> decide which HTM implementation to use. I am leaning towards HTM
>>>> >>>> >> Engine because it would take the smallest amount of effort to
>>>> do the
>>>> >>>> >> deployment configuration around it and get an MVP running the
>>>> fastest
>>>> >>>> >> (even if it doesn't to prediction or custom model params out of
>>>> the
>>>> >>>> >> box).
>>>> >>>> >>
>>>> >>>> >> IMO the best way to attack this is to get something minimal
>>>> running
>>>> >>>> >> ASAP and add features as required.
>>>> >>>> >>
>>>> >>>> >> [1] http://webpy.org/
>>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http
>>>> >>>> >> ---------
>>>> >>>> >> Matt Taylor
>>>> >>>> >> OS Community Flag-Bearer
>>>> >>>> >> Numenta
>>>> >>>> >>
>>>> >>>> >
>>>> >>>> >
>>>> >>>> >
>>>> >>>> > --
>>>> >>>> > Jonathan Mackenzie
>>>> >>>> > BEng (Software) Hons
>>>> >>>> > PhD Candidate, Flinders University
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> With kind regards,
>>>> >>
>>>> >> David Ray
>>>> >> Java Solutions Architect
>>>> >>
>>>> >> Cortical.io
>>>> >> Sponsor of:  HTM.java
>>>> >>
>>>> >> [email protected]
>>>> >> http://cortical.io
>>>>
>>>>
>>>
>>>
>>> --
>>> *With kind regards,*
>>>
>>> David Ray
>>> Java Solutions Architect
>>>
>>> *Cortical.io <http://cortical.io/>*
>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>
>>> [email protected]
>>> http://cortical.io
>>>
>>
>>
>>
>> --
>>
>> Fergal Byrne, Brenter IT @fergbyrne
>>
>> http://inbits.com - Better Living through Thoughtful Technology
>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>
>> Founder of Clortex: HTM in Clojure -
>> https://github.com/nupic-community/clortex
>> Co-creator @OccupyStartups Time-Bombed Open License
>> http://occupystartups.me
>>
>> Author, Real Machine Intelligence with Clortex and NuPIC
>> Read for free or buy the book at https://leanpub.com/realsmartmachines
>>
>> e:[email protected] t:+353 83 4214179
>> Join the quest for Machine Intelligence at http://numenta.org
>> Formerly of Adnet [email protected] http://www.adnet.ie
>>
>
>
>
> --
> *With kind regards,*
>
> David Ray
> Java Solutions Architect
>
> *Cortical.io <http://cortical.io/>*
> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>
> [email protected]
> http://cortical.io
>



-- 

Fergal Byrne, Brenter IT @fergbyrne

http://inbits.com - Better Living through Thoughtful Technology
http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne

Founder of Clortex: HTM in Clojure -
https://github.com/nupic-community/clortex
Co-creator @OccupyStartups Time-Bombed Open License http://occupystartups.me

Author, Real Machine Intelligence with Clortex and NuPIC
Read for free or buy the book at https://leanpub.com/realsmartmachines

e:[email protected] t:+353 83 4214179
Join the quest for Machine Intelligence at http://numenta.org
Formerly of Adnet [email protected] http://www.adnet.ie

Re: HTM.Java performance in HTM-Moclu

Reply via email to