Re: HTM.Java performance in HTM-Moclu

cogmission (David Ray) Tue, 08 Dec 2015 04:05:44 -0800

Hi Fergal,

By "unboxed" are you referring to the term used in the Java world which
refers to primitive constructs? I just wanted to be clear what you mean. So
are you saying that HTM.java should be de-objectified?



On Tue, Dec 8, 2015 at 5:34 AM, Fergal Byrne <[email protected]>
wrote:

> Hi David,
>
> No, I'm afraid this is an insurmountable problem if you use objects
> instead of unboxed arrays. NuPIC can do this by storing all the state in a
> set of C++ unboxed arrays, and it can persist that, free the memory, and
> load a different model. That's how HTM Engine (and Grok) can run lots of
> models. You can't do this in a single JVM if you have live refs to your
> Networks, and rebuilding a Network from disk is a big cost.
>
> The right way to do this is to have one or a very small number of Networks
> in each JVM, and manage them using something like htm-moclu.
>
> (By the way, this means you can't run NuPIC on GAE because they prohibit
> user-provided C/C++. Just Compute Engine or Container Engine).
>
> Regards
>
> Fergal
>
> On Tue, Dec 8, 2015 at 11:21 AM, cogmission (David Ray) <
> [email protected]> wrote:
>
>> Hi Fergal,
>>
>> It's not that big of a deal. I just haven't done a round of profiling
>> yet. Therefore there is lots of room for improvement in terms of memory
>> handling. There are lots of JVM applications running really data heavy
>> applications and the state of the art JVM GC is fully capable of handling
>> these loads. I did a preliminary profiling session back in January and
>> found some places where memory consumption could be optimized - meaning to
>> get back to it after the Network API was finished because I didn't want to
>> optimize things before I got to see some typical usage patterns. If GC were
>> an inescapable problem you wouldn't have the tons of mission critical apps
>> in the financial industry that are running today.
>>
>> I am only one person, and I will get around to it. HTM.java is
>> technically a pre-release (alpha) version for this reason.
>>
>> Cheers,
>> David
>>
>> On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne <[email protected]
>> > wrote:
>>
>>> Hi Matt,
>>>
>>> As Stuart Holloway explains here [1], on the JVM, it's always GC. I can
>>> barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym hourly
>>> data - it slows to a crawl after 1000 rows because it's thrashing the GC
>>> trying to free space on the heap (setting -Xmx2800m as JVM params helps).
>>> Good luck trying to keep any more than one model per JVM up for any length
>>> of time.
>>>
>>> If you run htm-moclu in a single JVM, something somewhere will have live
>>> references leading to every Network you have loaded. So your live heap is
>>> going to be at least N models x heap per model. This is not a big problem
>>> until you start growing distal segments, which are Java objects on the
>>> heap. In HTM.java this happens in the TM, which grows as it learns.
>>>
>>> GC will detect an impending OOM condition, then will stop the world and
>>> mark all these live references, traversing your millions of objects.
>>> Finding nothing to free, the JVM will eventually fail at some unpredictable
>>> and unrelated point in the code.
>>>
>>> To check this, run this function every few rows:
>>>
>>> void mem() {
>>>     int mb = 1024 * 1024;
>>>
>>>     // get Runtime instance
>>>     Runtime instance = Runtime.getRuntime();
>>>
>>>     //System.out.println("***** Heap utilization statistics [MB] *****\n");
>>>
>>>     // available memory
>>>     System.out.println("Total: " + instance.totalMemory() / mb
>>>             + "\tFree: " + instance.freeMemory() / mb
>>>             + "\tUsed Memory: " + (instance.totalMemory() - 
>>> instance.freeMemory()) / mb
>>>             + "\tMax Memory: " + instance.maxMemory() / mb);
>>> }
>>>
>>> Regards,
>>>
>>> Fergal Byrne
>>>
>>> [1] https://youtu.be/FihU5JxmnBg?t=38m6s
>>>
>>> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) <
>>> [email protected]> wrote:
>>>
>>>> Hey Matt, did you try ramping up from 1 model to see if it was a
>>>> capacity issue? I would be interested to see how the system responds as an
>>>> increasing number of models are added. Anyway, I can't really comment on
>>>> moclu as I don't know what's happening there and I don't have time these
>>>> days to help investigate as I am stretched a bit thin at the moment.
>>>>
>>>> @antidata if you could explain what you mean by "renders the JVM
>>>> unresponsive" it would help me possibly attend to any issue there might be
>>>> in the Network API though I never had any problems with unresponsiveness at
>>>> all. Thanks...
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <[email protected]>
>>>> wrote:
>>>>
>>>>> David, BTW the failure in the video is a 4m:
>>>>> https://youtu.be/DnKxrd4TLT8?t=4m
>>>>> ---------
>>>>> Matt Taylor
>>>>> OS Community Flag-Bearer
>>>>> Numenta
>>>>>
>>>>>
>>>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <[email protected]>
>>>>> wrote:
>>>>> > David and Mike,
>>>>> >
>>>>> > I've moved this to another topic to discuss.
>>>>> >
>>>>> > So what I tried with moclu was to take the HTM engine traffic app as
>>>>> shown here:
>>>>> >
>>>>> >
>>>>> https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg
>>>>> >
>>>>> > And I swapped out the entire green python box containing the HTM
>>>>> > Engine and replaced it with a local instance of moclu. When the
>>>>> > traffic app starts up, it creates 153 models immediately and then
>>>>> > starts pushing data into all of them at once:
>>>>> >
>>>>> > https://youtu.be/lzJd_a6y6-E?t=15m
>>>>> >
>>>>> > This caused dramatic failure in HTM Moclu, and I think that is what
>>>>> > Mike's talking about. I recorded it for Mike here:
>>>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8
>>>>> >
>>>>> > I hope that explains some things.
>>>>> >
>>>>> > ---------
>>>>> > Matt Taylor
>>>>> > OS Community Flag-Bearer
>>>>> > Numenta
>>>>> >
>>>>> >
>>>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray)
>>>>> > <[email protected]> wrote:
>>>>> >>>  the issue you faced is that it can't create hundreds of models at
>>>>> the
>>>>> >>> same time (like its done by the traffic example) because
>>>>> instantiate a
>>>>> >>> Network object from Htm.java is an expensive operation that turns
>>>>> the JVM
>>>>> >>> unresponsive.
>>>>> >>
>>>>> >> What is being implied here? Are you saying that instantiating
>>>>> HTM.java is
>>>>> >> anymore expensive than instantiating any other medium weight
>>>>> application?
>>>>> >>
>>>>> >> Cheers,
>>>>> >> David
>>>>> >>
>>>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta <[email protected]>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Hello Matt, folks
>>>>> >>>
>>>>> >>> You can currently use Htm-MoClu in just one computer, the issue
>>>>> you faced
>>>>> >>> is that it can't create hundreds of models at the same time (like
>>>>> its done
>>>>> >>> by the traffic example) because instantiate a Network object from
>>>>> Htm.java
>>>>> >>> is an expensive operation that turns the JVM unresponsive.
>>>>> >>>
>>>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and the
>>>>> only
>>>>> >>> thing missing from your specs is:
>>>>> >>>
>>>>> >>> `allows POST of full model params`
>>>>> >>>
>>>>> >>> Will chat over Gitter to get more details on this.
>>>>> >>>
>>>>> >>> You can find an example of its usage in
>>>>> https://github.com/antidata/ATAD
>>>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates to
>>>>> the browser
>>>>> >>> in real time (similar to web sockets proposition) and saves the
>>>>> requests  +
>>>>> >>> results into MongoDB so you can query both the data coming from
>>>>> outside and
>>>>> >>> the data generated from HTM (anomaly score + predictions).
>>>>> >>> One last comment is that Htm-Moclu is web framework agnostic, you
>>>>> can use
>>>>> >>> any web framework that works on the JVM.
>>>>> >>>
>>>>> >>> Feel free to ping me if any of you like to contribute to this
>>>>> project.
>>>>> >>>
>>>>> >>> Thanks!
>>>>> >>>
>>>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <[email protected]>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> Ok folks, let's move discussion of the implementation to Github.
>>>>> First
>>>>> >>>> question to answer is which HTM implementation to use:
>>>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2
>>>>> >>>>
>>>>> >>>> Anyone else reading this is free to jump in and help out, but I
>>>>> want
>>>>> >>>> to define our work properly using Github issues so we all know
>>>>> what is
>>>>> >>>> happening and who is working on what.
>>>>> >>>> ---------
>>>>> >>>> Matt Taylor
>>>>> >>>> OS Community Flag-Bearer
>>>>> >>>> Numenta
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie <
>>>>> [email protected]>
>>>>> >>>> wrote:
>>>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for
>>>>> getting an
>>>>> >>>> > web
>>>>> >>>> > app off the ground quickly in python I recommend pyramid:
>>>>> >>>> > http://www.pylonsproject.org/
>>>>> >>>> >
>>>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor <[email protected]>
>>>>> wrote:
>>>>> >>>> >>
>>>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in
>>>>> this
>>>>> >>>> >> email. But first, who reading this would want to use an HTM
>>>>> over HTTP
>>>>> >>>> >> service like this? It means that you won't need to have HTM
>>>>> running on
>>>>> >>>> >> the same system that is generating the data. It's basically
>>>>> HTM in the
>>>>> >>>> >> Cloud. :)
>>>>> >>>> >>
>>>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis <
>>>>> [email protected]>
>>>>> >>>> >> wrote:
>>>>> >>>> >> > I'm interested in HTTP GET, inspecting models.
>>>>> >>>> >>
>>>>> >>>> >> Great feature to add after a minimum viable product has been
>>>>> created,
>>>>> >>>> >> but this adds the complexity of either caching or persistence
>>>>> >>>> >> (depending on how much history you want).
>>>>> >>>> >>
>>>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray)
>>>>> >>>> >> <[email protected]> wrote:
>>>>> >>>> >> > One thing I am concerned about is the call/answer nature of
>>>>> the
>>>>> >>>> >> > interface
>>>>> >>>> >> > you describe because of the latency involved in a
>>>>> >>>> >> > submit-one-row-per-call
>>>>> >>>> >> > methodology? Should it not be able to "batch" process rows
>>>>> of data
>>>>> >>>> >> > instead?
>>>>> >>>> >> > (batches could contain one row if you were dedicated to
>>>>> being a
>>>>> >>>> >> > masochist)?
>>>>> >>>> >>
>>>>> >>>> >> Yes, we will eventually need that, but I don't need it in the
>>>>> >>>> >> prototype. Let's focus on one row at a time and expand to
>>>>> batching
>>>>> >>>> >> later.
>>>>> >>>> >>
>>>>> >>>> >> > Next, at Cortical we use a technology called DropWizard
>>>>> which makes
>>>>> >>>> >> > it
>>>>> >>>> >> > very
>>>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries (I
>>>>> have
>>>>> >>>> >> > done
>>>>> >>>> >> > this
>>>>> >>>> >> > for Twitter processing involving HTM.java).
>>>>> >>>> >>
>>>>> >>>> >> If this is going to use NuPIC and python, I have found that
>>>>> it's super
>>>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for writing
>>>>> a class
>>>>> >>>> >> and a few functions. For REST on the JVM, I am open for
>>>>> suggestions.
>>>>> >>>> >>
>>>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger
>>>>> >>>> >> <[email protected]> wrote:
>>>>> >>>> >> > Like a extended version of HTM engine?
>>>>> >>>> >> > This would be the solution to the htmengine prediction issue
>>>>> :)
>>>>> >>>> >>
>>>>> >>>> >> If we chose the HTM Engine option, then yes we would need to
>>>>> add some
>>>>> >>>> >> features to HTM Engine, especially prediction and user-defined
>>>>> model
>>>>> >>>> >> params. This is not a little job, but it would be great to
>>>>> have a
>>>>> >>>> >> scaling platform already built into the HTTP server. I would
>>>>> be happy
>>>>> >>>> >> even if we just started with an attempt to make HTM Engine
>>>>> (and the
>>>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud.
>>>>> Even with
>>>>> >>>> >> it's current capabilities, I could start using it immediately
>>>>> and we
>>>>> >>>> >> could add features over time.
>>>>> >>>> >>
>>>>> >>>> >> > Will you set up a repo in the community? :)
>>>>> >>>> >>
>>>>> >>>> >> Placeholder: https://github.com/nupic-community/htm-over-http
>>>>> >>>> >>
>>>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision is
>>>>> to
>>>>> >>>> >> decide which HTM implementation to use. I am leaning towards
>>>>> HTM
>>>>> >>>> >> Engine because it would take the smallest amount of effort to
>>>>> do the
>>>>> >>>> >> deployment configuration around it and get an MVP running the
>>>>> fastest
>>>>> >>>> >> (even if it doesn't to prediction or custom model params out
>>>>> of the
>>>>> >>>> >> box).
>>>>> >>>> >>
>>>>> >>>> >> IMO the best way to attack this is to get something minimal
>>>>> running
>>>>> >>>> >> ASAP and add features as required.
>>>>> >>>> >>
>>>>> >>>> >> [1] http://webpy.org/
>>>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http
>>>>> >>>> >> ---------
>>>>> >>>> >> Matt Taylor
>>>>> >>>> >> OS Community Flag-Bearer
>>>>> >>>> >> Numenta
>>>>> >>>> >>
>>>>> >>>> >
>>>>> >>>> >
>>>>> >>>> >
>>>>> >>>> > --
>>>>> >>>> > Jonathan Mackenzie
>>>>> >>>> > BEng (Software) Hons
>>>>> >>>> > PhD Candidate, Flinders University
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> With kind regards,
>>>>> >>
>>>>> >> David Ray
>>>>> >> Java Solutions Architect
>>>>> >>
>>>>> >> Cortical.io
>>>>> >> Sponsor of:  HTM.java
>>>>> >>
>>>>> >> [email protected]
>>>>> >> http://cortical.io
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *With kind regards,*
>>>>
>>>> David Ray
>>>> Java Solutions Architect
>>>>
>>>> *Cortical.io <http://cortical.io/>*
>>>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>>>
>>>> [email protected]
>>>> http://cortical.io
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Fergal Byrne, Brenter IT @fergbyrne
>>>
>>> http://inbits.com - Better Living through Thoughtful Technology
>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>>
>>> Founder of Clortex: HTM in Clojure -
>>> https://github.com/nupic-community/clortex
>>> Co-creator @OccupyStartups Time-Bombed Open License
>>> http://occupystartups.me
>>>
>>> Author, Real Machine Intelligence with Clortex and NuPIC
>>> Read for free or buy the book at https://leanpub.com/realsmartmachines
>>>
>>> e:[email protected] t:+353 83 4214179
>>> Join the quest for Machine Intelligence at http://numenta.org
>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>
>>
>>
>>
>> --
>> *With kind regards,*
>>
>> David Ray
>> Java Solutions Architect
>>
>> *Cortical.io <http://cortical.io/>*
>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>
>> [email protected]
>> http://cortical.io
>>
>
>
>
> --
>
> Fergal Byrne, Brenter IT @fergbyrne
>
> http://inbits.com - Better Living through Thoughtful Technology
> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>
> Founder of Clortex: HTM in Clojure -
> https://github.com/nupic-community/clortex
> Co-creator @OccupyStartups Time-Bombed Open License
> http://occupystartups.me
>
> Author, Real Machine Intelligence with Clortex and NuPIC
> Read for free or buy the book at https://leanpub.com/realsmartmachines
>
> e:[email protected] t:+353 83 4214179
> Join the quest for Machine Intelligence at http://numenta.org
> Formerly of Adnet [email protected] http://www.adnet.ie
>



-- 
*With kind regards,*

David Ray
Java Solutions Architect

*Cortical.io <http://cortical.io/>*
Sponsor of:  HTM.java <https://github.com/numenta/htm.java>

[email protected]
http://cortical.io

Re: HTM.Java performance in HTM-Moclu

Reply via email to