Hi David, No, I'm afraid this is an insurmountable problem if you use objects instead of unboxed arrays. NuPIC can do this by storing all the state in a set of C++ unboxed arrays, and it can persist that, free the memory, and load a different model. That's how HTM Engine (and Grok) can run lots of models. You can't do this in a single JVM if you have live refs to your Networks, and rebuilding a Network from disk is a big cost.
The right way to do this is to have one or a very small number of Networks in each JVM, and manage them using something like htm-moclu. (By the way, this means you can't run NuPIC on GAE because they prohibit user-provided C/C++. Just Compute Engine or Container Engine). Regards Fergal On Tue, Dec 8, 2015 at 11:21 AM, cogmission (David Ray) < [email protected]> wrote: > Hi Fergal, > > It's not that big of a deal. I just haven't done a round of profiling yet. > Therefore there is lots of room for improvement in terms of memory > handling. There are lots of JVM applications running really data heavy > applications and the state of the art JVM GC is fully capable of handling > these loads. I did a preliminary profiling session back in January and > found some places where memory consumption could be optimized - meaning to > get back to it after the Network API was finished because I didn't want to > optimize things before I got to see some typical usage patterns. If GC were > an inescapable problem you wouldn't have the tons of mission critical apps > in the financial industry that are running today. > > I am only one person, and I will get around to it. HTM.java is technically > a pre-release (alpha) version for this reason. > > Cheers, > David > > On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne <[email protected]> > wrote: > >> Hi Matt, >> >> As Stuart Holloway explains here [1], on the JVM, it's always GC. I can >> barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym hourly >> data - it slows to a crawl after 1000 rows because it's thrashing the GC >> trying to free space on the heap (setting -Xmx2800m as JVM params helps). >> Good luck trying to keep any more than one model per JVM up for any length >> of time. >> >> If you run htm-moclu in a single JVM, something somewhere will have live >> references leading to every Network you have loaded. So your live heap is >> going to be at least N models x heap per model. This is not a big problem >> until you start growing distal segments, which are Java objects on the >> heap. In HTM.java this happens in the TM, which grows as it learns. >> >> GC will detect an impending OOM condition, then will stop the world and >> mark all these live references, traversing your millions of objects. >> Finding nothing to free, the JVM will eventually fail at some unpredictable >> and unrelated point in the code. >> >> To check this, run this function every few rows: >> >> void mem() { >> int mb = 1024 * 1024; >> >> // get Runtime instance >> Runtime instance = Runtime.getRuntime(); >> >> //System.out.println("***** Heap utilization statistics [MB] *****\n"); >> >> // available memory >> System.out.println("Total: " + instance.totalMemory() / mb >> + "\tFree: " + instance.freeMemory() / mb >> + "\tUsed Memory: " + (instance.totalMemory() - >> instance.freeMemory()) / mb >> + "\tMax Memory: " + instance.maxMemory() / mb); >> } >> >> Regards, >> >> Fergal Byrne >> >> [1] https://youtu.be/FihU5JxmnBg?t=38m6s >> >> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) < >> [email protected]> wrote: >> >>> Hey Matt, did you try ramping up from 1 model to see if it was a >>> capacity issue? I would be interested to see how the system responds as an >>> increasing number of models are added. Anyway, I can't really comment on >>> moclu as I don't know what's happening there and I don't have time these >>> days to help investigate as I am stretched a bit thin at the moment. >>> >>> @antidata if you could explain what you mean by "renders the JVM >>> unresponsive" it would help me possibly attend to any issue there might be >>> in the Network API though I never had any problems with unresponsiveness at >>> all. Thanks... >>> >>> Cheers, >>> David >>> >>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <[email protected]> wrote: >>> >>>> David, BTW the failure in the video is a 4m: >>>> https://youtu.be/DnKxrd4TLT8?t=4m >>>> --------- >>>> Matt Taylor >>>> OS Community Flag-Bearer >>>> Numenta >>>> >>>> >>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <[email protected]> >>>> wrote: >>>> > David and Mike, >>>> > >>>> > I've moved this to another topic to discuss. >>>> > >>>> > So what I tried with moclu was to take the HTM engine traffic app as >>>> shown here: >>>> > >>>> > >>>> https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg >>>> > >>>> > And I swapped out the entire green python box containing the HTM >>>> > Engine and replaced it with a local instance of moclu. When the >>>> > traffic app starts up, it creates 153 models immediately and then >>>> > starts pushing data into all of them at once: >>>> > >>>> > https://youtu.be/lzJd_a6y6-E?t=15m >>>> > >>>> > This caused dramatic failure in HTM Moclu, and I think that is what >>>> > Mike's talking about. I recorded it for Mike here: >>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8 >>>> > >>>> > I hope that explains some things. >>>> > >>>> > --------- >>>> > Matt Taylor >>>> > OS Community Flag-Bearer >>>> > Numenta >>>> > >>>> > >>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray) >>>> > <[email protected]> wrote: >>>> >>> the issue you faced is that it can't create hundreds of models at >>>> the >>>> >>> same time (like its done by the traffic example) because >>>> instantiate a >>>> >>> Network object from Htm.java is an expensive operation that turns >>>> the JVM >>>> >>> unresponsive. >>>> >> >>>> >> What is being implied here? Are you saying that instantiating >>>> HTM.java is >>>> >> anymore expensive than instantiating any other medium weight >>>> application? >>>> >> >>>> >> Cheers, >>>> >> David >>>> >> >>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta <[email protected]> >>>> wrote: >>>> >>> >>>> >>> Hello Matt, folks >>>> >>> >>>> >>> You can currently use Htm-MoClu in just one computer, the issue you >>>> faced >>>> >>> is that it can't create hundreds of models at the same time (like >>>> its done >>>> >>> by the traffic example) because instantiate a Network object from >>>> Htm.java >>>> >>> is an expensive operation that turns the JVM unresponsive. >>>> >>> >>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and the >>>> only >>>> >>> thing missing from your specs is: >>>> >>> >>>> >>> `allows POST of full model params` >>>> >>> >>>> >>> Will chat over Gitter to get more details on this. >>>> >>> >>>> >>> You can find an example of its usage in >>>> https://github.com/antidata/ATAD >>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates to >>>> the browser >>>> >>> in real time (similar to web sockets proposition) and saves the >>>> requests + >>>> >>> results into MongoDB so you can query both the data coming from >>>> outside and >>>> >>> the data generated from HTM (anomaly score + predictions). >>>> >>> One last comment is that Htm-Moclu is web framework agnostic, you >>>> can use >>>> >>> any web framework that works on the JVM. >>>> >>> >>>> >>> Feel free to ping me if any of you like to contribute to this >>>> project. >>>> >>> >>>> >>> Thanks! >>>> >>> >>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <[email protected]> >>>> wrote: >>>> >>>> >>>> >>>> Ok folks, let's move discussion of the implementation to Github. >>>> First >>>> >>>> question to answer is which HTM implementation to use: >>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2 >>>> >>>> >>>> >>>> Anyone else reading this is free to jump in and help out, but I >>>> want >>>> >>>> to define our work properly using Github issues so we all know >>>> what is >>>> >>>> happening and who is working on what. >>>> >>>> --------- >>>> >>>> Matt Taylor >>>> >>>> OS Community Flag-Bearer >>>> >>>> Numenta >>>> >>>> >>>> >>>> >>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie < >>>> [email protected]> >>>> >>>> wrote: >>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for >>>> getting an >>>> >>>> > web >>>> >>>> > app off the ground quickly in python I recommend pyramid: >>>> >>>> > http://www.pylonsproject.org/ >>>> >>>> > >>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor <[email protected]> >>>> wrote: >>>> >>>> >> >>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in this >>>> >>>> >> email. But first, who reading this would want to use an HTM >>>> over HTTP >>>> >>>> >> service like this? It means that you won't need to have HTM >>>> running on >>>> >>>> >> the same system that is generating the data. It's basically HTM >>>> in the >>>> >>>> >> Cloud. :) >>>> >>>> >> >>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis < >>>> [email protected]> >>>> >>>> >> wrote: >>>> >>>> >> > I'm interested in HTTP GET, inspecting models. >>>> >>>> >> >>>> >>>> >> Great feature to add after a minimum viable product has been >>>> created, >>>> >>>> >> but this adds the complexity of either caching or persistence >>>> >>>> >> (depending on how much history you want). >>>> >>>> >> >>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray) >>>> >>>> >> <[email protected]> wrote: >>>> >>>> >> > One thing I am concerned about is the call/answer nature of >>>> the >>>> >>>> >> > interface >>>> >>>> >> > you describe because of the latency involved in a >>>> >>>> >> > submit-one-row-per-call >>>> >>>> >> > methodology? Should it not be able to "batch" process rows of >>>> data >>>> >>>> >> > instead? >>>> >>>> >> > (batches could contain one row if you were dedicated to being >>>> a >>>> >>>> >> > masochist)? >>>> >>>> >> >>>> >>>> >> Yes, we will eventually need that, but I don't need it in the >>>> >>>> >> prototype. Let's focus on one row at a time and expand to >>>> batching >>>> >>>> >> later. >>>> >>>> >> >>>> >>>> >> > Next, at Cortical we use a technology called DropWizard which >>>> makes >>>> >>>> >> > it >>>> >>>> >> > very >>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries (I >>>> have >>>> >>>> >> > done >>>> >>>> >> > this >>>> >>>> >> > for Twitter processing involving HTM.java). >>>> >>>> >> >>>> >>>> >> If this is going to use NuPIC and python, I have found that >>>> it's super >>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for writing >>>> a class >>>> >>>> >> and a few functions. For REST on the JVM, I am open for >>>> suggestions. >>>> >>>> >> >>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger >>>> >>>> >> <[email protected]> wrote: >>>> >>>> >> > Like a extended version of HTM engine? >>>> >>>> >> > This would be the solution to the htmengine prediction issue >>>> :) >>>> >>>> >> >>>> >>>> >> If we chose the HTM Engine option, then yes we would need to >>>> add some >>>> >>>> >> features to HTM Engine, especially prediction and user-defined >>>> model >>>> >>>> >> params. This is not a little job, but it would be great to have >>>> a >>>> >>>> >> scaling platform already built into the HTTP server. I would be >>>> happy >>>> >>>> >> even if we just started with an attempt to make HTM Engine (and >>>> the >>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud. >>>> Even with >>>> >>>> >> it's current capabilities, I could start using it immediately >>>> and we >>>> >>>> >> could add features over time. >>>> >>>> >> >>>> >>>> >> > Will you set up a repo in the community? :) >>>> >>>> >> >>>> >>>> >> Placeholder: https://github.com/nupic-community/htm-over-http >>>> >>>> >> >>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision is >>>> to >>>> >>>> >> decide which HTM implementation to use. I am leaning towards HTM >>>> >>>> >> Engine because it would take the smallest amount of effort to >>>> do the >>>> >>>> >> deployment configuration around it and get an MVP running the >>>> fastest >>>> >>>> >> (even if it doesn't to prediction or custom model params out of >>>> the >>>> >>>> >> box). >>>> >>>> >> >>>> >>>> >> IMO the best way to attack this is to get something minimal >>>> running >>>> >>>> >> ASAP and add features as required. >>>> >>>> >> >>>> >>>> >> [1] http://webpy.org/ >>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http >>>> >>>> >> --------- >>>> >>>> >> Matt Taylor >>>> >>>> >> OS Community Flag-Bearer >>>> >>>> >> Numenta >>>> >>>> >> >>>> >>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> > -- >>>> >>>> > Jonathan Mackenzie >>>> >>>> > BEng (Software) Hons >>>> >>>> > PhD Candidate, Flinders University >>>> >>>> >>>> >>> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> With kind regards, >>>> >> >>>> >> David Ray >>>> >> Java Solutions Architect >>>> >> >>>> >> Cortical.io >>>> >> Sponsor of: HTM.java >>>> >> >>>> >> [email protected] >>>> >> http://cortical.io >>>> >>>> >>> >>> >>> -- >>> *With kind regards,* >>> >>> David Ray >>> Java Solutions Architect >>> >>> *Cortical.io <http://cortical.io/>* >>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>> >>> [email protected] >>> http://cortical.io >>> >> >> >> >> -- >> >> Fergal Byrne, Brenter IT @fergbyrne >> >> http://inbits.com - Better Living through Thoughtful Technology >> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >> >> Founder of Clortex: HTM in Clojure - >> https://github.com/nupic-community/clortex >> Co-creator @OccupyStartups Time-Bombed Open License >> http://occupystartups.me >> >> Author, Real Machine Intelligence with Clortex and NuPIC >> Read for free or buy the book at https://leanpub.com/realsmartmachines >> >> e:[email protected] t:+353 83 4214179 >> Join the quest for Machine Intelligence at http://numenta.org >> Formerly of Adnet [email protected] http://www.adnet.ie >> > > > > -- > *With kind regards,* > > David Ray > Java Solutions Architect > > *Cortical.io <http://cortical.io/>* > Sponsor of: HTM.java <https://github.com/numenta/htm.java> > > [email protected] > http://cortical.io > -- Fergal Byrne, Brenter IT @fergbyrne http://inbits.com - Better Living through Thoughtful Technology http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne Founder of Clortex: HTM in Clojure - https://github.com/nupic-community/clortex Co-creator @OccupyStartups Time-Bombed Open License http://occupystartups.me Author, Real Machine Intelligence with Clortex and NuPIC Read for free or buy the book at https://leanpub.com/realsmartmachines e:[email protected] t:+353 83 4214179 Join the quest for Machine Intelligence at http://numenta.org Formerly of Adnet [email protected] http://www.adnet.ie
