Hi Fergal, By "unboxed" are you referring to the term used in the Java world which refers to primitive constructs? I just wanted to be clear what you mean. So are you saying that HTM.java should be de-objectified?
On Tue, Dec 8, 2015 at 5:34 AM, Fergal Byrne <[email protected]> wrote: > Hi David, > > No, I'm afraid this is an insurmountable problem if you use objects > instead of unboxed arrays. NuPIC can do this by storing all the state in a > set of C++ unboxed arrays, and it can persist that, free the memory, and > load a different model. That's how HTM Engine (and Grok) can run lots of > models. You can't do this in a single JVM if you have live refs to your > Networks, and rebuilding a Network from disk is a big cost. > > The right way to do this is to have one or a very small number of Networks > in each JVM, and manage them using something like htm-moclu. > > (By the way, this means you can't run NuPIC on GAE because they prohibit > user-provided C/C++. Just Compute Engine or Container Engine). > > Regards > > Fergal > > On Tue, Dec 8, 2015 at 11:21 AM, cogmission (David Ray) < > [email protected]> wrote: > >> Hi Fergal, >> >> It's not that big of a deal. I just haven't done a round of profiling >> yet. Therefore there is lots of room for improvement in terms of memory >> handling. There are lots of JVM applications running really data heavy >> applications and the state of the art JVM GC is fully capable of handling >> these loads. I did a preliminary profiling session back in January and >> found some places where memory consumption could be optimized - meaning to >> get back to it after the Network API was finished because I didn't want to >> optimize things before I got to see some typical usage patterns. If GC were >> an inescapable problem you wouldn't have the tons of mission critical apps >> in the financial industry that are running today. >> >> I am only one person, and I will get around to it. HTM.java is >> technically a pre-release (alpha) version for this reason. >> >> Cheers, >> David >> >> On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne <[email protected] >> > wrote: >> >>> Hi Matt, >>> >>> As Stuart Holloway explains here [1], on the JVM, it's always GC. I can >>> barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym hourly >>> data - it slows to a crawl after 1000 rows because it's thrashing the GC >>> trying to free space on the heap (setting -Xmx2800m as JVM params helps). >>> Good luck trying to keep any more than one model per JVM up for any length >>> of time. >>> >>> If you run htm-moclu in a single JVM, something somewhere will have live >>> references leading to every Network you have loaded. So your live heap is >>> going to be at least N models x heap per model. This is not a big problem >>> until you start growing distal segments, which are Java objects on the >>> heap. In HTM.java this happens in the TM, which grows as it learns. >>> >>> GC will detect an impending OOM condition, then will stop the world and >>> mark all these live references, traversing your millions of objects. >>> Finding nothing to free, the JVM will eventually fail at some unpredictable >>> and unrelated point in the code. >>> >>> To check this, run this function every few rows: >>> >>> void mem() { >>> int mb = 1024 * 1024; >>> >>> // get Runtime instance >>> Runtime instance = Runtime.getRuntime(); >>> >>> //System.out.println("***** Heap utilization statistics [MB] *****\n"); >>> >>> // available memory >>> System.out.println("Total: " + instance.totalMemory() / mb >>> + "\tFree: " + instance.freeMemory() / mb >>> + "\tUsed Memory: " + (instance.totalMemory() - >>> instance.freeMemory()) / mb >>> + "\tMax Memory: " + instance.maxMemory() / mb); >>> } >>> >>> Regards, >>> >>> Fergal Byrne >>> >>> [1] https://youtu.be/FihU5JxmnBg?t=38m6s >>> >>> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) < >>> [email protected]> wrote: >>> >>>> Hey Matt, did you try ramping up from 1 model to see if it was a >>>> capacity issue? I would be interested to see how the system responds as an >>>> increasing number of models are added. Anyway, I can't really comment on >>>> moclu as I don't know what's happening there and I don't have time these >>>> days to help investigate as I am stretched a bit thin at the moment. >>>> >>>> @antidata if you could explain what you mean by "renders the JVM >>>> unresponsive" it would help me possibly attend to any issue there might be >>>> in the Network API though I never had any problems with unresponsiveness at >>>> all. Thanks... >>>> >>>> Cheers, >>>> David >>>> >>>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <[email protected]> >>>> wrote: >>>> >>>>> David, BTW the failure in the video is a 4m: >>>>> https://youtu.be/DnKxrd4TLT8?t=4m >>>>> --------- >>>>> Matt Taylor >>>>> OS Community Flag-Bearer >>>>> Numenta >>>>> >>>>> >>>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <[email protected]> >>>>> wrote: >>>>> > David and Mike, >>>>> > >>>>> > I've moved this to another topic to discuss. >>>>> > >>>>> > So what I tried with moclu was to take the HTM engine traffic app as >>>>> shown here: >>>>> > >>>>> > >>>>> https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg >>>>> > >>>>> > And I swapped out the entire green python box containing the HTM >>>>> > Engine and replaced it with a local instance of moclu. When the >>>>> > traffic app starts up, it creates 153 models immediately and then >>>>> > starts pushing data into all of them at once: >>>>> > >>>>> > https://youtu.be/lzJd_a6y6-E?t=15m >>>>> > >>>>> > This caused dramatic failure in HTM Moclu, and I think that is what >>>>> > Mike's talking about. I recorded it for Mike here: >>>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8 >>>>> > >>>>> > I hope that explains some things. >>>>> > >>>>> > --------- >>>>> > Matt Taylor >>>>> > OS Community Flag-Bearer >>>>> > Numenta >>>>> > >>>>> > >>>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray) >>>>> > <[email protected]> wrote: >>>>> >>> the issue you faced is that it can't create hundreds of models at >>>>> the >>>>> >>> same time (like its done by the traffic example) because >>>>> instantiate a >>>>> >>> Network object from Htm.java is an expensive operation that turns >>>>> the JVM >>>>> >>> unresponsive. >>>>> >> >>>>> >> What is being implied here? Are you saying that instantiating >>>>> HTM.java is >>>>> >> anymore expensive than instantiating any other medium weight >>>>> application? >>>>> >> >>>>> >> Cheers, >>>>> >> David >>>>> >> >>>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta <[email protected]> >>>>> wrote: >>>>> >>> >>>>> >>> Hello Matt, folks >>>>> >>> >>>>> >>> You can currently use Htm-MoClu in just one computer, the issue >>>>> you faced >>>>> >>> is that it can't create hundreds of models at the same time (like >>>>> its done >>>>> >>> by the traffic example) because instantiate a Network object from >>>>> Htm.java >>>>> >>> is an expensive operation that turns the JVM unresponsive. >>>>> >>> >>>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and the >>>>> only >>>>> >>> thing missing from your specs is: >>>>> >>> >>>>> >>> `allows POST of full model params` >>>>> >>> >>>>> >>> Will chat over Gitter to get more details on this. >>>>> >>> >>>>> >>> You can find an example of its usage in >>>>> https://github.com/antidata/ATAD >>>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates to >>>>> the browser >>>>> >>> in real time (similar to web sockets proposition) and saves the >>>>> requests + >>>>> >>> results into MongoDB so you can query both the data coming from >>>>> outside and >>>>> >>> the data generated from HTM (anomaly score + predictions). >>>>> >>> One last comment is that Htm-Moclu is web framework agnostic, you >>>>> can use >>>>> >>> any web framework that works on the JVM. >>>>> >>> >>>>> >>> Feel free to ping me if any of you like to contribute to this >>>>> project. >>>>> >>> >>>>> >>> Thanks! >>>>> >>> >>>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <[email protected]> >>>>> wrote: >>>>> >>>> >>>>> >>>> Ok folks, let's move discussion of the implementation to Github. >>>>> First >>>>> >>>> question to answer is which HTM implementation to use: >>>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2 >>>>> >>>> >>>>> >>>> Anyone else reading this is free to jump in and help out, but I >>>>> want >>>>> >>>> to define our work properly using Github issues so we all know >>>>> what is >>>>> >>>> happening and who is working on what. >>>>> >>>> --------- >>>>> >>>> Matt Taylor >>>>> >>>> OS Community Flag-Bearer >>>>> >>>> Numenta >>>>> >>>> >>>>> >>>> >>>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie < >>>>> [email protected]> >>>>> >>>> wrote: >>>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for >>>>> getting an >>>>> >>>> > web >>>>> >>>> > app off the ground quickly in python I recommend pyramid: >>>>> >>>> > http://www.pylonsproject.org/ >>>>> >>>> > >>>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor <[email protected]> >>>>> wrote: >>>>> >>>> >> >>>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in >>>>> this >>>>> >>>> >> email. But first, who reading this would want to use an HTM >>>>> over HTTP >>>>> >>>> >> service like this? It means that you won't need to have HTM >>>>> running on >>>>> >>>> >> the same system that is generating the data. It's basically >>>>> HTM in the >>>>> >>>> >> Cloud. :) >>>>> >>>> >> >>>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis < >>>>> [email protected]> >>>>> >>>> >> wrote: >>>>> >>>> >> > I'm interested in HTTP GET, inspecting models. >>>>> >>>> >> >>>>> >>>> >> Great feature to add after a minimum viable product has been >>>>> created, >>>>> >>>> >> but this adds the complexity of either caching or persistence >>>>> >>>> >> (depending on how much history you want). >>>>> >>>> >> >>>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray) >>>>> >>>> >> <[email protected]> wrote: >>>>> >>>> >> > One thing I am concerned about is the call/answer nature of >>>>> the >>>>> >>>> >> > interface >>>>> >>>> >> > you describe because of the latency involved in a >>>>> >>>> >> > submit-one-row-per-call >>>>> >>>> >> > methodology? Should it not be able to "batch" process rows >>>>> of data >>>>> >>>> >> > instead? >>>>> >>>> >> > (batches could contain one row if you were dedicated to >>>>> being a >>>>> >>>> >> > masochist)? >>>>> >>>> >> >>>>> >>>> >> Yes, we will eventually need that, but I don't need it in the >>>>> >>>> >> prototype. Let's focus on one row at a time and expand to >>>>> batching >>>>> >>>> >> later. >>>>> >>>> >> >>>>> >>>> >> > Next, at Cortical we use a technology called DropWizard >>>>> which makes >>>>> >>>> >> > it >>>>> >>>> >> > very >>>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries (I >>>>> have >>>>> >>>> >> > done >>>>> >>>> >> > this >>>>> >>>> >> > for Twitter processing involving HTM.java). >>>>> >>>> >> >>>>> >>>> >> If this is going to use NuPIC and python, I have found that >>>>> it's super >>>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for writing >>>>> a class >>>>> >>>> >> and a few functions. For REST on the JVM, I am open for >>>>> suggestions. >>>>> >>>> >> >>>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger >>>>> >>>> >> <[email protected]> wrote: >>>>> >>>> >> > Like a extended version of HTM engine? >>>>> >>>> >> > This would be the solution to the htmengine prediction issue >>>>> :) >>>>> >>>> >> >>>>> >>>> >> If we chose the HTM Engine option, then yes we would need to >>>>> add some >>>>> >>>> >> features to HTM Engine, especially prediction and user-defined >>>>> model >>>>> >>>> >> params. This is not a little job, but it would be great to >>>>> have a >>>>> >>>> >> scaling platform already built into the HTTP server. I would >>>>> be happy >>>>> >>>> >> even if we just started with an attempt to make HTM Engine >>>>> (and the >>>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud. >>>>> Even with >>>>> >>>> >> it's current capabilities, I could start using it immediately >>>>> and we >>>>> >>>> >> could add features over time. >>>>> >>>> >> >>>>> >>>> >> > Will you set up a repo in the community? :) >>>>> >>>> >> >>>>> >>>> >> Placeholder: https://github.com/nupic-community/htm-over-http >>>>> >>>> >> >>>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision is >>>>> to >>>>> >>>> >> decide which HTM implementation to use. I am leaning towards >>>>> HTM >>>>> >>>> >> Engine because it would take the smallest amount of effort to >>>>> do the >>>>> >>>> >> deployment configuration around it and get an MVP running the >>>>> fastest >>>>> >>>> >> (even if it doesn't to prediction or custom model params out >>>>> of the >>>>> >>>> >> box). >>>>> >>>> >> >>>>> >>>> >> IMO the best way to attack this is to get something minimal >>>>> running >>>>> >>>> >> ASAP and add features as required. >>>>> >>>> >> >>>>> >>>> >> [1] http://webpy.org/ >>>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http >>>>> >>>> >> --------- >>>>> >>>> >> Matt Taylor >>>>> >>>> >> OS Community Flag-Bearer >>>>> >>>> >> Numenta >>>>> >>>> >> >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > -- >>>>> >>>> > Jonathan Mackenzie >>>>> >>>> > BEng (Software) Hons >>>>> >>>> > PhD Candidate, Flinders University >>>>> >>>> >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> With kind regards, >>>>> >> >>>>> >> David Ray >>>>> >> Java Solutions Architect >>>>> >> >>>>> >> Cortical.io >>>>> >> Sponsor of: HTM.java >>>>> >> >>>>> >> [email protected] >>>>> >> http://cortical.io >>>>> >>>>> >>>> >>>> >>>> -- >>>> *With kind regards,* >>>> >>>> David Ray >>>> Java Solutions Architect >>>> >>>> *Cortical.io <http://cortical.io/>* >>>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>>> >>>> [email protected] >>>> http://cortical.io >>>> >>> >>> >>> >>> -- >>> >>> Fergal Byrne, Brenter IT @fergbyrne >>> >>> http://inbits.com - Better Living through Thoughtful Technology >>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >>> >>> Founder of Clortex: HTM in Clojure - >>> https://github.com/nupic-community/clortex >>> Co-creator @OccupyStartups Time-Bombed Open License >>> http://occupystartups.me >>> >>> Author, Real Machine Intelligence with Clortex and NuPIC >>> Read for free or buy the book at https://leanpub.com/realsmartmachines >>> >>> e:[email protected] t:+353 83 4214179 >>> Join the quest for Machine Intelligence at http://numenta.org >>> Formerly of Adnet [email protected] http://www.adnet.ie >>> >> >> >> >> -- >> *With kind regards,* >> >> David Ray >> Java Solutions Architect >> >> *Cortical.io <http://cortical.io/>* >> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >> >> [email protected] >> http://cortical.io >> > > > > -- > > Fergal Byrne, Brenter IT @fergbyrne > > http://inbits.com - Better Living through Thoughtful Technology > http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne > > Founder of Clortex: HTM in Clojure - > https://github.com/nupic-community/clortex > Co-creator @OccupyStartups Time-Bombed Open License > http://occupystartups.me > > Author, Real Machine Intelligence with Clortex and NuPIC > Read for free or buy the book at https://leanpub.com/realsmartmachines > > e:[email protected] t:+353 83 4214179 > Join the quest for Machine Intelligence at http://numenta.org > Formerly of Adnet [email protected] http://www.adnet.ie > -- *With kind regards,* David Ray Java Solutions Architect *Cortical.io <http://cortical.io/>* Sponsor of: HTM.java <https://github.com/numenta/htm.java> [email protected] http://cortical.io
