David, I'm sure we all are in the right track, the best we can do today is experiment with different solutions and share the experience.
The way I think that HTM.java was developed (let me know if I'm wrong) was trying to reproduce the theory and the code behind the python version without optimizations (which is really good approach to me). Now we know it works well we just need to tune and optimize it. *Related to my experience with Moclu* I was able to run the traffic example with Moclu using 2 computers and ~40GB ram in 40 minutes and the JVM never became unresponsive this way so Akka (the cluster) was able to work as expected. Using this https://github.com/antidata/htmengine-traffic-tutorial and this https://github.com/antidata/htm-moclu/releases/tag/v0.1.27 *ATAD* The experience was similar to the traffic example, I only replaced the Random Number Generator with a different one to gain ~600% of processing speed and less memory consumption (you can find it here https://github.com/antidata/htm.java.experiments) My conclusion is that this '*Models Cluster*' solution works but there is a bottleneck during models initialization time, once the models are live in memory it runs smoothly. *Some more technical details* Moclu uses CQRS and Event sourcing which I personally think that's a really good fit for HTM models, because you get for free the events history and the possibility to replay them (filtered or modified) for experimentation or audit. On 8 December 2015 at 06:32, cogmission (David Ray) < [email protected]> wrote: > I have talked with Francisco, the engineers at Cortical.io and some > community members about building a Spark model runner, so it looks like > that's the route to the future. I hate to duplicate effort made by > @antidata with moclu, but I think Cortical.io is biased toward Spark rather > than Akka - but it is still undecided. > > > > On Tue, Dec 8, 2015 at 6:13 AM, Fergal Byrne <[email protected]> > wrote: > >> Not at all. I'm only talking about this use case - appearing to run 500 >> non-trivial, independent models in one JVM. If you really do want to run >> hundreds of models, your time is worth more than the cost of deploying >> hundreds of JVM instances, and something like htm-moclu is going to save >> you all that time. In addition, you can deploy HTM.java instances to GAE >> and you can't deploy NuPIC there. Finally, your design is more flexible and >> industry-friendly, being based on the JVM. >> >> And yes, I'm referring to genuine arrays of primitive types like int[]. >> >> On Tue, Dec 8, 2015 at 12:03 PM, cogmission (David Ray) < >> [email protected]> wrote: >> >>> Hi Fergal, >>> >>> By "unboxed" are you referring to the term used in the Java world which >>> refers to primitive constructs? I just wanted to be clear what you mean. So >>> are you saying that HTM.java should be de-objectified? >>> >>> >>> On Tue, Dec 8, 2015 at 5:34 AM, Fergal Byrne < >>> [email protected]> wrote: >>> >>>> Hi David, >>>> >>>> No, I'm afraid this is an insurmountable problem if you use objects >>>> instead of unboxed arrays. NuPIC can do this by storing all the state in a >>>> set of C++ unboxed arrays, and it can persist that, free the memory, and >>>> load a different model. That's how HTM Engine (and Grok) can run lots of >>>> models. You can't do this in a single JVM if you have live refs to your >>>> Networks, and rebuilding a Network from disk is a big cost. >>>> >>>> The right way to do this is to have one or a very small number of >>>> Networks in each JVM, and manage them using something like htm-moclu. >>>> >>>> (By the way, this means you can't run NuPIC on GAE because they >>>> prohibit user-provided C/C++. Just Compute Engine or Container Engine). >>>> >>>> Regards >>>> >>>> Fergal >>>> >>>> On Tue, Dec 8, 2015 at 11:21 AM, cogmission (David Ray) < >>>> [email protected]> wrote: >>>> >>>>> Hi Fergal, >>>>> >>>>> It's not that big of a deal. I just haven't done a round of profiling >>>>> yet. Therefore there is lots of room for improvement in terms of memory >>>>> handling. There are lots of JVM applications running really data heavy >>>>> applications and the state of the art JVM GC is fully capable of handling >>>>> these loads. I did a preliminary profiling session back in January and >>>>> found some places where memory consumption could be optimized - meaning to >>>>> get back to it after the Network API was finished because I didn't want to >>>>> optimize things before I got to see some typical usage patterns. If GC >>>>> were >>>>> an inescapable problem you wouldn't have the tons of mission critical apps >>>>> in the financial industry that are running today. >>>>> >>>>> I am only one person, and I will get around to it. HTM.java is >>>>> technically a pre-release (alpha) version for this reason. >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Matt, >>>>>> >>>>>> As Stuart Holloway explains here [1], on the JVM, it's always GC. I >>>>>> can barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym >>>>>> hourly data - it slows to a crawl after 1000 rows because it's thrashing >>>>>> the GC trying to free space on the heap (setting -Xmx2800m as JVM params >>>>>> helps). Good luck trying to keep any more than one model per JVM up for >>>>>> any >>>>>> length of time. >>>>>> >>>>>> If you run htm-moclu in a single JVM, something somewhere will have >>>>>> live references leading to every Network you have loaded. So your live >>>>>> heap >>>>>> is going to be at least N models x heap per model. This is not a big >>>>>> problem until you start growing distal segments, which are Java objects >>>>>> on >>>>>> the heap. In HTM.java this happens in the TM, which grows as it learns. >>>>>> >>>>>> GC will detect an impending OOM condition, then will stop the world >>>>>> and mark all these live references, traversing your millions of objects. >>>>>> Finding nothing to free, the JVM will eventually fail at some >>>>>> unpredictable >>>>>> and unrelated point in the code. >>>>>> >>>>>> To check this, run this function every few rows: >>>>>> >>>>>> void mem() { >>>>>> int mb = 1024 * 1024; >>>>>> >>>>>> // get Runtime instance >>>>>> Runtime instance = Runtime.getRuntime(); >>>>>> >>>>>> //System.out.println("***** Heap utilization statistics [MB] >>>>>> *****\n"); >>>>>> >>>>>> // available memory >>>>>> System.out.println("Total: " + instance.totalMemory() / mb >>>>>> + "\tFree: " + instance.freeMemory() / mb >>>>>> + "\tUsed Memory: " + (instance.totalMemory() - >>>>>> instance.freeMemory()) / mb >>>>>> + "\tMax Memory: " + instance.maxMemory() / mb); >>>>>> } >>>>>> >>>>>> Regards, >>>>>> >>>>>> Fergal Byrne >>>>>> >>>>>> [1] https://youtu.be/FihU5JxmnBg?t=38m6s >>>>>> >>>>>> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey Matt, did you try ramping up from 1 model to see if it was a >>>>>>> capacity issue? I would be interested to see how the system responds as >>>>>>> an >>>>>>> increasing number of models are added. Anyway, I can't really comment on >>>>>>> moclu as I don't know what's happening there and I don't have time these >>>>>>> days to help investigate as I am stretched a bit thin at the moment. >>>>>>> >>>>>>> @antidata if you could explain what you mean by "renders the JVM >>>>>>> unresponsive" it would help me possibly attend to any issue there might >>>>>>> be >>>>>>> in the Network API though I never had any problems with >>>>>>> unresponsiveness at >>>>>>> all. Thanks... >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> >>>>>>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> David, BTW the failure in the video is a 4m: >>>>>>>> https://youtu.be/DnKxrd4TLT8?t=4m >>>>>>>> --------- >>>>>>>> Matt Taylor >>>>>>>> OS Community Flag-Bearer >>>>>>>> Numenta >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <[email protected]> >>>>>>>> wrote: >>>>>>>> > David and Mike, >>>>>>>> > >>>>>>>> > I've moved this to another topic to discuss. >>>>>>>> > >>>>>>>> > So what I tried with moclu was to take the HTM engine traffic app >>>>>>>> as shown here: >>>>>>>> > >>>>>>>> > >>>>>>>> https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg >>>>>>>> > >>>>>>>> > And I swapped out the entire green python box containing the HTM >>>>>>>> > Engine and replaced it with a local instance of moclu. When the >>>>>>>> > traffic app starts up, it creates 153 models immediately and then >>>>>>>> > starts pushing data into all of them at once: >>>>>>>> > >>>>>>>> > https://youtu.be/lzJd_a6y6-E?t=15m >>>>>>>> > >>>>>>>> > This caused dramatic failure in HTM Moclu, and I think that is >>>>>>>> what >>>>>>>> > Mike's talking about. I recorded it for Mike here: >>>>>>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8 >>>>>>>> > >>>>>>>> > I hope that explains some things. >>>>>>>> > >>>>>>>> > --------- >>>>>>>> > Matt Taylor >>>>>>>> > OS Community Flag-Bearer >>>>>>>> > Numenta >>>>>>>> > >>>>>>>> > >>>>>>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray) >>>>>>>> > <[email protected]> wrote: >>>>>>>> >>> the issue you faced is that it can't create hundreds of models >>>>>>>> at the >>>>>>>> >>> same time (like its done by the traffic example) because >>>>>>>> instantiate a >>>>>>>> >>> Network object from Htm.java is an expensive operation that >>>>>>>> turns the JVM >>>>>>>> >>> unresponsive. >>>>>>>> >> >>>>>>>> >> What is being implied here? Are you saying that instantiating >>>>>>>> HTM.java is >>>>>>>> >> anymore expensive than instantiating any other medium weight >>>>>>>> application? >>>>>>>> >> >>>>>>>> >> Cheers, >>>>>>>> >> David >>>>>>>> >> >>>>>>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta < >>>>>>>> [email protected]> wrote: >>>>>>>> >>> >>>>>>>> >>> Hello Matt, folks >>>>>>>> >>> >>>>>>>> >>> You can currently use Htm-MoClu in just one computer, the issue >>>>>>>> you faced >>>>>>>> >>> is that it can't create hundreds of models at the same time >>>>>>>> (like its done >>>>>>>> >>> by the traffic example) because instantiate a Network object >>>>>>>> from Htm.java >>>>>>>> >>> is an expensive operation that turns the JVM unresponsive. >>>>>>>> >>> >>>>>>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and >>>>>>>> the only >>>>>>>> >>> thing missing from your specs is: >>>>>>>> >>> >>>>>>>> >>> `allows POST of full model params` >>>>>>>> >>> >>>>>>>> >>> Will chat over Gitter to get more details on this. >>>>>>>> >>> >>>>>>>> >>> You can find an example of its usage in >>>>>>>> https://github.com/antidata/ATAD >>>>>>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates >>>>>>>> to the browser >>>>>>>> >>> in real time (similar to web sockets proposition) and saves the >>>>>>>> requests + >>>>>>>> >>> results into MongoDB so you can query both the data coming from >>>>>>>> outside and >>>>>>>> >>> the data generated from HTM (anomaly score + predictions). >>>>>>>> >>> One last comment is that Htm-Moclu is web framework agnostic, >>>>>>>> you can use >>>>>>>> >>> any web framework that works on the JVM. >>>>>>>> >>> >>>>>>>> >>> Feel free to ping me if any of you like to contribute to this >>>>>>>> project. >>>>>>>> >>> >>>>>>>> >>> Thanks! >>>>>>>> >>> >>>>>>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>> >>>>>>>> >>>> Ok folks, let's move discussion of the implementation to >>>>>>>> Github. First >>>>>>>> >>>> question to answer is which HTM implementation to use: >>>>>>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2 >>>>>>>> >>>> >>>>>>>> >>>> Anyone else reading this is free to jump in and help out, but >>>>>>>> I want >>>>>>>> >>>> to define our work properly using Github issues so we all know >>>>>>>> what is >>>>>>>> >>>> happening and who is working on what. >>>>>>>> >>>> --------- >>>>>>>> >>>> Matt Taylor >>>>>>>> >>>> OS Community Flag-Bearer >>>>>>>> >>>> Numenta >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie < >>>>>>>> [email protected]> >>>>>>>> >>>> wrote: >>>>>>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for >>>>>>>> getting an >>>>>>>> >>>> > web >>>>>>>> >>>> > app off the ground quickly in python I recommend pyramid: >>>>>>>> >>>> > http://www.pylonsproject.org/ >>>>>>>> >>>> > >>>>>>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>> >> >>>>>>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in >>>>>>>> this >>>>>>>> >>>> >> email. But first, who reading this would want to use an HTM >>>>>>>> over HTTP >>>>>>>> >>>> >> service like this? It means that you won't need to have HTM >>>>>>>> running on >>>>>>>> >>>> >> the same system that is generating the data. It's basically >>>>>>>> HTM in the >>>>>>>> >>>> >> Cloud. :) >>>>>>>> >>>> >> >>>>>>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis < >>>>>>>> [email protected]> >>>>>>>> >>>> >> wrote: >>>>>>>> >>>> >> > I'm interested in HTTP GET, inspecting models. >>>>>>>> >>>> >> >>>>>>>> >>>> >> Great feature to add after a minimum viable product has >>>>>>>> been created, >>>>>>>> >>>> >> but this adds the complexity of either caching or >>>>>>>> persistence >>>>>>>> >>>> >> (depending on how much history you want). >>>>>>>> >>>> >> >>>>>>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray) >>>>>>>> >>>> >> <[email protected]> wrote: >>>>>>>> >>>> >> > One thing I am concerned about is the call/answer nature >>>>>>>> of the >>>>>>>> >>>> >> > interface >>>>>>>> >>>> >> > you describe because of the latency involved in a >>>>>>>> >>>> >> > submit-one-row-per-call >>>>>>>> >>>> >> > methodology? Should it not be able to "batch" process >>>>>>>> rows of data >>>>>>>> >>>> >> > instead? >>>>>>>> >>>> >> > (batches could contain one row if you were dedicated to >>>>>>>> being a >>>>>>>> >>>> >> > masochist)? >>>>>>>> >>>> >> >>>>>>>> >>>> >> Yes, we will eventually need that, but I don't need it in >>>>>>>> the >>>>>>>> >>>> >> prototype. Let's focus on one row at a time and expand to >>>>>>>> batching >>>>>>>> >>>> >> later. >>>>>>>> >>>> >> >>>>>>>> >>>> >> > Next, at Cortical we use a technology called DropWizard >>>>>>>> which makes >>>>>>>> >>>> >> > it >>>>>>>> >>>> >> > very >>>>>>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries >>>>>>>> (I have >>>>>>>> >>>> >> > done >>>>>>>> >>>> >> > this >>>>>>>> >>>> >> > for Twitter processing involving HTM.java). >>>>>>>> >>>> >> >>>>>>>> >>>> >> If this is going to use NuPIC and python, I have found that >>>>>>>> it's super >>>>>>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for >>>>>>>> writing a class >>>>>>>> >>>> >> and a few functions. For REST on the JVM, I am open for >>>>>>>> suggestions. >>>>>>>> >>>> >> >>>>>>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger >>>>>>>> >>>> >> <[email protected]> wrote: >>>>>>>> >>>> >> > Like a extended version of HTM engine? >>>>>>>> >>>> >> > This would be the solution to the htmengine prediction >>>>>>>> issue :) >>>>>>>> >>>> >> >>>>>>>> >>>> >> If we chose the HTM Engine option, then yes we would need >>>>>>>> to add some >>>>>>>> >>>> >> features to HTM Engine, especially prediction and >>>>>>>> user-defined model >>>>>>>> >>>> >> params. This is not a little job, but it would be great to >>>>>>>> have a >>>>>>>> >>>> >> scaling platform already built into the HTTP server. I >>>>>>>> would be happy >>>>>>>> >>>> >> even if we just started with an attempt to make HTM Engine >>>>>>>> (and the >>>>>>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud. >>>>>>>> Even with >>>>>>>> >>>> >> it's current capabilities, I could start using it >>>>>>>> immediately and we >>>>>>>> >>>> >> could add features over time. >>>>>>>> >>>> >> >>>>>>>> >>>> >> > Will you set up a repo in the community? :) >>>>>>>> >>>> >> >>>>>>>> >>>> >> Placeholder: >>>>>>>> https://github.com/nupic-community/htm-over-http >>>>>>>> >>>> >> >>>>>>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision >>>>>>>> is to >>>>>>>> >>>> >> decide which HTM implementation to use. I am leaning >>>>>>>> towards HTM >>>>>>>> >>>> >> Engine because it would take the smallest amount of effort >>>>>>>> to do the >>>>>>>> >>>> >> deployment configuration around it and get an MVP running >>>>>>>> the fastest >>>>>>>> >>>> >> (even if it doesn't to prediction or custom model params >>>>>>>> out of the >>>>>>>> >>>> >> box). >>>>>>>> >>>> >> >>>>>>>> >>>> >> IMO the best way to attack this is to get something minimal >>>>>>>> running >>>>>>>> >>>> >> ASAP and add features as required. >>>>>>>> >>>> >> >>>>>>>> >>>> >> [1] http://webpy.org/ >>>>>>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http >>>>>>>> >>>> >> --------- >>>>>>>> >>>> >> Matt Taylor >>>>>>>> >>>> >> OS Community Flag-Bearer >>>>>>>> >>>> >> Numenta >>>>>>>> >>>> >> >>>>>>>> >>>> > >>>>>>>> >>>> > >>>>>>>> >>>> > >>>>>>>> >>>> > -- >>>>>>>> >>>> > Jonathan Mackenzie >>>>>>>> >>>> > BEng (Software) Hons >>>>>>>> >>>> > PhD Candidate, Flinders University >>>>>>>> >>>> >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> -- >>>>>>>> >> With kind regards, >>>>>>>> >> >>>>>>>> >> David Ray >>>>>>>> >> Java Solutions Architect >>>>>>>> >> >>>>>>>> >> Cortical.io >>>>>>>> >> Sponsor of: HTM.java >>>>>>>> >> >>>>>>>> >> [email protected] >>>>>>>> >> http://cortical.io >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *With kind regards,* >>>>>>> >>>>>>> David Ray >>>>>>> Java Solutions Architect >>>>>>> >>>>>>> *Cortical.io <http://cortical.io/>* >>>>>>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>>>>>> >>>>>>> [email protected] >>>>>>> http://cortical.io >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Fergal Byrne, Brenter IT @fergbyrne >>>>>> >>>>>> http://inbits.com - Better Living through Thoughtful Technology >>>>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >>>>>> >>>>>> Founder of Clortex: HTM in Clojure - >>>>>> https://github.com/nupic-community/clortex >>>>>> Co-creator @OccupyStartups Time-Bombed Open License >>>>>> http://occupystartups.me >>>>>> >>>>>> Author, Real Machine Intelligence with Clortex and NuPIC >>>>>> Read for free or buy the book at >>>>>> https://leanpub.com/realsmartmachines >>>>>> >>>>>> e:[email protected] t:+353 83 4214179 >>>>>> Join the quest for Machine Intelligence at http://numenta.org >>>>>> Formerly of Adnet [email protected] http://www.adnet.ie >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *With kind regards,* >>>>> >>>>> David Ray >>>>> Java Solutions Architect >>>>> >>>>> *Cortical.io <http://cortical.io/>* >>>>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>>>> >>>>> [email protected] >>>>> http://cortical.io >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Fergal Byrne, Brenter IT @fergbyrne >>>> >>>> http://inbits.com - Better Living through Thoughtful Technology >>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >>>> >>>> Founder of Clortex: HTM in Clojure - >>>> https://github.com/nupic-community/clortex >>>> Co-creator @OccupyStartups Time-Bombed Open License >>>> http://occupystartups.me >>>> >>>> Author, Real Machine Intelligence with Clortex and NuPIC >>>> Read for free or buy the book at https://leanpub.com/realsmartmachines >>>> >>>> e:[email protected] t:+353 83 4214179 >>>> Join the quest for Machine Intelligence at http://numenta.org >>>> Formerly of Adnet [email protected] http://www.adnet.ie >>>> >>> >>> >>> >>> -- >>> *With kind regards,* >>> >>> David Ray >>> Java Solutions Architect >>> >>> *Cortical.io <http://cortical.io/>* >>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>> >>> [email protected] >>> http://cortical.io >>> >> >> >> >> -- >> >> Fergal Byrne, Brenter IT @fergbyrne >> >> http://inbits.com - Better Living through Thoughtful Technology >> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >> >> Founder of Clortex: HTM in Clojure - >> https://github.com/nupic-community/clortex >> Co-creator @OccupyStartups Time-Bombed Open License >> http://occupystartups.me >> >> Author, Real Machine Intelligence with Clortex and NuPIC >> Read for free or buy the book at https://leanpub.com/realsmartmachines >> >> e:[email protected] t:+353 83 4214179 >> Join the quest for Machine Intelligence at http://numenta.org >> Formerly of Adnet [email protected] http://www.adnet.ie >> > > > > -- > *With kind regards,* > > David Ray > Java Solutions Architect > > *Cortical.io <http://cortical.io/>* > Sponsor of: HTM.java <https://github.com/numenta/htm.java> > > [email protected] > http://cortical.io >
