Re: HTM.Java performance in HTM-Moclu

David Ray Wed, 09 Dec 2015 08:07:03 -0800

Fergal I don't think that's what he's talking about. The MappedByteBuffer (I 
believe) lets you read the file is if it's in memory even though it's on disk 
(it's an in-memory representation of a file?) and the file can contain anything 
and for HTM.java all we have to do is serialize the Connections object. This 
way we remove some of the latency from reading and writing to disk.


Sent from my iPhone

> On Dec 9, 2015, at 8:31 AM, Fergal Byrne <[email protected]> wrote:
> 
> Hi Sato,
> 
> Thanks for mentioning that. Yes, that is how you get high performance out of 
> Java - it's essentially replacing objects with arrays as used in NuPIC. You'd 
> have to redesign HTM.java to do that. I'm not sure that's a great idea - 
> you'd be unable to innovate the algorithms flexibly.
> 
> Regards
> 
> Fergal Byrne
> 
>> On Wed, Dec 9, 2015 at 12:50 AM, Takenori Sato <[email protected]> wrote:
>> Hi David and Fergal,
>> 
>> > NuPIC can do this by storing all the state in a set of C++ unboxed arrays, 
>> > and it can persist that, free the memory, and load a different model. 
>> > That's how HTM Engine (and Grok) can run lots of models. You can't do this 
>> > in a single JVM if you have live refs to your Networks, and rebuilding a 
>> > Network from disk is a big cost.
>> 
>> In Java,  is often used for this purpose. For example, Apache Cassandra in 
>> NoSQL, ElasticSearch/Lucene in Search Engine.
>> 
>> The principles behind the design is:
>> 
>> 1. less heap size for better throughput(depending on # of cores, but no more 
>> than 8GB)
>> 2. OS is the best to manage memory(through virtual memory management with 
>> file cache)
>> 
>> With MappedByteBuffer, you can get any portion of bytes in your own binary 
>> file to instantiate an object. You can let an object GCed soon after you 
>> complete your operation, but whose bytes are managed in file cache, freed 
>> and read in as needed.
>> 
>> Thus, you can achieve both of low Java heap memory usage and the best 
>> possible performance.
>> 
>> Thanks,
>> Sato
>> 
>>> On Tue, Dec 8, 2015 at 8:34 PM, Fergal Byrne <[email protected]> 
>>> wrote:
>>> Hi David,
>>> 
>>> No, I'm afraid this is an insurmountable problem if you use objects instead 
>>> of unboxed arrays. NuPIC can do this by storing all the state in a set of 
>>> C++ unboxed arrays, and it can persist that, free the memory, and load a 
>>> different model. That's how HTM Engine (and Grok) can run lots of models. 
>>> You can't do this in a single JVM if you have live refs to your Networks, 
>>> and rebuilding a Network from disk is a big cost.
>>> 
>>> The right way to do this is to have one or a very small number of Networks 
>>> in each JVM, and manage them using something like htm-moclu. 
>>> 
>>> (By the way, this means you can't run NuPIC on GAE because they prohibit 
>>> user-provided C/C++. Just Compute Engine or Container Engine).
>>> 
>>> Regards
>>> 
>>> Fergal
>>> 
>>>> On Tue, Dec 8, 2015 at 11:21 AM, cogmission (David Ray) 
>>>> <[email protected]> wrote:
>>>> Hi Fergal,
>>>> 
>>>> It's not that big of a deal. I just haven't done a round of profiling yet. 
>>>> Therefore there is lots of room for improvement in terms of memory 
>>>> handling. There are lots of JVM applications running really data heavy 
>>>> applications and the state of the art JVM GC is fully capable of handling 
>>>> these loads. I did a preliminary profiling session back in January and 
>>>> found some places where memory consumption could be optimized - meaning to 
>>>> get back to it after the Network API was finished because I didn't want to 
>>>> optimize things before I got to see some typical usage patterns. If GC 
>>>> were an inescapable problem you wouldn't have the tons of mission critical 
>>>> apps in the financial industry that are running today. 
>>>> 
>>>> I am only one person, and I will get around to it. HTM.java is technically 
>>>> a pre-release (alpha) version for this reason.
>>>> 
>>>> Cheers,
>>>> David
>>>> 
>>>>> On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne 
>>>>> <[email protected]> wrote:
>>>>> Hi Matt,
>>>>> 
>>>>> As Stuart Holloway explains here [1], on the JVM, it's always GC. I can 
>>>>> barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym 
>>>>> hourly data - it slows to a crawl after 1000 rows because it's thrashing 
>>>>> the GC trying to free space on the heap (setting -Xmx2800m as JVM params 
>>>>> helps). Good luck trying to keep any more than one model per JVM up for 
>>>>> any length of time.
>>>>> 
>>>>> If you run htm-moclu in a single JVM, something somewhere will have live 
>>>>> references leading to every Network you have loaded. So your live heap is 
>>>>> going to be at least N models x heap per model. This is not a big problem 
>>>>> until you start growing distal segments, which are Java objects on the 
>>>>> heap. In HTM.java this happens in the TM, which grows as it learns. 
>>>>> 
>>>>> GC will detect an impending OOM condition, then will stop the world and 
>>>>> mark all these live references, traversing your millions of objects. 
>>>>> Finding nothing to free, the JVM will eventually fail at some 
>>>>> unpredictable and unrelated point in the code. 
>>>>> 
>>>>> To check this, run this function every few rows:
>>>>> 
>>>>> void mem() {
>>>>>     int mb = 1024 * 1024;
>>>>> 
>>>>>     // get Runtime instance
>>>>>     Runtime instance = Runtime.getRuntime();
>>>>> 
>>>>>     //System.out.println("***** Heap utilization statistics [MB] 
>>>>> *****\n");
>>>>> 
>>>>>     // available memory
>>>>>     System.out.println("Total: " + instance.totalMemory() / mb
>>>>>             + "\tFree: " + instance.freeMemory() / mb
>>>>>             + "\tUsed Memory: " + (instance.totalMemory() - 
>>>>> instance.freeMemory()) / mb
>>>>>             + "\tMax Memory: " + instance.maxMemory() / mb);
>>>>> }
>>>>> Regards,
>>>>> 
>>>>> Fergal Byrne 
>>>>> 
>>>>> [1] https://youtu.be/FihU5JxmnBg?t=38m6s
>>>>> 
>>>>>> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) 
>>>>>> <[email protected]> wrote:
>>>>>> Hey Matt, did you try ramping up from 1 model to see if it was a 
>>>>>> capacity issue? I would be interested to see how the system responds as 
>>>>>> an increasing number of models are added. Anyway, I can't really comment 
>>>>>> on moclu as I don't know what's happening there and I don't have time 
>>>>>> these days to help investigate as I am stretched a bit thin at the 
>>>>>> moment.
>>>>>> 
>>>>>> @antidata if you could explain what you mean by "renders the JVM 
>>>>>> unresponsive" it would help me possibly attend to any issue there might 
>>>>>> be in the Network API though I never had any problems with 
>>>>>> unresponsiveness at all. Thanks...
>>>>>> 
>>>>>> Cheers,
>>>>>> David
>>>>>> 
>>>>>>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <[email protected]> wrote:
>>>>>>> David, BTW the failure in the video is a 4m: 
>>>>>>> https://youtu.be/DnKxrd4TLT8?t=4m
>>>>>>> ---------
>>>>>>> Matt Taylor
>>>>>>> OS Community Flag-Bearer
>>>>>>> Numenta
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <[email protected]> wrote:
>>>>>>> > David and Mike,
>>>>>>> >
>>>>>>> > I've moved this to another topic to discuss.
>>>>>>> >
>>>>>>> > So what I tried with moclu was to take the HTM engine traffic app as 
>>>>>>> > shown here:
>>>>>>> >
>>>>>>> > https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg
>>>>>>> >
>>>>>>> > And I swapped out the entire green python box containing the HTM
>>>>>>> > Engine and replaced it with a local instance of moclu. When the
>>>>>>> > traffic app starts up, it creates 153 models immediately and then
>>>>>>> > starts pushing data into all of them at once:
>>>>>>> >
>>>>>>> > https://youtu.be/lzJd_a6y6-E?t=15m
>>>>>>> >
>>>>>>> > This caused dramatic failure in HTM Moclu, and I think that is what
>>>>>>> > Mike's talking about. I recorded it for Mike here:
>>>>>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8
>>>>>>> >
>>>>>>> > I hope that explains some things.
>>>>>>> >
>>>>>>> > ---------
>>>>>>> > Matt Taylor
>>>>>>> > OS Community Flag-Bearer
>>>>>>> > Numenta
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray)
>>>>>>> > <[email protected]> wrote:
>>>>>>> >>>  the issue you faced is that it can't create hundreds of models at 
>>>>>>> >>> the
>>>>>>> >>> same time (like its done by the traffic example) because 
>>>>>>> >>> instantiate a
>>>>>>> >>> Network object from Htm.java is an expensive operation that turns 
>>>>>>> >>> the JVM
>>>>>>> >>> unresponsive.
>>>>>>> >>
>>>>>>> >> What is being implied here? Are you saying that instantiating 
>>>>>>> >> HTM.java is
>>>>>>> >> anymore expensive than instantiating any other medium weight 
>>>>>>> >> application?
>>>>>>> >>
>>>>>>> >> Cheers,
>>>>>>> >> David
>>>>>>> >>
>>>>>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta <[email protected]> 
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hello Matt, folks
>>>>>>> >>>
>>>>>>> >>> You can currently use Htm-MoClu in just one computer, the issue you 
>>>>>>> >>> faced
>>>>>>> >>> is that it can't create hundreds of models at the same time (like 
>>>>>>> >>> its done
>>>>>>> >>> by the traffic example) because instantiate a Network object from 
>>>>>>> >>> Htm.java
>>>>>>> >>> is an expensive operation that turns the JVM unresponsive.
>>>>>>> >>>
>>>>>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and the 
>>>>>>> >>> only
>>>>>>> >>> thing missing from your specs is:
>>>>>>> >>>
>>>>>>> >>> `allows POST of full model params`
>>>>>>> >>>
>>>>>>> >>> Will chat over Gitter to get more details on this.
>>>>>>> >>>
>>>>>>> >>> You can find an example of its usage in 
>>>>>>> >>> https://github.com/antidata/ATAD
>>>>>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates to 
>>>>>>> >>> the browser
>>>>>>> >>> in real time (similar to web sockets proposition) and saves the 
>>>>>>> >>> requests  +
>>>>>>> >>> results into MongoDB so you can query both the data coming from 
>>>>>>> >>> outside and
>>>>>>> >>> the data generated from HTM (anomaly score + predictions).
>>>>>>> >>> One last comment is that Htm-Moclu is web framework agnostic, you 
>>>>>>> >>> can use
>>>>>>> >>> any web framework that works on the JVM.
>>>>>>> >>>
>>>>>>> >>> Feel free to ping me if any of you like to contribute to this 
>>>>>>> >>> project.
>>>>>>> >>>
>>>>>>> >>> Thanks!
>>>>>>> >>>
>>>>>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <[email protected]> 
>>>>>>> >>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Ok folks, let's move discussion of the implementation to Github. 
>>>>>>> >>>> First
>>>>>>> >>>> question to answer is which HTM implementation to use:
>>>>>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2
>>>>>>> >>>>
>>>>>>> >>>> Anyone else reading this is free to jump in and help out, but I 
>>>>>>> >>>> want
>>>>>>> >>>> to define our work properly using Github issues so we all know 
>>>>>>> >>>> what is
>>>>>>> >>>> happening and who is working on what.
>>>>>>> >>>> ---------
>>>>>>> >>>> Matt Taylor
>>>>>>> >>>> OS Community Flag-Bearer
>>>>>>> >>>> Numenta
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie 
>>>>>>> >>>> <[email protected]>
>>>>>>> >>>> wrote:
>>>>>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for 
>>>>>>> >>>> > getting an
>>>>>>> >>>> > web
>>>>>>> >>>> > app off the ground quickly in python I recommend pyramid:
>>>>>>> >>>> > http://www.pylonsproject.org/
>>>>>>> >>>> >
>>>>>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor <[email protected]> 
>>>>>>> >>>> > wrote:
>>>>>>> >>>> >>
>>>>>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in this
>>>>>>> >>>> >> email. But first, who reading this would want to use an HTM 
>>>>>>> >>>> >> over HTTP
>>>>>>> >>>> >> service like this? It means that you won't need to have HTM 
>>>>>>> >>>> >> running on
>>>>>>> >>>> >> the same system that is generating the data. It's basically HTM 
>>>>>>> >>>> >> in the
>>>>>>> >>>> >> Cloud. :)
>>>>>>> >>>> >>
>>>>>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis 
>>>>>>> >>>> >> <[email protected]>
>>>>>>> >>>> >> wrote:
>>>>>>> >>>> >> > I'm interested in HTTP GET, inspecting models.
>>>>>>> >>>> >>
>>>>>>> >>>> >> Great feature to add after a minimum viable product has been 
>>>>>>> >>>> >> created,
>>>>>>> >>>> >> but this adds the complexity of either caching or persistence
>>>>>>> >>>> >> (depending on how much history you want).
>>>>>>> >>>> >>
>>>>>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray)
>>>>>>> >>>> >> <[email protected]> wrote:
>>>>>>> >>>> >> > One thing I am concerned about is the call/answer nature of 
>>>>>>> >>>> >> > the
>>>>>>> >>>> >> > interface
>>>>>>> >>>> >> > you describe because of the latency involved in a
>>>>>>> >>>> >> > submit-one-row-per-call
>>>>>>> >>>> >> > methodology? Should it not be able to "batch" process rows of 
>>>>>>> >>>> >> > data
>>>>>>> >>>> >> > instead?
>>>>>>> >>>> >> > (batches could contain one row if you were dedicated to being 
>>>>>>> >>>> >> > a
>>>>>>> >>>> >> > masochist)?
>>>>>>> >>>> >>
>>>>>>> >>>> >> Yes, we will eventually need that, but I don't need it in the
>>>>>>> >>>> >> prototype. Let's focus on one row at a time and expand to 
>>>>>>> >>>> >> batching
>>>>>>> >>>> >> later.
>>>>>>> >>>> >>
>>>>>>> >>>> >> > Next, at Cortical we use a technology called DropWizard which 
>>>>>>> >>>> >> > makes
>>>>>>> >>>> >> > it
>>>>>>> >>>> >> > very
>>>>>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries (I 
>>>>>>> >>>> >> > have
>>>>>>> >>>> >> > done
>>>>>>> >>>> >> > this
>>>>>>> >>>> >> > for Twitter processing involving HTM.java).
>>>>>>> >>>> >>
>>>>>>> >>>> >> If this is going to use NuPIC and python, I have found that 
>>>>>>> >>>> >> it's super
>>>>>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for writing 
>>>>>>> >>>> >> a class
>>>>>>> >>>> >> and a few functions. For REST on the JVM, I am open for 
>>>>>>> >>>> >> suggestions.
>>>>>>> >>>> >>
>>>>>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger
>>>>>>> >>>> >> <[email protected]> wrote:
>>>>>>> >>>> >> > Like a extended version of HTM engine?
>>>>>>> >>>> >> > This would be the solution to the htmengine prediction issue 
>>>>>>> >>>> >> > :)
>>>>>>> >>>> >>
>>>>>>> >>>> >> If we chose the HTM Engine option, then yes we would need to 
>>>>>>> >>>> >> add some
>>>>>>> >>>> >> features to HTM Engine, especially prediction and user-defined 
>>>>>>> >>>> >> model
>>>>>>> >>>> >> params. This is not a little job, but it would be great to have 
>>>>>>> >>>> >> a
>>>>>>> >>>> >> scaling platform already built into the HTTP server. I would be 
>>>>>>> >>>> >> happy
>>>>>>> >>>> >> even if we just started with an attempt to make HTM Engine (and 
>>>>>>> >>>> >> the
>>>>>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud. 
>>>>>>> >>>> >> Even with
>>>>>>> >>>> >> it's current capabilities, I could start using it immediately 
>>>>>>> >>>> >> and we
>>>>>>> >>>> >> could add features over time.
>>>>>>> >>>> >>
>>>>>>> >>>> >> > Will you set up a repo in the community? :)
>>>>>>> >>>> >>
>>>>>>> >>>> >> Placeholder: https://github.com/nupic-community/htm-over-http
>>>>>>> >>>> >>
>>>>>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision is 
>>>>>>> >>>> >> to
>>>>>>> >>>> >> decide which HTM implementation to use. I am leaning towards HTM
>>>>>>> >>>> >> Engine because it would take the smallest amount of effort to 
>>>>>>> >>>> >> do the
>>>>>>> >>>> >> deployment configuration around it and get an MVP running the 
>>>>>>> >>>> >> fastest
>>>>>>> >>>> >> (even if it doesn't to prediction or custom model params out of 
>>>>>>> >>>> >> the
>>>>>>> >>>> >> box).
>>>>>>> >>>> >>
>>>>>>> >>>> >> IMO the best way to attack this is to get something minimal 
>>>>>>> >>>> >> running
>>>>>>> >>>> >> ASAP and add features as required.
>>>>>>> >>>> >>
>>>>>>> >>>> >> [1] http://webpy.org/
>>>>>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http
>>>>>>> >>>> >> ---------
>>>>>>> >>>> >> Matt Taylor
>>>>>>> >>>> >> OS Community Flag-Bearer
>>>>>>> >>>> >> Numenta
>>>>>>> >>>> >>
>>>>>>> >>>> >
>>>>>>> >>>> >
>>>>>>> >>>> >
>>>>>>> >>>> > --
>>>>>>> >>>> > Jonathan Mackenzie
>>>>>>> >>>> > BEng (Software) Hons
>>>>>>> >>>> > PhD Candidate, Flinders University
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> With kind regards,
>>>>>>> >>
>>>>>>> >> David Ray
>>>>>>> >> Java Solutions Architect
>>>>>>> >>
>>>>>>> >> Cortical.io
>>>>>>> >> Sponsor of:  HTM.java
>>>>>>> >>
>>>>>>> >> [email protected]
>>>>>>> >> http://cortical.io
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> With kind regards,
>>>>>>  
>>>>>> David Ray
>>>>>> Java Solutions Architect
>>>>>>  
>>>>>> Cortical.io
>>>>>> Sponsor of:  HTM.java
>>>>>>  
>>>>>> [email protected]
>>>>>> http://cortical.io
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> Fergal Byrne, Brenter IT @fergbyrne
>>>>> 
>>>>> http://inbits.com - Better Living through Thoughtful Technology
>>>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>>>> 
>>>>> Founder of Clortex: HTM in Clojure - 
>>>>> https://github.com/nupic-community/clortex
>>>>> Co-creator @OccupyStartups Time-Bombed Open License 
>>>>> http://occupystartups.me
>>>>> 
>>>>> Author, Real Machine Intelligence with Clortex and NuPIC 
>>>>> Read for free or buy the book at https://leanpub.com/realsmartmachines
>>>>> 
>>>>> e:[email protected] t:+353 83 4214179
>>>>> Join the quest for Machine Intelligence at http://numenta.org
>>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> With kind regards,
>>>>  
>>>> David Ray
>>>> Java Solutions Architect
>>>>  
>>>> Cortical.io
>>>> Sponsor of:  HTM.java
>>>>  
>>>> [email protected]
>>>> http://cortical.io
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Fergal Byrne, Brenter IT @fergbyrne
>>> 
>>> http://inbits.com - Better Living through Thoughtful Technology
>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>> 
>>> Founder of Clortex: HTM in Clojure - 
>>> https://github.com/nupic-community/clortex
>>> Co-creator @OccupyStartups Time-Bombed Open License http://occupystartups.me
>>> 
>>> Author, Real Machine Intelligence with Clortex and NuPIC 
>>> Read for free or buy the book at https://leanpub.com/realsmartmachines
>>> 
>>> e:[email protected] t:+353 83 4214179
>>> Join the quest for Machine Intelligence at http://numenta.org
>>> Formerly of Adnet [email protected] http://www.adnet.ie
> 
> 
> 
> -- 
> 
> Fergal Byrne, Brenter IT @fergbyrne
> 
> http://inbits.com - Better Living through Thoughtful Technology
> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
> 
> Founder of Clortex: HTM in Clojure - 
> https://github.com/nupic-community/clortex
> Co-creator @OccupyStartups Time-Bombed Open License http://occupystartups.me
> 
> Author, Real Machine Intelligence with Clortex and NuPIC 
> Read for free or buy the book at https://leanpub.com/realsmartmachines
> 
> e:[email protected] t:+353 83 4214179
> Join the quest for Machine Intelligence at http://numenta.org
> Formerly of Adnet [email protected] http://www.adnet.ie

Re: HTM.Java performance in HTM-Moclu

Reply via email to