Re: [nupic] Steps toward a distributed NuPic

Doug King Fri, 06 Feb 2015 12:04:26 -0800

I've been circling around the ideas of scale up/out of HTMs since the first
CLA white paper was published, first looking for a way to optimize with GPU
and then looking at what happens when you have a common protocol for
sharing trained HTMs and interconnecting them across
process boundaries either in the local machine or across http.


Here briefly are my observations - also the NuPic archives are full of good
info on this.

Optimizing with a GPU is hard. The current algorithm does not fall into the
category of 'embarrassingly parallel' and for GPU acceleration there is a
lot of state that must be maintained in high-latency global memory. Not to
say there aren't parts of the algorithm that could be paralellized, but the
inner loops of the current algo where most of the time is spent don't allow
for GPU acceleration in my understanding. This may also apply to non-gpu
many-core solutions like Parllella or Intel Xeon Phi. I don't know for
sure, but I suspect the problem space is similar.

This leaves us with the alternative of running a single patch or region on
a core and passing messages either via in-memory with copies or direct
access to shared memory. For in-memory message passing you would be able to
pass data structures that are larger with more granular connections and
with low latencies you can do things like feeding signals back from higher
regions.  For out-of-process message passing you need copies and you get
latencies, so keeping the message size low is desirable, and passing fully
processed SDRs with some other global state seems like a good idea. Using
CapnProto for message passing and serialization might fulfill both
requirements in NuPic. If there is a common message passing and
serialization standard then implementation in your language of choice
allows for an ecosystem to grow. This is a model that scales two ways, in
memory with high bandwidth and potential for feedback to regions (motor
control), and across process with slower connections using SDRs or similar
data structures. Combine this with a standard serialization of the entire
state of the HTM and you can build an ecosystem of trained HTMs that scale
up and out.

In my dreams I see a future where trained HTMs are shared and sold, and a
flexible ecosystem arises. The real power of HTM or any machine
learning algorithm becomes apparent when scaled up. The real power is in
systems that have learned over time on huge amounts of data. Once learning
has been captured this way it can be shared/sold either as a serialized
blob of the entire model or as a live stream of SDRs. The Google self
driving cars share learning and each car now collectively possesses 40
years of driving experience. As they add more cars this will go
up exponentially to thousands/millions of years of collective experience in
short time. So the big question is how to architect NuPic so trained HTMs
can be leveraged into an ecosystem.

It is in our interest to architect this to be language agnostic but with
flexible open message passing and serialization standards. In my dreams I
also see neuromorphic hardware that implements these algorithms in a non
von Neumann architecture. This is being worked on by various institutions -
Brainscales HiCann, Sandia, IBM. If done correctly neuromorphic hardware
will leapfrog the current architectures in processing power per dollar for
a number of reasons, including this: you can have cheaper chip fab
costs because, like the human brain, loosing a few connections does not
matter. When neuromorphic hardware happens I would hope we would have
enough momentum as a standard, and value in trained networks, that whatever
protocols we have established for sharing and inter-operating might get
adopted so neuromorphic boards don't remain closed systems. In the interim
we would get the leverage of interconnected systems and a happy community
of hackers and entrepreneurs.

-Doug

On Fri, Feb 6, 2015 at 8:59 AM, Kevin Archie <[email protected]> wrote:

> Can’t believe I’m stumbling into a language war (carrying a flamethrower).
> I apologize and will attempt to bring this around to NuPIC in a clearly
> marked section below.
>
> Erlang/BEAM is an amazing environment for building distributed systems.
> Monitoring, debugging, distribution, interoperability, are all equal to or
> years ahead of JVM*. It’s this crazy hidden gem. I like Erlang the language
> a lot (less so Elixir, for not very good reasons) and BEAM the runtime even
> more and if I were building a startup where distribution and fault
> tolerance were key, I would absolutely write the system in Erlang** (and
> some C for performance-critical bits) with full confidence that I could
> build something amazing quickly by myself, and that if the business took
> off and I needed to hire 10 or 12 expert Erlangers to build it out, I could
> find them.
>
> That’s a much different problem from: I’m building an open-source
> system+community and I’m trying to attract people interested in the domain
> (biologically inspired intelligence) to come work on it in their free time.
> The intersection of {Erlang gurus} and {cortical computation nerds} is
> pretty darn small. (Back to the hypothetical startup: when we make it huge
> and Google buys us out, they’re going to rewrite everything in Java because
> they need to deploy and scale and maintain with an army of drones, and
> Costco carries economy packs of Java programmers. Me, I cash out before
> that point.)
>
> — DIRECT APPLICABILITY TO NUPIC FOLLOWS --
>
> Which I guess is me coming down on the side of writing everything
> performance-critical in C++ (ugh) and wrapping it up with Python (sigh)
> because you need very low entry barriers for recruiting. (What, that’s
> done? Hurrah!) By all means, port to Java or Elixir or whatever if you’re
> doing it for personal education, but if you want to change the world, focus
> on what's at hand rather than borrowing trouble from the future. Use NuPIC
> to solve cool problems, and those successes will feed and grow the
> community.
>
>   - Kevin
>
> * JVM probably wins on cloud deployment at enterprise scale once your
> application is bulletproof. Good luck getting it there.
>
> ** okay, I’d be awfully tempted by Cloud Haskell, but I’d wind up doing it
> in Erlang because the tools are much more mature.
>
> On Feb 6, 2015, at 9:42 AM, cogmission1 . <[email protected]>
> wrote:
>
> Your intuitions are correct about both the scalability and simplicity of
>> using Elixir/Erlang to do HTM. In my initial tests, I was able to spring up
>> 250k neuron processes in 20-25ms in Elixir on a laptop.
>
>
> How does Elixir/Erlang speak to the network issue? What about state
> monitoring across network nodes? What about debugging nodes on a network?
> What about ease of setup and distribution? What about interoperation on
> heterogenous architectures? We're talking about ease and maintainability
> here... The JVM's appropriateness for this task is as apparent as
> gravity... I don't see how this can be disputed? We want to take advantage
> of the wealth of development talent, and time-to-market ease of developing
> in Java - not introduce another level of obscurity?
>
> Other than that issue, the identification of what comprises a discrete
> computational unit I believe is agreed on... and Michael just raised a more
> poignant question as far as I'm concerned. Parallelism is an interesting
> topic. The nature of cortical inputs seems to involve parallel sensor input
> from multiple senses concurrently; however does a single sense (such as
> vision), have opportunities for parallelism? Do individual senses in
> general? Otherwise parallelism would have to be introduced algorithmically
> to break up processing of a single stream of sequential input into parallel
> tasks.
>
> Interesting discussion...
>
>
>
> On Fri, Feb 6, 2015 at 9:03 AM, Michael Klachko <[email protected]>
> wrote:
>
>> I think a better question is how can HTM computation be parallelized? For
>> example, can we map a whole region to a single GPU card? What functions in
>> a region can be isolated as kernels? What could be running as a (CUDA)
>> thread? How can these threads be partitioned into blocks and grids?
>>
>> On Fri, Feb 6, 2015 at 6:22 AM, Fergal Byrne <[email protected]
>> > wrote:
>>
>>> Hi Rich,
>>>
>>> Thanks for restarting this discussion.
>>>
>>> I started a project to reimplement HTM in Elixir in December 2013, but
>>> then switched to Clojure for mainly non-technical reasons. One of the
>>> "thought leaders" in the Elixir community is interested in being the
>>> lynchpin of the project to bring HTM to Elixir, so I'll be making some
>>> announcements on that in the near future (I have some code archaeology to
>>> perform first!).
>>>
>>> Your intuitions are correct about both the scalability and simplicity of
>>> using Elixir/Erlang to do HTM. In my initial tests, I was able to spring up
>>> 250k neuron processes in 20-25ms in Elixir on a laptop.
>>>
>>> On the general point of distributed HTM, Michael is correct to identify
>>> the granularity at which things can be split up, and Tim is on the money
>>> about the kernel of the issue - state (synapses, in particular). In a
>>> typical NuPIC-sized region, we have 2048 cols x 32 cells = 64k total cells,
>>> with in the neighbourhood of 1-300m synapses. The number of "messages"
>>> passed between these neurons (and their "state") is thus very large
>>> compared with the number of input and out messages between regions. It
>>> makes sense to have the processing for a contiguous "patch" of neurons such
>>> as this all contained within a single (OS level) process, and to have
>>> patches communicate using SDRs.
>>>
>>> Matt is correct when he describes the importance of this in the context
>>> of Temporal Pooling and hierarchy. With a multi-layer architecture for a
>>> single region, and a hierarchy of regions, we will very quickly hit the
>>> skids if we continue with a single-threaded, monolithic design for HTM. On
>>> the other hand, Matt is also correct that, once solved, we can take
>>> advantage of distributed processing to build HTM systems as large and
>>> powerful as we like.
>>>
>>> Within a patch, I think the jury is very much out on the performance of
>>> message-passing versus sparse vectors (as used in NuPIC). Due to sparseness
>>> both in space and time in real world data, it's not clear that
>>> message-passing (or some equivalent, functional reactive scheme) would not
>>> outperform the use of big sparse arrays.
>>>
>>> Regards,
>>>
>>> Fergal Byrne
>>>
>>> On Thu, Feb 5, 2015 at 7:07 PM, Rich Morin <[email protected]> wrote:
>>>
>>>> On Feb 5, 2015, at 05:18, Kevin Archie <[email protected]> wrote:
>>>> > https://github.com/nupic-community/comportex
>>>> >
>>>> > (I have no connection to the project, I’m just aware of it.)
>>>>
>>>> The Clojure ports are certainly worth a look, if only to see how they
>>>> decompose the problem.  Although scalability is a motivation, my real
>>>> interest has to do with seeing how Elixir (including Erlang and OTP)
>>>> can be used to simplify the model.  That is, can I model things like
>>>> neurons, columns, and regions using lightweight processes, leaving
>>>> the communication and management to OTP.
>>>>
>>>> -r
>>>>
>>>>  --
>>>> http://www.cfcl.com/rdm           Rich Morin           [email protected]
>>>> http://www.cfcl.com/rdm/resume    San Bruno, CA, USA   +1 650-873-7841
>>>>
>>>> Software system design, development, and documentation
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Fergal Byrne, Brenter IT
>>>
>>> http://inbits.com - Better Living through Thoughtful Technology
>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>>
>>> Founder of Clortex: HTM in Clojure -
>>> https://github.com/nupic-community/clortex
>>>
>>> Author, Real Machine Intelligence with Clortex and NuPIC
>>> Read for free or buy the book at https://leanpub.com/realsmartmachines
>>>
>>> Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
>>> http://euroclojure.com/2014/
>>> and at LambdaJam Chicago, July 2014: http://www.lambdajam.com
>>>
>>> e:[email protected] t:+353 83 4214179
>>> Join the quest for Machine Intelligence at http://numenta.org
>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>
>>
>>
>
>
> --
> *We find it hard to hear what another is saying because of how loudly "who
> one is", speaks...*
>
>
>

Re: [nupic] Steps toward a distributed NuPic

Reply via email to