Hi Oreste,

you are right, performance will be a central issue. There are a few
bottlenecks in the algorithm that can be attacked with hardware
acceleration. The best approach I can think of for now is to use
parallelization (some form of map-reduce) to solve this. OpenCL would be a
good choice to use in place of some of the C++ or Python code. The rest of
the Python code could be kept as-is to allow for easy experimentation for
optimization of parameters or changes to features that are not core CLA
algorithms.

There are many OpenCL drivers for GPUs and there is even a platform for
converting OpenCL code to FPGA hardware. Eventually the CLA will be ported
to some sort of digital/analog hybrid device that simulates
dendrite/synapse connection on neuromorphic silicone. This will not be far
off - maybe 5 years or less for early experiments, 10 years for cheap
commodity devices.

For now, most of us are trying to get results that are proof of concept
with the current code base, then we will figure out how to scale up and
optimize.

Another key to acceleration will be the sharing of trained networks that
have encapsulated many CPU hours of training on fundamental streams of
data, for example speech audio, that once trained will be shared or sold.
If this happens the building blocks of lower HTM regions could be leveraged
to get to the next level. We need to work towards some CLA network
serialization standards for this to happen.

I think you are correct in your assumptions, and if you want to contribute
to the effort to move to a more performant version of the code I would love
to see someone port some of the critial segments of the CLA code to OpenCL.
For an analysis of where the bottlenecks are in the CLA and hardware
solutions you can start by checking out this paper:
http://www.pdx.edu/sites/www.pdx.edu.sysc/files/SySc.Seminar.Hammestrom.May.2011.pdf

-Doug


On Tue, Aug 20, 2013 at 9:59 PM, oreste villa <[email protected]>wrote:

> Hello everybody, this is my first post on this list so please forgive me
> if this has already been addressed before.
>
> I have seen that the current NuPIC source code is mostly Phyton and I am
> wondering....
>
> I don't know about the problems people are trying to solve today (maybe
> for demand and response of power in a building this is not true) but in the
> future I believe performance is going to be a central issue. Python seems
> to be a non-optimal choice in this respect (as single threaded Java, single
> threaded C# or single threaded C++, or everything not parallel).
>
> I keep thinking for instance that the the Large Hadron Collider at CERN 
> produces
> something like 3 GByte <http://en.wikipedia.org/wiki/Megabyte>/s of raw
> data and it would be really nice if we were able to feed at full year of
> experiments in real time to a system based on the CLA. Also in robotic,
> performance and I/O bandwidth requirements for vision, sensing and motion
> control are impressive.
>
> The question/discussion point I wanted to make is, where does the project
> stand in terms of performance? More specifically, are there any plans to
> design high performance code inside NuPIC (openMP, CUDA, MPI)? Is this
> something much less emphasized because the focus of the project is more on
> learning the basic CLA principles?
>
> Thanks,
>
> Oreste
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to