[nupic-discuss] Planning for the extraction of nupic-core

Matthew Taylor Fri, 24 Jan 2014 18:00:56 -0800

I asked Stewart to respond, and he gave me approval to post his response to
the list (he meant to do that anyway). My comments are inline below. If I
haven't commented on one of Stewart's bullet points, I agree with his point.


---------- Forwarded message ----------
> From: Stewart Mackenzie <[email protected]>
> Date: Fri, Jan 24, 2014 at 10:00 AM
> Subject: Re: Fwd: Planning for the extraction of nupic-core
> To: Matthew Taylor <[email protected]>
>
> Fergal as ever, is bang on the money, don't break the existing NuPIC.
> This is a hard fork, decisions should be made with a mind to a future of
> neocortical machine learning. Ideally we need to architect this so that it
> lives on beyond our lifetimes.
> GPL3 + C3 will allow this. Once ready Grok will be ported to this.
> Decisions are made independently of Grok. I echo Fergal. Grok must accord
> to the best decisions for nupic. It's easy to say Grok will put food on the
> table, but to put Grok first won't work.
>
> The decisions made now will have a great impact on the future of our
> society. Lets engineer this so the clean efficient biomimiced code can go
> _everywhere_.
>
> Some important (in my view) points:
> * no dependencies.
>

The current C++ has dependencies on boost and apr. If we break off
nupic-core as-is, the community is welcome to remove these dependencies. I
don't think it will be a part of the initial effort, but could be an
incremental change.


> * (strongly) consider using C for nupic-core (or nupic.core whatever the
> chosen name might be)
> (C keeps you close to the hardware, it prevents you from getting lost in
> hierarchies of inheritance. Side note, zeromq regretted not using C,
> Torsvalds of Linux thinks C++ is Satan's spawn)
> ** it'll be faster
>

This would take a complete rewrite of the the core C++, so this would need
to be a community effort. Also, I think it would be easier to do this if
the C++ was already extracted into it's own codebase, because we could
reuse a lot of infrastructure and tests.


> * ideally we don't create clients. (nupic-core <- python binding <-
> py-client)
> we just create a bindings (obviously in library form and in a separate
> repo from core), folks include the binding into their application and get
> neocortical machine learning. Don't bother shipping executable binaries.
> Grok is your only executable. The new API will either sink you (if badly
> designed) or turn you into machine learning superstars (if well designed)
>

Agreed.

There are currently some things within python that will need to be
translated into the core :
- temporal pooler
- cla model
- general encoders

If we simply split repos, we'd need to eventually rewrite these components
within the core.


> * it _has_ to be easy to create bindings in different languages. Absolute
> requirement.

* There should be a selected set of popular numenta backed bindings, the
> rest are community backed. (maybe have a naming convention so that the
> greater community don't get confused)

* document that new API, absolutely lather it on. Think Body Chocolate.
> * core will most likely be a static library. Though I'm noticing more
> modern day Linux distributions frown on the use of static libraries (Arch
> Linux + NixOS).
>

I believe that is currently is, so that shouldn't change. (Correct me if
I'm wrong, Subutai or Scott.)


>
> * encoders, regions and classifiers in the core repo.

** variations of encoders and regions all go into core. Try to find
> patterns and don't just explode the repo because the community found a cool
> gizmo encoder. Expand component variety using the creative extension
> principle, under guidance from neuroscientists.
>

I think a few generic encoders should be in the core repo (scalars,
vectors, etc.). But many of them are quite domain-specific. I don't think
we want these in the core. Those will either need to be provided by client
libraries, or else stored elsewhere outside core (perhaps a community pool
of encoders).


> ** testing framework from the get go, speak to J.Hawkins and more clued up
> community members to decide on most efficient testing mechanism. Francisco
> has excellent ideas here. Testing framework must be under the same license.
> ** the execution graph or network will be constructed in the host
> (application).
> ** the host will create loops to feed the encoders data, encoders
> dutifully pass outputs to whomever it is programmed to send it to.
>

Subutai, isn't this how it currently works?


> * SDR creation must have a prominent place in core. don't be surprised if
> people use core solely to get access to SDR creation.
>

We talked about having both a "low-level" and "high-level" API. This type
of thing could be included in that.


> * formalize naming. Ie is a region or CLA etc document and elaborate,
> we're communicating difficult terminology.
>

Another thing we've talked about recently. I think we all agree that some
names need to change.


> * each entity/component (encoder, region, classifier) can snap together
> like Lego bricks, they know how to talk to each other without some external
> entity. (maybe you'll need the language binding to know about it? Not sure
> yet, but really that's about it - no network manager etc)
> ** this means using bounded buffers in-between components.
> * The community must allocate more mindshare to nupic during this phase.
>

This is a big point. I've been happy with the amount of contributions we've
gotten from the community so far (thanks!), but to get what you all want
out of NuPIC, there will need to be a significant amount of work done by
community members to reach these goals.


>
> Kind regards
> Stewart


Thanks, Stewart. I truly think we can get what you want without a complete
rewrite. It should be a relatively easy task to extract the current C++
core (as-is) from the nupic repo, leaving all the python behind. Then we
can adjust our build to accommodate the changes. Once that is done, it will
be much simpler to refactor the C++ codebase, because:

- The API will be isolated
- I'm going to set up regression tests for the core no matter what happens
- We'll work on documenting the hell out of it
- We can ask the community to do some of work like removing dependencies

I do not think any of these goals are unreachable, no matter which route we
take.

---------
Matt Taylor
OS Community Flag-Bearer
Numenta

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

[nupic-discuss] Planning for the extraction of nupic-core

Reply via email to