Hi Matt, I'm sure everyone echoes every word of those sentiments - it because of our respect for one another that we're all interested in keeping everyone together on this.
I have a suggestion (really a phasing) which appears to match both main camps' arguments: Step 1. Leave nupic repo as is, except for adding a suite of the newly released regression tests. Step 2. Perform the Grok side of the fork to nupic.core and nupic.py Step 3. Identify core API, document, build more tests. Step 4. Roadmap for Grok fork (incremental refactoring plan, C++ implementations etc) Step 5. Plan for Free Fork (overall goals, plan for 2014, UAT specs, etc) ... everyone work on Grok fork until... Step 6. Perform Orderly Free Fork to nupic2.core and nupic2.py If we do it this way, we can get far enough down the road on the Grok fork to identify exactly why a free fork is needed and what is involved. Anything which can be done in the Grok fork should be done there first (the free fork will inherit it), and Grok can say for example that change X would need to wait for the free fork. If we can all agree that "we will do a free fork inside the project, sometime soon" then we free fork people will help you get the whole system as ready as possible for that (by doing everything that can be done with Grok's tests passing), and everyone wins. Regards Fergal Byrne On Mon, Jan 27, 2014 at 1:21 AM, Matthew Taylor <[email protected]> wrote: > Comments inline. > > On Sat, Jan 25, 2014 at 1:43 PM, Fergal Byrne <[email protected] > > wrote: > >> The problem with the "incremental" approach is that NuPIC will need to be >> destroyed before it can be rebuilt anew. >> > > How can anyone know that the core must be destroyed before anyone has > actually used it? I haven't heard of anyone in the community who's event > attempted to use the C++ API directly. Before we talk about the phoenix > rising from the ashes, we should all have a chance to extract the existing > phoenix from NuPIC and take a good look at it before making this decision. > > You and Stewart have brought up a lot of good requirements for the core > API, and many of them are already fulfilled (or easily fulfilled with a > little elbow grease) in the current API. I would really like the chance to > make this extraction as-is so the community can have a better understanding > of what currently exists. Then we can talk about ways in which it is > deficient. > > >> Subutai, the problem is not with the Network API or any other apparently >> large scale API. The problem is that the implementation of big chunks of >> the CLA and HTM are inside monolithic classes which have execution threads >> extending over thousands of lines of code (2500 lines of python in the case >> of the main SP class). These huge classes have everything a monolith needs, >> opaque mutable data structures, a dozen methods reading and writing various >> subsets of them, intermediate data being created on line 280* (optionally >> recreated when unpickling), mutated interdependently in various ways on >> lines 390, 670, 220, 1800, etc, and then outputted differently depending on >> which method is called where. >> *Line numbers are made up and do not correspond to actual code. >> > > I would be happy to have conversations like this about the real code > within the C++ core. It will much easier to do this with after the > extraction and documentation. At that point, refactoring will be very much > welcome, especially after the regression test suite is open source and > running against the core. > > >> NuPIC does (most of) the algorithms outlined in Jeff's theories, but it >> is not an artefact which is the result of any single process of design. You >> cannot refactor an implementation with this much coupling incrementally, >> and you certainly cannot do it if you have a commercial product reading >> your bleeding edge. >> > > What coupling are you referring to? Coupling of the NuPIC core with Grok? > As far as I can tell, the C++ core is extremely generic. There is > absolutely nothing Grok-specific within the C++. Same with the algorithms > in python and the python language bindings, those are entirely generic. > There may be some Grok-isms in the OPF or other python helpers, but those > won't be a part of the core. And as part of the extraction plan, we'll need > to move over some of the python code into the core, which will also be > generic. > > >> As someone else said, the free fork will happen anyway, and you only have >> the choice of remaining the leader or becoming a bystander. >> > > What I said (or perhaps meant to say) is that if we don't accomplish this > core extraction in a fashion that satisfies the community, someone else is > going to fork it and do what they wish with it. I don't want the "hardfork" > decision to be made until we've done the initial extraction of the core, > after which the API and underlying implementation will be more visible and > scrutable. > > >> I don't know what happens when an Open Source project hard forks out when >> Jaff has 58 patents covering the very basics, that just frightens me. >> > > As long as the fork remains GPLv3, it is safe. (Also, Jeff doesn't have > nearly that many patterns on core HTM/CLA technology.) > > >> I asked Matt yesterday had he heard of mSQL. Well, mSQL was the first >> OSS database on Linux that worked, I used it for production work in 1995. >> The creator got a community going, and after a while they implored him to >> allow them to rewrite it from scratch. He refused, clung on, and they >> forked. They took all his good stuff and built a database, and they named >> it (so the urban myth goes) to acknowledge the "dog in the manger" who >> refused to let them fork internally. The fork was reasonably successful, >> they called it MySQL. >> > > I did a bit of research on this topic after our chat and found some text > about it in the book MySQL & > mSQL<http://docstore.mik.ua/orelly/linux/sql/ch01_04.htm> > : > > *Widenius contacted David Hughes -- the author of mSQL -- to see if Hughes > would be interested in connecting mSQL to UNIREG's B+ ISAM handler. Hughes > was already well on his way to mSQL 2, however, and already had his > indexing infrastructure in place. TcX decided to create a database server > that was more compatible with its requirements.* > > Monty Widenius (of TcX) didn't want a complete rewrite, he wanted to take > the indexing infrastructure in a different direction. Hughes was already > committed to the current direction of mSQL 2 and wouldn't acquiesce. There > is a big difference between that situation and this one: No one is asking > for any specific changes in the existing API, I'm guessing because no one > has attempted to use it. This is expected because the primary interface for > NuPIC today is Python. > > How can we decide to rewrite a piece of software before anyone is actually > using it's interface? Shouldn't we first make it accessible to others to > attempt to use before hitting the reset button? > > So I'll say: if you want an incremental approach, then nupic (or a minor >> refactoring fork) is for you. Just do that. if you want to rebuild NuPIC >> then a free fork is required, and I would so much prefer that be an >> internal fork that any other approach is just incomprehensible to me. >> > > It may come to a free fork in the future, but I don't think now is the > time to be executing on it. Let us first get a chance to extract the > existing core and allow others to use it. If the community is still upset > about the API, or the performance, or the whatever, we should first talk > about how it can be fixed. If it's unable to be fixed because Grok is > impeding on the necessary work, then we should talk about a hard fork. > > --- > > In closing, I want to say that I hold no hard feelings whatsoever towards > anyone involved in this conversation. Discussions like this are essential > on FOSS projects. We all have strong opinions, which is good. I respect > everyone who has voiced theirs, and I thank you all for your participation > in this project. > > No matter how this issue turns out, or what state NuPIC is in a year from > now, I am dedicated to the primary idea of advancing machine intelligence > based on CLA/HTM. > > --------- > Matt Taylor > OS Community Flag-Bearer > Numenta > > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Fergal Byrne, Brenter IT <http://www.examsupport.ie>http://inbits.com - Better Living through Thoughtful Technology e:[email protected] t:+353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
