Good to hear that. I've merged in your story-teller feature, a great way to make this project more fun, interactive, and educational. Thanks for the pull request! Playing with it now :)
On Sat, Nov 16, 2013 at 9:03 AM, Marek Otahal <[email protected]> wrote: > Hi Chetan, > > > On Fri, Nov 15, 2013 at 11:00 PM, Chetan Surpur <[email protected]>wrote: > >> Wonderful! That's a great idea. This would be making the "naive" >> assumption that sentences are independent of each other, but that's >> probably a good simplification to make at this point. >> > > Yes, there's a trade off. I made a story-telling :) application, based on > your Linguist project. W/o the resets and for bigger datasets, the learning > wouldn't cut it and only sometimes output looked like a proper English > word. > > I made a list of sentence terminators ['.','!','?',':'] and reset() after > seeing one of these. With this simplification, the prediction probabilities > are much higher, and predictions look like english sentences. > > I have one problem though [*]. > >> >> By the way, I'm sorry the Linguist code isn't very clean, it was mostly >> just a tiny experiment to see what would happen. I didn't think people >> would still be using it :) If there's enough interest, I would be willing >> to help design a proper language prediction framework, so we can experiment >> more quickly and confidently. >> > > The code is just fine for my needs! Im happy it's low level enough to call > model directly and allow me to play more. > As the hackathlon showed, the NLP Platform would be very interesting > project for the future experiments. We could base it off on your linguist > and Matt's repos for hackathlon (CEPT) and Subutai's application! > >> >> In fact, I just had the following idea: instead of converting individual >> letters using CategoryEncoder, and predicting the next letters, what if we >> converted entire words using something like "CEPTWordEncoder" (that >> transparently used the CEPT API), and tried to predict the next few words? >> I bet we would get really cool and possibly useful results. It wouldn't be >> able to do correction / prediction at the sub-word level, but it might be >> great for sub-sentence prediction. (Also, we might want to bypass the >> spatial pooler for this experiment.) >> >> What do you think? >> > > I was thinking this too! Definitely WordEncoder would rock and be cool for > people, actually I'm surprised this hasn't been offered upstream yet, as > these things had to be dealt with for the NLP-Hack, so we could just cut > out these pieces. > > Generally I think encoding whole words would make more sence! and find > wider use-case. For my work, I'm happy to have the letters as basic stones. > Inspired by Jeff Hinton's deep NN for text predictions, which ate whole > wikipedia and then produced signs of grammar! (See NeuralNets course on > coursera.org for details, it's been talked up on the ML as well > recently). > > My app works similarily, you give it a start of the sentence, and it > continues. The grammar it picked up was eg, "verb after a subject in > singular ends with 's' ". And also some (impressive!) knowledge that > "John"==singular. > > So you gave it "John li" ...and it followed with .."keS" ! > > To make these observations, it's important to use 'chars' directly. > > The other advantage for them was there's only some "+-32" characters in > alphabet, unlike 3000+ words of avg active vocabulary. This second > assumption isn't advantage for Nupic, but still. > > > > > > -- > Marek Otahal :o) > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
