On Tue, Feb 5, 2019, 5:23 PM Linas Vepstas <linasveps...@gmail.com wrote:
> > if there were an experimental results section that told us >> which ones were worth pursuing. > > > There's this: > > > https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/connector-sets-revised.pdf > > > https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/learn-lang-diary.pdf > Yes, that is what I was looking for. I haven't read all of it but so far I learned: 1. It is possible to learn parts of speech and a grammar from unlabeled text. 2. It is possible to learn word boundaries in continuous speech by finding boundaries with low mutual information. (I did similar experiments to restore deleted spaces in text using only n-gram statistics. It is how infants learn to segment speech at 7-10 months old, before learning any words.) 3. Word pairs have a Zipf distribution just like single words. (I suspect it is also true of word triples representing grammar rules. It suggests there are around 10^8 rules). I hope this work continues. It would be interesting if it advances the state of the art on my large text benchmark or the Hutter prize. -- Matt Mahoney, mattmahone...@gmail.com ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M2868c79e3d7fbcd113691a5b Delivery options: https://agi.topicbox.com/groups/agi/subscription