On Tue, Feb 5, 2019, 5:23 PM Linas Vepstas <linasveps...@gmail.com wrote:

>
> if there were an experimental results section that told us
>> which ones were worth pursuing.
>
>
> There's this:
>
>
> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/connector-sets-revised.pdf
>
>
> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/learn-lang-diary.pdf
>

Yes, that is what I was looking for. I haven't read all of it but so far I
learned:

1. It is possible to learn parts of speech and a grammar from unlabeled
text.

2. It is possible to learn word boundaries in continuous speech by finding
boundaries with low mutual information. (I did similar experiments to
restore deleted spaces in text using only n-gram statistics. It is how
infants learn to segment speech at 7-10 months old, before learning any
words.)

3. Word pairs have a Zipf distribution just like single words. (I suspect
it is also true of word triples representing grammar rules. It suggests
there are around 10^8 rules).

I hope this work continues. It would be interesting if it advances the
state of the art on my large text benchmark or the Hutter prize.

-- Matt Mahoney, mattmahone...@gmail.com

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta6fce6a7b640886a-M2868c79e3d7fbcd113691a5b
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to