On Thu, Jul 18, 2019, 9:40 PM Costi Dumitrescu <costi.dumitre...@gmx.com>
wrote:

> Write input text - remove spaces in the input text - compress - send -
> decompress - AI - output text including spaces.
>

In 2000 I found that you could find most of the word boundaries in text
without spaces simply by finding the high entropy boundaries using n-gram
statistics.

https://cs.fit.edu/~mmahoney/dissertation/lex1.html

So, yes you could do this and encode just the locations where the model
makes errors.

But I was more interested in testing language models that simulate language
learning in children. In particular, babies can identify word boundaries in
speech at 7-10 months, which is before they learn any words. Children also
learn semantics before grammar, which is the reverse of rule based language
models.

I wanted to show that language is structured in a way that makes it
possible to learn it completely unsupervised. Using a deep neural network,
the layers are trained one at a time in the order that children learn. And
now we now have neural language models that compress to one bit per
character, within the uncertainty bounds of Shannon's 1950 estimate of
written English according to human prediction tests.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc1fd5fc7fae0a6a9-Mb9b990dee4c827dce3deba0e
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to