On Thu, Jul 18, 2019, 9:40 PM Costi Dumitrescu <costi.dumitre...@gmx.com> wrote:
> Write input text - remove spaces in the input text - compress - send - > decompress - AI - output text including spaces. > In 2000 I found that you could find most of the word boundaries in text without spaces simply by finding the high entropy boundaries using n-gram statistics. https://cs.fit.edu/~mmahoney/dissertation/lex1.html So, yes you could do this and encode just the locations where the model makes errors. But I was more interested in testing language models that simulate language learning in children. In particular, babies can identify word boundaries in speech at 7-10 months, which is before they learn any words. Children also learn semantics before grammar, which is the reverse of rule based language models. I wanted to show that language is structured in a way that makes it possible to learn it completely unsupervised. Using a deep neural network, the layers are trained one at a time in the order that children learn. And now we now have neural language models that compress to one bit per character, within the uncertainty bounds of Shannon's 1950 estimate of written English according to human prediction tests. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tc1fd5fc7fae0a6a9-Mb9b990dee4c827dce3deba0e Delivery options: https://agi.topicbox.com/groups/agi/subscription