--- Linas Vepstas <[EMAIL PROTECTED]> wrote: > On Fri, Nov 02, 2007 at 12:56:14PM -0700, Matt Mahoney wrote: > > --- Jiri Jelinek <[EMAIL PROTECTED]> wrote: > > > On Oct 31, 2007 8:53 PM, Matt Mahoney <[EMAIL PROTECTED]> wrote: > > > > Natural language is a fundamental part of the knowledge > > > base, not something you can add on later. > > > > > > I disagree. You can start with a KB that contains concepts retrieved > > > from a well structured non-NL input format only, get the thinking > > > algorithms working and then (possibly much later) let the system to > > > focus on NL analysis/understanding or build some > > > NL-to-the_structured_format translation tools. > > > > Well, good luck with that. Are you aware of how many thousands of times > this > > approach has been tried? You are wading into a swamp. Progress will be > rapid > > at first. > > Yes, and in the first email I wrote, that started this thread, I stated, > more or less: "yes, I am aware that many have tried, and that its a > swamp, and can anyone elucidate why?" And, so far, no one as been able > to answer that question, even as they firmly assert that surely it is a > swamp. Nor has anyone attempted to posit any mechanisms that avoid that > swamp, other than thought bubbles that state things like "starting from > a clean slate, my system will be magic".
Actually my research is trying to answer this question. In 1999 I looked at language model size vs. compression and found data consistent with Turing's and Landauer's estimates of 10^9 bits. This is also about the compressed size of the Cyc database. http://cs.fit.edu/~mmahoney/dissertation/ But then I started looking at CPU and memory requirements, which turn out to be much larger. Why does the human brain need 10^15 synapses? When you plot text compression ratio on the speed-memory surface, it is still very steep, especially on the memory axis. http://cs.fit.edu/~mmahoney/compression/text.html Unfortunately the data is still far from clear. The best programs still model semantics crudely and grammar not at all. From the data I would guess that an ungrounded language model could run on a 1000 CPU cluster, plus or minus a couple orders of magnitude. The fact that Google hasn't solved the problem with a 10^6 cluster does not make me hopeful. -- Matt Mahoney, [EMAIL PROTECTED] ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=60626587-43dede