--- Linas Vepstas <[EMAIL PROTECTED]> wrote:

> On Fri, Nov 02, 2007 at 12:56:14PM -0700, Matt Mahoney wrote:
> > --- Jiri Jelinek <[EMAIL PROTECTED]> wrote:
> > > On Oct 31, 2007 8:53 PM, Matt Mahoney <[EMAIL PROTECTED]> wrote:
> > > > Natural language is a fundamental part of the knowledge
> > > base, not something you can add on later.
> > > 
> > > I disagree. You can start with a KB that contains concepts retrieved
> > > from a well structured non-NL input format only, get the thinking
> > > algorithms working and then (possibly much later) let the system to
> > > focus on NL analysis/understanding or build some
> > > NL-to-the_structured_format translation tools.
> > 
> > Well, good luck with that.  Are you aware of how many thousands of times
> this
> > approach has been tried?  You are wading into a swamp.  Progress will be
> rapid
> > at first.
> 
> Yes, and in the first email I wrote, that started this thread, I stated,
> more or less: "yes, I am aware that many have tried, and that its a
> swamp, and can anyone elucidate why?"  And, so far, no one as been able
> to answer that question, even as they firmly assert that surely it is a
> swamp. Nor has anyone attempted to posit any mechanisms that avoid that
> swamp, other than thought bubbles that state things like "starting from
> a clean slate, my system will be magic".

Actually my research is trying to answer this question.  In 1999 I looked at
language model size vs. compression and found data consistent with Turing's
and Landauer's estimates of 10^9 bits.  This is also about the compressed size
of the Cyc database.  http://cs.fit.edu/~mmahoney/dissertation/

But then I started looking at CPU and memory requirements, which turn out to
be much larger.  Why does the human brain need 10^15 synapses?  When you plot
text compression ratio on the speed-memory surface, it is still very steep,
especially on the memory axis. 
http://cs.fit.edu/~mmahoney/compression/text.html

Unfortunately the data is still far from clear.  The best programs still model
semantics crudely and grammar not at all.  From the data I would guess that an
ungrounded language model could run on a 1000 CPU cluster, plus or minus a
couple orders of magnitude.  The fact that Google hasn't solved the problem
with a 10^6 cluster does not make me hopeful.


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=60626587-43dede

Reply via email to