Straight parsing the web runs into nasty problems really quickly, mainly lack 
of trusted sources and bad grammar to parse.
  You have to restrict it to a much smaller set of data items, such as novels, 
news sources, or something like wikipedia...  These mediums all provide you 
with a good deal of structured knowledge that gives some good statistical 
results.
  Even the preposition problem can be boosted with this knowledge a great deal, 
and if we add in interaction with users, we can catch the tough cases as well.

James Ratcliff

Benjamin Goertzel <[EMAIL PROTECTED]> wrote: Mark Waser wrote:
>     To me, it seems much easier to have an automated system encode the
> facts/rules.

Indeed ... but obviously, no one has solved the NLP problem... there
are no semantic analysis systems out there that can map complex
sentences into logical relationships in an adequately reliable way...

For instance, the problem of preposition disambiguation is very badly
unsolved...

Current reference resolution algorithms are pretty unreliable too...

There are plenty of parsing/semantic-mapping systems that work well
for most simple sentences, but that doesn't really help in the
interpretation of the real text that's out there in the Web, in books,
in dictionaries and encyclopedias, etc.

Mark, how do you plan to solve [or work around] this problem?

A possibly viable approach is to simply parse the Web, and throw out
sentences that one's system can't interpret confidently enough.  The
hope is that there is enough information online in suitably simple
form, to fill up an artificial mind.  This is essentially PowerSet's
plan, as I understand it: to parse the Web and map it into logic, but
using a parser that can only handle relatively simple syntactic
constructs reliably.  The redundancy of the Web is being relied upon.

This may well work to get a whole bunch of knowledge into the system,
but there is a lot of stuff it will never get, too ... because not
everything is out there in simple-sentence or simple-clause form....

Note: I don't buy that acquiring a mass of knowledge in logic form is
the key to AI.  But I do think it's useful, and if PowerSet or Mark
Waser or anyone else builds up an awesome repository of knowledge via
savvy NLP, I'd be very happy to partner with them and ingest that
knowledge into Novamente.  I have thought about ingesting Cyc into
Novamente too, but have not because

-- My impression is that Cyc's commercial license terms mean that if
we sold a Novamente system that used knowledge derived from Cyc
internally, we would be considered as reselling Cyc, which is
expensive.  (If this is wrong, I'd like to know about it...)

-- Cyc seems to me to be over-complexly structured (or, to be more
accurate, to be structured complexly in the wrong sort of way), so
that properly making use of Cyc's KB within Novamente would require
substantial effort in hand-coding "intermediary knowledge" mapping
between Cyc's constructs and the natural logical representation of
knowlege coming out of NM's sensors and actuators.  This is not a huge
objection but it's a reason why we haven't tried to deal with Cyc yet.
 I estimate it would take around a man-year of effort to effectively
glue Cyc's KB into the Novamente system...

-- Ben G

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;



_______________________________________
James Ratcliff - http://falazar.com
Looking for something...
       
---------------------------------
Ahhh...imagining that irresistible "new car" smell?
 Check outnew cars at Yahoo! Autos.

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936

Reply via email to