Straight parsing the web runs into nasty problems really quickly, mainly lack of trusted sources and bad grammar to parse. You have to restrict it to a much smaller set of data items, such as novels, news sources, or something like wikipedia... These mediums all provide you with a good deal of structured knowledge that gives some good statistical results. Even the preposition problem can be boosted with this knowledge a great deal, and if we add in interaction with users, we can catch the tough cases as well.
James Ratcliff Benjamin Goertzel <[EMAIL PROTECTED]> wrote: Mark Waser wrote: > To me, it seems much easier to have an automated system encode the > facts/rules. Indeed ... but obviously, no one has solved the NLP problem... there are no semantic analysis systems out there that can map complex sentences into logical relationships in an adequately reliable way... For instance, the problem of preposition disambiguation is very badly unsolved... Current reference resolution algorithms are pretty unreliable too... There are plenty of parsing/semantic-mapping systems that work well for most simple sentences, but that doesn't really help in the interpretation of the real text that's out there in the Web, in books, in dictionaries and encyclopedias, etc. Mark, how do you plan to solve [or work around] this problem? A possibly viable approach is to simply parse the Web, and throw out sentences that one's system can't interpret confidently enough. The hope is that there is enough information online in suitably simple form, to fill up an artificial mind. This is essentially PowerSet's plan, as I understand it: to parse the Web and map it into logic, but using a parser that can only handle relatively simple syntactic constructs reliably. The redundancy of the Web is being relied upon. This may well work to get a whole bunch of knowledge into the system, but there is a lot of stuff it will never get, too ... because not everything is out there in simple-sentence or simple-clause form.... Note: I don't buy that acquiring a mass of knowledge in logic form is the key to AI. But I do think it's useful, and if PowerSet or Mark Waser or anyone else builds up an awesome repository of knowledge via savvy NLP, I'd be very happy to partner with them and ingest that knowledge into Novamente. I have thought about ingesting Cyc into Novamente too, but have not because -- My impression is that Cyc's commercial license terms mean that if we sold a Novamente system that used knowledge derived from Cyc internally, we would be considered as reselling Cyc, which is expensive. (If this is wrong, I'd like to know about it...) -- Cyc seems to me to be over-complexly structured (or, to be more accurate, to be structured complexly in the wrong sort of way), so that properly making use of Cyc's KB within Novamente would require substantial effort in hand-coding "intermediary knowledge" mapping between Cyc's constructs and the natural logical representation of knowlege coming out of NM's sensors and actuators. This is not a huge objection but it's a reason why we haven't tried to deal with Cyc yet. I estimate it would take around a man-year of effort to effectively glue Cyc's KB into the Novamente system... -- Ben G ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?& _______________________________________ James Ratcliff - http://falazar.com Looking for something... --------------------------------- Ahhh...imagining that irresistible "new car" smell? Check outnew cars at Yahoo! Autos. ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936