I highly doubt it's any faster or easier to manage than recursive ascent descent parsing coded in register-machine assembly.
On Mon, Mar 18, 2013 at 4:29 PM, Steve Richfield <[email protected]>wrote: > Get ready for an intelligent Internet, because here it comes... > * > The Problem* > > * > * > > Why haven’t computers been able to understand plain English? There have > been many attempts over the last 4 decades, and on careful examination they > all seem to end the same way. Someone sees the complexity of English as > being well within the reach of a good programmer, writes some working code, > starts entering rules, and then they simultaneously hit two barriers: > > > > 1. While a few simple rules works on test cases, real-world English is > REALLY complicated, with enough exceptions to every rule that it would take > thousands of rules to be able to pick apart everyday English, and many of > the rules are not at all simple. It would take years of work to write a > “critical mass” of such rules, and lacking certain psychiatric conditions, > these rules are not at all easy or fun to create. Of course, if a developer > has a few million dollars, they could hire a team of linguists to start > writing these rules, but... > 2. As developers enter even enough rules for a good demonstration, > their program starts to run SO slowly that they must back out some of their > rules just to have their program respond in a timely manner. A little > research into the combinatorial nature of the problem shows that they are > at least a couple of orders of magnitude short on speed. Hence, they are > unmotivated to put together a company to create the needed rules, when > there is no computer capable of processing them. > > So, one by one, NLP developers have published some interesting examples of > things their programs were able to do, without mentioning now long it took, > or that it was hopeless to extend their methods to everyday English. No one > wants to publish that their methods are unscalable, so these past efforts > have simply faded away, without marking the trap waiting for the next NLP > project. > > > I have talked with several people who were writing yet another program to > “understand” English, to try to save them from wasting years of their lives > as have others before them, but they invariably just couldn’t believe that > a modern gigahertz processor could ever be bogged down by seemingly simple > string processing. > > > This is the path that DrEliza.com was on. Looking to rewrite it, better > programming offered a possible order of magnitude improvement in speed, > which was still not enough to achieve the desired performance. However, > instead of walking away from it as past NLP developers had done, I decided > to determine just how fast it was conceivably possible to process NL, to > see if this speed trap was theoretically unavoidable. In this process, I > found a new technique that would probably be ~3-4 orders of magnitude > faster than traditional methods, depending on just what it is compared with. > > > Now, I have a way to avoid this speed trap, so people can start writing > highly scalable NLP code. > > > *The Solution* > > * > * > > The patent for the details that any competent AI guru could apply to > implement REALLY fast parsing, that operates orders of magnitude faster > than prior art methods, and use it to make the Internet intelligent, are > now embodied in U.S. Patent Application 13/836,678 that is attached to > this posting. We are now working out the business details to encourage > people to use this technology. Probably involved will be a users’ group, in > which participation will earn enough credit toward future royalties that > only medium and large sized corporations would end up paying anything. Also, > we are open for joint ventures, e.g. trading license to use this technology > in return for founders’ stock. Earlier thoughts of simply granting > exemptions from royalties, rather than granting credit, had some subtle > legal problems and have been abandoned. If you think you see a better > business approach, one good enough to get YOU involved, then please let me > know. > > > At considerable risk of summarizing 100 pages of legalese into a brief > explanation... > > > *Fast Parsing* > > * > * > > Here it is in a nutshell: > > - The input is parsed into tokens. > - The tokens are hashed into double precision floating point (DPFP) > numbers. > - A portion of the DPFP numbers are then used to access the English > lexicon in typical symbol table fashion, e.g. via a circular table, with > the usual collision handling, etc. > - During initialization, the first few thousand most commonly used > words, in order of frequency of use, are preprocessed to seed the lexicon. > - Lexicon entries will contain the string that represents the word > (which is only needed for output), the DPFP hash for the word (used to > confirm that the correct entry has been found), and pointers to rules for > which the entry is the least frequently used word. Words will be > represented as an ordinal indicating the relative frequency of use, e.g. > “the” will be represented by 1. > - Rules are then compiled, during which time the least frequently used > words in the rules are identified and marked in the lexicon to trigger the > queuing of those rules. > - Higher-level rules will be queued as lower-level rules are satisfied. > - As input words are processed, the rules that are triggered will be > put into appropriate queues. The next rule processed will always come from > the highest priority non-empty queue. Higher-level rules will go into > lower-priority queues. > - When the last queue is empty, output can then be retrieved from > variables set by the rules. > > This method removes the usual scope constraints that most NL processing > methods have, so for example, it will be possible to disambiguate > abbreviations and idioms based on words that occur elsewhere in the same > sentence, paragraph, posting, or prior posting by the same user, without > incurring significant additional overhead. > > > In other methods of parsing NL, >99% of all tests fail to find what they > are seeking. In this method a large fraction, approaching half, will find > what they are looking for because they aren’t performed unless the least > likely element is present. Note that any rule accessing the results of a > lower-level rule that hasn’t been evaluated simply assumes that its result > is false, which it would be if it were to be evaluated, because its least > likely necessary element MUST be absent for it not to have previously been > evaluated. > > > Note that this approach is a method of running really fast. It is NOT a > particular parsing methodology. You can invent rules of any kind to parse > text in any way you imagine. This is just a way of putting it all together > to run really fast. > > > Note that this same sort of selective processing based on the appearance > of least likely elements has all sorts of other applications not mentioned > in the patent, e.g. in AGI internals, so it is important to grok how this > approach so greatly speeds things up. > > > *Intelligent Internet* > > * > * > > Fast parsing makes it possible to keep up with everything on the Internet > in real time. My plan is to have an AI synthetic user watch everything, > making it the most active user on the Internet, and comment on things that > it can usefully comment on. There are lots of “little” details that will > have to be worked out to make this a reality, including: > > > > 1. A mechanism of sending emails through human representatives for > review and onto the ultimate recipients, in a way that the FROM is altered > to be the human representative, the TO is altered to be the ultimate > recipient, and all markings indicating this processing are removed. > 2. A mechanism for web crawlers to work through human representatives’ > computers to hide their activity in some sensitive domains. > > In addition to implementing a synthetic user, this supports on-the-fly > custom tailoring ads to address recipients' postings, as well as > traditional expert sites like the present DrEiza.com. > > > *Where this Really Shines* > > * > * > > Clearly the best fit between technology and leverage is in political, > religious, and other contentious issues where everyone has an opinion, but > few opinions have been well thought out. Here, an AI can easily see the > common expressions indicating simple flaws in people’s rantings, and tailor > responses that strike at the very heart of those flaws – for a price of > course. This should be able to grab much of the money now going to > political advertising, because it can touch people’s individual points of > view right as they are expressing them. > > > *The Future* > > * > * > > Other methods of parsing NL will soon be abandoned in the face of having > this method available. Unlike other computer-related tools and technology > that becomes obsolete when the next version is released, this is likely to > be around for a while. After all, it took 40 years of NLP stumbling along > for someone to think of this, so how long is it going to take to come up > with something significantly better? > > > *Special Thanks* > > * > * > > Special thanks to the technical reviewers who helped make this possible. > You are hereby released from your NDAs and are free to discuss all you now > know. > > > In case you are interested in more details, or just want to see what such > a patent looks like after the lawyers have finished with it, I have > attached the patent abstract, specification, and drawings to this message. > It is a bit big and may not remain on the server, so I recommend that you > copy it off and save it on your own computer. > > > Steve > > > *AGI* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/5037279-a88c7a6d> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com> > ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
