I highly doubt it's any faster or easier to manage than recursive ascent
descent parsing coded in register-machine assembly.


On Mon, Mar 18, 2013 at 4:29 PM, Steve Richfield
<[email protected]>wrote:

> Get ready for an intelligent Internet, because here it comes...
> *
> The Problem*
>
> *
> *
>
> Why haven’t computers been able to understand plain English? There have
> been many attempts over the last 4 decades, and on careful examination they
> all seem to end the same way. Someone sees the complexity of English as
> being well within the reach of a good programmer, writes some working code,
> starts entering rules, and then they simultaneously hit two barriers:
>
>
>
>    1. While a few simple rules works on test cases, real-world English is
>    REALLY complicated, with enough exceptions to every rule that it would take
>    thousands of rules to be able to pick apart everyday English, and many of
>    the rules are not at all simple. It would take years of work to write a
>    “critical mass” of such rules, and lacking certain psychiatric conditions,
>    these rules are not at all easy or fun to create. Of course, if a developer
>    has a few million dollars, they could hire a team of linguists to start
>    writing these rules, but...
>    2. As developers enter even enough rules for a good demonstration,
>    their program starts to run SO slowly that they must back out some of their
>    rules just to have their program respond in a timely manner. A little
>    research into the combinatorial nature of the problem shows that they are
>    at least a couple of orders of magnitude short on speed. Hence, they are
>    unmotivated to put together a company to create the needed rules, when
>    there is no computer capable of processing them.
>
> So, one by one, NLP developers have published some interesting examples of
> things their programs were able to do, without mentioning now long it took,
> or that it was hopeless to extend their methods to everyday English. No one
> wants to publish that their methods are unscalable, so these past efforts
> have simply faded away, without marking the trap waiting for the next NLP
> project.
>
>
> I have talked with several people who were writing yet another program to
> “understand” English, to try to save them from wasting years of their lives
> as have others before them, but they invariably just couldn’t believe that
> a modern gigahertz processor could ever be bogged down by seemingly simple
> string processing.
>
>
> This is the path that DrEliza.com was on. Looking to rewrite it, better
> programming offered a possible order of magnitude improvement in speed,
> which was still not enough to achieve the desired performance. However,
> instead of walking away from it as past NLP developers had done, I decided
> to determine just how fast it was conceivably possible to process NL, to
> see if this speed trap was theoretically unavoidable. In this process, I
> found a new technique that would probably be ~3-4 orders of magnitude
> faster than traditional methods, depending on just what it is compared with.
>
>
> Now, I have a way to avoid this speed trap, so people can start writing
> highly scalable NLP code.
>
>
> *The Solution*
>
> *
> *
>
> The patent for the details that any competent AI guru could apply to
> implement REALLY fast parsing, that operates orders of magnitude faster
> than prior art methods, and use it to make the Internet intelligent, are
> now embodied in U.S. Patent Application 13/836,678 that is attached to
> this posting. We are now working out the business details to encourage
> people to use this technology. Probably involved will be a users’ group, in
> which participation will earn enough credit toward future royalties that
> only medium and large sized corporations would end up paying anything.  Also,
> we are open for joint ventures, e.g. trading license to use this technology
> in return for founders’ stock. Earlier thoughts of simply granting
> exemptions from royalties, rather than granting credit, had some subtle
> legal problems and have been abandoned. If you think you see a better
> business approach, one good enough to get YOU involved, then please let me
> know.
>
>
> At considerable risk of summarizing 100 pages of legalese into a brief
> explanation...
>
>
> *Fast Parsing*
>
> *
> *
>
> Here it is in a nutshell:
>
>    - The input is parsed into tokens.
>    - The tokens are hashed into double precision floating point (DPFP)
>    numbers.
>    - A portion of the DPFP numbers are then used to access the English
>    lexicon in typical symbol table fashion, e.g. via a circular table, with
>    the usual collision handling, etc.
>    - During initialization, the first few thousand most commonly used
>    words, in order of frequency of use, are preprocessed to seed the lexicon.
>    - Lexicon entries will contain the string that represents the word
>    (which is only needed for output), the DPFP hash for the word (used to
>    confirm that the correct entry has been found), and pointers to rules for
>    which the entry is the least frequently used word. Words will be
>    represented as an ordinal indicating the relative frequency of use, e.g.
>    “the” will be represented by 1.
>    - Rules are then compiled, during which time the least frequently used
>    words in the rules are identified and marked in the lexicon to trigger the
>    queuing of those rules.
>    - Higher-level rules will be queued as lower-level rules are satisfied.
>    - As input words are processed, the rules that are triggered will be
>    put into appropriate queues. The next rule processed will always come from
>    the highest priority non-empty queue. Higher-level rules will go into
>    lower-priority queues.
>    - When the last queue is empty, output can then be retrieved from
>    variables set by the rules.
>
> This method removes the usual scope constraints that most NL processing
> methods have, so for example, it will be possible to disambiguate
> abbreviations and idioms based on words that occur elsewhere in the same
> sentence, paragraph, posting, or prior posting by the same user, without
> incurring significant additional overhead.
>
>
> In other methods of parsing NL, >99% of all tests fail to find what they
> are seeking. In this method a large fraction, approaching half, will find
> what they are looking for because they aren’t performed unless the least
> likely element is present. Note that any rule accessing the results of a
> lower-level rule that hasn’t been evaluated simply assumes that its result
> is false, which it would be if it were to be evaluated, because its least
> likely necessary element MUST be absent for it not to have previously been
> evaluated.
>
>
> Note that this approach is a method of running really fast. It is NOT a
> particular parsing methodology. You can invent rules of any kind to parse
> text in any way you imagine. This is just a way of putting it all together
> to run really fast.
>
>
> Note that this same sort of selective processing based on the appearance
> of least likely elements has all sorts of other applications not mentioned
> in the patent, e.g. in AGI internals, so it is important to grok how this
> approach so greatly speeds things up.
>
>
> *Intelligent Internet*
>
> *
> *
>
> Fast parsing makes it possible to keep up with everything on the Internet
> in real time. My plan is to have an AI synthetic user watch everything,
> making it the most active user on the Internet, and comment on things that
> it can usefully comment on. There are lots of “little” details that will
> have to be worked out to make this a reality, including:
>
>
>
>    1. A mechanism of sending emails through human representatives for
>    review and onto the ultimate recipients, in a way that the FROM is altered
>    to be the human representative, the TO is altered to be the ultimate
>    recipient, and all markings indicating this processing are removed.
>    2. A mechanism for web crawlers to work through human representatives’
>    computers to hide their activity in some sensitive domains.
>
> In addition to implementing a synthetic user, this supports on-the-fly
> custom tailoring ads to address recipients' postings, as well as
> traditional expert sites like the present DrEiza.com.
>
>
> *Where this Really Shines*
>
> *
> *
>
> Clearly the best fit between technology and leverage is in political,
> religious, and other contentious issues where everyone has an opinion, but
> few opinions have been well thought out. Here, an AI can easily see the
> common expressions indicating simple flaws in people’s rantings, and tailor
> responses that strike at the very heart of those flaws – for a price of
> course. This should be able to grab much of the money now going to
> political advertising, because it can touch people’s individual points of
> view right as they are expressing them.
>
>
> *The Future*
>
> *
> *
>
> Other methods of parsing NL will soon be abandoned in the face of having
> this method available. Unlike other computer-related tools and technology
> that becomes obsolete when the next version is released, this is likely to
> be around for a while. After all, it took 40 years of NLP stumbling along
> for someone to think of this, so how long is it going to take to come up
> with something significantly better?
>
>
> *Special Thanks*
>
> *
> *
>
> Special thanks to the technical reviewers who helped make this possible.
> You are hereby released from your NDAs and are free to discuss all you now
> know.
>
>
> In case you are interested in more details, or just want to see what such
> a patent looks like after the lawyers have finished with it, I have
> attached the patent abstract, specification, and drawings to this message.
> It is a bit big and may not remain on the server, so I recommend that you
> copy it off and save it on your own computer.
>
>
> Steve
>
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/5037279-a88c7a6d> | 
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to