[opencog-dev] Re: [Link Grammar] Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper.

Linas Vepstas Sun, 29 Sep 2019 18:13:28 -0700

CC'ing the opencog mailing list.

On Sun, Sep 29, 2019 at 3:08 PM Amirouche Boubekki <
amirouche.boube...@gmail.com> wrote:


>
> My goal has not changed since 5 years! I want to create a mini-opencog
> framework. In the spirit of Scheme that builds abstractions on top of
> powerful primitives, such as a SAT solvers.
>

One very interesting abstraction on top of SAT is "answer set programming"
(ASP). Now, ASP looks a lot like prolog, except that ASP solvers are orders
of magnitude faster than traditional prolog, because they replace the
stack-based backwards/forwards chainers by the SAT algo.

I would love to see an atomspace interface into an ASP solver.  In this
view, ASP would look like a crisp-logic variant of PLN.  We could even
approximate PLN by taking a statistical average of hundreds of ASP
solutions. It might even be faster than the current rule-engine
backwards/forwards chainers, for the same reason that ASP/SAT is faster
than traditional prolog backward/forward chaining.


> Like we discussed previously, I think the mix of programming language
> (C, C++, Python, Scheme, Java, Haskell) is not helping embrace the
> power of opencog. So I will try only rely on proven C libraries. Link
> Grammar is such a beast. But:
>
> a) As far as I understand there still some moving pieces in how LG is
> implemented, and there is still improvement to the core mechanic that
> parse natural languages that could be made to support more. That is
> not everything about LG engineering is performance optimization. A
> high level language is a good candidate for experimenting with new
> features in LG.
>

Yes, but ... Although LG was originally developed to "parse natural
language", what it is actually doing is to parse any linear (time-ordered)
sequence of tokens, to extract structural relationships between those
tokens.  With Amir's work on the tokenizer, it can even discover different,
competing (contradictory) token boundaries, splitting an input stream in
different ways.

There are possible enhancements to LG, such as "multi-color parsing" or
"link crossing" but discussing these properly is beyond the scope of this
email.

At the level of this email, LG really is more-or-less "feature complete"
and nothing to be done except to performance tune.  Yet, also, as you
suggest, we could build better tools on top of it...


> b) I would like to better understand the link grammar theory. A bit of
> thinking, goofing around and reading lead me to the personal discovery
> that LG is a very peculiar kind of software because it rely upon a
> programming language, that is the language that in which dictionary
> are expressed, that is very broad and powerful (like other have
> noted). It goes along the idea of Domain Specific Language,


The theory is that of jigsaw-puzzle pieces, as described in the very first
paper. If you knew a lot more category theory, you would recognize these as
monoidal categories.  For example, LG resembles "pregroup grammars" (see
wikipedia) and this is no accident: the same theory describes both, more or
less.

The current text-file dictionaries in LG are a kind of DSL, but that's
misleading, because any system that can describe a monoidal category in a
type-theoretic way can be handled by LG.  So there's a whole class of DSL's
that you could layer on top of LG.


> where one
> builds a programming language to solve a particular task. I think LG
> is the best example of DSL I know. As such, it calls for more study,
> experimentation and understanding. Even if LG or a particular
> dictionary e.g.  English dictionary is flawed (somehow?) it is a
> significant software feat that I am sure will be taken as inspiration
> in the future human endeavours.
>

The "flaw" is that the English language cannot be described by a small
ruleset, no matter what DSL you would care to use. This is the general
lesson of linguistics: there are dozens of linguistic theories, and
hundreds of variants, they are all appealing for various reasons, they are
all adept at demonstrating various linguistic phenomena, but as soon as you
try to craft "the rules of English" by hand, the number of rules grows
exponentially.  Pick some theory of language, I don't care which, and a few
dozen rules will capture simplistic-English. A few hundred rules will give
reasonable accuracy at some elementary-school reading level, and you need
thousands of rules to begin to get acceptable quality on newspaper English,
and you need tens of thousands of rules to start getting broad coverage
(science literature, tweets, 18th century English, etc.)  and
tens-of-thousands of rules can no longer be managed by hand.


>
> c) One area, where Link Grammar software will probably will need to
> improve is the ability to create, fix, extend and improve the
> dictionaries. That is, it needs a user interface and user experience
> that looks better than notepad


My core claim is that even if you invent some really cool-looking DSL for
describing a monoidal category in a type theoretic way, and also developed
a nice GUI for it, you would still be faced with the need to maintain
thousands or tens of thousands of rules. At least, for natural language.

If you wanted to use LG to parse *some other* time series -- I dunno,
something from biochemistry or some network-hacking trace log or whatever,
something that could be described by only dozens or hundreds of rules, then
yes, a GUI would be great to have.

However, all linguists who have attempted to build any kind of GUI for any
kind of parser for any natural language -- they all hit a wall.
Maintaining tens of thousands of rules is just too hard.


> or the current REPL cli tool called
> link-parser. Alas, I have no better idea than a REPL of some sort but
> use voice... One area, that could improve the ui/ux of the creation
> and maintaince of the dictionary is better integration with the
> AtomSpace and in general with the end-user application.


This veers in a different direction.  Roughly speaking, the atomspace is
"just like" any other database. Have you ever seen a nice UI/UX for the
maintenance of a database? Gee golly, well, why not?  Because databases are
too abstract to mash into a GUI or UI/UX. Database tables can be anything
at all. There's no way to UI/UX that, (for the same reason that no
general-purpose programming language has a GUI) (I mean, LEGO Mindstorms is
almost a GUI, and COBOL is almost a GUI, but once you understand either of
these, you promptly realize that a plain-old text editor is just faster and
easier.)


> That is it
> would be neat, to allow the user to fix the dictionary. Something that
> is made difficult in current microservice-like setup of opencog and
> the fact that. In order, to have a quick feedback loop between LG, the
> knowledge base and the user. To make it more clear, the current unit
> tests obviously are a good thing. One should bring that feature, unit
> testing into the client. We could have, ground through knowledge,
> similar to expected parse trees in the current unit tests even take
> into account inferred knowledge based on parse trees and eventually on
> a version of the dictionary which will be checked as soon as the
> client user make a change to that dictionary.


This is impossible. The current LG test sets contain about 7K test
sentences, and it is impossible to fix one without breaking something else.
The best you can do is to fix more than what you break.

The current English dict has approximately 2K rules in it, and its
impossible to make changes in one of them without carefully understanding
most of the others.

This is not like ordinary software.  Human natural language is much much
messier than any software program.

The goal of the language-learning project is to automatically learn the
rules of any natural language, given a sampling of it's corpus.

The point being that humans should not be writing these dictionaries.
Machines should be.

-- linas

Basically, give access
> to the user to some knobs that are frozen right now and give the user
> the tools necessary to make sure: It Works (tm). The reason for that
> is two sides: 1) for rare languages, it should be possible to build a
> LG dictionary from the user interface, prolly with a bootstrap
> language like lobjan or english. 2) I think LG dictionary language (or
> something similar) is a good candidate for inclusion in projects such
> as wikidata but before that happens it must be possible to test check
> the correctness of changes since it is possible to do so. (unlike
> common sense, encyclopedic knowledge and, so called, lexicographic
> that are ground truth).
>
> d) Like I try to explain above, I prefer easy to code. Fast programs,
> Speedy processors et al. have proven numerous times in recent years to
> be false friends. So, like Rob Pike might have said: "make it work,
> then make it fast".
>
> My understanding is that GOFAI has nightmares about slowness. Like i
> tried to explain somewhere else, they are workarounds to slow
> processus like a) lazy algorithm or beam search b) probabilistic
> models c) slow overall workflow.
>
> The last point is interesting, my idea is that the problem that AGI
> needs to solve, are big and slow and sometime not even advancing at
> all when humans try to tackle them. So, if the computer is "slow"
> compared to ordering a pizza, the user will be thankful even if it
> reply with a message saying: need more input.
>
> --
> Amirouche ~ amz3 ~ https://hyper.dev
>
> --
> You received this message because you are subscribed to the Google Groups
> "link-grammar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to link-grammar+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/link-grammar/CAL7_Mo81_MyagiBNspt-25x66s-UXD9oerUD2kVQmaiSC4%2B-Bg%40mail.gmail.com
> .
>


-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35%2B4A7-FLTykU%3DQY27crpNyGDBHiKYo%3DAfdXA%2BUkk45PQ%40mail.gmail.com.

[opencog-dev] Re: [Link Grammar] Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper.

Reply via email to