CC'ing the opencog mailing list. On Sun, Sep 29, 2019 at 3:08 PM Amirouche Boubekki < amirouche.boube...@gmail.com> wrote:
> > My goal has not changed since 5 years! I want to create a mini-opencog > framework. In the spirit of Scheme that builds abstractions on top of > powerful primitives, such as a SAT solvers. > One very interesting abstraction on top of SAT is "answer set programming" (ASP). Now, ASP looks a lot like prolog, except that ASP solvers are orders of magnitude faster than traditional prolog, because they replace the stack-based backwards/forwards chainers by the SAT algo. I would love to see an atomspace interface into an ASP solver. In this view, ASP would look like a crisp-logic variant of PLN. We could even approximate PLN by taking a statistical average of hundreds of ASP solutions. It might even be faster than the current rule-engine backwards/forwards chainers, for the same reason that ASP/SAT is faster than traditional prolog backward/forward chaining. > Like we discussed previously, I think the mix of programming language > (C, C++, Python, Scheme, Java, Haskell) is not helping embrace the > power of opencog. So I will try only rely on proven C libraries. Link > Grammar is such a beast. But: > > a) As far as I understand there still some moving pieces in how LG is > implemented, and there is still improvement to the core mechanic that > parse natural languages that could be made to support more. That is > not everything about LG engineering is performance optimization. A > high level language is a good candidate for experimenting with new > features in LG. > Yes, but ... Although LG was originally developed to "parse natural language", what it is actually doing is to parse any linear (time-ordered) sequence of tokens, to extract structural relationships between those tokens. With Amir's work on the tokenizer, it can even discover different, competing (contradictory) token boundaries, splitting an input stream in different ways. There are possible enhancements to LG, such as "multi-color parsing" or "link crossing" but discussing these properly is beyond the scope of this email. At the level of this email, LG really is more-or-less "feature complete" and nothing to be done except to performance tune. Yet, also, as you suggest, we could build better tools on top of it... > b) I would like to better understand the link grammar theory. A bit of > thinking, goofing around and reading lead me to the personal discovery > that LG is a very peculiar kind of software because it rely upon a > programming language, that is the language that in which dictionary > are expressed, that is very broad and powerful (like other have > noted). It goes along the idea of Domain Specific Language, The theory is that of jigsaw-puzzle pieces, as described in the very first paper. If you knew a lot more category theory, you would recognize these as monoidal categories. For example, LG resembles "pregroup grammars" (see wikipedia) and this is no accident: the same theory describes both, more or less. The current text-file dictionaries in LG are a kind of DSL, but that's misleading, because any system that can describe a monoidal category in a type-theoretic way can be handled by LG. So there's a whole class of DSL's that you could layer on top of LG. > where one > builds a programming language to solve a particular task. I think LG > is the best example of DSL I know. As such, it calls for more study, > experimentation and understanding. Even if LG or a particular > dictionary e.g. English dictionary is flawed (somehow?) it is a > significant software feat that I am sure will be taken as inspiration > in the future human endeavours. > The "flaw" is that the English language cannot be described by a small ruleset, no matter what DSL you would care to use. This is the general lesson of linguistics: there are dozens of linguistic theories, and hundreds of variants, they are all appealing for various reasons, they are all adept at demonstrating various linguistic phenomena, but as soon as you try to craft "the rules of English" by hand, the number of rules grows exponentially. Pick some theory of language, I don't care which, and a few dozen rules will capture simplistic-English. A few hundred rules will give reasonable accuracy at some elementary-school reading level, and you need thousands of rules to begin to get acceptable quality on newspaper English, and you need tens of thousands of rules to start getting broad coverage (science literature, tweets, 18th century English, etc.) and tens-of-thousands of rules can no longer be managed by hand. > > c) One area, where Link Grammar software will probably will need to > improve is the ability to create, fix, extend and improve the > dictionaries. That is, it needs a user interface and user experience > that looks better than notepad My core claim is that even if you invent some really cool-looking DSL for describing a monoidal category in a type theoretic way, and also developed a nice GUI for it, you would still be faced with the need to maintain thousands or tens of thousands of rules. At least, for natural language. If you wanted to use LG to parse *some other* time series -- I dunno, something from biochemistry or some network-hacking trace log or whatever, something that could be described by only dozens or hundreds of rules, then yes, a GUI would be great to have. However, all linguists who have attempted to build any kind of GUI for any kind of parser for any natural language -- they all hit a wall. Maintaining tens of thousands of rules is just too hard. > or the current REPL cli tool called > link-parser. Alas, I have no better idea than a REPL of some sort but > use voice... One area, that could improve the ui/ux of the creation > and maintaince of the dictionary is better integration with the > AtomSpace and in general with the end-user application. This veers in a different direction. Roughly speaking, the atomspace is "just like" any other database. Have you ever seen a nice UI/UX for the maintenance of a database? Gee golly, well, why not? Because databases are too abstract to mash into a GUI or UI/UX. Database tables can be anything at all. There's no way to UI/UX that, (for the same reason that no general-purpose programming language has a GUI) (I mean, LEGO Mindstorms is almost a GUI, and COBOL is almost a GUI, but once you understand either of these, you promptly realize that a plain-old text editor is just faster and easier.) > That is it > would be neat, to allow the user to fix the dictionary. Something that > is made difficult in current microservice-like setup of opencog and > the fact that. In order, to have a quick feedback loop between LG, the > knowledge base and the user. To make it more clear, the current unit > tests obviously are a good thing. One should bring that feature, unit > testing into the client. We could have, ground through knowledge, > similar to expected parse trees in the current unit tests even take > into account inferred knowledge based on parse trees and eventually on > a version of the dictionary which will be checked as soon as the > client user make a change to that dictionary. This is impossible. The current LG test sets contain about 7K test sentences, and it is impossible to fix one without breaking something else. The best you can do is to fix more than what you break. The current English dict has approximately 2K rules in it, and its impossible to make changes in one of them without carefully understanding most of the others. This is not like ordinary software. Human natural language is much much messier than any software program. The goal of the language-learning project is to automatically learn the rules of any natural language, given a sampling of it's corpus. The point being that humans should not be writing these dictionaries. Machines should be. -- linas Basically, give access > to the user to some knobs that are frozen right now and give the user > the tools necessary to make sure: It Works (tm). The reason for that > is two sides: 1) for rare languages, it should be possible to build a > LG dictionary from the user interface, prolly with a bootstrap > language like lobjan or english. 2) I think LG dictionary language (or > something similar) is a good candidate for inclusion in projects such > as wikidata but before that happens it must be possible to test check > the correctness of changes since it is possible to do so. (unlike > common sense, encyclopedic knowledge and, so called, lexicographic > that are ground truth). > > d) Like I try to explain above, I prefer easy to code. Fast programs, > Speedy processors et al. have proven numerous times in recent years to > be false friends. So, like Rob Pike might have said: "make it work, > then make it fast". > > My understanding is that GOFAI has nightmares about slowness. Like i > tried to explain somewhere else, they are workarounds to slow > processus like a) lazy algorithm or beam search b) probabilistic > models c) slow overall workflow. > > The last point is interesting, my idea is that the problem that AGI > needs to solve, are big and slow and sometime not even advancing at > all when humans try to tackle them. So, if the computer is "slow" > compared to ordering a pizza, the user will be thankful even if it > reply with a message saying: need more input. > > -- > Amirouche ~ amz3 ~ https://hyper.dev > > -- > You received this message because you are subscribed to the Google Groups > "link-grammar" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to link-grammar+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/link-grammar/CAL7_Mo81_MyagiBNspt-25x66s-UXD9oerUD2kVQmaiSC4%2B-Bg%40mail.gmail.com > . > -- cassette tapes - analog TV - film cameras - you -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA35%2B4A7-FLTykU%3DQY27crpNyGDBHiKYo%3DAfdXA%2BUkk45PQ%40mail.gmail.com.