Le mar. 30 juin 2020 à 20:04, Linas Vepstas <linasveps...@gmail.com> a écrit :
>
> Hi Amirouche,
>
> There are other, far more sophisticated ways of doing spell checking. The 
> best way is to do context-dependent checking... and link-grammar provides 
> extremely precise context. For example, "I gave him teh hammer" -- the LG 
> spelling guesser offers up "the", "then", "ten" as possible fixes. However, 
> only one of these choices leads to a grammatically-correct sentence. Thus, 
> the other spelling guesses can be discarded because they lead to grammatical 
> nonsense.

Exactly! That is my goal.

> Because of this, spelling checkers and POS taggers are NOT used in the 
> language pipeline, and, based on practical experience, they actually make 
> things worse.  That is, they make suggestions and provide tags that are 
> misleading or wrong, and lower the quality of the results -- It turns out 
> that English grammar provides tight constraints on what is possible, and 
> those constraints are much tighter (and have higher 
> accuracy/recall/precision) than single-word taggers.
>
> Such statistical systems might currently do as well as or better than grammar,

They will always fail on new grammar rules, and re-training the
algorithm over those is painful I guess. Maybe new grammar rules is
not a use-case since nowadays there is grammar-police-robot in every
browser...

> since, it turns out, writing a complete grammar is impossible; there is 
> always yet one more, extremely rare exception to every rule.  Roughly 
> speaking, grammar rules have a Zipfian distribution.

I mean probably yes with the approach taken by link-grammar where only
a couple of experts can edit the grammar. My goal is to make it also
easy for the user to improve the grammar, hence the single source of
truth database that will include the dictionary of words, grammar
relations between words, and test cases for each relation.  I am still
nurturing the idea to allow a user as part of the conversational
interface, to make it possible to extend the grammar based on a subset
of English grammar that is not ambiguous and that would be the seed of
the system. I still need to create a proof-of-concept of this.

> "A sat solver and an okvs" -- we've played with SAT. Basically, the SAT 
> solvers are slower. They used to be sometimes-faster, but Amir's work fixed 
> that.  I don't know what OKVS is.

How much slower ? 2 or 3 times slower ? or more like 10 or 100 times
slower. My primary use for parsing sentences is to be able to have
some kind of limited conversation with a human. In a _narrow_ first
step, I do not plan to parse essays by Kant or Goethe for the time
being.

> p.s. LG is written in C, and so it can only use those spellling-guessers that 
> have a C api.  The current choices are aspell and hunspell, -- once upon a 
> time, "hunspell" was "better" but seems to no longer be maintained.  Again -- 
> aspell is used to provide suggestions, and parsing determines which of these 
> suggestions is correct (if any).

Thanks for the enlightening conversation. I was imagining such a thing
but did not have time to look into aspell intricates.

I think a clean implementation of the ideas behind Link Grammar can be
useful for OpenCog at least to help approach Link Grammar C/C++
codebase. You can compare my project to MINIX vs. Linux. MINIX is
pedagogical. Linux is industrial. It seems to me learning grammar in
an unsupervised way, is not ready?

(A little bit unrelated to this topic: I think we already discussed
that. My plan is to do most of the still fuzzy stuff in Scheme and
re-use existing off-the-shelf libraries like minisat where it makes
sense. I mean performance is very important but being able to
comprehend the whole system is much more important for me. I think the
example of spellchecking or candidate selection is a good example
where one can fine-tune the accuracy and speed of the algorithm, but
in my case, I prefer a simple algorithm that can possibly scale and
that is easy to use instead of complex machinery that handles all the
cases but is difficult to work with. I will learn patience :)

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAL7_Mo9rgf_t4f5UUGd-iEBfPMtR-eY3TPYm6Ou_OaE8vRXUrg%40mail.gmail.com.

Reply via email to