Hello Ben and Linas,

Sorry for the delay, I was reading the papers. About additivity: In
Coecke's et al. program you turn a sentence into a *multilinear* map
that goes from the vectors of the words having elementary syntactic
category to a semantic vector space, the sentence meaning space. So
yes, there is additivity in each of theese arguments (thing which by
the way should have a consequence in those beautiful word2vec
relations of France - Paris ~= Spain - Madrid, though I haven't seen a
description).

As I understand, your goal is to go from plain text to logical forms
in a probabilistic logic, and you have two stages, parsing from plain
text to a pregroup grammar parse structure (I'm not sure that the
parse trees I spoken before are really trees, hence the change to
'parse structure'), and then you go from that parse structure (via
RelEx and RelEx2Logic if that's ok) to a lambda calculus term bearing
the meaning and having attached extrinsically a kind of probability
and another number.

How do Coecke's program (and from now on that unfairly includes all
the et als.) fit in that picture? I think the key observation is when
Coecke says that his framework can be interpreted, as a particular
case, as Montague semantics. Though adorned by linguistic
considerations this semantic is well known as amenable to computation,
and a toy version is shown in chapter 10 of the NLTK book, where they
show how lambda calculus represents a logic that has a model theory.
That is important because all those lambda terms have to be actual
functions with actual values.

How exactly does Coecke's framework reduces to Montague semantics?
That matters, because if we understand how Montague semantics is a
particular case of Coecke's, we can think in the opposite direction
and see Coecke's semantics as an extension.

As starting point we have the fact that Coecke semantics can be
summarized as a monoidal functor that sends a morphism from a compact
closed category in syntax-land (the pregroup grammar parse structure,
resulting from parsing the plain text of a sentence) to a morphism in
a compact closed category in semantics-land, the category of real
vector spaces, that morphism being a (multi)linear map.

Coecke semantic functor definition, however, hardly needs any
modification if we use as target the compact closed category of
modules over a fixed semiring. If the semiring is that of booleans, we
are talking about the category of relations between sets, with Pierce
relational product (uncle = brother * father) expressed with the same
matrix product formula of linear algebra, and with cartesian product
as the tensor product that makes it monoidal.

The idea is that when Coecke semantic functor has as codomain the
category of relations, one obtains Montague semantics. More exactly,
when one applies the semantic functor to a pregroup grammar parse
structure of a sentence, one obtains the lambda term that Montague
would have attached to it. Naturally the question is how exactly
unfold that abstract notion. The folk joke on 'abstract nonsense'
forgets that there is a down button in the elevator.

Well, this would be lenghty here, but the way I started to come to
grips is by entering into the equation the CCG linguistic formalism. A
fast and good slide show of how one goes from plain text to CCG
derivations, and from derivations then to classic Montague-semantics
lambda terms, can be found in [1].

One important feature in CCG is that it is lexicalized, i. e., all the
linguistic data necessary to do both syntatic and semantic parsing is
attached to the words of the dictionary, in contrast with, say, NLTK
book ch. 10, where the linguistic data is inside production rules of
an explicit grammar.

Looking closer to the lexicon (dictionary), one has that each word is
supplemented with its syntactic category (N/N...) and also with a
lambda term compatible with the syntactic category used in semantic
parsing. Those lambda terms are not magical letters. For the lambda
terms to have a true model theoretic semantics they must correspond to
specific functions.

The good thing is that the work of porting Coecke semantics to CCG
(instead of pregroup grammar) is already done: in [2]. The details are
there, but the thing that I want to highlight is that in this case,
when one is doing Coecke semantics with CCG parsing, the structure of
the lexicon is changed. One retains the words, and their associated
syntactic category. But now, instead of the lambda terms (with their
corresponding interpretation as actual relations/functions), one has
vectors and tensors for simple and compound syntactic categories (say
N vs N/N) respectively. When those tensors/vectors are of booleans one
recovers Montague semantics.

In the Coecke general case, sentences mean vectors in a real vector
space and the benefits start by using its inner product, and hence
norm and metric, so you can measure quantitatively sentence similarity
(rather normalized vectors...).

CCG is very nice in practical terms. An open SOTA parser
implementation is [3] described in [4], to be compared with [5] ("The
parser finds the optimal parse for 99.9% of held-out sentences").
openCCG is older but does parsing and generation.

One thing that I don't understand well with the above stuff is that
the category of vector spaces over a fixed field (or even the finite
dimensional ones) is *not* cartesian closed. While in the presentation
of Montague semantics in NLTK book ch. 10 the lambda calculus appears
to be untyped, more faithful presentations seem to require (simply)
typed or even a more complex calculus/logic. In that case the semantic
category perhaps should had to be cartesian closed, supporting in
particular higher order maps.

That's all in the expository front and now some speculation.

Up to now the only tangible enhancement brought by Coecke semantics is
the motivation of a metric among sentence meanings. What we really
want is a mathematical motivation to probabilize the crisp, hard facts
character of the interpretation of sentences as Montague lambda terms.
How to attack the problem?

One idea is to experiment with other kinds of semantic category as
target of the Coecke semantic functor. To be terse, this can be
explored by means of a monad on a vanilla unstructured base category
such as finite sets. One can have several choices of endofunctor to
specify the corresponding monad. Then the semantic category proposed
is its Kleisli category. Theese categories are monoidal and have a
revealing diagrammatic notation.

1.- Powerset endofunctor. This gives rise to the category of sets,
relations and cartesian product as monoidal operation. Coecke
semantincs results in montagovian hard facts as described above.
Coecke and Kissinger's new book [6] details the diagramatic language
particulars.
2.- Vector space monad (over the reals). Since the sets are finite,
the Kleisli category is that of finite dimensional real vector spaces.
That is properly Coecke's framework for computing sentence similarity.
Circuit diagrams are tensor networks where boxes are tensors and wires
are  contractions of specific indices.
3.- A monad in quantum computing is shown in [7], and quantumly
motivated semantics is specifically addressed by Coecke. The whole
book [8] discuss the connection though I haven't read it. Circuit
diagrams should be quantum circuits representing possibly unitary
process. Quantum amplitudes through measurement give rise to classical
probabilities.
4.- The Giry monad here results from the functor that produces all
formal convex linear combinations of the elements of a given set. The
Kleisli category is very interesting, having as maps probabilistic
mappings that under the hood are just conditional probabilities. This
maps allow a more user friendly understanding of Markov Chains, Markov
Decission Processes, HMMs, POMDPs, Naive Bayes classifiers and Kalman
filters. Circuit diagrams have to correspond to the factor diagrams
notation of bayesian networks [9], and the law of total probability
generalizes in bayesian networks to the linear algebra tensor network
calculations of the corresponding network (this can be shown in actual
bayesian network software).

A quote from mathematician Gian Carlo Rota [10]:

"The first lecture by Jack [Schwartz] I listened to was given in the
spring of 1954 in a seminar in functional analysis. A brilliant array
of lecturers had been expounding throughout the spring term on their
pet topics. Jack's lecture dealt with stochastic processes.
Probability was still a mysterious subject cultivated by a few
scattered mathematicians, and the expression "Markov chain" conveyed
more than a hint of mystery. Jack started his lecture with the words,
"A Markov chain is a generalization of a function." His perfect
motivation of the Markov property put the audience at ease. Graduate
students and instructors relaxed and followed his every word to the
end."

The thing I would research would be to use as semantic category that
of those generalized functions of the former quote and bullet 4 so
basically you replace word2vec vectors by probability distributions of
the words meaning something, connect a bayesian network from the CCG
parse and apply generalized total probability to obtain probabilized
booleans, i.e. a number 0 <= x <= 1 (instead of just a boolean as with
Montague semantics). That is, the probability that a sentence holds
depends on the distributions of its syntactically elementary
contituyents meaning something, and those distros are combined by
factors of a bayesian net with conditional independence relations that
respect and reflect the sentence syntax and have the local Markov
property. The factors are for words of complex syntactic cateogory (as
N/N...) and their attached tensors are multivariate conditional
probability distributions.

Hope this helps somehow. Kind regards,
Jesus.


[1] http://yoavartzi.com/pub/afz-tutorial.acl.2013.pdf
[2] http://www.cl.cam.ac.uk/~sc609/pubs/eacl14types.pdf
[3] http://homepages.inf.ed.ac.uk/s1049478/easyccg.html
[4] http://www.aclweb.org/anthology/D14-1107
[5] https://arxiv.org/abs/1607.01432
[6] ISBN 1108107710
[7] https://bram.westerbaan.name/kleisli.pdf
[8] ISBN 9780199646296
[9] http://helper.ipam.ucla.edu/publications/gss2012/gss2012_10799.pdf
[10] Indiscrete thoughts

On 4/2/17, Linas Vepstas <linasveps...@gmail.com> wrote:
> Hi Ben,
>
> On Sun, Apr 2, 2017 at 3:16 PM, Ben Goertzel <b...@goertzel.org> wrote:
>
>>  So e.g. if we find X+Y is roughly equal to Z in the domain
>> of semantic vectors,
>>
>
> But what Jesus is saying (and what we say in our paper, with all that
> fiddle-faddle about categories)  is precisely that while the concept of
> addition is kind-of-ish OK for meanings  it can be even better if replaced
> with the correct categorial generalization.
>
> That is, addition -- the plus sign --is a certain speciific morphism, and
> that this morphism,  the addition of vectors, has the unfortunate property
> of being commutative, whereas we know that language is non-commutative. The
> stuff  about pre-group grammars is all about identifying exactly which
> morphism it is that correctly generalizes the addition morphism.
>
> That addition is kind-of OK is why word2vec kind-of works. But I think we
> can do better.
>
> Unfortunately, the pressing needs of having to crunch data, and to write
> the code to crunch that data, prevents me from devoting enough time to this
> issue for at least a few more weeks or a month. I would very much like to
> clarify the theoretical situation here, but need to find a chunk of time
> that isn't taken up by email and various mundane tasks.
>
> --linas
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAFx29Pu6zK7MwbOuTPHcwOUuW9C6Wrhc9ZFxc5Kp3J4GkMtHkg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to