[opencog-dev] Re: word2vec within openCog language learning?

Linas Vepstas Sun, 02 Apr 2017 12:01:49 -0700

My quick, informal gut-feel sense of this is that the right answer is to
replace "vector" part of word2vec by that *actual data structure* that
*actually occurs in language".  It's kind of hard to explain how to do
this, but let me give it a whirl.

Note how vectors are "symmetric", in the sense that the dot-product of
vector A and B is the same as that for B and A.  Now go the the wikipedia
article for "pregroup grammar", and note the example about half-way down,
talking about left and right inverses. Notice that the left and the right
inverses are NOT the same.  Language simply does not have the symmetry
properties of vectors.  There are *some* similarities between vectors and
language, and that is why tricks like word2vec partly work.   But they
break down because they average together things that should not be averaged
together.

Another, better, different replacement that kind-of-ish leeps some of the
word-2-vec ideas would be to encode portions of the "subgraph isomorphism
problem" into vectors.  word2vec uses only the very very simplest
subgraphs: the pairs -- whereas we could use more complex graphs than
simply the pairs.

How can you "encode" a subgraph that is more complex than a word-pair, and
can still be shoved into a vector?  Well, you could, for example, state
what nodes in the subgraph are connected to what other nodes. How could one
do that .. oh hey... that's the LG disjunct!

So the language learning project *already* contains a word2vec-like stage
in it...and its a critical stage of the project, and its the one that Rohit
almost reached, a few years back.

--linas

On Sun, Mar 26, 2017 at 7:44 PM, Ben Goertzel <b...@goertzel.org> wrote:

> Linas,
>
> I thought a bit about how to use a modified version of the word2vec
> idea in our language learning pipeline...
>
> I'm thinking about the Skip-gram model of word2vec, as summarized
> informally e.g. here
>
> http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
>
> Following up the suggestion you made in Addis in our chat with
> Masresha, I'm thinking to replace the "adjacent word-pairs" used in
> word2vec with "word-pairs that are adjacent in the parse tree" (where
> e.g. the parse tree may be the max-weight spanning tree in our
> language learning algorithm)....
>
> This would still produce a vector just like word2vec does, via the
> hidden layer of the NN ... but the vector would likely be more
> meaningful than a typical word2vec vector...
>
> What would the purpose of this be, in the context of our language
> learning algorithm?  The purpose would be that clustering should work
> better on the word2vec vectors than on the raw-er data regarding "word
> co-occurrence in parse trees."   At least, that seems plausible, since
> clustering on word2vec vectors generally works better than on
> co-occurrence vectors
>
> This would be something that Masresha or someone else in Addis could
> work on, I think...
>
> We can discuss at the office this week...
>
> ben
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> “Our first mothers and fathers … were endowed with intelligence; they
> saw and instantly they could see far … they succeeded in knowing all
> that there is in the world. When they looked, instantly they saw all
> around them, and they contemplated in turn the arch of heaven and the
> round face of the earth. … Great was their wisdom …. They were able to
> know all....
>
> But the Creator and the Maker did not hear this with pleasure. … ‘Are
> they not by nature simple creatures of our making? Must they also be
> gods? … What if they do not reproduce and multiply?’
>
> Then the Heart of Heaven blew mist into their eyes, which clouded
> their sight as when a mirror is breathed upon. Their eyes were covered
> and they could see only what was close, only that was clear to them.”
>
> — Popol Vuh (holy book of the ancient Mayas)
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35_naiAdAprevJRqi%3DvOhrhB__8Zwe%3DZ-aA_BZQ4T6Bew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: word2vec within openCog language learning?

Reply via email to