On Dec 5, 2007 7:13 PM, Richard Loosemore <[EMAIL PROTECTED]> wrote:
>
> Vladimir Nesov wrote:
> > Richard,
> >
> > I'll try to summarize my solutions to these problems which allow to
> > use a network without need for explicit copying of instances (or any
> > other kind of explicit allocation of entities which are to correspond
> > to instances). (Although my model also requires ubiquitous induction
> > between nodes which disregards network structure.)
> >
> > Basic structure of network: network is 'spiking' in the sense that it
> > operates in real time and links between nodes have a delay. Input
> > nodes send in the network sensory data, output nodes read actions. All
> > links between nodes can shift over time and experience through
> > induction. Initial configuration specifies simple pathways from input
> > to output, shifting of links changes these pathways, making them more
> > intricate to reflect experience.
> >
> > Scene (as a graph which describes objects) is represented by active
> > nodes: node being active corresponds to feature being included in the
> > scene. Not all features present in the scene are active at the same
> > time, some of them can activate periodically, every several tacts or
> > more, and some other features can be represented by summarizing
> > simplified features (node 'apple' instead of 3D sketch of its
> > surface).
> >
> > Network edges (links) activate the nodes. If condition (configuration
> > of nodes from which link originates) for a link is satisfied, and link
> > is active, it activates the target node.
> >
> > Activation in the network follows a variation of Hebbian rule,
> > 'induction rule' (which is essential for mechanism of instance
> > representation): link becomes active (starts to activate its target
> > node) only if it observed that node to be activated after condition
> > for link was satisfied in a majority of cases (like 90% or more). So,
> > if some node is activated in a network, there are good reasons for
> > that, no blind association-seeking.
> >
> > Representation of instances. If scene contains multiple instances of
> > the same object (or pattern, say an apple), and these patterns are not
> > modified in it, there is no point in representing those instances
> > separately: all places at which instances are located ('instantiation
> > points', say places where apples lie or hang) refer to the same
> > pattern. The only problem is modification of instances in specific
> > instantiation points.
> >
> > This scene can be implemented by creating links from instantiation
> > points to nodes that represent the pattern. As a result, during
> > activation cycle of represented scene, activation of instantiation
> > points leads to activation of patterns (as there's only one pattern
> > for each instantiation point, so induction rule works in this
> > direction), but not in other direction (as there are many
> > instantiation points for the pattern, none of them will be a target of
> > a link originating from the pattern).
> >
> > This one-way activation results in a propagation of 'activation waves'
> > from instantiation points to the pattern, so that each wave 'outlines'
> > both pattern and instantiation point. These waves effectively
> > represent instances. If there's a modifier associated with specific
> > instantiation point, during an activation wave it will activate during
> > the same wave as pattern does, and as a result it can be applied to
> > it. As other instantiation points refer to the pattern 'by value',
> > pattern at those points won't change much.
> >
> > Also, this way of representing instances is central to extraction of
> > similarities: if several objects are similar, they will share some of
> > their nodes and as a result their structures will influence one
> > another, creating a pressure to extract a common pattern.
>
> I have questions at this point.
>
> Your notion of "instantiation point" sounds like what I would call an
> "instance node" which is created on the fly.

No, it's not that; I'll try to clarify using more detailed example.
Say, there are these apples (to which I referred to as 'pattern'),
which are all represented by single clump of nodes, the same as would
be used for a single apple. Instantiation points are actual objects
that in some sense 'hold' the apples on the scene, for example a
particular plate on which an apple lies. In the scene, there is (for
simplicity) only one plate, and there's always an apple that lies on
it. So, we can create a PLATE->APPLE link, and this link satisfies the
induction rule since whenever PLATE is encountered, there's an APPLE
on it. Here, PLATE is an instantiation point, and APPLE is a pattern.
If scene also contains an apple-tree BRANCH, on which there's also an
APPLE hanging, we can create a BRANCH->APPLE link. But we can't create
an APPLE->PLATE link, since in one of instantiation points (BRANCH),
PLATE is not there when APPLE is. Also, these links are short-term
things (as plates don't always have apples on them), but scene can be
stored long-term if they are duplicated on new nodes corresponding to
these nodes (in which case PLATE'->APPLE would be an universal rule
for PLATE', local copy of PLATE, unique to that scene with an apple).

Now that I put it this way, that new node that can store the relation
as episodic memory is probably what corresponds to your instance
nodes, but it doesn't figure into recall and scene manipulation
dynamics.

It's admittedly extremely sketchy, but it seems to scale well on more
elaborate examples, at least I haven't found something that can't be
naturally represented in it, and I believe I didn't stick to it
blindly, having enumerated different possibilities, then forgoing it
as useless intuition and returning to it when it happened to describe
new implementation-level technique (AND and NOT links with temporal
delays).


> There is nothing wrong with this in principle, I believe, but it all
> depends on the details of how these things are handled.  For example, it
> requires a *substantial* modification of the neural network idea to
> allow for the rapid formation of instance nodes, and that modification
> is so substantial that it would dominate the behavior of the system.  I
> don't know if you follow the colloquialism, but there is a sense in
> which the tail is wagging the dog:  the instance nodes are such an
> important mechanism that everything depends on the details of how they
> are handled.

Those short-term links (which is what corresponds in this context to
your instance nodes) are a part of unified learning dynamics. Network
is still largely static, but small number of links is created all the
time, and they only become active if there's at least some statistics
suggesting that induction rule holds, subsequently becoming more and
more long-term if rule continues to hold for longer and longer (so
that if regularity occurred reliably for some time, it's expected to
hold for some time in the future).

So, number of modifications is small compared to total size of network
(which is mostly dormant), and it even should somewhat work if all
formation of new links and nodes is suspended, almost normally if just
ultra-short-term links are enabled.


> So, to consider one or two of the details that you mention.  You would
> like there to be only a one-way connection between the generic node (do
> you call this the "pattern" node?) and the instance node (instantiation
> point?), so that the latter can contact the former, but not vice-versa.

Intention is not to distinguish instances and patterns, but to unify
them: those instantiation points are generally patterns themselves,
temporary instantiation by activation waves can happen hierarchically.
Essentially, it's a sparse representation of graph with explicit
copies of each instance which collapses similar substructures (in
particular, copies of the same pattern), and activation waves
temporarily extract parts of this graph to explicit form. Since all
operations are performed on active nodes, and activation waves consist
of active nodes, they temporarily uncompress this sparse
representation locally and allow modifications to work on current
part.

Links work as deductive, allowing to perform inference blindly by
activating everything that activates. Control mechanism is in
learning, not recall: only inferences that do hold are learned.
Patterns need to be isolated, otherwise each occurrence of a pattern
would bring up all scenes in which it was included.


>   Does this not contradict the data from psychology (if you care about
> that)?

To a point I do. I don't care about specialized subsystems, like
earliest stages of visual processing though.


> For instance, we are able to see a field of patterns, of
> different colors, and then when someone says the phrase "the green
> patterns" we find that the set of green patterns "jumps out at us" from
> the scene.  It is as if we did indeed have links from the generic
> concept [green pattern] to all the instances.

Let late stage of processing of phrase "the green pattern" be denoted
PHRASE, sample instantiation point in a filed FIELD_PT, original
pattern of greenness GREEN and similar pattern of greenness associated
with PHRASE, GREEN2. Then there are links FILED_PT->GREEN from each
point in the field that contains a green blob, and a link
PHRASE->GREEN2 that is triggered by phrase. Patterns GREEN and GREEN2
are very similar and share many nodes, so when phrase is uttered and
GREEN2 activates, it influences structure of GREEN, and this influence
can be seen through FIELD_PT->GREEN links when activation waves
originating in FIELD_PT highlight it (it requires at least
ultra-short-term links to support change of structure of GREEN between
influence from GREEN2 and observation from FIELD_PT). As a result of
observation, changed structure of GREEN is more directly attached to
FIELD_PT, which is a kind of conceptual slippage, as I use the term in
my model.


> Another question: what do we do in a situation where we see a field of
> grass, and think about the concept [grass blade]?  Are there individual
> instances for each grass blade?  Are all of these linked to the generic
> concept of [grass blade]?

There is only one [glass blade] which is referenced separately from
[field-of-grass] and from [thought-about-grass-blade]. Certainly it's
an oversimplification: there are actually many kinds of grass blades
(or any other pattern) which can be assembled from subpatterns in a
combinatorial manner. Process of extraction of these subpatterns is
supported by sparse representation, substitution of one subpattern
with another ('conceptual slippage') corresponds to variation of
properties of specific instances and can be influenced by context, in
particular to fit an object on an appropriate place in the scene with
appropriate orientation and other properties. Activation waves allow
these details to be left without explicit representation most of the
time, being activated only as wave propagates from nodes that define
context that influences choice of there properties to a generic node
that represent core pattern ('grass blade').


> I have no big problems with your model so far (if I understand it
> correctly, which I am not sure I do) but it sits at the level of what I
> would call the beginnings of a framework, rather than a complete model.

And it's intended as such (most of what I wrote about in this thread):
these intuitive pictures should guide implementation so that it
provides capabilities described by them. Only recently I started
actually implementing it, when found a technique (context-sensitive
links) that seems to fit it and is simple enough; you can remember my
attempts to describe it before that when I struggled with explicit
local contexts (sets of nodes).


> My goal is to build a complete framework, and I have been working on
> this for many years.  It is similar to the direction you are heading,
> but goes further along its particular path.

Alas, you are not sharing the details!


> > Creation of new nodes. Each new node during a creation phase
> > corresponds to an existing node ('original node') in the network.
> > During this phase (which isn't long), each activated link that
> > connects to original node (both incoming and outgoing connections) is
> > copied so that in a copy original node is substituted by a new node.
> > As a result, new node will be active in situations in this original
> > node activated during creation of the new node. New node can represent
> > episodic memory or more specific subcategory of category represented
> > by original node. Initially, new node doesn't influence behavior of
> > the system (as it's activated in a subset of tacts in which original
> > node can activate), but because of this difference it can obtain
> > inductive links different from those that fit original node.
> >
>
> Again, this is broadly along the same lines that I think.
>
> What is different is that I see many, many possible ways to get these
> new-node creation mechanisms to work (and ditto for other mechanisms
> like the instance nodes, etc.) and I feel it is extremely problematic to
> focus on just one mechanism and say that THIS is the one I will
> implement because .... "I think it feels like a good idea".

I thought about node creation for some time, and had to reject many
other possibilities. For example, I considered special nodes that
embody context-sensitivity, but had to change that as node growth had
to be controlled. It lead to context-sensitive links, which have to
have an existing target node, so that link is only created to
reinforce an existing regularity. New nodes have to be able to
contribute to action, so they shouldn't be created isolated and should
be equipped with outgoing links from the start. Their semantics has to
come from somewhere and should be drawn from existing regularities.
There should be a technique to represent episodic knowledge. Useless
nodes should die off. These considerations and more converge more or
less unambiguously on creation rule I described.

If you have other viable alternatives, please hint on their specifics.


> The reason I think this is a problem is that these mechanisms have
> system-wide consequences (i.e. they give rise to global behaviors) that
> are not necessarily obvious from the definition of the mechanism, so we
> need to build a simulation to find out what those mechanisms *really* do
> when they are put together and allowed to interact.

Yes, it will become a problem once I start teaching the system in a
way that will enable techniques required to support Hebbian learning
to develop (some form of internal recitation loop, likely on several
levels). Early stages of learning are not as robust as those
accessible to a system with high-level reflection of incoming
information and may need to be selected carefully as to develop these
subsystems. Still I think that the ability of system to learn
arbitrarily detailed and flexible procedures is what it takes, even if
early stages of teaching would be somewhat dodgy.


> Rather than decide that I think one particular set of mechanisms is the
> BEST one ahead of time, therefore, my strategy is to find ways to start
> with a general framework, THEN create large numbers of
> mechanism-candidates for each one of these issues (instance node format,
> new node creation, etc etc), then explore the consequences of these
> choices empirically.  Doing large numbers of exploratory simulations
> with different choices for the mechanisms.

It might be a right thing to do, but in my experience it pays off to
start with a little (not too little) of architecture, and then clean
up interfaces and generalize around the points where multiple
implementations of the same fragment of abstraction level are actually
being written. Otherwise system ends up consisting of huge palette of
artfully configurable modules which you can actually use to implement
'Hello world' in several lines, but have to write as much of code that
actually does something useful...


> I can show you a paper of mine in which I describe my framework in a
> little more detail.  If you have a paper in which you have described
> yours in more detail I would be interested to read it.

If you are talking about paper you wrote with Trevor Harley, you sent
it to me a while ago and I read it. It clarified some ambiguity about
your model, but I didn't find there anything new, as you already
mentioned the principle of allocation of concepts in cortical columns
on the list IIRC somewhere in spring, and paper doesn't seem to
introduce anything beyond this single principle (the way concepts
interacts if they are given this exclusive role during activation is
almost inevitable). I find this principle quite magical, as AFAIK
there's no direct evidence that it's there, and because I don't see
fundamental problems with my way of representing instances which is
quite consistent with classical model on consensus level of detail
(and hence I see no need for a complex principle where simple one
seems to pass).

I'll probably write a paper about my model in January, when I'd have
more free time on my hands. It's dodgy, I tried to start couple of
times, but it requires more than I expect to describe with minimum
handwaving, and it's in a state of perpetual adjustment or change of
perspective on what's important.


-- 
Vladimir Nesov                            mailto:[EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=72650142-b4a842

Reply via email to