Just got on the list, would like to introduce my
approach to general AI:
Intelligence is not an empirically specific
ability, - we can learn anything. Therefore its method can only be derived from
functional definition of intelligence, not for or from any specific property of
inputs, Well, with one exception: the inputs can't be random within a given
coordinate system, if they are intelligence is useless.
Mainstream position of 'artificial intelligentsia'
seems to be that intelligence is a lot of different things, which I think is
largely responsible for the pathetic state of AI. Any general term is applicable
to different things, but there must be something in common, otherwise we
wouldn't use it to describe them all.
Expert Systems simply prepackage & reuse
human-generated knowledge, with no independent ability to generate new
knowledge.
Neural Nets imitate insufficiently understood brain processes,- a first functional intelligence produced by evolution, & as such, probably the least efficient possible. Problem-solving algorithms require first manually
describing a specific environment, which is what intelligence is really for. The
cognitive part of problem solving is discovering compressive patterns to
represent the subject/environment, solution itself is a brute-force search of
alternative projections of these patterns to maximize values.
Evolutionary algorithms utilize random changes of
the code, that could be useful to maximize a specific value. But I think the
real problem is to define what is it that we want to maximize.
In pattern recognition there has been no theoretically consistent way to value & encode match & resulting patterns, making hierarchical schemes & scaling impractical. Algorithmic/Computational learning theory, while
conceptually very close to my approach, doesn't seem to connect the purpose with
the method. 'Learning theorists', being mathematicians, consider all possible
cases & no method can serve them all. I'm only concerned with the real
world, characterized by spatio-temporal continuum. This 'specification' makes
the whole idea meaningful.
I define intelligence as an ability to produce expectations of future inputs through recognition & interactive projection of past inputs' patterns. This includes planning, which technically is a self-prediction. Pattern is a set of inputs the record of which can be compressively
replaced with a record of a vector: a value that converts new inputs into old
ones by any operation: identity, addition, multiplication, & so on.
Compression here is a reduction of recorded magnitude compared to the magnitude
of restorable inputs.
Compression accumulated over all inputs of a pattern determines its
predictive value, - the extent & the direction of its' projection. Net value
of a pattern is its predictive value minus average predictive value produced by
equivalent resources,- memory & operations. Predictive power of a system is
increased by deleting patterns of negative value.
Expanding range of search for compressive vectors(matches) & increasing
syntactic complexity of resulting patterns require resources affordable only to
patterns of corresponding predictive value. Thus, recorded patterns should form
a hierarchy of compression / search range & syntactic complexity, with each
level divided into fixed-range search units.
Higher levels have fewer units with greater range, ultimately leading to a
single top-level unit with a global range of search. Patterns are elevated as
long as their predictive value exceeds required resources of a destination
level, & deleted when shift of inputs places them beyond their range of
search.
A pattern can be predictive only if derived & projected within a given metric system. The only metrics we can use a priori is Spatio-Temporal: in our environment physical causality is S-T continuous, therefore there is a strong compressibility between S-T'adjacent elements, which declines with the distance/delay. That means search must proceed in S-T order, starting from minimal
complexity S-T adjacent inputs, such as raw multimedia: single-variable inputs
(pixels of a video), ordered initially in one dimension (a scan line). Within
S-T unit a search can be re-ordered by variable types of sufficient predictive
value.
This may seem primitive, but that's precisely the point, all the complexity
must be learned. Generalization is a reduction, & an intelligent system that
starts with higher-level, especially language-level data, can't recreate the
semantic context, that is generalized real-world 4-dimensional patterns that
most words in the language stand for.
I don't believe in combining different methods because cognition deals with
the unknown, - we can't a priori split it into different areas, except to the
extent that they're sensor/hardware specific, or levels, except that syntactic
complexity of inputs should be sequentially increased.
Any methodological differentiation should be not in a kind but in degree,
which must be determined by an intelligent system rather than programmed into
it. The 'types' of inputs are ultimately the types of empirical objects they
represent, an intelligence should be able to learn them on its own.
Scalability is of the essence, & a truly scalable method should start
from the beginning: the limit of resolution of raw sensory data, from which all
higher data types are ultimately derived. I currently work on a procedure that
would initiate cognition by processing a digitized video, here is a (very) rough
outline:
Original inputs representing pixels of a horizontal scan line consist of 2 variables: brightness(B) & coordinate(C), indicating the order of input. Each input is compared only with previous input of the same line, because range & dimensionality of search should expand in proportion with pattern's previous recurrence, which originally is 1 & 0D. After comparison two more variables are added: length(L) &
compression(R), indicating cumulative match of, respectively, dC & B. In
case of match the difference(dB) is preserved & compared with that from
previous comparison forming higher derivatives. Multi-derivative pattern
results: C,L,B,R (C,L,dB,R (C,L,ddB,R... In case of miss a pattern is terminated
& contrast( a negative pattern) is formed.
On the next level 1-dimensional patterns are compared to those of a
previous scan line with overlapping horizontal C+L. The same process of encoding
repeats here: 2D Ps are formed by adding Vert.Coord & Length, as well as R
& derivatives for each variable of 1D P.
Next, 2D Ps are compared with those of adjacent confocal 2D Frames of a
given Base & Angle with overlapping Horiz & Vert C+L, forming 3D
Ps with a new set of Depth C & L, R, & derivative
patterns.
Continuous search is completed by forming Temporal 4D Ps from matching 3D Ps of adjacent frames of video with overlapping Horiz, Vert, & Depth C+L. Subsequently, completed 4D Ps with above-average compression are compared over discontinuity on higher levels of search with expanding range of Coords, increasing degree of compression & syntactic/semantic complexity, eventually achieving & surpassing that of natural languages. Would appreciate any comments.
Boris.
|
- RE: [agi] Intelligence by definition Boris Kazachenko
- RE: [agi] Intelligence by definition Ben Goertzel
- Re: [agi] Intelligence by definition Boris Kazachenko
- RE: [agi] Intelligence by definition Ben Goertzel
- Re: [agi] Intelligence by definition Boris Kazachenko
- RE: [agi] Intelligence by definitio... Ben Goertzel
- [agi] Fearless prediction. Alan Grimes
- Re: [agi] Intelligence by defin... Boris Kazachenko
- Re: [agi] Intelligence by definition RSbriggs
- [agi] Chess Master Theory Of AGI. Mike Deering
- Re: [agi] Chess Master Theory Of AGI. Cliff Stabbert