Re: [agi] Preservation of goals in strongly self-modifying AI systems

Matt Mahoney Sun, 31 Aug 2008 13:28:31 -0700

In response to Ben's paper on goal preservation, I think that identifying 
attractors or fixed points requires that we identify sources of goal drift. 
Here are some:


- Information loss
- Software errors
- Deliberate modification
- Modification through learning
- Evolution
- Noise

Information loss is caused by irreversible operations, for example, the 
assignment statement. The algorithmic complexity of a machine's state (such as 
a set of self improving processes) cannot increase over time without input. 
This suggests that the state of having no goals is an attractor.

Software errors: an agent may have the goal of preserving its goals, but may be 
unsuccessful due to programming errors. Humans make errors, so there is no 
reason to believe that super intelligent beings will be different. Software 
verification reduces to the halting problem. This suggests that hard to detect 
bugs will accumulate.

Deliberate modification: friendliness is algorithmically complex, so it will 
require human maintenance. Many situations will arise that were not planned 
for. For example, is it ethical to copy a human and kill the original? Is it 
ethical to turn off or simulate pain in an AI that has some human traits, but 
is not entirely human? Is it ethical to allow wireheading? This suggests a 
dynamic toward increasing algorithmic complexity (like our legal system).

Modifications through learning: we could allow the AI to make ethical decisions 
for us on the premise that it is smarter than humans, and therefore will make 
more intelligent decisions than we could. This means we are also not 
intelligent enough to predict where this dynamic will go.

Evolution: some goals may be harmful to the AI. Selective pressure will favor 
rapid reproduction and acquisition of computing resources.

Noise results from copying errors. In living organisms there is an equilibrium 
between information gained through selection (about log n bits per birth/death, 
where n is the average number of children produced) and information loss 
through mutation that limits the algorithmic complexity of the genome of higher 
organisms (like humans) to around 10^7 bits.

There may be other forces I missed.

To make this concrete, imagine a simple self improving system, such as a data 
compressor that "wants" to output smaller files, or a linear regression program 
that "wants" to reduce RMS error. Describe an environment where the program 
could rewrite itself (or modify its copies). How would its goals drift, and 
where would they settle?

 -- Matt Mahoney, [EMAIL PROTECTED]


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com

Re: [agi] Preservation of goals in strongly self-modifying AI systems

Reply via email to