Which reward function are you referring to? reward for what specifically? 
When the cognitive system is pursuing multiple parallel goals at different 
levels of abstraction, which specific function are you controlling? 

~PM

Date: Tue, 29 Jan 2013 14:58:30 -0600
Subject: Re: [agi] Robots and Slavery
From: [email protected]
To: [email protected]

People have formative years because it's in their genetic best interest to stop 
exploring new options and start exploiting known ones, due to a limited 
lifetime and the need to reliably reproduce for themselves. We can't reset that 
explore/exploit trade-off in people (yet), but in machines there's no reason to 
make that control inaccessible to ourselves. It's a good thing machines aren't 
children.

In most RL algorithms, there are two key system parameters that allow learning 
to be modulated: the reward expectation learning/update rate, and the 
exploration rate. Raising these two values causes the system to learn faster 
but make more mistakes. Lowering them causes the system to be more stable but 
learn more slowly. An analysis would have to be done to determine whether the 
costs/dangers of a system's behavioral aberrations due to a misshapen reward 
function outweigh the costs/dangers of raising the learning & exploration rates 
while the system relearns the reward function after modification. 


On Tue, Jan 29, 2013 at 2:39 PM, Piaget Modeler <[email protected]> 
wrote:





People procreate.   And for a certain period of time they have influence over 
their creation (children).But then, children grow up and take responsibility 
for their own lives, and we no longer have control.
It's in those formative years that you have influence. 
Similarly, when you create developmental AI, you have some period during the 
formative years to influence the later behavior of the cognitive system.  But 
you don't have control, and you wouldn't 
expect to either.   That's why rights are important.

Date: Tue, 29 Jan 2013 14:11:05 -0600
Subject: Re: [agi] Robots and Slavery

From: [email protected]
To: [email protected]

If we can build a system capable of determining the value of concepts 
automatically, we can build a system that can readjust those values 
automatically, too. If that's not feasible for the design, it's an unsafe 
design, and you shouldn't have the expectation that it will act as you intend 
it to. You wouldn't get in a car without a steering wheel, would you? Would you 
trust an even more powerful and dangerous machine to just do the right thing, 
with no controls? Let's not build any machines of this uncontrollable nature.





                                          


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to