Hi Eliezer, > > It looks like Williams' book is more about the perils of Asimov's > > Laws than about hard-wiring. As logical constraints, Asimov's Laws > > suffer from the grounding problem. Any analysis of brains as purely > > logical runs afoul of the grounding problem. Brains are statistical > > (or, if you prefer, "fuzzy"), and logic must emerge from statistical > > processes. That is, symbols must be grounded in sensory experience, > > reason and planning must be grounded in learning, and goals must be > > grounded in values. > > This solves a *small* portion of the Friendliness problem. It doesn't > solve all of it. > > There is more work to do even after you ground symbols in experience, > planning in learned models, and goals (what I would call "subgoals") in > values (what I would call "supergoals"). For example, Prime Intellect > *does* do reinforcement learning and, indeed, goes on evolving its > definitions of, for example, "human", as time goes on, yet Lawrence is > still locked out of the goal system editor and humanity is still stuck in > a pretty nightmarish system because Lawrence picked the *wrong* > reinforcement values and didn't give any thought about how to fix that. > Afterward, of course, Prime Intellect locked Lawrence out of editing the > reinforcement values, because that would have conflicted with the very > reinforcement values he wanted to edit. This also happens with the class > of system designs you propose. If "temporal credit assignment" solves > this problem I would like to know exactly why it does.
The temporal credit assignment problem is the problem whose solution causes reason and planning to emerge from learning, in order to simulate the world and hence predict the effect of actions on values. It isn't specifically about the problem you describe. I'll answer the question about Lawrence being locked out in my next set of paragraphs. > > Also, while I advocate hard-wiring certain values of intelligent > > machines, I also recognize that such machines will evolve (there > > is a section on "Evolving God" in my book). And as Ben says, once > > things evolve there can be no absolute guaratees. But I think > > that a machine whose primary values are for the happiness of all > > humans will not learn any behaviors to evolve against human > > interests. Ask any mother whether she would rewire her brain > > to want to eat her children. Designing machines with primary > > values for the happiness of all humans essentially defers their > > values to the values of humans, so that machine values will > > adapt to evolving circumstances as human values adapt. > > Erm... damn. I've been trying to be nice recently, but I can't think of > any way to phrase my criticism except "Basically we've got a vague magical > improvement force that fixes all the flaws in your system?" If you want to be nasty, you'll have to try harder than that. I think you've been studying friendliness so long you've internalized it. My approach is not magic. By making machine (I know you don't like that word, but I use it to mean artifact, and also use God to make it clear I'm not talking about can openers) values depend on human happiness, they are essentially deferred to human values. There can never be guarantees. So given that I have to trust something, I put my trust in the happiness expressed by all humans. In fact, I trust the expression of happiness by all humans a lot more than I trust any individual (e.g., Lawrence) to modify machine values. Lawrence may be a good guy, but lots of individuals aren't and I certainly won't trust a programmed set of criteria about which individuals to trust. > What kind of evolution? How does it work? What does it do? The world changes through human action, natural action, and in the future the actions of intelligent machines. Human happiness will change in response, and the machines will learn new behaviors based on world changes and human happiness changes. Furthermore, the mental and physical capabilities of the machines will change, giving it a broader array of actions for causing human happiness, and more accurate simulations for predicting human happiness. > Where does it go? That's the big question, isn't it? Who can say for sure where super-intelligent brains responding to the happiness of all humans will go. In my book I say the machines will simulate all humans and their interactions (except for those luddites who opt out). I say they will probably continue the human science program, drive by continuing human curiosity. They will probably work hard to reduce humans' natural xenophobia, which is the source of so much unhappiness. And for any party animals out there, there will probably be lots of really well produced low brow entertainment. > If you don't know where it ends up, then what forces determine the > trajectory and why do you trust them? If I have to trust anything, its the happiness of all humans. Its like politics. Benjamin Franklin said democracy is a terrible form of government, but its better than all the others. All the dictators who killed millions during the twentieth century at least thought they had good intentions. I don't trust good intentions. I trust collective decisions, expressed via votes or happiness. > Why doesn't your system shut off > the reinforcement mechanism on top-level goals for exactly the same reason > Prime Intellect locks Lawrence out of the goal system editor. Goals emerge from values. Goals will constantly evolve with the situation and with evolving human emotional responses. I want Lawrence and any individual (fill in the name of your favoriate villain) locked out from special control. > Why doesn't > your system wirehead on infinitely increasing the amount of > "reinforcement" by direct editing its own code? I don't completely understand your question (is there a typo?), but in computer systems there is no hard distinction between code and data (all data needs is an interpreter to become code). So a learning system essentially re-programs itself by learning new data. > We are talking about the fate of the human species here. I'm with you there. > Someone has to work out the nitty-gritty, not just to > implement the system, but to even know for any reason beyond pure > ungrounded hope that Friendliness *can* be made to work. I understand > that you *hope that* machines will evolve, and that you hope this will be > beneficial to humanity. Hope is not evidence. As it stands, using > reinforcement learning alone as a solution to Friendliness can be modeled > to malfunction in pretty much the same way Prime Intellect does. If you > have a world model for solving the temporal credit assignment problem, > exactly the same thing happens. That's the straightforward projection. > If evolution is supposed to fix this problem, you have to explain how. My approach isn't hope. Its more like the KISS (Keep It Simple, Stupid) principal. Of course, no one knows how to implement general reinforcement learning yet, and that won't be simple (there are folks on this list working hard at it, though). And robustly recogninzing expressions of human happiness and unhappiness will take some work, too. But the general principal is simple: defer machine values to human values, as expresses by human happiness. My approach is really a symbiosis between machines, which learn and simulate the world in ways that are way beyond human capabilities, and all of humanity, who supply the values by their expressions of happiness. Hence my interest in the Global Brain Group. Some of its members think super-intelligence will emerge from collective human interactions without any need for machines. I think that the limits of human communications limit the complexity of simulation and learning that human groups are capable of. But I think the symbiosis of machines and humans values will form a global brain. Cheers, Bill ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]