Re: [agi] Re: [singularity] Motivational Systems that are stable
Ben, I guess the issue I have with your critique is that you say that I have given no details, no rigorous argument, just handwaving, etc. But you are being contradictory: on the one hand you say that the proposal is vague/underspecified/does not give any arguments but then having said that, you go on to make specific criticisms and say that it is wrong on this or that point. I don't think you can have it both ways. Either you don't see an argument, and rest your case, or you do see an argument and want to critique it. You are trying to do both: you repeatedly make broad accusations about the quality of the proposal (some very hand-wavy, intuitive suggestions, you have not given any sort of rigorous argument, ... your intuitive suggestions..., you did not give any details as to why you think your proposal will 'work', etc. etc.), but then go on to make specific points about what is wrong with it. Now, if the specific points you make were valid criticisms, I could perhaps overlook the inconsistency and just address the criticisms. But that is exactly what I just did, and your specific criticisms, as I explained in the last message, were mostly about issues that had nothing to do with the general class of architectures I proposed, but only with weird cases or weird issues that had no bearing on my case. Since you just dropped most of those issues (except one, which I will address in a moment), I must assume that you accept that I have given a good reply to each of them. But instead of conceding that the argument I gave must therefore have some merit, you repeat -- even more insistently than before -- that there is nothing in the argument, that it is all just vague handwaving etc. No fair! This kind of response: - Your argument is either too vague or I don't understand it. Would be fine, and I would just try to clarify it in the future. But this response: - This is all just handwaving, with no details and no argument. - It is also a wrong argument, for these reasons: - [Reasons that are mostly just handwaving or irrelevant]. Is not so good. * I will say something about the specific point you make about my claim that as time goes on the system will check new ideas against previous ones to make sure that new ones are consistent with ALL the old ones, so therefore it will become more and more stable. What you have raised is a minor technical issue, together with some confusion about what exactly I meant: The ideas being checked against all previous ideas are *not* the incoming general learned concepts (cup, salt, cricket, democracy, sneezes. etc.) but the concepts related to planned actions and the system's base of moral/ethical/motivational concerns. Broadly speaking, it is when there is a new perhaps I should do this ... idea that the comparison starts. I did actually say this, but it was a little obscurely worded. Now, when I said checked for consistency against all previous ideas I was speaking rather loosely (my bad). Obviously I would not do this by an exhaustive comparison [please: I don't need to have it explained to me that this is O(n^^2)! :-) ]. The mechanism would work something like a parallel terraced scan: issues are represented at different levels of granularity, and if any kind of inconsistency is detected at one of the high (low-granularity) levels, it provokes a focussing on the problem and an elaboration of everything involved in the idea, which then can bring in lots more consideration, potentially resulting in a complete comparison on that one issue. In addition, but the system would use various other (monte-carlo-esque) techniques for taking random looks at the implications of some issue, to catch problems that might not get past the top level scan. Specific example. The system thinks that maybe selling its mother into the white slave trade is a good way to make money. But this very idea causes simple associations with [white slave trade] to kick in (for example [misery], [brutality], [betrayal], and so on). These simple associations get connected with [mother] and in a moment the system finds that the concept [unhappy mother] sends a big fat negative signal back to the motivational system, waking up the module that is responsible for the [social group attachment] motivation. Pretty soon this kicks in a full-scale reexamination of the entire idea, and when examined in detail it is found to be inconsistent with the system's prime motivations. So although you made a reasonable point, this is not a technical difficulty that cannot be handled easily. I note that you did anticipate this reply, when you said Some heuristic shortcuts must be used to decrease the number of comparisons, and such heuristics introduce the possibility of error..., and then also The kind of distributed system you are describing seems NOT to solve the
Re: Re: [agi] Re: [singularity] Motivational Systems that are stable
Hi Richard, Let me go back to start of this dialogue... Ben Goertzel wrote: Loosemore wrote: The motivational system of some types of AI (the types you would classify as tainted by complexity) can be made so reliable that the likelihood of them becoming unfriendly would be similar to the likelihood of the molecules of an Ideal Gas suddenly deciding to split into two groups and head for opposite ends of their container. Wow! This is a vey strong hypothesis I really doubt this kind of certainty is possible for any AI with radically increasing intelligence ... let alone a complex-system-type AI with highly indeterminate internals... I don't expect you to have a proof for this assertion, but do you have an argument at all? Your subsequent responses have shown that you do have an argument, but not anything close to a proof. And, your argument has not convinced me, so far. Parts of it seem vague to me, but based on my limited understanding of your argument, I am far from convinced that AI systems of the type you describe, under conditions of radically improving intelligence, can be made so reliable that the likelihood of them becoming unfriendly would be similar to the likelihood of the molecules of an Ideal Gas suddenly deciding to split into two groups and head for opposite ends of their container. At this point, my judgment is that carrying on this dialogue further is not the best expenditure of my time. Your emails are long and complex mixtures of vague and precise statements, and it takes a long time for me to read them and respond to them with even a moderate level of care. I remain interested in your ideas and if you write a paper or book on your ideas I will read it as my schedule permits. But I will now opt out of this email thread. Thanks, Ben On 10/30/06, Richard Loosemore [EMAIL PROTECTED] wrote: Ben, I guess the issue I have with your critique is that you say that I have given no details, no rigorous argument, just handwaving, etc. But you are being contradictory: on the one hand you say that the proposal is vague/underspecified/does not give any arguments but then having said that, you go on to make specific criticisms and say that it is wrong on this or that point. I don't think you can have it both ways. Either you don't see an argument, and rest your case, or you do see an argument and want to critique it. You are trying to do both: you repeatedly make broad accusations about the quality of the proposal (some very hand-wavy, intuitive suggestions, you have not given any sort of rigorous argument, ... your intuitive suggestions..., you did not give any details as to why you think your proposal will 'work', etc. etc.), but then go on to make specific points about what is wrong with it. Now, if the specific points you make were valid criticisms, I could perhaps overlook the inconsistency and just address the criticisms. But that is exactly what I just did, and your specific criticisms, as I explained in the last message, were mostly about issues that had nothing to do with the general class of architectures I proposed, but only with weird cases or weird issues that had no bearing on my case. Since you just dropped most of those issues (except one, which I will address in a moment), I must assume that you accept that I have given a good reply to each of them. But instead of conceding that the argument I gave must therefore have some merit, you repeat -- even more insistently than before -- that there is nothing in the argument, that it is all just vague handwaving etc. No fair! This kind of response: - Your argument is either too vague or I don't understand it. Would be fine, and I would just try to clarify it in the future. But this response: - This is all just handwaving, with no details and no argument. - It is also a wrong argument, for these reasons: - [Reasons that are mostly just handwaving or irrelevant]. Is not so good. * I will say something about the specific point you make about my claim that as time goes on the system will check new ideas against previous ones to make sure that new ones are consistent with ALL the old ones, so therefore it will become more and more stable. What you have raised is a minor technical issue, together with some confusion about what exactly I meant: The ideas being checked against all previous ideas are *not* the incoming general learned concepts (cup, salt, cricket, democracy, sneezes. etc.) but the concepts related to planned actions and the system's base of moral/ethical/motivational concerns. Broadly speaking, it is when there is a new perhaps I should do this ... idea that the comparison starts. I did actually say this, but it was a little obscurely worded. Now, when I said checked for consistency against all previous ideas I was speaking rather loosely (my bad). Obviously I would not do this by an exhaustive comparison [please: I don't
Re: [agi] Re: [singularity] Motivational Systems that are stable
Hi, There is something about the gist of your response that seemed strange to me, but I think I have put my finger on it: I am proposing a general *class* of architectures for an AI-with-motivational-system. I am not saying that this is a specific instance (with all the details nailed down) of that architecture, but an entire class. an approach. However, as I explain in detail below, most of your criticisms are that there MIGHT be instances of that architecture that do not work. No. I don't see why there will be any instances of your architecture that do work (in the sense of providing guaranteeable Friendliness under conditions of radical, intelligence-increasing self-modification). And you have not given any sort of rigorous argument that such instances will exist Just some very hand-wavy, intuitive suggestions, centering on the notion that (to paraphrase) because there are a lot of constraints, a miracle happens ;-) I don't find your intuitive suggestions foolish or anything, just highly sketchy and unconvincing. I would say the same about Eliezer's attempt to make a Friendly AI architecture in his old, now-repudiated-by-him essay Creating a Friendly AI. A lot in CFAI seemed plausible to me , and the intuitive arguments were more fully fleshed out than your in your email (naturally, because it was an article, not an email) ... but in the end I felt unconvinced, and Eliezer eventually came to agree with me (though not on the best approach to fixing the problems)... In a radically self-improving AGI built according to your architecture, the set of constraints would constantly be increasing in number and complexity ... in a pattern based on stimuli from the environment as well as internal stimuli ... and it seems to me you have no way to guarantee based on the smaller **initial** set of constraints, that the eventual larger set of constraints is going to preserve Friendliness or any other criterion. On the contrary, this is a system that grows by adding new ideas whose motivatonal status must be consistent with ALL of the previous ones, and the longer the system is allowed to develop, the deeper the new ideas are constrained by the sum total of what has gone before. This does not sound realistic. Within realistic computational constraints, I don't see how an AI system is going to verify that each of its new ideas is consistent with all of its previous ideas. This is a specific issue that has required attention within the Novamente system. In Novamente, each new idea is specifically NOT required to be verified for consistency against all previous ideas existing in the system, because this would make the process of knowledge acquisition computationally intractable. Rather, it is checked for consistency against those other pieces of knowledge with which it directly interacts. If an inconsistency is noticed, in real-time, during the course of thought, then it is resolved (sometimes by a biased random decision, if there is not enough evidence to choose between two inconsistent alternatives; or sometimes, if the matter is important enough, by explicitly maintaining two inconsistent perspectives in the system, with separate labels, and an instruction to pay attention to resolving the inconsistency as more evidence comes in.) The kind of distributed system you are describing seems NOT to solve the computational problem of verifying the consistency of each new knowledge item with each other knowledge item. Thus: if the system has grown up and acquired a huge number of examples and ideas about what constitutes good behavior according to its internal system of values, then any new ideas about new values must, because of the way the system is designed, prove themselves by being compared against all of the old ones. If each idea must be compared against all other ideas, then cognition has order n^2 where n is the number of ideas. This is not workable. Some heuristic shortcuts must be used to decrease the number of comparisons, and such heuristics introduce the possibility of error... And I said ridiculously small chance advisedly: if 10,000 previous constraints apply to each new motivational idea, and if 9,900 of them say 'Hey, this is inconsistent with what I think is a good thing to do', then it doesn't have a snowball's chance in hell of getting accepted. THIS is the deep potential well I keep referring to. The problem, as I said, is posing a set of constraints that is both loose enough to allow innovative new behaviors, and tight enough to prevent the wrong behaviors... I maintain that we can, during early experimental work, understand the structure of the motivational system well enough to get it up to a threshold of acceptably friendly behavior, and that beyond that point its stability will be self-reinforcing, for the above reasons. Well, I hope so ;-) I don't rule out the possibility, but I don't feel you've argued for it convincingly,