Thursday, February 20, 2003, 2:25:54 PM, Ben Goertzel wrote: BG> The basic situation can be thought of as follows.
<snip> Thanks, this does clarify things a lot. Your first statement of the problem did leave some things out though...but, perhaps unsurprisingly, I'm still a bit puzzled. I don't mean to nag, so if you don't have the time, just leave it -- perhaps someone else will volunteer as probability professor... BG> One thing that complicates the problem is that ,in some cases, as well as BG> inferring probabilities one hasn't been given, one may want to make BG> corrections to probabilities one HAS been given. For instance, sometimes BG> one may be given inconsistent information, and one has to choose which BG> information to accept. If I'm following this, this corresponds in your second statement to the random point selection leading to *approximations* of probabilities, i.e., we only have samples such as Cliff(Fat, Smelly, Slow, American, Sucks at Math) and Ben(Slender, Smelly, Fast, American, Math Geek) etc. from which the probabilities P(fat|american) etc. are derived. So don't we, in order to make assessments of the accuracy of the approximations need to know the number of samples taken and have some given confidence level of the randomness of the sampling process? Or is the randomness of sampling more or less a given, and we're dealing with n << t total samples, i.e. we've sampled (n/t) of the population? In that case, all probabilities we've inferred have the same initial "certainty quotient" which depends in a straightforward way on the ratio of n to t...? Or is the probability *itself* a factor in certainty, i.e. if only 1 in a 1,000 people have property X, then the number of people you sample who have property X is "more random" in a 100 person sample than if 1 in 2 people have that property. I.e. if the probability of having lung cancer is 1/300 and the probability of being male is 1/2 than if we have a sample of 100 people 48 male 2 lung cancer the 2 is less significant (informative) because it's so small, while the 48 "gives you more information" or "more certainty"... ? BG> For example, if you're told BG> P(male) = .5 BG> P(young|male) = .4 BG> P(young) = .1 BG> then something's gotta give, because the first two probabilities imply P(young) >>= .5*.4 = .2 Right, because out of a population of 1000 people 100 young 500 males 200 young males ====== ?? does not compute: 100 young or >=200 young?? So the sample size / population size and possibly the sampling method have a large influence here. BG> Novamente's probabilistic reasoning system handles this problem pretty well, BG> but one thing we're struggling with now is keeping this "correction of BG> errors in the premises" under control. If you let the system revise its BG> premises to correct errors (a necessity in an AGI context), then it can BG> easily get carried away in cycles of revising premises based on conclusions, BG> then revising conclusions based on the new premises, and so on in a chaotic BG> trajectory leading to meaningless inferred probabilities. Say we randomly sampled 100 people of the total 1,000 people to arrive at the above probabilities: this is what I don't get. In order to find the above probabilities wouldn't we have to have found, out of 100 samples that 10 are young 50 are males 20 are young males ?? We couldn't have. So are we separately sampling 100 samples 10 are young 50 are male and then sampling a *different* random set of n males of which 0.2n are young ?? Can you (or somebody) sketch me a scenario in which we arrive from some set of samples at "contradictory" probability estimates? BG> As I said before, this is a very simple incarnation of a problem that takes BG> a lot of other forms, more complex but posing the same essential challenge. >From everything you're saying above I get the sense that - there's *some* set of equations governing the *relationship between* the actual values of the probabilities and the estimated probabilities - these relationships depend on the number of samples and the estimated probabilities (estimated based on those samples), i.e. the certainty of the estimate P(X_i) depends both on the sample size relative to the population, and how many of the sample exhibited X_i (e.g. 50 males out of a 100 "tells you more certainly" about the probability of male than 1 lung cancer out of a 100 tells you about the probability of lung cancer) - you want to converge quickly (computationally cheaply) on some optimum estimate of those relationships (because they involve something nastily nonlinear, somehow) Somehow I see this ending up as finding a set a bell curves (i.e. their height, spread and optimum) for each estimate. That is to say I don't see *just* the probability as relevant but the probability distribution...if I sample 10 people, the curves are all "wider" than if I sample 100 people out of a 1,000 total. So the "movability" of the initial estimates/premises depends on sample sizes and number of hits within those sizes... Am I completely off-base? Am I saying stupidly obvious and unhelpful things? Stupidly obviously wrong things? Should I shut up and reread basic probability theory, and then some Kolmogorov, before possibly returning to this? -- Cliff ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]