Thursday, February 20, 2003, 2:25:54 PM, Ben Goertzel wrote:

BG> The basic situation can be thought of as follows.

<snip>

Thanks, this does clarify things a lot.  Your first statement of the
problem did leave some things out though...but, perhaps
unsurprisingly, I'm still a bit puzzled.

I don't mean to nag, so if you don't have the time, just leave it --
perhaps someone else will volunteer as probability professor...

BG> One thing that complicates the problem is that ,in some cases, as well as
BG> inferring probabilities one hasn't been given, one may want to make
BG> corrections to probabilities one HAS been given.  For instance, sometimes
BG> one may be given inconsistent information, and one has to choose which
BG> information to accept.

If I'm following this, this corresponds in your second statement to
the random point selection leading to *approximations* of
probabilities, i.e., we only have samples such as
  Cliff(Fat, Smelly, Slow, American, Sucks at Math)
and
  Ben(Slender, Smelly, Fast, American, Math Geek)
etc.

from which the probabilities P(fat|american) etc. are derived.

So don't we, in order to make assessments of the accuracy of the
approximations need to know the number of samples taken and have some
given confidence level of the randomness of the sampling process?

Or is the randomness of sampling more or less a given, and we're
dealing with n << t total samples, i.e. we've sampled (n/t) of the
population?

In that case, all probabilities we've inferred have the same initial
"certainty quotient" which depends in a straightforward way on the
ratio of n to t...?

Or is the probability *itself* a factor in certainty, i.e. if only 1
in a 1,000 people have property X, then the number of people you
sample who have property X is "more random" in a 100 person sample
than if 1 in 2 people have that property. 

I.e. if the probability of having lung cancer is 1/300 and the
probability of being male is 1/2 than if we have a sample of
  100 people
   48 male
    2 lung cancer

the 2 is less significant (informative) because it's so small, while
the 48 "gives you more information" or "more certainty"... ?

BG> For example, if you're told

BG> P(male) = .5
BG> P(young|male) = .4
BG> P(young) = .1

BG> then something's gotta give, because the first two probabilities imply
P(young) >>= .5*.4 = .2

Right, because out of a population of
  1000 people
   100 young
   500 males
   200 young males
======
    ?? does not compute: 100 young or >=200 young??

So the sample size / population size and possibly the sampling method
have a large influence here.

BG> Novamente's probabilistic reasoning system handles this problem pretty well,
BG> but one thing we're struggling with now is keeping this "correction of
BG> errors in the premises" under control.  If you let the system revise its
BG> premises to correct errors (a necessity in an AGI context), then it can
BG> easily get carried away in cycles of revising premises based on conclusions,
BG> then revising conclusions based on the new premises, and so on in a chaotic
BG> trajectory leading to meaningless inferred probabilities.

Say we randomly sampled 100 people of the total 1,000 people to arrive
at the above probabilities: this is what I don't get.  In order to
find the above probabilities wouldn't we have to have found, out of
  100 samples      that
   10 are young
   50 are males
   20 are young males
??  We couldn't have.

So are we separately sampling
  100 samples
   10 are young
   50 are male

and then sampling a *different* random set of
     n  males  of which
  0.2n  are young
??

Can you (or somebody) sketch me a scenario in which we arrive from
some set of samples at "contradictory" probability estimates?
   
BG> As I said before, this is a very simple incarnation of a problem that takes
BG> a lot of other forms, more complex but posing the same essential challenge.

>From everything you're saying above I get the sense that

- there's *some* set of equations governing the *relationship
  between* the actual values of the probabilities and the estimated
  probabilities
- these relationships depend on the number of samples and the
  estimated probabilities (estimated based on those samples),
  i.e. the certainty of the estimate P(X_i) depends both on the
  sample size relative to the population, and how many of the sample
  exhibited X_i (e.g. 50 males out of a 100 "tells you more
  certainly" about the probability of male than 1 lung cancer out of
  a 100 tells you about the probability of lung cancer)
- you want to converge quickly (computationally cheaply) on some
  optimum estimate of those relationships (because they involve
  something nastily nonlinear, somehow)

Somehow I see this ending up as finding a set a bell curves (i.e.
their height, spread and optimum) for each estimate.  That is to say I
don't see *just* the probability as relevant but the probability
distribution...if I sample 10 people, the curves are all "wider" than
if I sample 100 people out of a 1,000 total.

So the "movability" of the initial estimates/premises depends on
sample sizes and number of hits within those sizes...

Am I completely off-base?  Am I saying stupidly obvious and unhelpful
things? Stupidly obviously wrong things?  Should I shut up and reread
basic probability theory, and then some Kolmogorov, before possibly
returning to this? 


--
Cliff

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Reply via email to