On 11 March 2010 14:21, 'Dragon' Dave McKee <[email protected]> wrote:

Hmmm, I'm tired after a training this morning, so I haven't had time
to really think about this, but I'm not sure that your approach works.

> So there's N people with a given full name ('Stefan Magdalinski', for 
> example).

... something we don't actually know - unless someone has the relevant
data - but maybe we can guess it.

> There's L registered lobbyists, and V whitehouse visitors.

Ah, but the key thing here is you don't know what V is. You have a
list of names, but you don't know which of them are distinct visitors
- that's part of what we want to be able to estimate.

What you actually want to know is the probability distribution of
visits by a particular lobbyist. Eg suppose you know I am a registered
lobbyist and there are 11 "Francis Davey" log entries. Call the number
of times I visited A. You know that:

P(A<0) = P(A>11) = 0

what you want to work out is what the distribution of A is *given* the
data you have. How many of those visit are me?

> (the population of america is P, so there's a L/P chance of being a lobbyist,

Agreed - take a random person in the US, their chance of being a
lobbyist is close to L/P.

> and a V/P chance of being a visitor, unless there's a way of reducing this?)

Sadly *this* we can't say, since we don't know how many distinct
people visited (and anyway there are problems with assuming that a
person who isn't a lobbyist has the same chance of being a visitor as
someone who is - this will certainly be false).

.... and I'm afraid it all falls down from there.

I can't see the list of lobbyists names though.

-- 
Francis Davey

_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to