Dear Emanuele,
thank you again for all the very helpful clarifications!
As for the posterior probabilities not summing up to 1, I am afraid I cannot
help much, except to provide some further details. I may be very well doing
something wrong.
As it stands, the inverse logs of the log posterior probabilities by far do not
sum up to 1 (SUM=0.0622138449).
If it can be of any use, I have uploaded a table here:
https://dl.dropboxusercontent.com/u/58155846/abssom_36s_6cond_Baycvsvm_res.samples.xls
The table reports the log likelihoods and post.probs for the 203 partitions of
my dataset, as given by BayesConfusionHypothesis, and the calculated posterior
probabilities.
All the best,
Marco
------------------------------------------------------------------------
# Code as previously posted:
clfsvm = LinearCSVMC()
Baycvsvm = CrossValidation(clfsvm, NFoldPartitioner(), errorfx=None,
postproc=ChainNode((Confusion(labels=fds.UT), BayesConfusionHypothesis())))
Baycvsvm_res = Baycvsvm(fds)
------------------------------------------------------------------------
Date: Thu, 11 Jul 2013 11:06:03 +0200 From: Emanuele
Olivetti<[email protected]> To:
[email protected] Subject: Re: [pymvpa]
Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3
Message-ID:<[email protected]> Content-Type: text/plain;
charset=ISO-8859-1; format=flowed
Hi Marco,
Sorry, I missed your reply because of the change in the subject.
The posterior probabilities have to sum up to 1. If that is not the case,
then we should dig into the details.
There might be numerical instabilities in the computation of likelihoods and
posteriors because of the extremely low values involved, but I believe this
is unlikely because I did my best to avoid this problem. So my current best
guess is that the problem may derive from the use of cross-validation. In my
original formulation[0] I did not considered cross-validation (it is now
work in progress). Perhaps Michael (Hanke), who implemented the glue between
the algorithms [1] and PyMVPA, can comment on cross-validation.
With respect to the snippet you sent and according to here
https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/clfs/transerror.py I
confirm that you are getting the loglikelihood and the log of the
posteriors, as you said.
About the posterior probability of the most likely hypothesis being just
0.014, consider that your have many hypotheses, i.e. 203 (so a problem with 6
classes). If you adopted the uniform prior probability over all hypotheses,
i.e. p(H_i) = 1/203 = 0.00493, then then posterior probability of the most
likely one increased almost 3 times: 0.014 / 0.00493 = 2.84. This means that
the data are supporting that hypothesis more than you believed in your prior.
I don't have the full results of your analysis but your should check whether
you have a similar increase with other hypotheses or not.
About the Bayes factor>1, consider that different values of the Bayes Factor
have different interpretation. In Kass and Raftery (JASA 1995) or here
http://en.wikipedia.org/wiki/Bayes_factor#Interpretation you can find
commonly accepted guidelines for the interpretation of that value. So you
should look at your Bayes Factors according to that. If, for example, you
have values not much greater than 1, then the evidence supporting your most
likely hypothesis is weak.
Best,
Emanuele
[0]: http://dx.doi.org/10.1109/prni.2012.14 [1]:
https://github.com/emanuele/inference_with_classifiers
PS: yes the docstring may be improved. Consider submitting a pull request ;)
On 07/05/2013 12:30 PM, marco tettamanti wrote:
Dear Emanuele, sorry for the late reply, It took me a while until I could
get back to the data.
Thank you very much for the very helpful clarifications! Shouldn't the
BayesConfusionHypothesis documentation be updated to mention that also the
log posterior probabilities are calculated?
Can you just please confirm that given:
clfsvm = LinearCSVMC() Baycvsvm = CrossValidation(clfsvm,
NFoldPartitioner(), errorfx=None,
postproc=ChainNode((Confusion(labels=fds.UT),
BayesConfusionHypothesis()))) Baycvsvm_res = Baycvsvm(fds)
the 2 columns of values in 'Baycvsvm_res.samples', indeed correspond to,
respectively, the log likelihoods (1st column) and to the log posterior
probabilities (2nd column), as in:
print Baycvsvm_res.fa.stat ['log(p(C|H))' 'log(p(H|C))']
I have a couple of further questions: I thought from your reply that the
sum of all p(H_i | CM) should give 1, but this does not seem to be the case
for the inverse log values of the 2nd column. Or is it rather that the sum
of all p(H_i) should give 1?
Also, if the above is correct, and regarding my data specifically: over
203 possible partitions, the most likely hypothesis has a Bayes factor>1
over all competing hypotheses, which I guess should constitute sufficient
evidence to support it. However, the posterior probability of the most
likely hypothesis seems quite small (0.014). Is this something to be
expected?
Thank you a lot again and best wishes, Marco
Date: Tue, 25 Jun 2013 08:48:34 -0400 From: Emanuele
Olivetti<[email protected]> To:
[email protected] Subject: Re: [pymvpa]
BayesConfusionHypothesis Message-ID:<[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear Marco,
Sorry for the late reply, I'm traveling during these days.
BayesConfusionHypothesis, as default, computes the posterior
probabilities of each hypothesis tested on the confusion matrix. As you
correctly report, there is one hypothesis for each possible partition of
the set of the class labels. For example for three class labels, (A,B,C),
there are 5 possible partitions: H_1=((A),(B),(C)), H_2=((A,B),(C)),
H_3=((A,C),(B)), H_4=((A),(B,C)), H_5=((A,B,C)).
The posterior probability of each hypothesis is computed in the usual way
(let CM be the confusion matrix):
p(H_i | CM) = p(CM | H_i) * p(H_i) / (sum_j p(CM | H_j) * p(H_j))
where p(H_i) is the prior probability of each hypothesis and p(CM | H_i)
is the (integrated) likelihood of each hypothesis. The default value for
p(H_j) is p(H_i) = 1/(number of hypotheses), i.e. no hypothesis is
preferred. You can specify a different one from the "prior_Hs" parameter
of BayesConfusionHypothesis.
The measures that are popped out by BayesConfusionHypothesis, i.e. the
posterior probabilities of each hypothesis, quantify how likely is each
hypothesis in the light of the data and of the priors that you assumed.
So those values should be what you are looking for.
If you set "postprob=False" in BayesConfusionHypothesis, you will get
the likelihoods of each model/hypothesis, i.e. p(CM | H_i), instead of
posterior probabilities. This is a different quantity. Note that,
differently from p(H_i | CM), if you sum all the p(CM | H_i) you will not
get one. The likelihoods (which is an "integrated likelihood", or a
Bayesian likelihood) are useful to compare hypotheses in pairs. For
example if you want to know how much evidence is in the data in favor of
discriminating all classes, i.e. H_5=((A),(B),(C)), compared to not
discriminating any class, i.e. H_1=((A,B,C)), then you can look at the
ratio B_51 = p(CM|H_5) / p(CM|H_1), which is called Bayes factor (similar
to the likelihood ratio of the frequentist approach, but note that the
likelihoods are not frequentist likelihoods). If that number is>1, then
the evidence of the data supports H_5 more than H_1. More detailed
guidelines to interpret the value of the Bayes factor can be found for
example in Kass and Raftery (JASA 1995).
In the paper Olivetti et al (PRNI 2012) I presented the Bayes factor way,
but I believe that looking at the posterior probabilities - which is the
PyMVPA's default I proposed - is simpler and more clear especially in the
case of many hypotheses/partitions. I am describing these things in an
article in preparation.
The parameters "space" and "hypotheses" of BayesConfusionHypothesis have
the following meaning:
- "space" stores the string of the dataset's field where the posterior
probabilities are stored. That dataset is the output of
BayesConfusionHypothesis. You might want to change the default name
"hypothesis". Or not :).
oops, sorry! I should have read in the documentation a bit further and see
that this is just a name string....
- "hypotheses" may be useful if you want to define your own set of
hypotheses/partitions instead of relying on all possible partitions of
the set of classes. The default value "None" triggers the internal
computation of all possible partitions. If you do not have strong reasons
to change this default behavior, I guess your should stick with the
default value.
Best,
Emanuele Olivetti
On 06/21/2013 08:47 AM, marco tettamanti wrote:
Dear all, first of all I take my first chance to thank the authors for
making such a great software as pymvpa available!
I have some (beginner) questions regarding the
BayesConfusionHypothesis algorithm for for multiclass pattern
discrimination.
If I understand it correctly, what the algorithm does is to compare
all possible partitions of classes and it then reports the most likely
partitioning hypothesis to explain the confusion matrix (i.e. highest
log likelihood among those of all possible hypotheses, as stored in the
.sample attribute).
Apart from being happy to see confimed my hypothesis of all classes
being discriminable from each other, is there any way to obtain or
calculate some measures of how likely it is that the most likely
hypothesis is truly strongly/weakly superior than some or all of the
alternative hypotheses? For instance, Olivetti et al (PRNI 2012) state
that a BF>1 is sufficient to support H1 over H0 and report Bayes Factor
and binomial tests in tables.
I assume I should know the answer, so forgive me for my poor
statistics.
On a related matter: I see form the BayesConfusionHypothesis
documentation, that there should be parameters to define a hypothesis
space (space=) or some specific hypotheses (hypotheses=). Could anybody
please provide some examples on how to fill in these parameters?
Thank you and all the best, Marco
------------------------------
Subject: Digest Footer
_______________________________________________ Pkg-ExpPsy-PyMVPA mailing
list [email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
------------------------------
End of Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3
************************************************ .
------------------------------
Subject: Digest Footer
_______________________________________________ Pkg-ExpPsy-PyMVPA mailing
list [email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
------------------------------
End of Pkg-ExpPsy-PyMVPA Digest, Vol 65, Issue 6
************************************************
--
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: [email protected]
Skype: mtettamanti
--------------------------------------------------------------------------
LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO.
AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 07636600962
info:[email protected] - www.5xmille.org
Disclaimer added by CodeTwo Exchange Rules 2007
http://www.codetwo.com
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa