[Scikit-learn-general] How should I deal with log(0) when calculating Mutual Information?

Robert Layton Thu, 13 Oct 2011 20:10:56 -0700

I'm working on adding Adjusted Mutual Information, and need to calculate the
Mutual Information.
I think I have the algorithm itself correct, except for the fact that
whenever the contingency matrix is 0, a nan happens and propogates through
the code.



Sample code on the net [1] uses an eps=np.finfo(float).eps. Should I do
this, adding eps to anything that is a denominator or parameter to log?
Is there a better way?

[1]
http://blog.sun.tc/2010/10/mutual-informationmi-and-normalized-mutual-informationnmi-for-numpy.html
FYI: My current code:
def mutual_information(labels_true, labels_pred, contingency=None):
    if contingency is None:
        labels_true, labels_pred = check_clusterings(labels_true,
labels_pred)
        contingency = contingency_matrix(labels_true, labels_pred)
    # Calculate P(i) for all i and P'(j) for all j
    pi = np.sum(contingency, axis=1)
    pi /= float(np.sum(pi))
    pj = np.sum(contingency, axis=0)
    pj /= float(np.sum(pj))
    # Compute log for all values
    log_pij = np.log(contingency)
    # Product of pi and pj for denominator
    pi_pj = np.outer(pi, pj)
    # Remembering that log(x/y) = log(x) - log(y)
    mi = np.sum(contingency * (log_pij - pi_pj))
    return mi

-- 



My public key can be found at: http://pgp.mit.edu/
Search for this email address and select the key from "2011-08-19" (key id:
54BA8735)
Older keys can be used, but please inform me beforehand (and update when
possible!)

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] How should I deal with log(0) when calculating Mutual Information?

Reply via email to