oh you're right, bad example. so it *is* a dangerous function after all ;-)
On Sat, Oct 15, 2011 at 9:26 PM, <[email protected]> wrote: > On Sat, Oct 15, 2011 at 4:12 PM, Pietro Berkes <[email protected]> wrote: >> On Sat, Oct 15, 2011 at 9:07 PM, <[email protected]> wrote: >>> On Sat, Oct 15, 2011 at 3:57 PM, Pietro Berkes <[email protected]> wrote: >>>> I wish there was a native numpy function for this case, which is >>>> fairly common in information theory quantities. >>>> As a workaround, I sometimes use these reasonably efficient utility >>>> functions: >>>> >>>> def log0(x): >>>> """Robust 'entropy' logarithm: log(0.) = 0.""" >>>> return np.where(x==0., 0., np.log(x)) >>>> >>>> >>>> def log0_no_warning(x): >>>> """Robust 'entropy' logarithm: log(0.) = 0. >>>> >>>> This version does not raise any warning when values of x=0. are first >>>> encountered. However, it is slightly more inefficient.""" >>>> with np.errstate(divide='ignore'): >>>> res = np.where(x==0., 0., np.log(x)) >>>> return res >>>> >>> >>> I think the function is quite dangerous if you take it out of the >>> context of information measures >>> >>>>>> np.log(0) >>> -inf >>> >>> The equivalent functions that I used where all for xlogy >>> >>> res = np.where(x==0., 0., x*np.log(y)) >>> >>> >>> Just my 2c from other packages. >> >> Well it is useful in other contexts, e.g. to compute the log pdf of a >> beta distribution: >> >> from scipy.special import gammaln >> >> def log_beta_pdf(x, a, b): >> """Return the natural logarithm of the Beta(a,b) distribution at x.""" >> return (gammaln(a+b) - gammaln(a) - gammaln(b) >> + (a-1.)*log0(x) + (b-1.)*log0(1.-x)) > > not here: > >>>> from scipy import stats >>>> stats.beta._logpdf(0, 0.5, 0.5) > inf >>>> stats.beta._logpdf(1e-15, 0.5, 0.5) > 16.124658311605941 >>>> stats.beta._logpdf(1e-30, 0.5, 0.5) > 33.394046509061283 >>>> stats.beta._logpdf(1e-100, 0.5, 0.5) > 113.98452476385289 >>>> stats.beta._logpdf(1e-500, 0.5, 0.5) > inf >>>> stats.beta._logpdf(1e-300, 0.5, 0.5) > 344.24303406325743 > > 0log0 only if a=1 or b=1 and x is 0 or 1 > > or gamma: https://github.com/scipy/scipy/pull/5 > > (bug in scipy 0.9: >>>> stats.beta._logpdf(1e-300, 1, 0.5) > -0.69314718055994529 >>>> stats.beta._logpdf(0, 1, 0.5) > nan >>>> np.log(stats.beta._pdf(0, 1, 0.5)) > -0.69314718055994529 > ) > > Josef > >> >> I agree that it could have a more explicit name, like entropy_log(x) . >> >> >> >> >>> >>> Josef >>> >>>> >>>> >>>> On Fri, Oct 14, 2011 at 10:31 AM, Olivier Grisel >>>> <[email protected]> wrote: >>>>> 2011/10/14 Robert Layton <[email protected]>: >>>>>> I'm working on adding Adjusted Mutual Information, and need to calculate >>>>>> the >>>>>> Mutual Information. >>>>>> I think I have the algorithm itself correct, except for the fact that >>>>>> whenever the contingency matrix is 0, a nan happens and propogates >>>>>> through >>>>>> the code. >>>>>> >>>>>> Sample code on the net [1] uses an eps=np.finfo(float).eps. Should I do >>>>>> this, adding eps to anything that is a denominator or parameter to log? >>>>>> Is there a better way? >>>>> >>>>> I would rather filter out any entry that has a 0.0 in the denominator >>>>> before the final sum using array masking. >>>>> >>>>> BTW, thanks for tackling this. >>>>> >>>>> -- >>>>> Olivier >>>>> http://twitter.com/ogrisel - http://github.com/ogrisel >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> All the data continuously generated in your IT infrastructure contains a >>>>> definitive record of customers, application performance, security >>>>> threats, fraudulent activity and more. Splunk takes this data and makes >>>>> sense of it. Business sense. IT sense. Common sense. >>>>> http://p.sf.net/sfu/splunk-d2d-oct >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> All the data continuously generated in your IT infrastructure contains a >>>> definitive record of customers, application performance, security >>>> threats, fraudulent activity and more. Splunk takes this data and makes >>>> sense of it. Business sense. IT sense. Common sense. >>>> http://p.sf.net/sfu/splunk-d2d-oct >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure contains a >>> definitive record of customers, application performance, security >>> threats, fraudulent activity and more. Splunk takes this data and makes >>> sense of it. Business sense. IT sense. Common sense. >>> http://p.sf.net/sfu/splunk-d2d-oct >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
