oh you're right, bad example. so it *is* a dangerous function after all ;-)


On Sat, Oct 15, 2011 at 9:26 PM,  <[email protected]> wrote:
> On Sat, Oct 15, 2011 at 4:12 PM, Pietro Berkes <[email protected]> wrote:
>> On Sat, Oct 15, 2011 at 9:07 PM,  <[email protected]> wrote:
>>> On Sat, Oct 15, 2011 at 3:57 PM, Pietro Berkes <[email protected]> wrote:
>>>> I wish there was a native numpy function for this case, which is
>>>> fairly common in information theory quantities.
>>>> As a workaround, I sometimes use these reasonably efficient utility 
>>>> functions:
>>>>
>>>> def log0(x):
>>>>    """Robust 'entropy' logarithm: log(0.) = 0."""
>>>>    return np.where(x==0., 0., np.log(x))
>>>>
>>>>
>>>> def log0_no_warning(x):
>>>>    """Robust 'entropy' logarithm: log(0.) = 0.
>>>>
>>>>    This version does not raise any warning when values of x=0. are first
>>>>    encountered. However, it is slightly more inefficient."""
>>>>    with np.errstate(divide='ignore'):
>>>>        res = np.where(x==0., 0., np.log(x))
>>>>    return res
>>>>
>>>
>>> I think the function is quite dangerous if you take it out of the
>>> context of information measures
>>>
>>>>>> np.log(0)
>>> -inf
>>>
>>> The equivalent functions that I used where all  for xlogy
>>>
>>> res = np.where(x==0., 0., x*np.log(y))
>>>
>>>
>>> Just my 2c from other packages.
>>
>> Well it is useful in other contexts, e.g. to compute the log pdf of a
>> beta distribution:
>>
>> from scipy.special import gammaln
>>
>> def log_beta_pdf(x, a, b):
>>    """Return the natural logarithm of the Beta(a,b) distribution at x."""
>>    return (gammaln(a+b) - gammaln(a) - gammaln(b)
>>            + (a-1.)*log0(x) + (b-1.)*log0(1.-x))
>
> not here:
>
>>>> from scipy import stats
>>>> stats.beta._logpdf(0, 0.5, 0.5)
> inf
>>>> stats.beta._logpdf(1e-15, 0.5, 0.5)
> 16.124658311605941
>>>> stats.beta._logpdf(1e-30, 0.5, 0.5)
> 33.394046509061283
>>>> stats.beta._logpdf(1e-100, 0.5, 0.5)
> 113.98452476385289
>>>> stats.beta._logpdf(1e-500, 0.5, 0.5)
> inf
>>>> stats.beta._logpdf(1e-300, 0.5, 0.5)
> 344.24303406325743
>
> 0log0 only if a=1 or b=1 and x is 0 or 1
>
> or gamma: https://github.com/scipy/scipy/pull/5
>
> (bug in scipy 0.9:
>>>> stats.beta._logpdf(1e-300, 1, 0.5)
> -0.69314718055994529
>>>> stats.beta._logpdf(0, 1, 0.5)
> nan
>>>> np.log(stats.beta._pdf(0, 1, 0.5))
> -0.69314718055994529
> )
>
> Josef
>
>>
>> I agree that it could have a more explicit name, like entropy_log(x) .
>>
>>
>>
>>
>>>
>>> Josef
>>>
>>>>
>>>>
>>>> On Fri, Oct 14, 2011 at 10:31 AM, Olivier Grisel
>>>> <[email protected]> wrote:
>>>>> 2011/10/14 Robert Layton <[email protected]>:
>>>>>> I'm working on adding Adjusted Mutual Information, and need to calculate 
>>>>>> the
>>>>>> Mutual Information.
>>>>>> I think I have the algorithm itself correct, except for the fact that
>>>>>> whenever the contingency matrix is 0, a nan happens and propogates 
>>>>>> through
>>>>>> the code.
>>>>>>
>>>>>> Sample code on the net [1] uses an eps=np.finfo(float).eps. Should I do
>>>>>> this, adding eps to anything that is a denominator or parameter to log?
>>>>>> Is there a better way?
>>>>>
>>>>> I would rather filter out any entry that has a 0.0 in the denominator
>>>>> before the final sum using array masking.
>>>>>
>>>>> BTW, thanks for tackling this.
>>>>>
>>>>> --
>>>>> Olivier
>>>>> http://twitter.com/ogrisel - http://github.com/ogrisel
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> All the data continuously generated in your IT infrastructure contains a
>>>>> definitive record of customers, application performance, security
>>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>>> sense of it. Business sense. IT sense. Common sense.
>>>>> http://p.sf.net/sfu/splunk-d2d-oct
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure contains a
>>>> definitive record of customers, application performance, security
>>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>>> sense of it. Business sense. IT sense. Common sense.
>>>> http://p.sf.net/sfu/splunk-d2d-oct
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2d-oct
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to