[ https://issues.apache.org/jira/browse/MATH-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phil Steitz updated MATH-984: ----------------------------- Fix Version/s: 3.3 > Incorrect (bugged) generating function getNextValue() in > .random.EmpiricalDistribution > -------------------------------------------------------------------------------------- > > Key: MATH-984 > URL: https://issues.apache.org/jira/browse/MATH-984 > Project: Commons Math > Issue Type: Bug > Affects Versions: 3.2, 3.1.1 > Environment: all > Reporter: Radoslav Tsvetkov > Assignee: Phil Steitz > Fix For: 3.3 > > > The generating function getNextValue() in > org.apache.commons.math3.random.EmpiricalDistribution > will generate wrong values for all Distributions that are single tailed or > limited. For example Data which are resembling Exponential or Lognormal > distributions. > The problem could be easily seen in code and tested. > In last version code > ... > 490 return getKernel(stats).sample(); > ... > it samples from Gaussian distribution to "smooth" in_the_bin. Obviously > Gaussian Distribution is not limited and sometimes it does generates numbers > outside the bin. In the case when it is the last bin it will generate wrong > numbers. > For example for empirical non-negative data it will generate negative rubbish. > Additionally the proposed algorithm boldly returns only the mean value of > the bin in case of one value! This last makes the generating function > unusable for heavy tailed distributions with small number of values. (for > example computer network traffic) > On the last place usage of Gaussian soothing in the bin will change greatly > some empirical distribution properties. > The proposed method should be reworked to be applicable for real data which > have often limited ranges. (either non-negative or both sides limited) -- This message was sent by Atlassian JIRA (v6.1.5#6160)