Hello.
On Thu, 11 Sep 2014 14:29:49 -0400, Evan Ward wrote:
Hi,
A while ago I had bought up the idea of adding residual editing (aka
data
editing, outlier rejection, robust regression) to our non-linear
least
squares implementations.[1] As the name suggests, the idea is to
de-weight
observations that don't match the user's model. There are several
ways to
this including choosing a fixed cutoff, a fixed standard deviation
cutoff,
or reducing a residual's weight based on its magnitude.[2]
However we add the data editing feature I think it will cause
backward
incompatibilities with the released API. I've outlined below the two
options I see. I'm open to other ideas as well.
1. Replace edited residuals with 0's in the residual vector and
Jacobian
(i.e. apply a 0 weight). This has the advantage of being simple to
implement and that our existing optimizers are already able to handle
it.
The downside is evident when the user tries to obtain the number of
residuals that were edited. It is hard to tell the difference between
an
edited residual, an apriori zero weight, or a model evaluation where
the
residual and gradient is, in fact, zero. We can provide easy access
to the
number of edited residuals by adding a method to the Evaluation
interface.
(This is what I implemented in the patch in the original thread.) Now
that
the code has been released though, this would cause a backward
incompatibility for some advanced users. Most users will likely use
the
included factory and builder methods to define their
LeastSquaresProblem(LSP) and these users would not be affected by the
change. Only the users that provide a custom implementation of
LSP.Evauation would be affected.
2. Remove edited residuals from the gradient and Jacobian, so that
the
resulting vector and matrix have fewer rows. The advantage here is
that the
user can compare the length of the residual vector in the Optimum to
the
number of observations in the LSP to determine the number of edited
residuals. The problem is that returning vectors/matrices with
different
sizes from LSP.evaluate() would violate the contract. Additionally we
would
have to modify our existing optimizers to deal with the variable
lengths.
For GaussNewton the modification would be small, but for
LevenburgMarquardt
I would likely have to re-write it since I don't understand the code
(not
for lack of trying :P ). Users that implement LeastSquaresOptimizers
would
likely have to modify their code as well.
To summarize, in both cases users that only use the provided [math]
classes
would not have to modify their code, while users that provide custom
implementations of [math] interfaces would have to modify their code.
I would like to get this feature wrapped in for the next release.
Please
let me know if you have a preference for either implementation and if
there
are any other issues I should consider.
Compatibility breaks cannot occur in minor releases.
The next major release should not occur before deprecated classes are
all
replaced. [I'm thinking about the optimizers, for which the fluent API
should
be implemented based on your design of NLLS.]
It would be nice to recode the whole "LevenbergMarquardtOptimizer" in
full OO
Java. But it should be implemented and tested before any new feature is
added
to the mix.
Do I understand correctly that in the "robust" fit, the weights are
modified
during the optimization?
If so, would the algorithms still be "standard"?
At first sight, I'd avoid modification of the sizes of input data
(option 2);
from an API usage viewpoint, I imagine that user code will require
additional
"length" tests.
Couldn't the problem you mention in option 1 disappear by having
different
methods that return the a-priori weights and the modified weights?
Best regards,
Gilles
Best Regards,
Evan
[1] http://markmail.org/message/e53nago3swvu3t52
https://issues.apache.org/jira/browse/MATH-1105
[2] http://www.mathworks.com/help/curvefit/removing-outliers.html
http://www.mathworks.com/help/curvefit/least-squares-fitting.html
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org