Hi Gilles, Luc,

Thanks for all the comments. I'll try to respond to the more fundamental
concerns in this email, and the more practical ones in another email, if we
decide that we want to include data editing in [math].

On Fri, Sep 12, 2014 at 8:55 AM, Gilles <gil...@harfang.homelinux.org>
wrote:

> Hi.
>
>
> On Fri, 12 Sep 2014 09:16:07 +0200, Luc Maisonobe wrote:
>
>> Le 12/09/2014 01:35, Gilles a écrit :
>>
>>> Hello.
>>>
>>> On Thu, 11 Sep 2014 14:29:49 -0400, Evan Ward wrote:
>>>
>>>> Hi,
>>>>
>>>> A while ago I had bought up the idea of adding residual editing (aka
>>>> data
>>>> editing, outlier rejection, robust regression) to our non-linear least
>>>> squares implementations.[1] As the name suggests, the idea is to
>>>> de-weight
>>>> observations that don't match the user's model. There are several ways
>>>> to
>>>> this including choosing a fixed cutoff, a fixed standard deviation
>>>> cutoff,
>>>> or reducing a residual's weight based on its magnitude.[2]
>>>>
>>>> However we add the data editing feature I think it will cause backward
>>>> incompatibilities with the released API. I've outlined below the two
>>>> options I see. I'm open to other ideas as well.
>>>>
>>>> 1. Replace edited residuals with 0's in the residual vector and Jacobian
>>>> (i.e. apply a 0 weight). This has the advantage of being simple to
>>>> implement and that our existing optimizers are already able to handle
>>>> it.
>>>> The downside is evident when the user tries to obtain the number of
>>>> residuals that were edited. It is hard to tell the difference between an
>>>> edited residual, an apriori zero weight, or a model evaluation where the
>>>> residual and gradient is, in fact, zero. We can provide easy access to
>>>> the
>>>> number of edited residuals by adding a method to the Evaluation
>>>> interface.
>>>> (This is what I implemented in the patch in the original thread.) Now
>>>> that
>>>> the code has been released though, this would cause a backward
>>>> incompatibility for some advanced users. Most users will likely use the
>>>> included factory and builder methods to define their
>>>> LeastSquaresProblem(LSP) and these users would not be affected by the
>>>> change. Only the users that provide a custom implementation of
>>>> LSP.Evauation would be affected.
>>>>
>>>> 2. Remove edited residuals from the gradient and Jacobian, so that the
>>>> resulting vector and matrix have fewer rows. The advantage here is
>>>> that the
>>>> user can compare the length of the residual vector in the Optimum to the
>>>> number of observations in the LSP to determine the number of edited
>>>> residuals. The problem is that returning vectors/matrices with different
>>>> sizes from LSP.evaluate() would violate the contract. Additionally we
>>>> would
>>>> have to modify our existing optimizers to deal with the variable
>>>> lengths.
>>>> For GaussNewton the modification would be small, but for
>>>> LevenburgMarquardt
>>>> I would likely have to re-write it since I don't understand the code
>>>> (not
>>>> for lack of trying :P ). Users that implement LeastSquaresOptimizers
>>>> would
>>>> likely have to modify their code as well.
>>>>
>>>> To summarize, in both cases users that only use the provided [math]
>>>> classes
>>>> would not have to modify their code, while users that provide custom
>>>> implementations of [math] interfaces would have to modify their code.
>>>>
>>>> I would like to get this feature wrapped in for the next release. Please
>>>> let me know if you have a preference for either implementation and if
>>>> there
>>>> are any other issues I should consider.
>>>>
>>>
>>> Compatibility breaks cannot occur in minor releases.
>>> The next major release should not occur before deprecated classes are all
>>> replaced. [I'm thinking about the optimizers, for which the fluent API
>>> should
>>> be implemented based on your design of NLLS.]
>>>
>>
>> I theoretically agree here, but we are in a sort of chicken and egg
>> problem. Typically, when I proposed to switch the ODE to the new fluent
>> API, we decided we should do it at a major release only. We also said we
>> cannot remove deprecation at minor releases. Now we say major release
>> should wait for deprecation to be removed.
>>
>> So As far as I understand, everything should occur at once for major
>> releases : remove deprecated classes, introduce new classes and
>> interfaces, but see next remark.
>>
>
> No, there is actually no chicken and egg problem in this case.
> The issue is only that we should not delete an API before a replacement
> is provided.
> So
> 1. We want to remove the deprecated optimizers API
>    -> must be done in major release (e.g. 4.0)
> 2. We want to let users upgrade their code
>    -> the new API must exist before 4.0
>
> 1 and 2 are not contradictory as the new API will be written from
> scratch (it is not a modification, say, of existing interfaces).
>
>
>>> It would be nice to recode the whole "LevenbergMarquardtOptimizer" in
>>> full OO
>>> Java. But it should be implemented and tested before any new feature is
>>> added
>>> to the mix.
>>>
>>
>> Here comes another problem: we want to have time to stabilize new API,
>> which mean we should have a few minor releases with trial API.
>> Unfortunately, once these trial API have been introduced, we are stuck
>> with them and we end up piling several API togethers (the optimizers are
>> the perfect example).
>>
>
> Piling up, yes, but it is not a problem: all the old APIs are already
> deprecated; "only" the replacements are missing. We must provide them
> in a minor release before 4.0. [This should not need to delay the next
> release (3.4), as we can have as many 3.x as we want...]
>
>
>> So when do we test new API : at minor releases just before a major
>> release, but then we keep all the trials until a big clean up at major
>> release ?
>>
>
> Yes.
>
>
>>
>>> Do I understand correctly that in the "robust" fit, the weights are
>>> modified
>>> during the optimization?
>>> If so, would the algorithms still be "standard"?
>>>
>>
>> I think performing some modifications between iterations is still a
>> standard way to perform some optimizations: we can detect the outliers
>> only once we have computed residuals and some statistics like the
>> standard deviation, so this editing operation should occur somewhere
>> during the optimization process.
>>
>
> I don't see the that the editing has to occur during the optimization.
> It could be independent:
>  1. Let the optimization run to completion with all the data
>  2. Compute the residuals
>  3a. If there are no outliers, stop
>  3b. If there are outliers, remove them (or modify their weight)
>  4. Run the optimization with the remaining data
>  5. Goto 2
>
> The advantage I see here is that the weight modification is a user
> decision (and implementation).
>
> However, IIUC the examples from the link provided by Evan, "robust"
> optimization (i.e. handling outliers during optimization) could lead
> to a "better" solution.
>
> Would it always be "better"? Not sure: significant points could be
> mistaken as outliers and be discarded before the optimizer could
> figure out the correct solution...
> To avoid a "good but wrong" fit, I'm afraid that we'd have to
> introduce several parameters (like "do not discard outliers before
> iteration n", "do not discard more than m points", etc.)
> The tuning of those parameters will probably not be obvious, and
> they will surely complicate the code (e.g. input data like the
> "target" and "weight" array won't be "final").
>

The advantage I see is not correctness (since the algorithm you outline
will converge correctly), but reducing function evaluations. (I don't have
data to back up this assertion.) Without inline data editing the
optimization algorithm would "waste" the evaluations between when the
outliers became obvious, and when the optimization converges. With the
"inline" scheme, the outliers are deleted as soon as possible, and the
remaining evaluations are used to converge towards the correct solution.

Converging to a "good but wrong" will always be a risk of any algorithm
that automatically throws away data. As with our other algorithms, I'm
expecting the user to know when the algorithm is a good fit for their
problem. The use case I see is when the observations contain mostly real
data, and a few random numbers. The bad observations can be hard to
identify apriori, but become obvious during the fitting process.



>
>  I have already seen this typically in
>> orbit determination so I would consider it a classical optional feature.
>>
>
> What works in one domain might not in another.
> Thus, the feature should not alter the "standardness", nor decrease
> the robustness of the implementation. Caring for special cases
> (for which the feature is useful) may be obtained by e.g. use the
> standard algorithm as a basic block that is called repeatedly, as
> hinted above (and tuning the standard parameters and input data
> appropriately for each call).
>

I was not expecting the response that [math] may not want this feature. I'm
o.k. with this result since I can implement it as an addition to [math],
though the API won't be as clean.


>
>>> At first sight, I'd avoid modification of the sizes of input data
>>> (option 2);
>>> from an API usage viewpoint, I imagine that user code will require
>>> additional
>>> "length" tests.
>>>
>>> Couldn't the problem you mention in option 1 disappear by having
>>> different
>>> methods that return the a-priori weights and the modified weights?
>>>
>>
>> As we are already much too stringent with our compatibility policy, I
>> would allow the case were only *very* advanced users would have
>> problems. So it would seem fair to me if we can do some changes were the
>> users of the factory are unaffected and expert users who decided the
>> factory was not good for them have to put new efforts.
>>
>
> Before embarking on this, I would like to see examples where the "inline"
> outlier rejection is leads to a (correct) solution impossible to achieve
> with the approach I've suggested.
>

As discussed above, I don't see any cases where "inline" outlier rejection
will result in a less correct solution than the algorithm you outline. I do
think we can save a significant number of function evaluations by using
"inline" outlier rejection.

Best Regards,
Evan


>
>  Anyway, I would be OK with both solutions, and also think rewriting
>> Levenberg-Marquardt would be a good thing. I was translated from Minpack
>> with only a change in the QR decomposition and a change in the
>> convergence checker, so basically it is Fortran written with Java syntax.
>>
>
> I just want to remind that quite some effort has been put already in the
> modification of the original port (removal of many "work" arrays, "final"
> data). It's not anymore just Fortran in Java. [But the core algorithm is
> certainly not obvious, and I'm all for a clean reimplementation (to exist
> in parallel with the current one, until everybody agrees that nothing is
> lost).]
>
>
> Best,
> Gilles
>
>
>
>>>> [1] http://markmail.org/message/e53nago3swvu3t52
>>>>      https://issues.apache.org/jira/browse/MATH-1105
>>>> [2] http://www.mathworks.com/help/curvefit/removing-outliers.html
>>>>      http://www.mathworks.com/help/curvefit/least-squares-fitting.html
>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to