Re: [Math] Cleaning up the curve fitters

Phil Steitz Fri, 19 Jul 2013 12:28:47 -0700

On 7/19/13 12:21 PM, Ted Dunning wrote:
> The discussion about how to get something into commons when it is (a) well
> documented and (b) demonstrated better on at least some domains is
> partially procedural, but it hinges on technical factors.
>
> I think that Ajo is being very reserved here.  When I faced similar
> discouragement in the past with commons math contributions, I simply went
> elsewhere.
>
> It still seems to me that it would serve CM well to pay more attention to
> Ajo's comments and suggestions.  Simply saying that we should focus on
> technical discussion when CM's list is filled with esthetic arguments
> really just sounds like a way of pushing people away.


Please read the threads.  This is not "esthetics."  Maybe you can
help. 

Phil
>
>
> On Fri, Jul 19, 2013 at 10:21 AM, Phil Steitz <phil.ste...@gmail.com> wrote:
>
>> As I said above, let's focus on actual technical discussion here.
>> We implement standard, well-documented algorithms.  We need to
>> provide references and convince ourselves that what we release is
>> numerically sound, well-documented and well-tested.  We do our best
>> with the volunteer resources we have.  Your help and contributions
>> are appreciated.
>>
>> Phil
>>
>> On 7/19/13 9:44 AM, Ajo Fod wrote:
>>> Hi,
>>>
>>> I very much appreciate the work that has been done in CM and this is
>>> precisely why I'd like more people to contribute. Even when you didnt'
>>> accept my MATH-995 patch, I got useful input from Konstantin and it has
>>> already made my application more efficient.
>>>
>>> What you required of me in the Improper integral example was a comparison
>>> of different methods. This sort of research takes time. I hear that
>> Gilles
>>> is working on it. I appreciate that you guys spent so much effort on
>> this.
>>> However, my contention is that your efforts at researching alternate
>>> solutions to a patch is not justified till you come up with a test that
>> the
>>> patch fails OR if you know an alternate performs better for an
>> application
>>> you have. In the first case, you're losing the efficiency of open source
>> by
>>> reinventing a possibly different wheel without sufficient marginal
>> reward.
>>> In the second case, beware of the fact that numerical algorithms are
>> hairy
>>> beasts, and it takes a while to encode something new. The efficiency of
>>> commons comes from putting the burden of development on the developers
>> who
>>> need the code.
>>>
>>> So, I propose an alternate approach to testing if a submitted patch needs
>>> to be accepted:
>>> 1. Check if the patch fills a gap in existing CM code
>>> 2. if so, check if it passes known tests
>>> 3. if so, write up alternate tests to see if the code breaks.
>>> 4. if so, wrap the code up in a suitable API and accept the patch
>>>
>>> This has two advantages. First CM will have more capabilities per unit of
>>> your precious time. Second you give people the feeling that they are
>> making
>>> a difference.
>>>
>>> As far as the debate on AQ(AdaptiveQuadrature) vs
>>> LGQ(IterativeLegendreGaussIntegrator) goes:
>>> The FACTS that support AQ over LGQ are:
>>> 1. An example where LGQ failed and AQ succeeded. I also explained why LGQ
>>> fails and AQ will probably converge more correctly. Generally adaptive
>>> quadrature are known to be so succesful at integration that Konstantin
>> even
>>> wondered why we don't have something yet.
>>> 2. Efficiency improvement: I also showed that LGQ is more efficient at at
>>> least one example in terms of accuracy in digits per function evaluation.
>>> So, conversely, its now your turn to provide concrete examples where LGQ
>>> does better than AQ. You could pose credible objections by providing
>>> examples where:
>>> 1. AQ fails but LGQ passes.
>>> 2. LGQ is more efficient in accuracy per evaluation.
>>>
>>> All that to illustrate with example where the perception that it is hard
>> to
>>> convince the gatekeepers of commons of the merits of a patch arises
>> from. I
>>> have a package in my codebase with assorted patches that I just dont'
>> think
>>> is worth the time to try to post to commons. I think it is very
>> inefficient
>>> if others have such private patches.
>>>
>>> Cheers,
>>> Ajo
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 18, 2013 at 2:15 PM, Phil Steitz <phil.ste...@gmail.com>
>> wrote:
>>>> On 7/18/13 1:48 PM, Ajo Fod wrote:
>>>>> Hello folks,
>>>>>
>>>>> There is a lot of work in API design. However, Konstantin's point is
>> that
>>>>> it takes a lot of effort to convince Gilles of any alternatives. API
>>>> design
>>>>> issues should really be second to functionality. This idea seems to be
>>>> lost
>>>>> in conversations.
>>>> With patience and collaboration you can have both and we *need* to
>>>> have both.  You can't get to a stable API and approachable and
>>>> maintainable code base without thinking carefully about API design.
>>>>> I agree with Gilles that providing tests and benchmarks that exhibit
>> the
>>>>> advantages of a particular method are probably the best way to show
>> other
>>>>> contributors the value of an alternative approach.
>>>> There is some value to this, but honestly much more value in
>>>> carefully researching and presenting the numerical analysis to
>>>> support improvement / performance claims.
>>>>> It is quite depressing to the contributor to see one's contribution be
>>>>> rejected when efficiency/accuracy improvements are demonstrated.
>>>> What you "demonstrated" in one case was better performance in one
>>>> problem instance.  The change of variable approach you implemented
>>>> was, in my admittedly possibly naive numerics view, questionable.  I
>>>> asked to see numerical analysis support and no one provided that.
>>>> Had you provided that, I would have argued to include some version
>>>> of the patch.
>>>>
>>>>> In a
>>>>> better world, rejecting a patch that passes the hurdle of demonstrating
>>>> an
>>>>> efficiency improvement over existing code should come with a
>>>> responsibility
>>>>> of showing alternate tests that the patch fails and the original code
>>>>> passes. Otherwise, the patch should be accepted by default. The person
>>>> who
>>>>> commits or designed the API is free to make changes to fit API design.
>>>> This is essentially what Gilles ended up doing.  You may not agree
>>>> with the approach, but he did in fact address the core issue.
>>>>> Just like API designers are not experts at the underlying math,
>>>>> contributors are not necessarily experts at the underlying API design.
>> To
>>>>> unlock the efficiency of open source, contributor morale needs to be
>>>>> considered and classes that pass tests should really be accepted.
>>>> I agree that we should try to be friendly and encouraging and I
>>>> apologize if we have not been so.  That said, the process of
>>>> contributing here is not just tossing patches over the wall.  First
>>>> you need to get community support for the ideas.  Then work
>>>> collaboratively to get patches that work for the code and community.
>>>>> For example, Performance AND accuracy improvements to existing
>> algorithm
>>>>> were demonstrated for AdaptiveQuadrature in my patches to MATH-995
>>>> Sorry, I was not convinced by the accuracy and performance claims
>>>> and, as I said above, I suspect that the change of variable approach
>>>> may not be the best way to handle improper integrals.  I am not
>>>> claiming authority here - just - again - asking for real numerical
>>>> analysis arguments to support the claims you are making.
>>>>
>>>> It would be a lot better if we focused discussion on the actual
>>>> technical issues and mathematical principles rather than
>>>> generalities about how hard / easy it is to get stuff in.
>>>>
>>>> Phil
>>>>> The only joy I got out of that was Gilles putting a comment in the docs
>>>> for
>>>>> the existing class:
>>>>> "The Javadoc now draws attention that the [existing] algorithm is not
>>>> 100%
>>>>> fool-proof."!
>>>>> Also, I was asked to open a new issue about Adaptive Quadratures to
>>>> figure
>>>>> out what is the best quadratue method ... all while a patch that is a
>>>> clear
>>>>> improvement over existing code wastes away. Why not accept the patch
>> and
>>>>> make improvements as necessary?
>>>>>
>>>>> My impression since that patch was rejected, is that it just seems
>> like a
>>>>> huge hurdle to get any patch past the API design requirements that are
>>>>> frankly not as clear to me as it is to the designer. I can see how
>> others
>>>>> feel the same way.
>>>>>
>>>>> Cheers,
>>>>> Ajo.
>>>>>
>>>>> Gilles: if you don't want to end up spending time developing
>>>> Gauss-Hermite
>>>>> quadrature or something else you don't really need, perhaps you should
>>>>> consider accepting/modifying code that was shown to work by someone who
>>>>> needed that functionality. It is reasonable to develop alternatives to
>>>> fix
>>>>> flaws/gaps, but otherwise its your effort wasted.  If someone's
>>>>> contribution doesn't fit your view of the API, then by all means edit
>> the
>>>>> patch, but if you go about rejecting things that work, there won't be
>> as
>>>>> many contributors to CM.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 18, 2013 at 10:08 AM, Roger L. Whitcomb <
>>>>> roger.whitc...@actian.com> wrote:
>>>>>
>>>>>> As an outsider listening to these discussions, it seems like:
>>>>>> a) *IF* there are problems with the current arrangement of packages,
>>>> APIs,
>>>>>> or whatever, then a constructive approach would be for the one who
>> sees
>>>>>> such problems to take the time to not just criticize and point out
>>>> "flaws",
>>>>>> but to dig in and rearrange the packages, redo the APIs, provide unit
>>>>>> tests, and submit a patch with these changes, along with quantitative
>>>>>> justification, benchmarks, test cases, etc.  It is quite easy to
>>>> criticize,
>>>>>> from the sidelines, the one who is actually doing the work, but quite
>>>>>> another matter to roll up your sleeves and join in the work....
>>>>>> b) Since Math is a "library", it seems like there needs to be
>>>>>> implementations of many different algorithms, since (quite clearly)
>> not
>>>>>> every algorithm is suited to every problem.  To say that X method
>>>> doesn't
>>>>>> work well for problem Y, is not necessarily a reason to rewrite X
>>>> method,
>>>>>> if that method is correctly implementing the algorithm.  Maybe the
>>>>>> algorithm is simply not the right one to use for the problem.
>>>>>> c) Comments that imply (or state outright) that someone who has
>>>> (clearly)
>>>>>> done a lot of work has done it "...without much thinking..." are
>> clearly
>>>>>> out of line.  In my experience, the only reason to resort to name
>>>> calling
>>>>>> and character assassination is because one has no worthy arguments to
>>>> put
>>>>>> forward.
>>>>>> d) Kudos to the Commons committers who have been doing the work ...
>>>>>>
>>>>>> My 2 cents...
>>>>>>
>>>>>> ~Roger Whitcomb
>>>>>> Apache Pivot PMC Chair
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Gilles [mailto:gil...@harfang.homelinux.org]
>>>>>> Sent: Thursday, July 18, 2013 9:35 AM
>>>>>> To: dev@commons.apache.org
>>>>>> Subject: Re: [Math] Cleaning up the curve fitters
>>>>>>
>>>>>> On Thu, 18 Jul 2013 11:47:03 -0400, Konstantin Berlin wrote:
>>>>>>> I appreciate the comment. I would like to help, but currently my
>>>>>>> schedule is full. Maybe towards the end of the year.
>>>>>>>
>>>>>>> I think the first approach should be do no harm. The optimization
>>>>>>> package keeps getting refactored every few months without much
>>>>>>> thinking involved. We had the discuss previously, with Gilles
>>>>>>> unilaterally deciding on the current tree, which he now wants to
>>>>>>> change again.
>>>>>> As I said,
>>>>>> as Luc said,
>>>>>> as Phil said,
>>>>>> again and again and again,
>>>>>> we are not optimization (as a scientific field) experts here, but we
>> do
>>>>>> use Commons Math in scientific code that is pretty compute intensive
>>>> (and
>>>>>> yes, maybe not in the same sense as you'd like it to be for your
>>>> comfort).
>>>>>> Current code has, and may still have problems, but we see them only
>>>>>> through running unit tests, running our applications, running code
>>>> examples
>>>>>> submitted by issue reporters.
>>>>>> We improve what we can, given time and motivation constraints.
>>>>>> Other than that, there is nothing.
>>>>>>
>>>>>> Yes, we already had that asymmetrical conversation where _you_ declare
>>>>>> what _we_ should do.
>>>>>>
>>>>>>> As someone who uses optimization regular I would say the current API
>>>>>>> state (not just package naming) leaves a lot to be desired, and is
>> not
>>>>>>> amenable to the various modification that people might need for
>> larger
>>>>>>> problems. So if you are going to modify it, you should at least open
>>>>>>> up the API to the possibility that different optimization steps can
>> be
>>>>>>> done using various techniques, depending on the problem.
>>>>>>>
>>>>>>> We should also accept that not everything can fit neatly into a
>>>>>>> package tree and a single set of APIs. A good example is least
>>>>>>> squares. Linear least squares does not require an initial guess at a
>>>>>>> solution, and by performing decomposition ahead of time you can
>>>>>>> quickly recompute the solution given different input values. However,
>>>>>>> an iterative least squares method might not have these properties.
>>>>>>> There are probably countless of other examples.
>>>>>>>
>>>>>>> Because optimization problems are really computationally hard all the
>>>>>>> little specific differences matter, that is why Gilles approach of
>>>>>>> sweeping everything under the rug and into some rigid not thought out
>>>>>>> hierarchical API forces these methods to adapt (or drop) numerical
>>>>>>> aspects that should not be there (e.x. polynomial fits). This has
>>>>>>> *huge* performance implications, but the issue is treated as some OO
>>>>>>> design 101 class, with the focus on how to force everything into a
>>>>>>> simple inheritance structure, numerics be damned.
>>>>>>>
>>>>>>> I would gladly help with the feedback when I can. Ajo and I provided
>>>>>>> code for adaptive integration, yet the whole issue was completely
>>>>>>> ignored. So I am not sure how much effort is required for the
>>>>>>> developers to take an idea or mostly completed code and make a
>> change,
>>>>>>> rather than reject even the most basic numerical approaches that are
>>>>>>> taught in introduction classes as something that needs to be
>>>>>>> benchmarked.
>>>>>> As usual, you are mixing everything, from algorithms to
>> implementations,
>>>>>> from proposing new features to denigrating existing ones (with
>>>> non-existent
>>>>>> or inappropriate use-cases), from numerical to efficiency
>>>> considerations...
>>>>>> [On top of it, you blatantly affirm that this issue has been ignored,
>>>> even
>>>>>> as I provided[1] an analysis[2] of what was actually happening.
>>>>>> People like you seem to ignore that we work benevolently on this
>>>> project!]
>>>>>> Not even speaking of derogatory remarks like "sweeping [...] under the
>>>> rug"
>>>>>> and "not thought out" and insinuating that everything was better and
>>>> more
>>>>>> efficient before. Which is simply not true.
>>>>>>
>>>>>> It's an asymmetrical discussion because you declare that half-baked
>> code
>>>>>> is good enough and _we_ have to work even more than if we'd have to
>>>>>> implement the feature from scratch.
>>>>>>
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> [1] In the spare time I do _not_ have either.
>>>>>> [2] Which dragged me to the implementation of the Gauss-Hermite
>>>> quadrature
>>>>>>      scheme (although I had no personal use of it), which seems to be
>>>> the
>>>>>>      appropriate way to deal with the improper integral reported in
>> the
>>>>>>      issue which you refer to.
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] Cleaning up the curve fitters

Reply via email to