Gilles wrote: >> To address a few concerns I saw, >> >> Gilles wrote: >>> In this way, it does not; what I inferred from the partial code >>> above was >>> that there would only be _one_ "create" method. But: >>> 1. All "create" methods must be overridden at the next level of the >>> class >>> hierarchy (this is the duplication I was referring to in the first >>> post). >> >> True, but concrete classes should not be extended. > > Why not? > [Anyways, the duplication also occurs in the intermediate (abstract) > levels.] > >> Adding another >> abstract class to the hierarchy would mean implementing the create >> method form the superclass and delegating to a new abstract create >> method that includes the new parameters. The abstract class hierarchy >> would have to parallel the interface hierarchy. > > With a mutable instance, you avoid this; that's the point. > >> >>> 2. When a parameter is added somewhere in the hierarchy, all the >>> classes >>> below must be touched too. >> >> True, but since abstract classes are just skeletal implementations of >> interfaces you can't add any methods without breaking compatibility >> anyway. > > Adding a (concrete) method in an abstract class will not break > compatibility. > >> (You would have to add the same method to the public interface >> too.) > > There you break compatibility; that's why we removed a lot of the > interfaces in CM, because the API is not stable, and abstract classes > allow non-breaking changes. > >> This does make it important to decide on a well written and >> complete API before releasing it. > > When the scope of the software is well circumscribed, that would be > possible. With the whole of [Math]ematics, much less so. :-} > And state-of-the-art in Java is a moving target, aimed at by changing > CM contributors with differing needs and tastes; this adds to the > unstable mix.
That's a good point. I still prefer the interface design (though I may be in the minority) for two reasons. First, if a concrete class only publicly exposes the methods defined in an interface it encourages polymorphism. User code that uses one implementation can be easily switched to another and new implementations are less constrained. Second, it encourages composition over inheritance. I agree with Josh Bloch that composition produces more maintainable code. Adding new methods to an existing interface/class breaks composition. I think the interface/abstract class discussion is partially separable from the immutable/mutable discussion. I see the algorithm as the part that could really benefit from the polymorphism. Perhaps separating the problem definition (data) from the algorithm will improve the flexibility of the API. For example, PointVectorValuePair solveMyNLLSProblem(NLLSOptimizer opt){ //define problem to solve in an independent object NLLSProblem p = new NLLSProblem(/*model functions, weights, convergence checker, ...*/); //provide algorithm with the data it needs //algorithm has no problem specific state return opt.optimize(p); } > >>> And, we must note, that the duplication still does not ensure "real" >>> immutability unless all the passed arguments are themselves immutable. >>> But: >>> 1. We do not have control over them; e.g. in the case of the optimizers >>> the "ConvergenceChecker" interface could possibly be implemented by a >>> non-thread-safe class (I gave one example of such a thing a few weeks >>> ago when a user wanted to track the optimizer's search path) >> >> True. Thread safety is a tricky beast. I think we agree that the only >> way to guarantee thread safety is to only depend on final concrete >> classes that are thread safe themselves. > > I don't think so. When objects are immutable, thread-safety follows It is somewhat off topic, but a counter example would be Vector3D. Since the class is not final, a user could extend it and override all the methods and add some set{X,Y,Z} methods to make it mutable. Even though Vector3D is immutable, there is no _guarantee_ that every instanceof Vector3D is immutable unless it is final. This is why String is final. > (but > note that the current optimizers in CM were never thread-safe). > But thread-safety can exist even with mutable objects; it's just that > more > care must be taken to ensure it. > >> This is directly at odds with >> the inversion of control/dependency injection paradigm. I think a >> reasonable compromise is to depend on interfaces and make sure all the >> provided implementations are thread safe. > > Yes, that a way, but again easier said that done. > >> A simple sequential user won't >> need to care about thread safety. A concurrent user will need to >> understand the implications of Java threading to begin with. Accurate >> documentation of which interfaces and methods are assumed to be thread >> safe goes a long way here. > > I don't think I'm wrong if I say that most concurrent bugs are found in > production rather than in contrived test cases. > [That's why I advocated for introducing "real" applications as use-cases > for CM.] > >>> 2. Some users _complained_ (among other things :-) that we should not >>> force immutability of some input (model function and Jacobian IIRC) >>> because in some use-cases, it would be unnecessarily costly. >> >> I agree that copying any large matrices or arrays is prohibitively >> expensive. For the NLLS package we would be copying a pointer to a >> function that can generate a large matrix. I think adding some >> documentation that functions should be thread safe if you want to use >> them from multiple threads would be sufficient. > > I you pass a "pointer" (i.e. a "reference" in Java), all bets are off: > the > class is not inherently thread-safe. That's why I suggested to mandate a > _deep_ "copy" method (with a stringent contract that should allow a > caller > to be sure that all objects owned by an instance are disconnected from > any > other objects). As someone who has designed a thread safe application based on deep copying I don't think this is route to follow. A deep copy means you have to be able to copy an arbitrary (possibly cyclical) reference graph. Without the graph copy there are many subtle bugs. (References to the same object are now references to different objects.) With the graph copy the implementation is very complex. This is the reason Serialization has a separate "patch up" step after object creation, which leads to some nasty tricks/bugs. Similarly, Cloneable only produces a shallow copy. Opinions may vary, but in my experience immutability is an easier approach to thread safety, especially when you have to depend on user code. Either way there can be some tricky bugs for the user. In the immutable approach the bugs are solved by finding non-final fields and references to objects that are not thread safe. In the deep copy approach it can be hard to track down which reference isn't being copied correctly. My experience is that when I used deep copy I found may bugs in production. Now with immutability it is easier for me to reason about thread safety and I find few. >> >>> Consequently, if (when creating a new instance) we assign a reference >>> passed to the fluent method, we cannot warrant thread-safety anymore; >>> which in turn poses the question of whether this false sense of >>> security >>> warrants the increased code duplication and complexity (compare your >>> code >>> below with the same but without the constructors and "create" methods: >>> even a toy example is already cut by more than half). >> >> Agreed that thread safety can only be guaranteed with the help of the >> user. The immutable+fluent combination does add an additional layer of >> indirection. On the other hand it is much simpler to debug and analyze, >> especially in a concurrent environment. > > Immutable objects can be shared between threads. > Thread-safety is equally obtained if mutable objects are not shared > between > threads, e.g. keep the optimizer confined in one thread (no > implementation > in CM features concurrency anyways). > Even if the optimizers are not immutable, it will be possible to benefit > from efficiency improvement brought by concurrency e.g. by ensuring > that the > objective function is thread-safe, several evaluations could be > performed in > parallel. [Moreover the optimization logic is not really concurrent in > most > optimizers. So why have unnecessarily complicated code?] I agree that the optimization algorithm doesn't need to be concurrent. I think the key is to be able to use the same optimizer (algorithm) on different problems in different threads. E.g. optimizer.optimize(problem) or optimizer.copy().optimize(problem) Both work and both are better than my solution with the current API (see below), but I prefer the immutable algorithm. In the end you are the maintainer. ;) >> >>> Then if we really want to aim at thread-safety, I think that the >>> approach >>> of mandating a "copy()" interface to all participating classes, >>> would be >>> a serious contender to just making all fields "final". Let's also >>> recall >>> that immutability is not a goal but a means (the goal being >>> thread-safety). >>> >>> I still think that the three "tools" mentioned in the subject line >>> do not >>> play along very well, unfortunately. >>> I was, and still am, a proponent of immutability but IMO this >>> discussion >>> indicates that it should not be enforced at all cost. In particular, in >>> small and non-layered objects, it is easy to ensure thread-safety >>> (through >>> "final") but the objects that most benefit from a fluent API are big >>> (with >>> many configuration combinations), and their primary usage does not >>> necessarily benefit from transparent instantiation. >>> >>> Given all these considerations, in the first steps for moving some CM >>> codes >>> to fluent APIs, I would aim for simplicity and _less_ code (if just >>> to be >>> able to make adjustments more quickly). >>> Then, from "simple" code, we can more easily experiment (and compare, >>> with >>> "diff") the merits and drawbacks of various routes towards >>> thread-safety. >>> >>> OK? >> >> Agreed on the goal of thread safety. Is the copy a shallow copy? > > No! (cf. above.) > >> If it >> is, then copy is a complete punt of thread-safety to the user, forcing >> them to them to determine when and what they need to copy. I think the >> mutable+copy option would make it harder to understand client code and >> understand where copying is necessary. > > No; my suggestion, if feasible, is that a user who needs his own, > personal, > unshared instance would simply call "copy()": > > public void myMethodOne(Optimizer optim) { > // I don't trust that "optim" can be safe: I make a (deep) copy. > final Optimizer myOptim = optim.copy(); > > // By contract, "myOptim" is assumed to contain unshared fields. > // I can pass it safely to another thread. > myMethodTwo(myOptim); > } > > private void myMethodTwo(Optimizer optim) { > // Create a new thread, whatever... > } > > But my current point is that this could be developed at a later point, > after the fluent API has been adopted (and more widely tried within CM), > when someone has shown that e.g. the optimizer must be thread-safe to > achieve some actual use-case. In my use case I have a class that solves several related, but different, NLLS problems concurrently. I would like the optimization algorithm to be a configurable dependency (GN or LM). Currently I have a custom (thread safe) interface with two implementations that wrap the commons math optimizers, in order to provide thread safety and polymorphic access to all the options that both optimizers support. Best Regards, Evan >> Ultimately the decision is up to >> the maintainers and I think both options under discussion are a big >> improvement over the current API. :) >> >> Thanks for the great library. > > Thanks for the discussion, > Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org