On Mon, Oct 29, 2012 at 11:50 AM, Matthias Friedrich <[email protected]> wrote: > On Monday, 2012-10-29, Josh Wills wrote: >> On Sun, Oct 28, 2012 at 2:39 AM, Matthias Friedrich <[email protected]> wrote: > [...] >>> Good idea, let's first agree on a set of principles. In my opinion, >>> we should limit the scope for these prinicples to client-facing >>> packages, everything else can be changed in any way at any time. > >>> My proposal is based on [2], a very short and incomplete summary can >>> be found at [3]. For us, it boils down to this: > >>> * A package must have a clear purpose; it contains either mostly >>> abstractions or mostly implementations (this makes it easier >>> to explain) >>> * A package must not depend on a package that is less stable >>> than itself (meaning a package containing mostly abstractions >>> must not depend on one containing mostly implementations) >>> * There must be no dependencies from a client-facing package to >>> an internal package (that is, javadocs don't have dangling >>> references) >>> * There must be tight cohesion between classes in a package or >>> the package should be split (this doesn't apply for .util) >>> * There must be no dependency cycles between client-facing packages > >> I agree with these principles, although I think that the first one (clear >> purpose for a package) is often in conflict with the last one (dependency >> cycles between client facing packages). > > Hmm, I'm not sure. In most cases I've seen it's the mixing of > abstractions and implementation classes that makes cycles more likely > because the package has incoming references to its abstractions and > outgoing references from its implementations (see the .io problem > below). With just a tiny bit of sloppy programming your package > becomes part of a larger cycle that you don't even see without tool > support. > >> Is there an implicit priority scheme here? We're saying that having >> clear purpose for a package is more important than having dependency >> cycles, or are we saying that the two are equal? > > It can be really difficult to achieve all goals, sometimes even > prohibitively expensive because you'd need major refactorings that you > can't afford. If I really have to choose I'd pick the design > alternative that is easier to explain in my documentation. Cycles > aren't nice, but in the end we want an API that is easy to use and to > understand. > >>> You can calculate metrics for all of this but it's really just >>> common >>> sense. Crunch follows these rules in the vast majority of cases >>> already. Right now I see the following violations: > >>> * The .types package mixes abstractions and implementations and >>> is part of a dependency cycle with base. >>> * The base package references the .io implementation package >>> causing a dependency cycle. >>> * The base package references the .util package causing a >>> dependency cycle. >>> * There are lots of implementations in CombineFn and other Fns >>> that shouldn't be in base (which is for abstractions). We should >>> move them to .fn, perhaps to Guava style CombineFns, FilterFns. >>> We can even do this in a backwards compatible way. > >> So of these, I think that the CombineFn -> CombineFns change is the easiest >> fix, in that it solves the implementation issue for CombineFn and the >> dependency of the base API on the util package. I am 100% behind that one. > >> Sorting out the cycle between io/types/base seems trickier to me and I >> think that is the core of the design problem, and it goes right into the >> tradeoff between clear purpose for a package and the dependency cycles >> between client facing packages. Do you agree? > > Yes, that's more difficult. Let's validate my original proposal (move > PType, PTypeFamily, PTableType, Converter, and OutputHandler to base) > against the principles. > > Base has a clear purpose, it's the minimal client-facing facade that > holds all core abstractions. It doesn't depend on anything else, > neither client-facing nor internal packages, which also means there > can be no cycles. With Converter and OutputHandler, some > implementation details bleed into base; this is collateral damage that > we frown upon). > > The .types package would be a pure implementation class with helper > functionality for its subpackages. It would no longer be > client-facing, so the principles don't apply and we can safely hide it > from javadocs, which is good. > > The purpose of .io would be to provide factories for creating Sources, > Targets etc. Unfortunately, it also contains additional abstractions > that are referenced from .io's subpackages, causing dependency cycles. > This is tricky; I think the most promising solution is to split the > package, but I'm not sure how exactly (which part stays, which part > moves, and where?). Another solution would be to throw .io's > abstractions into base, but I would really like to avoid that. > > Do you have any ideas?
My feeling is that some of those IO abstractions-- especially the OutputHandler-- are bad abstractions, i.e., mistakes that were made when I designed my way into the aforementioned cul-de-sac. So if there are designs that let us get rid of those "abstractions," I consider that a good thing, as well as separating the implementation from the interface in the IO package. > > Regards, > Matthias
