[Long] Code Sharing Concepts

Craig R. McClanahan Thu, 08 Feb 2001 13:50:45 -0800
Several times, on several developer lists here within the Jakarta
community and elsewhere, the question comes up -- "why aren't we sharing
more code between projects and subprojects"?  Lots of us have opinions
and experiences about where this has worked (and not worked) for them in
the past.  The purpose of this thread is to discuss how we can create
and/or accumulate some infrastructure within Jakarta to encourage
sharing.

In order to accomplish this, some discussion on a wide ranging set of
concepts would be useful, so we can come to consensus on what steps we'd
like to see the Jakarta community do (if anything), and how.  As a
starting point for these discussions, I've outlined a few issues below
we might want to consider.


(1)  Subproject Or Community Resource?

This is a fundamental organizational / process question.  If a shared
library codebase is to be a separate Jakarta subproject, it would need
to be managed in the usual manner (sponsored by a set of initial
committers, have an initial codebase, proposed to PMC, voted and
accepted).  Alternatively, the shared code could be a "resource" to the
entire project, but that raises some interesting process and policy
questions:

* Who gets commit access?  All committers on all subprojects?
  If not all, how do you choose?

* Who decides whether a proposed codebase should be added
  to the shared library?

* Who arbitrates cases of conflict over goals of a particular module
  or set of classes within the shared code?

These issues are all normally handled by committer votes in a normal
Jakarta subproject -- the absence of such processes would seem almost
certain to create conflict.  Therefore, IMHO, we should commission a
regular subproject for this (assuming there are enough interested
committers with enough code to be worthwhile).

NOTE:  As with Ant, there is an issue of whether code in a shared
library like this fits the scope of the Jakarta Project as a whole, as
defined by the Board of the Apache Software Foundation.  We have sort of
established a precedent that saying "libraries useful in the
construction of server components" (such as the regular expression
libraries) are in scope -- but would we accept a library of Swing GUI
components?  Even if they were going to be used to build admin consoles
for server apps?


(2)  Functional Scope - What Is A Shareable Package?

IMHO we need to identify some basic rules that define what code modules
are reasonable candidates for being included in the shared library
(instead of in a particular subproject).  The following paragraphs
identify some dimensions along which such a decision might be defined.

(2a) Horizontal or Vertical Orientation

Peter Donald spoke of "vertical" integration as a typical model of code
sharing that goes on now -- projects that depend on each other, but
semi-autonomously.  It seems to me that what we are talking about here
is "horizontal" integration -- identifying code that is truly useful in
many domains, but without carrying along the baggage of dependency on
lots of other pieces of an existing subproject.

There are many code modules where its pretty obvious that the
functionality provided is shareable and independent.  But, IMHO, above a
certain level of complexity you will tend to find dependencies -- either
explicit because of shared APIs or implicit because of shared design
assumptions -- on particular application use cases or design patterns.

We should strive, at least initially, to collect code that is relatively
simple and independent of such assumptions, to avoid concerns about
vertical dependencies.  Over time, we can get better at identifying the
selection criteria for more complex cases.

(2b) External Dependencies

A fairly unambiguous criteria that can be used for admitting proposed
modules to the shared library are what external dependencies that code
has.  The following paragraphs identify some simple cases that we should
create policies for:

Core Java APIs - JDK 1.1 or 1.2? - Many Jakarta subprojects still wish
to (or must) remain compatible with JDK 1.1.  Does that mean the entire
shared codebase must be JDK 1.1-compatible?  Do we support multiple
versions of some functionality to support this?

Optional Java Package APIs - If a shared component provides
functionality around a standard Java API (such as a connection pool
around the JDBC APIs), should it be required to conform to those APIs,
or may it define its own?  What happens when a Java API is standardized
after existing implementations of an API (past example = JDBC data
sources; upcoming example = JSR for a Logging API)?  Should it be
possible to build the shared library (without support for this
particular API) if the API itself is not available on the developer's
environment?

External Package Wrappers - It is reasonable to visualize JavaBeans (or
collections of JavaBeans) that provide insulation wrappers around
complex external packages (such as an XML parser) that make use of those
packages.  If generally useful, such wrappers should be encouraged, but
IMHO they should be arranged as optionally built components (in a manner
similar to the way that Ant builds optional tasks only if the underlying
packages are available).

Application or Framework Dependencies - Should code that substantially
depends on internal APIs of existing applications or subprojects be
rejected, or required to be refactored before submission?  Example case
- the Turbine connection pool implementation can be easily built as a
separate JAR file, but it takes along about 40 classes of Turbine
infrastructure.

(2c) API Stability Promises

We've all experienced cases on vertical integration where incompatible
API changes between versions of a package we depend on causes grief.
IMHO this issue becomes much worse if a shared library is successfully
in being widely used, because incompatible changes have a much broader
impact.

IMHO we need to formalize some sort of promise that a particular module
is "stable" with reference to APIs (implementations can, of course,
change underneath).  A module that is under active development (and
therefore hasn't settled its APIs yet) should either be constructed
outside the shared subproject, or clearly marked (somehow) as
"experimental" or "unstable" so that developers unwilling to accept the
risk of API instability are warned about it.

(2d) Overlapping Functionality

There has been an implicit assumption in the discussion about connection
pools that the "obvious" answer is to have a single implementation that
everyone can use, i.e. "one size fits all".  While that is
possibly/probably true in the connection pool case, it is not clear that
this will *always* be true -- sometimes we might want to offer more than
one implementation in the same area of functionality.  Examples within
Jakarta already:
* Regular expression parsers - we have two different libraries with
  two different goal sets and complexity levels.
* Taglibs - we have sets of tags with overlapping functionality
  that are interdependent with their own related tags (micro
  frameworks, if you will).

We should have policies about whether or not we want to accept this kind
of overlap -- IMHO ambiguity here will cause lots of conflict.

(2e) Ongoing Support Commitment

One of the risks of a shared library (versus small stand-alone projects)
is that the usual commitment to ongoing support and enhancement (by the
original authors) can be less.  We should not desire to end up with a
bunch of "orphaned" code that was placed in the shared subproject so
that "someone else will have to maintain it" -- there needs to be at
least some level of committment from original contributors to fixing
bugs and adding enhancements.


(3) Packaging and Versioning

(3a)  Package Naming Conventions

I would suggest "org.apache.share.xxxxx", but am open to any reasonable
suggestion.

(3b) Documentation Requirements

Code created within a particular subproject tends to be better known to
the committers on that subproject.  Code from a shared library is, by
its nature, going to be less well known -- in the same sense that an
external packge you depend on is.  Should we have a "higher" standard
for API documentation and usage examples?  How would/could such a
standard be enforced?

(3c)  Individual Packages or Entire Library?

Is the sum total of all the code in a shared subpackage a single
"thing", with a single release cycle (and perhaps in a single JAR), or
are all the modules inside independently accessed and versioned?

I suspect we would prefer something of a combination approach:
* Individual JAR files for each independent module
  within the shared library
* Independent versioning/packaging/docs per module
* Periodic "snapshot" releases of the entire suite (version A
  of the suite includes version X of module 1, version Y of
  module 2, and so on) for easy download.


(4)  Inital Codebases To Consider

Just to whet appetites, here are a few codebase chunks from projects I
am familiar in that might or might not fit the criteria we come up with,
and therefore might be candidates for contribution to a shared resource
library.  (I would, of course, be willing to make whatever documentation
and support commitments are required.)

[Struts] Connection pool - One of many candidates for JDBC connection
pools.  (NOTE:  I don't care so much about who "wins" as whether it has
the particular functionality I need -- javax.sql.DataSource
implementation, so that I can plug *any* connection pool implementation
I want into my framework or apps.)

[Struts] Digester package - fires events during SAX processing of an XML
document, similar to Tomcat's XmlMapper used to configure itself.

[Struts] Message Resources - utility classes for packaging resources
(similar to ResourceBundles but Serializable) that include messages for
the same keys in multiple locales.  Extension of the StringManager
concept in Tomcat and others.

[Struts] Bean Introspection support - BeanUtils, PropertyUtils, and
ConvertUtils classes have generic support for managing JavaBean property
setting and getting through the use of the Java Reflection APIs,
including support for nested and indexed property expressions.

[Tomcat 4] Naming - Robust implementation of JNDI Context and DirContext
APIs for in-memory resource collections to be accessed via JNDI (with
extensions for things like WAR files that are Tomcat-specific, and
therefore would probably remain in Tomcat).

[Tomcat/Watchdog/Slide/Ant] HTTP test clients for unit tests - I think
there are least five or six of them across these subprojects; don't need
more than one (or possibly two).

[Tomcat] - A bunch of low level utility modules, much like any other
subproject accumulates, that might be generalizeable and generally
useful.


(5) Summary

What do you think?  Are we willing to work towards a common set of
goals, and then implement a strategy based on them?  Comments are
welcome.

Craig McClanahan



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
[Long] Code Sharing Concepts

Reply via email to