I think the primary focus is grouping packages with the following rules:

        1) Group packages that are strongly connected
        2) Start with the largest group and try to merge groups into it that do 
not cause additional dependencies
        3) repeat for all groups

I think you need a cost/entropy function to calculate the optimum division. If 
you group package A and B then the cost is zero if A and B have identical 
imports. Even if A and B are not cohesive they could be placed in the same 
bundle because they do not drag in additional dependencies. So the interesting 
question is what the cost is if B would add one additional import. Is it worth 
it? Or not? 

Part of that analysis is then to analyze if you could do more grouping if 
classes were moved from packages to other (new?) packages. I.e. sometimes you 
have a package where there is only one class that makes the package not 
groupable. I think there should be the concept of a "dependency cost". If you 
import X by 15 packages and 254 classes it is likely that you get your moneys 
worth for that dependency. However, if you find that a single class drags in 
dependencies that nobody else uses it is likely that that class is expensive. 
It is interesting how much automation we can do there but I expect you need 
people to look at the details.

One of the biggest modularity problems are usually when you get bridge classes. 
I.e. someone has a library doing X but wants to make it available with for 
example Spring. There is usually then a few classes bridging the library to the 
Spring world, which can be extremely expensive. For example, bnd is coupled to 
ant but I made sure that was a separate package.

This all seems closely related to the concept of entropy and it might be 
interesting to take a look at Shannon et al. You have to find a decomposition 
that has minimum entropy where entropy is somehow defined in terms of imports 
versus contents. You want to group as much as possible while minimizing the 
connections between the groups. Again, this normally means you need a cost 
function and optimize that cost function.

However, start with the mechanic grouping and apply that idea to open source 
projects to see how this would look like. If you could calculate the "entropy" 
of existing bundles that would also be very interesting.

Kind regards,

        Peter Kriens


On 8 jun 2011, at 10:35, Tiger Gui wrote:

> Hi Peter,
> 
> I am working about source code dependencies analyse algorithm design
> and implement job, i will finish the whole analyse algorithm in the
> coming month. This algorithm include two sections: package and class.
> 
> 1. Package section
> 
> a. It can analyse package cycles in project source code
> b. Analyse all the necessary packages for each package
> c. Tell us who use it about each package
> 
> 2. Class section
> 
> a. This algorithm will tell us all the class cycles in project source
> code (for example A -> B -> C -> A)
> b. Analyse all the necessary classes for each class (for example, it
> can tell us class A use class B, C and D)
> c. Tell us who use it about each class (for example, it can tell us
> class A was used by class B and C)
> 
> After we get the source code analyse report, we should split the
> project into several OSGi bundles, so the problems is how should we
> split the project according to the report.
> 
> In my initial option:
> 
> A. classes in a cycle should be in the same bundle
> B. classes (or interfaces ) which were used much by other classes, but
> does not require any other class, can be in the same bundle. (Usually,
> these are basic interface or abstract class).  These classes usually
> be API define classes.
> 
> I am very clear about these two situations, but there should be many
> other situations. So, you advises ?
> 
> -- 
> Best Regards
> ----------------------------------------------------
> Tiger Gui [tigergui1...@gmail.com]

Reply via email to