You may want to filter out small files, or common file name conventions: e.g. https://github.com/apache/accumulo/blob/trunk/maven-plugin/src/it/plugin-test/postbuild.groovy and https://github.com/apache/maven-plugins/blob/trunk/maven-invoker-plugin/src/it/script-additional-vars/src/it/groovy/postbuild.groovy are not the same, but probably were both built from the same example template.
-- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Mar 21, 2014 at 12:49 AM, Pawel Slusarz <p...@sw7d.com> wrote: > Greetings, > > When looking at the Apache SF Java projects as a group, I noticed that a > large number of projects have duplicate class names, ie > both openejb and tomee have a class named > jug.client.command.api.AbstractCommand > > When edge cases, ie test.Foo and tomcat55, tomcat60, tomcat70 get > eliminated, it still appears that the practice of code sharing by > drag-drop-modify is quite prevalent. Over 14,000 (out of 165,000) > classes were shared that way in the ecosystem, and 103 projects (out of > 300) are affected. > > Sometimes a measurement and visualization is all it takes to realize a > problem and begin fixing it. Below is raw data that can help understand > better what and how is happening: > > http://pslusarz.github.io/archeology3d/research/apache/conflicting-classes/index.html > > Hope this is the right place to engage in this sort of conversation. > > Paul Slusarz > > PS: Who am I and what's my agenda? I am interested in looking at large > codebases in search of patterns. I picked Apache SF, because, unlike my > company code, the data can be independently verified. The issue with > conflicting class names became apparent as I was trying to identify and > understand classes that are shared in this ecosystem. Some more > background on this approach can be found on my blog: > http://10kftcode.blogspot.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: community-unsubscr...@apache.org > For additional commands, e-mail: community-h...@apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: community-unsubscr...@apache.org For additional commands, e-mail: community-h...@apache.org