FYI, running another mirror is a noble goal, and I'm sure there is a
lot of good that could come of finding innovative ways to index, but I
did want to let you know that there is already a searchable index that
provides a number of tools a way to search by class name, GAV
coordinates, etc.

Maybe it might make more sense to find new information to include in
the Nexus index?    See
http://docs.codehaus.org/display/M2ECLIPSE/Nexus+Indexer for more
information about the index and how to generate one from the command
line.

References:

[1] http://weblogs.java.net/blog/kohsuke/archive/2008/05/nexus_index_is.html
[2] http://wiki.netbeans.org/MavenBestPractices


On Tue, Jul 14, 2009 at 4:08 PM, Geoff Clitheroe<[email protected]> wrote:
> Hi,
>
> I'm interested in hosting a maven mirror in New Zealand.  As far as I
> know there is not one available in this region.  Any comments on this
> being a good idea?  If so what is the preferred method (and source)
> for creating a mirror and keeping it in sync?  I work for
> http://www.gns.cri.nz and we could either host here (good because we
> are on research network as well) or at one of our remote sites (good
> because they are on the main NZ peering points).  I've got 'in
> principal agreement' but I need to work out some real numbers (disk,
> bandwidth etc).
>
> Aside from the obvious local mirror reasons I'm also interested in
> adding class name search to a repo, similar to
> http://www.findjar.com/.  I'll include, at the end of this message, a
> discussion I've been having with one of the Tattletale developers
> (http://www.jboss.org/tattletale) about this idea and some testing
> I've been doing.  If anyone has any comments on the idea (validity,
> necessity, obvious pitfalls etc) it would be greatly appreciated.
>
> Cheers,
> Geoff
>
>
> Hi Jesper,
>
> Just following up on searching for classes.  I'm imagining that when
> Tattletale reports on missing classes that it would then be possible
> to provide a link to a list of jars that contain that class (like
> http://www.findjar.com/).  You mentioned profiles for Tattletale which
> I will get back to at the end.
>
> I've done a spike test for implementing class name level searching for
> public repos.  I would hope to develop a search function that is
> embeddable and could also be a web service (it's actually more complex
> to make it embeddable than to provide a webservice).  From what I've
> done I see the main issues as being bandwidth and political and I
> don't think they would be insurmountable.
>
> What I've done is scrape about 2000 jars from
> http://mirrors.ibiblio.org/pub/mirrors/maven2/
> and http://repository.jboss.com/maven2/  I targeted Spring, Hibernate,
> Seam, Webbeans, Tapestry, and a few apache-commons projects.  I then
> analyse the jars using 'jar tf ...' to extract the class names and
> populate a lucene index using solr.  The resulting index is about 3.8M
> (for 880M of jars) with no thought to space saving in the index yet.
> Analysis and indexing takes about 10 mins on my aged laptop and again
> I've done nothing to optimize this as yet.
>
> I've added two search methods to access the index:
> public List <JarName> findJarsByClassName(String className);
> public List <JarLocation> findJarsByJarName(String jarName);
>
> So findJarsByClassName() returns a list of JarName that contain that
> class and from that the jar name can listed:
>
> findJarsByClassName("net.sf.hibernate.Hibernate")
> hibernate-2.0.1.jar
> hibernate-2.0.2.jar
> hibernate-2.0.3.jar
> hibernate-2.0-beta-6.jar
> hibernate-2.0-final.jar
> hibernate-2.0-final.jar
> hibernate-2.0-beta-5.jar
> hibernate-2.1.1.jar
> hibernate-2.1.2.jar
> hibernate-2.1.3.jar
>
> Searching for a class name is case insensitive and can be part class
> name to a punctuation token level (e.g., net.sf but not net.sf.hiber).
>  For implementing search the devil is always in the analysis but I
> think this is a fairly well defined problem.
>
> Then findJarsByJarName() can be used to to find where a praticualr jar
> is and return a URL and containing directory URL (often more useful as
> the first thing to do is usually look at the POM).   Ultimately this
> could return links to several repos:
>
> findJarsByJarName("hibernate-2.1.3.jar")
> hibernate-2.1.3.jar
> http://mirrors.ibiblio.org/pub/mirrors/maven2/hibernate/hibernate/2.1.3/hibernate-2.1.3.jar
> http://mirrors.ibiblio.org/pub/mirrors/maven2/hibernate/hibernate/2.1.3
>
> The search is very fast (milliseconds per query in my spike) and I'm
> going indirectly through solr - this can be made quicker (but more
> complex) by working with lucene directly.
>
> If this looks interesting I'd be happy to provide the spike code for
> any comments feedback etc.  Do you use linux (unix), Windows, or Mac
> then I could make sure there are working examples?
>
> For the project I'm proposing there are some implementation questions
> mainly around getting all the jars and syncing out new indexes (but
> these issues have been well addressed by Lucene before).  In the first
> instance I would approach the owners of http://repo2.maven.org/maven2/
> about scraping and hosting a mirror in New Zealand.  I certainly think
> it could be done.  I guess I'd imagine a subproject of lucene but that
> is yet to be thought through.
>
> Back to the Tattletale profile question.  Are you referring to files like 
> this:
> http://fisheye.jboss.org/browse/Tattletale/trunk/src/etc/sunjdk5-jsse.clz?r=trunk
>
> If this API is a snapshot of the classes contained in a jar at some
> release point then it's easy to do with 'jar tf' and a script (as long
> as the jar spec stays about the same and people don't obfuscate, or
> use a custom class loader etc).  I could produce this as a side effect
> of indexing a repo but I wonder how this will scale.
>
> I look forwards to hearing from you.
>
> Cheers,
> Geoff
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to