Hi, I'm interested in hosting a maven mirror in New Zealand. As far as I know there is not one available in this region. Any comments on this being a good idea? If so what is the preferred method (and source) for creating a mirror and keeping it in sync? I work for http://www.gns.cri.nz and we could either host here (good because we are on research network as well) or at one of our remote sites (good because they are on the main NZ peering points). I've got 'in principal agreement' but I need to work out some real numbers (disk, bandwidth etc).
Aside from the obvious local mirror reasons I'm also interested in adding class name search to a repo, similar to http://www.findjar.com/. I'll include, at the end of this message, a discussion I've been having with one of the Tattletale developers (http://www.jboss.org/tattletale) about this idea and some testing I've been doing. If anyone has any comments on the idea (validity, necessity, obvious pitfalls etc) it would be greatly appreciated. Cheers, Geoff Hi Jesper, Just following up on searching for classes. I'm imagining that when Tattletale reports on missing classes that it would then be possible to provide a link to a list of jars that contain that class (like http://www.findjar.com/). You mentioned profiles for Tattletale which I will get back to at the end. I've done a spike test for implementing class name level searching for public repos. I would hope to develop a search function that is embeddable and could also be a web service (it's actually more complex to make it embeddable than to provide a webservice). From what I've done I see the main issues as being bandwidth and political and I don't think they would be insurmountable. What I've done is scrape about 2000 jars from http://mirrors.ibiblio.org/pub/mirrors/maven2/ and http://repository.jboss.com/maven2/ I targeted Spring, Hibernate, Seam, Webbeans, Tapestry, and a few apache-commons projects. I then analyse the jars using 'jar tf ...' to extract the class names and populate a lucene index using solr. The resulting index is about 3.8M (for 880M of jars) with no thought to space saving in the index yet. Analysis and indexing takes about 10 mins on my aged laptop and again I've done nothing to optimize this as yet. I've added two search methods to access the index: public List <JarName> findJarsByClassName(String className); public List <JarLocation> findJarsByJarName(String jarName); So findJarsByClassName() returns a list of JarName that contain that class and from that the jar name can listed: findJarsByClassName("net.sf.hibernate.Hibernate") hibernate-2.0.1.jar hibernate-2.0.2.jar hibernate-2.0.3.jar hibernate-2.0-beta-6.jar hibernate-2.0-final.jar hibernate-2.0-final.jar hibernate-2.0-beta-5.jar hibernate-2.1.1.jar hibernate-2.1.2.jar hibernate-2.1.3.jar Searching for a class name is case insensitive and can be part class name to a punctuation token level (e.g., net.sf but not net.sf.hiber). For implementing search the devil is always in the analysis but I think this is a fairly well defined problem. Then findJarsByJarName() can be used to to find where a praticualr jar is and return a URL and containing directory URL (often more useful as the first thing to do is usually look at the POM). Ultimately this could return links to several repos: findJarsByJarName("hibernate-2.1.3.jar") hibernate-2.1.3.jar http://mirrors.ibiblio.org/pub/mirrors/maven2/hibernate/hibernate/2.1.3/hibernate-2.1.3.jar http://mirrors.ibiblio.org/pub/mirrors/maven2/hibernate/hibernate/2.1.3 The search is very fast (milliseconds per query in my spike) and I'm going indirectly through solr - this can be made quicker (but more complex) by working with lucene directly. If this looks interesting I'd be happy to provide the spike code for any comments feedback etc. Do you use linux (unix), Windows, or Mac then I could make sure there are working examples? For the project I'm proposing there are some implementation questions mainly around getting all the jars and syncing out new indexes (but these issues have been well addressed by Lucene before). In the first instance I would approach the owners of http://repo2.maven.org/maven2/ about scraping and hosting a mirror in New Zealand. I certainly think it could be done. I guess I'd imagine a subproject of lucene but that is yet to be thought through. Back to the Tattletale profile question. Are you referring to files like this: http://fisheye.jboss.org/browse/Tattletale/trunk/src/etc/sunjdk5-jsse.clz?r=trunk If this API is a snapshot of the classes contained in a jar at some release point then it's easy to do with 'jar tf' and a script (as long as the jar spec stays about the same and people don't obfuscate, or use a custom class loader etc). I could produce this as a side effect of indexing a repo but I wonder how this will scale. I look forwards to hearing from you. Cheers, Geoff --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
