Tomcat uses own slightly different version of URL class:
http://tomcat.apache.org/tomcat-5.5-doc/catalina/docs/api/index.html
URL is designed to provide public APIs for parsing and synthesizing Uniform
Resource Locators as similar as possible to the APIs of java.net.URL, but
without the ability to open a stream or connection. One of the consequences
of this is that you can construct URLs for protocols for which a
URLStreamHandler is not available (such as an https URL when JSSE is not
installed).
Synchonized staff in java.net.URL is URLStreamHandler -related.
-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: December-09-09 5:40 PM
To: nutch-dev@lucene.apache.org
Subject: RE: java.net.URL synchronization
I checked java.net.URL; yes, Nutch and BIXO implicitly use synchronized
Hashtable:
public URL(String protocol, String host, int port, String file,
URLStreamHandler handler) throws MalformedURLException {
...
if (handler == null
(handler = getURLStreamHandler(protocol)) == null) {
throw new MalformedURLException(unknown protocol: +
protocol);
}
...
However, I don't think it hurts because both architecture (at least, BIXO)
run single thread in a single JVM to process, for instance, Outlinks. Only
Fetch part is multithreaded, but it doesn't use URL class.
Not sure about Nutch, how the fetch list is generated... if multithreaded
then shared between threads RegexUrlNormalizer is even bigger problem...
Fuad Efendi
+1 416-993-2060
http://www.tokenizer.ca/
Data Mining, Vertical Search
-Original Message-
From: Otis Gospodnetic [mailto:ogjunk-nu...@yahoo.com]
Sent: December-09-09 5:12 PM
To: nutch-dev@lucene.apache.org
Subject: java.net.URL synchronization
Hello,
Has anyone seen this:
http://www.supermind.org/blog/580/java-net-url-synchronization-
bottleneck
?
Is this something that needs to be addressed in Nutch (and thus in Bixo,
and thus in the common crawler project)?
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay