Hi,

I was trying to run Benchmark in trunk using MySQL, on a standalone Hadoop cluster. My conf/gora.properties has this:

gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver
gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?user=nutch&password=nutch

Jobs were failing though, with the following:

Exception in thread "main" java.lang.NoSuchMethodError: org.hsqldb.DatabaseURL.parseURL(Ljava/lang/String;ZZ)Lorg/hsqldb/persist/HsqlProperties;
        at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
        at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:207)
        at org.gora.sql.store.SqlStore.getConnection(SqlStore.java:712)
        at org.gora.sql.store.SqlStore.initialize(SqlStore.java:145)
at org.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:64) at org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:86) at org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:98) at org.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:70) at org.apache.nutch.storage.StorageUtils.createDataStore(StorageUtils.java:25) at org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:68) at org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:50)
        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:237)
        at org.apache.nutch.tools.Benchmark.benchmark(Benchmark.java:190)
        at org.apache.nutch.tools.Benchmark.run(Benchmark.java:139)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.tools.Benchmark.main(Benchmark.java:32)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


Isn't this puzzling... It turns out that java.sql.DriverManager will try _all_ drivers in turn to see which one can handle the jdbcUrl, and the usual magic of Class.forName(jdbcDriver) doesn't mean we are going to use jdbcDriver, it's just to make sure the driver class was loaded and registered itself on the list of available drivers.

Now, I know why the particular error occured - Hadoop includes HSQLDB 1.8, and we use HSQLDB 2.0. When DriverManager tries each driver in turn, unfortunately Hsqldb is first on the classpath (it comes in Hadoop/lib), and MySQL is the last, so it bombs out even before trying the right driver...

For now I changed my build.xml to this:

Index: build.xml
===================================================================
--- build.xml   (revision 983564)
+++ build.xml   (working copy)
@@ -123,7 +123,7 @@
                   excludes="nutch-default.xml,nutch-site.xml"/>
       <zipfileset dir="${conf.dir}" excludes="*.template,hadoop*.*"/>
       <zipfileset dir="${build.lib.dir}" prefix="lib"
-                  includes="**/*.jar" excludes="hadoop-*.jar"/>
+                  includes="**/*.jar" excludes="hadoop-*.jar,hsqldb*.jar"/>
       <zipfileset dir="${build.plugins}" prefix="plugins"/>
     </jar>
   </target>



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to