Hi, On 6/3/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote: > Thanks but it seems it doesn't work. My data still use 7Go of space on my > disk. > > My command line is: > /usr/local/java/bin/java -Xmx512m -Dhadoop.log.dir=/data/sengine/logs - > Dhadoop.log.file=hadoop.log - > Djava.library.path=/data/sengine/lib/native/Linux-i386-32 -classpath > /data/sengine/conf:/usr/local/java/lib/tools.jar:/data/sengine/build:/data/sengine/build/nutch- > 1.0-dev.job:/data/sengine/build/test/classes:/data/sengine/nutch-*.job:/data/sengine/lib/commons-cli-2.0-SNAPSHOT.jar:/data/sengine/lib/commons-codec-1.3.jar:/data/sengine/lib/commons-httpclient-3.0.1.jar:/data/sengine/lib/commons-lang-2.1.jar:/data/sengine/lib/commons-logging-1.0.4.jar:/data/sengine/lib/commons-logging-api-1.0.4.jar:/data/sengine/lib/hadoop-0.12.2-core.jar:/data/sengine/lib/jakarta-oro-2.0.7.jar:/data/sengine/lib/jets3t-0.5.0.jar:/data/sengine/lib/jetty-5.1.4.jar:/data/sengine/lib/junit-3.8.1.jar:/data/sengine/lib/log4j-1.2.13.jar:/data/sengine/lib/lucene-core-2.1.0.jar:/data/sengine/lib/lucene-misc-2.1.0.jar:/data/sengine/lib/servlet-api.jar:/data/sengine/lib/taglibs-i18n.jar:/data/sengine/lib/xerces-2_6_2-apis.jar:/data/sengine/lib/xerces-2_6_2.jar:/data/sengine/lib/jetty-ext/ant.jar:/data/sengine/lib/jetty-ext/commons-el.jar:/data/sengine/lib/jetty-ext/jasper-compiler.jar:/data/sengine/lib/jetty-ext/jasper-runtime.jar:/data/sengine/lib/jetty-ext/jsp-api.j > ar > org.apache.nutch.crawl.CrawlDbMerger /data/sengine/crawlmd/crawldb > /data/sengine/crawl/crawldb /data/sengine/crawl.b2/crawldb > > It looks like it load the native library but i can not see any logs which > says it. > > The library are in the folder: > #ls /data/sengine/lib/native/Linux-i386-32 > libhadoop.a libhadoop.so libhadoop.so.1 libhadoop.so.1.0.0 > > How can i be sure that the compression works ?
Just to be sure, did you change io.seqfile.compression.type option to BLOCK? Also, IIRC, if hadoop loads native libs, it doesn't print a log about it. Hadoop only reports if it fails to load them. > > Regards, > E > > > > > Emmanuel JOKE wrote: > >> Hi Guys, > >> > >> I've read an article which explain that we are now able to use the > >> native > >> lib of hadoop in order to compress our data crawled. > >> > >> I'm just wondering how can we compress a crawldb and all others stuff > >> that > >> are already saved on the disk. > >> could you please help me ? > > > > You can use the *Merger tools to re-write the data. E.g. CrawlDbMerger > > for crawldb, giving just a single db as the input argument. > > > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __________________________________ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > > > > > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
