Re: [Nutch-general] Compression

Emmanuel JOKE Sat, 02 Jun 2007 23:33:31 -0700

Thanks but it seems it doesn't work. My data still use 7Go of space on my
disk.


My command line is:
/usr/local/java/bin/java -Xmx512m -Dhadoop.log.dir=/data/sengine/logs -
Dhadoop.log.file=hadoop.log -
Djava.library.path=/data/sengine/lib/native/Linux-i386-32 -classpath
/data/sengine/conf:/usr/local/java/lib/tools.jar:/data/sengine/build:/data/sengine/build/nutch-
1.0-dev.job:/data/sengine/build/test/classes:/data/sengine/nutch-*.job:/data/sengine/lib/commons-cli-2.0-SNAPSHOT.jar:/data/sengine/lib/commons-codec-1.3.jar:/data/sengine/lib/commons-httpclient-3.0.1.jar:/data/sengine/lib/commons-lang-2.1.jar:/data/sengine/lib/commons-logging-1.0.4.jar:/data/sengine/lib/commons-logging-api-1.0.4.jar:/data/sengine/lib/hadoop-0.12.2-core.jar:/data/sengine/lib/jakarta-oro-2.0.7.jar:/data/sengine/lib/jets3t-0.5.0.jar:/data/sengine/lib/jetty-5.1.4.jar:/data/sengine/lib/junit-3.8.1.jar:/data/sengine/lib/log4j-1.2.13.jar:/data/sengine/lib/lucene-core-2.1.0.jar:/data/sengine/lib/lucene-misc-2.1.0.jar:/data/sengine/lib/servlet-api.jar:/data/sengine/lib/taglibs-i18n.jar:/data/sengine/lib/xerces-2_6_2-apis.jar:/data/sengine/lib/xerces-2_6_2.jar:/data/sengine/lib/jetty-ext/ant.jar:/data/sengine/lib/jetty-ext/commons-el.jar:/data/sengine/lib/jetty-ext/jasper-compiler.jar:/data/sengine/lib/jetty-ext/jasper-runtime.jar:/data/sengine/lib/jetty-ext/jsp-api.j
ar
org.apache.nutch.crawl.CrawlDbMerger /data/sengine/crawlmd/crawldb
/data/sengine/crawl/crawldb /data/sengine/crawl.b2/crawldb

It looks like it load the native library but i can not see any logs which
says it.

The library are in the folder:
#ls /data/sengine/lib/native/Linux-i386-32
libhadoop.a  libhadoop.so  libhadoop.so.1  libhadoop.so.1.0.0

How can i be sure that the compression works ?

Regards,
E

Emmanuel JOKE wrote:

Hi Guys,

I've read an article which explain that we are now able to use the
native
lib of hadoop in order to compress our data crawled.

I'm just wondering how can we compress a crawldb and all others stuff
that
are already saved on the disk.
could you please help me ?


You can use the *Merger tools to re-write the data. E.g. CrawlDbMerger
for crawldb, giving just a single db as the input argument.


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Compression

Reply via email to