Re: [Nutch-general] Compression

Doğacan Güney Sun, 03 Jun 2007 01:11:25 -0700

Hi,

On 6/3/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote:
> Thanks but it seems it doesn't work. My data still use 7Go of space on my
> disk.
>
> My command line is:
> /usr/local/java/bin/java -Xmx512m -Dhadoop.log.dir=/data/sengine/logs -
> Dhadoop.log.file=hadoop.log -
> Djava.library.path=/data/sengine/lib/native/Linux-i386-32 -classpath
> /data/sengine/conf:/usr/local/java/lib/tools.jar:/data/sengine/build:/data/sengine/build/nutch-
> 1.0-dev.job:/data/sengine/build/test/classes:/data/sengine/nutch-*.job:/data/sengine/lib/commons-cli-2.0-SNAPSHOT.jar:/data/sengine/lib/commons-codec-1.3.jar:/data/sengine/lib/commons-httpclient-3.0.1.jar:/data/sengine/lib/commons-lang-2.1.jar:/data/sengine/lib/commons-logging-1.0.4.jar:/data/sengine/lib/commons-logging-api-1.0.4.jar:/data/sengine/lib/hadoop-0.12.2-core.jar:/data/sengine/lib/jakarta-oro-2.0.7.jar:/data/sengine/lib/jets3t-0.5.0.jar:/data/sengine/lib/jetty-5.1.4.jar:/data/sengine/lib/junit-3.8.1.jar:/data/sengine/lib/log4j-1.2.13.jar:/data/sengine/lib/lucene-core-2.1.0.jar:/data/sengine/lib/lucene-misc-2.1.0.jar:/data/sengine/lib/servlet-api.jar:/data/sengine/lib/taglibs-i18n.jar:/data/sengine/lib/xerces-2_6_2-apis.jar:/data/sengine/lib/xerces-2_6_2.jar:/data/sengine/lib/jetty-ext/ant.jar:/data/sengine/lib/jetty-ext/commons-el.jar:/data/sengine/lib/jetty-ext/jasper-compiler.jar:/data/sengine/lib/jetty-ext/jasper-runtime.jar:/data/sengine/lib/jetty-ext/jsp-api.j
>  ar
> org.apache.nutch.crawl.CrawlDbMerger /data/sengine/crawlmd/crawldb
> /data/sengine/crawl/crawldb /data/sengine/crawl.b2/crawldb
>
> It looks like it load the native library but i can not see any logs which
> says it.
>
> The library are in the folder:
> #ls /data/sengine/lib/native/Linux-i386-32
> libhadoop.a  libhadoop.so  libhadoop.so.1  libhadoop.so.1.0.0
>
> How can i be sure that the compression works ?



Just to be sure, did you change io.seqfile.compression.type option to BLOCK?
Also, IIRC, if hadoop loads native libs, it doesn't print a log about
it. Hadoop only reports if it fails to load them.

>
> Regards,
> E
>
>
>
> > Emmanuel JOKE wrote:
> >> Hi Guys,
> >>
> >> I've read an article which explain that we are now able to use the
> >> native
> >> lib of hadoop in order to compress our data crawled.
> >>
> >> I'm just wondering how can we compress a crawldb and all others stuff
> >> that
> >> are already saved on the disk.
> >> could you please help me ?
> >
> > You can use the *Merger tools to re-write the data. E.g. CrawlDbMerger
> > for crawldb, giving just a single db as the input argument.
> >
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >   ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Compression

Reply via email to