Re: [Nutch-general] Compression

Emmanuel JOKE Sun, 03 Jun 2007 01:27:26 -0700

yes i did.

<property>
<name>io.seqfile.compression.type</name>
<value>BLOCK</value>
</property>


Any clues ?

Hi,

On 6/3/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote:

Thanks but it seems it doesn't work. My data still use 7Go of space on
my
disk.

My command line is:
/usr/local/java/bin/java -Xmx512m -Dhadoop.log.dir=/data/sengine/logs -
Dhadoop.log.file=hadoop.log -
Djava.library.path=/data/sengine/lib/native/Linux-i386-32 -classpath

/data/sengine/conf:/usr/local/java/lib/tools.jar:/data/sengine/build:/data/sengine/build/nutch-

1.0-dev.job:/data/sengine/build/test/classes:/data/sengine/nutch-*.job:/data/sengine/lib/commons-cli-2.0-SNAPSHOT.jar:/data/sengine/lib/commons-codec-1.3.jar:/data/sengine/lib/commons-httpclient-3.0.1.jar:/data/sengine/lib/commons-lang-2.1.jar:/data/sengine/lib/commons-logging-1.0.4.jar:/data/sengine/lib/commons-logging-api-1.0.4.jar:/data/sengine/lib/hadoop-0.12.2-core.jar:/data/sengine/lib/jakarta-oro-2.0.7.jar:/data/sengine/lib/jets3t-0.5.0.jar:/data/sengine/lib/jetty-5.1.4.jar:/data/sengine/lib/junit-3.8.1.jar:/data/sengine/lib/log4j-1.2.13.jar:/data/sengine/lib/lucene-core-2.1.0.jar:/data/sengine/lib/lucene-misc-2.1.0.jar:/data/sengine/lib/servlet-api.jar:/data/sengine/lib/taglibs-i18n.jar:/data/sengine/lib/xerces-2_6_2-apis.jar:/data/sengine/lib/xerces-2_6_2.jar:/data/sengine/lib/jetty-ext/ant.jar:/data/sengine/lib/jetty-ext/commons-el.jar:/data/sengine/lib/jetty-ext/jasper-compiler.jar:/data/sengine/lib/jetty-ext/jasper-runtime.jar:/data/sengine/lib/jetty-ext/jsp-api.j

 ar
org.apache.nutch.crawl.CrawlDbMerger /data/sengine/crawlmd/crawldb
/data/sengine/crawl/crawldb /data/sengine/crawl.b2/crawldb

It looks like it load the native library but i can not see any logs
which
says it.

The library are in the folder:
#ls /data/sengine/lib/native/Linux-i386-32
libhadoop.a  libhadoop.so  libhadoop.so.1  libhadoop.so.1.0.0

How can i be sure that the compression works ?



Just to be sure, did you change io.seqfile.compression.type option to
BLOCK?
Also, IIRC, if hadoop loads native libs, it doesn't print a log about
it. Hadoop only reports if it fails to load them.


Regards,
E



> Emmanuel JOKE wrote:
>> Hi Guys,
>>
>> I've read an article which explain that we are now able to use the
>> native
>> lib of hadoop in order to compress our data crawled.
>>
>> I'm just wondering how can we compress a crawldb and all others stuff
>> that
>> are already saved on the disk.
>> could you please help me ?
>
> You can use the *Merger tools to re-write the data. E.g. CrawlDbMerger
> for crawldb, giving just a single db as the input argument.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>



--
DoÄŸacan GÃ¼ney

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Compression

Reply via email to