yes i did.
<property>
<name>io.seqfile.compression.type</name>
<value>BLOCK</value>
</property>
Any clues ?
Hi,
On 6/3/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote:
Thanks but it seems it doesn't work. My data still use 7Go of space on
my
disk.
My command line is:
/usr/local/java/bin/java -Xmx512m -Dhadoop.log.dir=/data/sengine/logs -
Dhadoop.log.file=hadoop.log -
Djava.library.path=/data/sengine/lib/native/Linux-i386-32 -classpath
/data/sengine/conf:/usr/local/java/lib/tools.jar:/data/sengine/build:/data/sengine/build/nutch-
1.0-dev.job:/data/sengine/build/test/classes:/data/sengine/nutch-*.job:/data/sengine/lib/commons-cli-2.0-SNAPSHOT.jar:/data/sengine/lib/commons-codec-1.3.jar:/data/sengine/lib/commons-httpclient-3.0.1.jar:/data/sengine/lib/commons-lang-2.1.jar:/data/sengine/lib/commons-logging-1.0.4.jar:/data/sengine/lib/commons-logging-api-1.0.4.jar:/data/sengine/lib/hadoop-0.12.2-core.jar:/data/sengine/lib/jakarta-oro-2.0.7.jar:/data/sengine/lib/jets3t-0.5.0.jar:/data/sengine/lib/jetty-5.1.4.jar:/data/sengine/lib/junit-3.8.1.jar:/data/sengine/lib/log4j-1.2.13.jar:/data/sengine/lib/lucene-core-2.1.0.jar:/data/sengine/lib/lucene-misc-2.1.0.jar:/data/sengine/lib/servlet-api.jar:/data/sengine/lib/taglibs-i18n.jar:/data/sengine/lib/xerces-2_6_2-apis.jar:/data/sengine/lib/xerces-2_6_2.jar:/data/sengine/lib/jetty-ext/ant.jar:/data/sengine/lib/jetty-ext/commons-el.jar:/data/sengine/lib/jetty-ext/jasper-compiler.jar:/data/sengine/lib/jetty-ext/jasper-runtime.jar:/data/sengine/lib/jetty-ext/jsp-api.j
ar
org.apache.nutch.crawl.CrawlDbMerger /data/sengine/crawlmd/crawldb
/data/sengine/crawl/crawldb /data/sengine/crawl.b2/crawldb
It looks like it load the native library but i can not see any logs
which
says it.
The library are in the folder:
#ls /data/sengine/lib/native/Linux-i386-32
libhadoop.a libhadoop.so libhadoop.so.1 libhadoop.so.1.0.0
How can i be sure that the compression works ?
Just to be sure, did you change io.seqfile.compression.type option to
BLOCK?
Also, IIRC, if hadoop loads native libs, it doesn't print a log about
it. Hadoop only reports if it fails to load them.
Regards,
E
> Emmanuel JOKE wrote:
>> Hi Guys,
>>
>> I've read an article which explain that we are now able to use the
>> native
>> lib of hadoop in order to compress our data crawled.
>>
>> I'm just wondering how can we compress a crawldb and all others stuff
>> that
>> are already saved on the disk.
>> could you please help me ?
>
> You can use the *Merger tools to re-write the data. E.g. CrawlDbMerger
> for crawldb, giving just a single db as the input argument.
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
--
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general