Re: How can I get one plugin's root dir

Andrzej Bialecki Tue, 16 Jan 2007 11:27:31 -0800

Scott Green wrote:

Thanks you for the detailed explanation, Andrzej.


My plugin contains one language-model(configuration file) whose size
is 40M, and could you please suggest me where the model file should
put.
a) put it into nutch/conf dir like "regex-urlfilter.txt" file
b) put it into plugin's jar package.

From the purely theoretic point of view, either way it should work fine- the content of conf/ dir is packed into the job jar too.

One comment though, and I hope I'm not confusing you too much ;) If thefile is that large, AND you execute your jobs usingjobtracker/tasktrackers, AND you run on Hadoop DFS, you may want to doexactly the opposite from what I advocated ;) I.e. keep this file in awell-known external location on DFS, where it's accessible to all tasks.You should also set its replication factor equal to the number ofdatanodes, and then load this file directly from DFS. Still, youwouldn't use java.io.File, but FileSystem.open(Path).

The reason is that if you pack this file into your job JAR, the job jarwould become very large (presumably this 40MB is already compressed?).Job jar needs to be copied to each tasktracker for each task, so youwill experience performance hit just because of the size of the job jar... whereas if this file sits on DFS and is highly replicated, itscontent will always be available locally.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: How can I get one plugin's root dir

Reply via email to