Re: nutch file content limit

m.harig Thu, 05 Jun 2008 00:27:52 -0700

thanks

my situation is this.. i've 100 MS-WORD files . each has 15MB in size...


if i set file.content.limit as 5MB. when nutch goes for fetching it can't
parse the content. it says Can't handle as Microsoft document. and its
failed.. how do i index partial content of those documents. any1 help me out
of this


this is my error

Can't be handled as Microsoft document. java.io.IOException: Cannot remove
block[ 20839 ]; out of range
-- 
View this message in context: 
http://www.nabble.com/nutch-file-content-limit-tp17640376p17663787.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Re: nutch file content limit

Reply via email to