thanks my situation is this.. i've 100 MS-WORD files . each has 15MB in size...
if i set file.content.limit as 5MB. when nutch goes for fetching it can't parse the content. it says Can't handle as Microsoft document. and its failed.. how do i index partial content of those documents. any1 help me out of this this is my error Can't be handled as Microsoft document. java.io.IOException: Cannot remove block[ 20839 ]; out of range -- View this message in context: http://www.nabble.com/nutch-file-content-limit-tp17640376p17663787.html Sent from the Nutch - Dev mailing list archive at Nabble.com.