Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "TikaPlugin" page has been changed by JulienNioche. http://wiki.apache.org/nutch/TikaPlugin?action=diff&rev1=1&rev2=2 -------------------------------------------------- - =Tika Plugin= + = Tika Plugin = + The Tika plugin in http://issues.apache.org/jira/browse/NUTCH-766 is a first attempt at delegating the parsing to Tika instead of having to maintain the parser plugins in Nutch. This page will list the differences in coverage or functionality between the Tika plugin and the existing Nutch parsers. Tika also has more formats not covered by Nutch which are not described here. + '''html''': ? - The Tika plugin in http://issues.apache.org/jira/browse/NUTCH-766 is a first attempt at delegating the parsing to Tika instead of having to maintain the parser plugins in Nutch. - This page will list the differences in coverage or functionality between the Tika plugin and the existing Nutch parsers. + '''js''': ? + + '''mp3''': ? + + '''msexcel''': ? + + '''mspowerpoint''': ? + + '''msword''': ? + + '''openoffice''': ? + + '''pdf''': ? + + '''rss''': ? + + '''rtf''': ? + + '''swf''' : not yet covered in Tika (see https://issues.apache.org/jira/browse/TIKA-337) + + '''text''': ? + + '''zip''': ? not covered in Tika +