Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "CompositeParserDiscussion" page has been changed by NickBurch: https://wiki.apache.org/tika/CompositeParserDiscussion?action=diff&rev1=3&rev2=4 Comment: Config <mime>application/pdf</mime> </parser> - <!-- JPEG needs special handling --> + <!-- JPEG needs special handling - try+combine everything --> - <!-- XML needs special handling --> + <parser class="org.apache.tika.parser.(suppliment)"> + <parser class="org.apache.tika.parser.ocr.TesseractOCRParser" /> + <parser class="org.apache.tika.parser.image.ImageParser" /> + <parser class="org.apache.tika.parser.jpeg.JpegParser" /> + <parser class="org.apache.tika.parser.gdal.GDALParser" /> + <!-- TODO DO we need to give mimetypes here too? Or can we get implicitly? --> + </parser> + + <!-- XML needs special handling - use fallbacks to get something --> + <parser class="org.apache.tika.parser.(fallback)"> + <parser class="my.custom.xml.parser" /> + <parser class="org.apache.tika.parser.xml.XMLParser" /> + <parser class="org.apache.tika.parser.html.HTMLParser" /> + <parser class="org.apache.tika.parser.txt.TXTParser" /> + <mime>application/xml</mime> + </parser> </parsers> }}} == In Code == - ''TODO'' + Whatever we do, this must be available from code too, much as how today people can create custom {{{CompositeParser}}} instances, or wrap things up with custom {{{ParserDecorator}}} instances + We also need an example for all of these, not only in unit tests, but also in the examples pacakge +
