[ https://issues.apache.org/jira/browse/TIKA-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210429#comment-13210429 ]
Jukka Zitting commented on TIKA-866: ------------------------------------ Actually, scrap the above rationale. The DefaultParser is OK for inclusion in a configuration file (that's actually what it was designed for, see TIKA-527), it's just AutoDetectParser that wouldn't work well with that mechanism. The infinite loop triggered by DefaultParser was rather a result of an unnecessary getDefaultConfig() call in MediaTypeRegistry.getDefaultRegistry(). I replaced that call and restored the ability to use DefaultParser in configuration in revision 1245692. And as discussed above, I also improved the config code use the default parser or detector loading mechanism when no explicit <parser> or <detector> entries are present in a configuration file. A missing mimetypes entry was already being handled by loading the default settings, which was the original cause of the OOM as explained above. > Invalid configuration file causes OutOfMemoryException > ------------------------------------------------------ > > Key: TIKA-866 > URL: https://issues.apache.org/jira/browse/TIKA-866 > Project: Tika > Issue Type: Bug > Components: config > Affects Versions: 1.0 > Reporter: Stephan Mühlstrasser > Assignee: Jukka Zitting > Priority: Minor > Fix For: 1.1 > > Attachments: ConfigFile.java > > > I tried to override a built-in parser according to the method described in > issue TIKA-527. During testing this approach I used an incomplete > configuration file (as far as I learned from a discussion on the mailing list > also mimetypes and a detector should be specified): > $ cat tika-config.xml > <properties> > <parsers> > <parser class="org.apache.tika.parser.DefaultParser"/> > </parsers> > </properties> > Using this configuration file causes an OutOfMemoryException: > $ java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.util.Arrays.copyOfRange(Arrays.java:3209) > at java.lang.String.<init>(String.java:216) > at java.lang.StringBuilder.toString(StringBuilder.java:430) > at org.apache.tika.mime.MediaType.toString(MediaType.java:237) > at org.apache.tika.detect.MagicDetector.<init>(MagicDetector.java:142) > at > org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:254) > at > org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:202) > at > org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:186) > at > org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:152) > at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:124) > at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:107) > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:63) > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:91) > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:147) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:455) > at > org.apache.tika.config.TikaConfig.typesFromDomElement(TikaConfig.java:273) > at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:161) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237) > at > org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42) > at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52) > at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at java.lang.Class.newInstance0(Class.java:355) > at java.lang.Class.newInstance(Class.java:308) > at > org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:288) > at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:162) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:237) > at > org.apache.tika.mime.MediaTypeRegistry.getDefaultRegistry(MediaTypeRegistry.java:42) > at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:52) > at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > Expected behavior: If the configuration file is not valid, and appropriate > exception should be produced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira