[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389656#comment-14389656 ] Hudson commented on TIKA-1558: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #592 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/592/]) TIKA-1558. Better error message and fix typo. (tpalsulich: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1670490) * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java TIKA-1558. Refactor Parser blacklisting. (tpalsulich: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1670487) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParser.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserSubclass.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserTest.java * /tika/trunk/tika-core/src/test/resources/META-INF * /tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist2_file.blacklist2 * /tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist_file.blacklist * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java * /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1558-blacklistsub.xml Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. -So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}.- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389464#comment-14389464 ] ASF GitHub Bot commented on TIKA-1558: -- Github user tpalsulich closed the pull request at: https://github.com/apache/tika/pull/39 Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341529#comment-14341529 ] Hudson commented on TIKA-1558: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #513 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/513/]) Start on unit testing for the new TIKA-1558 style parser blacklisting (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1662927) * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java * /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config * /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1558-blacklist.xml Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341556#comment-14341556 ] Hudson commented on TIKA-1558: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #514 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/514/]) TIKA-1558 Support excluding (blacklisting) parsers from config, so you can use DefaultParser for all except certain parsers. Also supports child parsers of a composite parser from config, towards TIKA-1509 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1662940) * /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/DefaultParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340624#comment-14340624 ] Tyler Palsulich commented on TIKA-1558: --- [~gagravarr], that sounds good! IMO, a consolidated configuration file for all Parsers is better than splitting/duplicating functionality with what I implemented here. Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334779#comment-14334779 ] Nick Burch commented on TIKA-1558: -- I've updated the Tika Config example in https://wiki.apache.org/tika/CompositeParserDiscussion#With_a_Tika_Configuration_file to show how I see a parser-exclude (blacklist) working in config with DefaultParser. If people think that looks OK, I can add support for that pretty easily and quickly, I think! Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333266#comment-14333266 ] Tim Allison commented on TIKA-1558: --- I agree with Nick that I'd prefer to migrate more and more control to a config file than relying on SPI in the long term. As Nick observes on TIKA-1557, there is a way to do this now with the config file, but more work remains before we're ready to fully move to a config file. [~thetaphi], from a Solr/DIH perspective, would Solr users prefer SPI or a config file? Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333400#comment-14333400 ] Uwe Schindler commented on TIKA-1558: - Hi, Lucene uses SPI for its index codecs, so we are familar with SPI. But we have no problems with order of classpath. We just preserve what Java delivers in Classloader.getResources(). But order is not really important (it was important for testing in Lucene 4.x, but that's history since last Friday). We already have a custom TikaConfig class so I am happy to use that. In our case we would only put the SPI exclusion into our test classpath. But TikaConfig is also fine. Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1494#comment-1494 ] Chris A. Mattmann commented on TIKA-1558: - I agree that longer term we should move more to a config file, but there is a lot of work that needs to be done between now and then. This is a good interim solution and the code can keep evolving, so if someone comes up with a better patch by all means. Nick's solution was great; we now have a solution that Tyler added; and later maybe we can trump both of them with the config file. Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331975#comment-14331975 ] Tyler Palsulich commented on TIKA-1558: --- This has the added benefit of working for any Tika service -- Translator, Parser, etc. And, regardless of how/where the services are loaded, the blacklist is applied. In order for the blacklistlist to apply in all cases for TIKA-1509, the ServiceLoader would need to be passed the TikaConfig with the Parser (or whatever service?) strategy. Unless I'm missing something (very well could be!) I thought of TIKA-1509 as configuration when multiple Parsers are available. But, it could definitely apply as a blacklist feature. I'm happy to iterate. :) Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330171#comment-14330171 ] Nick Burch commented on TIKA-1558: -- Is it not better to do this via a custom tika config, once we get TIKA-1509? Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1558) Create a Parser Blacklist
[ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330010#comment-14330010 ] Hudson commented on TIKA-1558: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #501 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/501/]) TIKA-1558. Enable blacklisting of Parsers and other services with a servicename.blacklist META-INF file. (tpalsulich: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1661284) * /tika/trunk/CHANGES.txt * /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParser.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserSubclass.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserTest.java * /tika/trunk/tika-core/src/test/resources/META-INF * /tika/trunk/tika-core/src/test/resources/META-INF/services * /tika/trunk/tika-core/src/test/resources/META-INF/services/org.apache.tika.parser.Parser * /tika/trunk/tika-core/src/test/resources/META-INF/services/org.apache.tika.parser.Parser.blacklist * /tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/custom-mimetypes.xml * /tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist2_file.blacklist2 * /tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist_file.blacklist Create a Parser Blacklist - Key: TIKA-1558 URL: https://issues.apache.org/jira/browse/TIKA-1558 Project: Tika Issue Type: New Feature Reporter: Tyler Palsulich Assignee: Tyler Palsulich Fix For: 1.8 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers), there may not be an easy way to exclude the dependencies via Maven. So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)