[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-03-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389656#comment-14389656
 ] 

Hudson commented on TIKA-1558:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #592 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/592/])
TIKA-1558. Better error message and fix typo. (tpalsulich: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1670490)
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java
TIKA-1558. Refactor Parser blacklisting. (tpalsulich: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1670487)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParser.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserSubclass.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserTest.java
* /tika/trunk/tika-core/src/test/resources/META-INF
* 
/tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist2_file.blacklist2
* 
/tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist_file.blacklist
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java
* 
/tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1558-blacklistsub.xml


 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 -So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389464#comment-14389464
 ] 

ASF GitHub Bot commented on TIKA-1558:
--

Github user tpalsulich closed the pull request at:

https://github.com/apache/tika/pull/39


 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341529#comment-14341529
 ] 

Hudson commented on TIKA-1558:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #513 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/513/])
Start on unit testing for the new TIKA-1558 style parser blacklisting (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1662927)
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/config
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java
* /tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config
* 
/tika/trunk/tika-parsers/src/test/resources/org/apache/tika/config/TIKA-1558-blacklist.xml


 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341556#comment-14341556
 ] 

Hudson commented on TIKA-1558:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #514 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/514/])
TIKA-1558 Support excluding (blacklisting) parsers from config, so you can use 
DefaultParser for all except certain parsers. Also supports child parsers of a 
composite parser from config, towards TIKA-1509 (nick: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1662940)
* /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
* /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/DefaultParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/config/TikaParserConfigTest.java


 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-27 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340624#comment-14340624
 ] 

Tyler Palsulich commented on TIKA-1558:
---

[~gagravarr], that sounds good! IMO, a consolidated configuration file for all 
Parsers is better than splitting/duplicating functionality with what I 
implemented here.

 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-24 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334779#comment-14334779
 ] 

Nick Burch commented on TIKA-1558:
--

I've updated the Tika Config example in 
https://wiki.apache.org/tika/CompositeParserDiscussion#With_a_Tika_Configuration_file
 to show how I see a parser-exclude (blacklist) working in config with 
DefaultParser.

If people think that looks OK, I can add support for that pretty easily and 
quickly, I think!

 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333266#comment-14333266
 ] 

Tim Allison commented on TIKA-1558:
---

I agree with Nick that I'd prefer to migrate more and more control to a config 
file than relying on SPI in the long term.  As Nick observes on TIKA-1557, 
there is a way to do this now with the config file, but more work remains 
before we're ready to fully move to a config file.

[~thetaphi], from a Solr/DIH perspective, would Solr users prefer SPI or a 
config file? 

 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333400#comment-14333400
 ] 

Uwe Schindler commented on TIKA-1558:
-

Hi,
Lucene uses SPI for its index codecs, so we are familar with SPI. But we have 
no problems with order of classpath. We just preserve what Java delivers in 
Classloader.getResources(). But order is not really important (it was important 
for testing in Lucene 4.x, but that's history since last Friday).

We already have a custom TikaConfig class so I am happy to use that. In our 
case we would only put the SPI exclusion into our test classpath. But 
TikaConfig is also fine.

 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-23 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1494#comment-1494
 ] 

Chris A. Mattmann commented on TIKA-1558:
-

I agree that longer term we should move more to a config file, but there is a 
lot of work that needs to be done between now and then. This is a good interim 
solution and the code can keep evolving, so if someone comes up with a better 
patch by all means. Nick's solution was great; we now have a solution that 
Tyler added; and later maybe we can trump both of them with the config file.

 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-21 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14331975#comment-14331975
 ] 

Tyler Palsulich commented on TIKA-1558:
---

This has the added benefit of working for any Tika service -- Translator, 
Parser, etc. And, regardless of how/where the services are loaded, the 
blacklist is applied. In order for the blacklistlist to apply in all cases for 
TIKA-1509, the ServiceLoader would need to be passed the TikaConfig with the 
Parser (or whatever service?) strategy. Unless I'm missing something (very well 
could be!)

I thought of TIKA-1509 as configuration when multiple Parsers are available. 
But, it could definitely apply as a blacklist feature. I'm happy to iterate. :)

 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-21 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330171#comment-14330171
 ] 

Nick Burch commented on TIKA-1558:
--

Is it not better to do this via a custom tika config, once we get TIKA-1509?

 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1558) Create a Parser Blacklist

2015-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330010#comment-14330010
 ] 

Hudson commented on TIKA-1558:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #501 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/501/])
TIKA-1558. Enable blacklisting of Parsers and other services with a 
servicename.blacklist META-INF file. (tpalsulich: 
http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1661284)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParser.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserSubclass.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/parser/BlacklistedParserTest.java
* /tika/trunk/tika-core/src/test/resources/META-INF
* /tika/trunk/tika-core/src/test/resources/META-INF/services
* 
/tika/trunk/tika-core/src/test/resources/META-INF/services/org.apache.tika.parser.Parser
* 
/tika/trunk/tika-core/src/test/resources/META-INF/services/org.apache.tika.parser.Parser.blacklist
* 
/tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/custom-mimetypes.xml
* 
/tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist2_file.blacklist2
* 
/tika/trunk/tika-core/src/test/resources/org/apache/tika/parser/blacklist_file.blacklist


 Create a Parser Blacklist
 -

 Key: TIKA-1558
 URL: https://issues.apache.org/jira/browse/TIKA-1558
 Project: Tika
  Issue Type: New Feature
Reporter: Tyler Palsulich
Assignee: Tyler Palsulich
 Fix For: 1.8


 As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to 
 disable Parsers without pulling their dependencies out. In some cases (e.g. 
 disable all ExternalParsers), there may not be an easy way to exclude the 
 dependencies via Maven.
 So, an initial design would be to include another file like 
 {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}. We create a 
 new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in 
 {{ServiceLoader#loadServiceProviders}}, we remove all elements of the list 
 that are assignable to an element in 
 {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)