Fwd: tika install fail on os x 10.9.2

2014-05-15 Thread Annie Burgess
Hi all,

I have a new computer running  OS X 10.9.2 (13C64).  I am attempting to get
Tika up and running, but am getting errors in the Maven install phase.  My
steps are as follows:


[annies-mbp:~/tika/] % svn co https://svn.apache.org/repos/asf/tika/trunktmp
[annies-mbp:~/tika/tmp]% setenv MAVEN_OPTS "-Xms128m -Xmx256m"
[annies-mbp:~/tika/tmp]% mvn install

Results :

Tests in error:

  testiBooksParser(org.apache.tika.parser.ibooks.iBooksParserTest):
Premature end of file.

Tests run: 506, Failures: 0, Errors: 1, Skipped: 1

[INFO]

[INFO] Reactor Summary:
[INFO] Apache Tika parent  SUCCESS [  0.626
s]
[INFO] Apache Tika core .. SUCCESS [  6.631
s] [INFO] Apache Tika parsers ... FAILURE [
23.323 s]

.
.
.

[INFO]

[INFO] BUILD FAILURE
[INFO]


My Maven version is:

[annies-mbp:~/Development/tika/tmp]% mvn --version
Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9;
2014-02-14T08:37:52-09:00)
Maven home: /usr/local/Cellar/maven/3.2.1/libexec
Java version: 1.8.0_05, vendor: Oracle Corporation
Java home:
/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.9.2", arch: "x86_64", family: "mac"--


Does anyone have any insight as to why this is failing at
'iBooksParserTest'?
Thanks!
Annie

--
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burg...@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
---


[jira] [Commented] (TIKA-1204) DWFX files detection

2014-05-15 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992899#comment-13992899
 ] 

Nick Burch commented on TIKA-1204:
--

Mimetype and detector logic added in r1593322.

Leaving the issue open for now, pending a small file which can be used to add 
unit tests for this

> DWFX files detection
> 
>
> Key: TIKA-1204
> URL: https://issues.apache.org/jira/browse/TIKA-1204
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, mime
>Affects Versions: 1.4
>Reporter: Marco Quaranta
>Priority: Minor
> Attachments: General assembly filter.dwfx
>
>
> DWFX are AutoCAD [Design web 
> format|http://en.wikipedia.org/wiki/Design_Web_Format] files and follow [Open 
> Packaging 
> Conventions|http://en.wikipedia.org/wiki/Open_Packaging_Conventions]. 
> Tika "correctly" detects these files as application/zip. 
> It would be better if Tika could recognize the true mimetype: 
> model/vnd.dwfx+xps. (y)
> Please add logic in ZipContainerDetector in such a way could be possible to 
> detect dwfx. We need a method behaving like detectOfficeOpenXML(OPCPackage 
> pkg): 
> {noformat}
> PackageRelationshipCollection core = 
> pkg.getRelationshipsByType("http://schemas.autodesk.com/dwfx/2007/relationships/documentsequence";);
> if (core.size() != 1) {
>  // Invalid DWFX Package received
>  return null;
> }
> PackagePart corePart = pkg.getPart(core.getRelationship(0));
> String coreType = corePart.getContentType();
> return MediaType.parse(coreType);
> {noformat}
> Thank you,
> Marco



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TIKA-941) Detecting KML / KMZ files

2014-05-15 Thread Nick Burch (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-941.
-

   Resolution: Fixed
Fix Version/s: 1.6

Thanks, additional namespaces added in r1593315.

> Detecting KML / KMZ files
> -
>
> Key: TIKA-941
> URL: https://issues.apache.org/jira/browse/TIKA-941
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.1
>Reporter: Marco Quaranta
>Assignee: Jukka Zitting
>Priority: Minor
>  Labels: google, kml, kmz
> Fix For: 1.6, 1.2
>
> Attachments: ZipContainerDetector.java
>
>
> KML format is subtype of application/xml with a "kml" root node and (an 
> optional?) "http://www.opengis.net/kml/2.2"; namespace.
>   
> 
> http://www.opengis.net/kml/2.2"; localName="kml"/> 
>
> KML
> <_comment>Keyhole Markup Language
> 
> 
>   
> KMZ files (https://developers.google.com/kml/documentation/kmzarchives) are 
> zip archives with a KML file inside (the file should be called doc.kml) and 
> one or more folder. A naive approach consists in adding a further check in 
> ZipContainerDetector (find attached). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TIKA-1175) MS Money files wrongly detected as True Type Font

2014-05-15 Thread Nick Burch (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-1175.
--

   Resolution: Fixed
Fix Version/s: 1.6

Thanks for this, magic added in r1593311.

> MS Money files wrongly detected as True Type Font
> -
>
> Key: TIKA-1175
> URL: https://issues.apache.org/jira/browse/TIKA-1175
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 1.3, 1.4
>Reporter: Boris Naguet
>Priority: Minor
> Fix For: 1.6
>
>
> TTF magic is probably not specific enough, because it incorrectly detect MS 
> Money files as TTF files, and then the parsing generates an Exception.
> {quote}
> Caused by: ! java.io.IOException: head is mandatory
> ! at 
> org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:107)
>  
> {quote}
> Here is the magic detection code that I added to {{custom-mimetypes.xml}}, 
> and solves it:
> {code:xml}
> 
>   
>   
>   
>type="string" offset="0" />
>   
>   
> {code}
> It can replace the existing {{application/x-msmoney}} empty mime-type in 
> {{tika-mimetypes.xml}}.
> magic comes from
> http://filesignatures.net/index.php?search=mny&mode=EXT



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-15 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995491#comment-13995491
 ] 

Tim Allison edited comment on TIKA-1294 at 5/12/14 7:16 PM:


Great. Just to make sure that I understand correctly...I think I was going to 
head this route at one point via subclassing EmbeddedResourceHandler.  Can your 
MediaTypeDisablingDocumentSelector tell the difference between a jpeg that was 
attached to a PDF (basic attachment) and one that was derived from a 
PDXObjectImage?


was (Author: talli...@mitre.org):
Great. Just to make sure that I understand correctly...I think I was going to 
head this route at one point.  Can your MediaTypeDisablingDocumentSelector tell 
the difference between a jpeg that was attached to a PDF (basic attachment) and 
one that was derived from a PDXObjectImage?

> Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs
> ---
>
> Key: TIKA-1294
> URL: https://issues.apache.org/jira/browse/TIKA-1294
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: TIKA-1294.patch
>
>
> TIKA-1268 added the capability to extract embedded images as regular embedded 
> resources...a great feature!
> However, for some use cases, it might not be desirable to extract those types 
> of embedded resources.  I see two ways of allowing the client to choose 
> whether or not to extract those images:
> 1) set a value in the metadata for the extracted images that identifies them 
> as embedded PDXObjectImages vs regular image attachments.  The client can 
> then choose not to process embedded resources with a given metadata value.
> 2) allow the client to set a parameter in the PDFConfig object.
> My initial proposal is to go with option 2, and I'll attach a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1295) Make some Dublin Core items multi-valued

2014-05-15 Thread Ray Gauss II (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995945#comment-13995945
 ] 

Ray Gauss II commented on TIKA-1295:


+1 for the data model more accurately reflecting the standard and for 
multilingual fields, but with a simple text bag how would you know which value 
corresponds to which language?

I think this is another example that highlights the need for a more structured 
underlying metadata store as mentioned in section IV of the [metadata 
roadmap|http://wiki.apache.org/tika/MetadataRoadmap].

> Make some Dublin Core items multi-valued
> 
>
> Key: TIKA-1295
> URL: https://issues.apache.org/jira/browse/TIKA-1295
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 1.6
>
>
> According to: http://www.pdfa.org/2011/08/pdfa-metadata-xmp-rdf-dublin-core, 
> dc:title, dc:description and dc:rights should allow multiple values because 
> of language alternatives.  Unless anyone objects in the next few days, I'll 
> switch those to Property.toInternalTextBag() from Property.toInternalText().  
> I'll also modify PDFParser to extract dc:rights.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-05-15 Thread Ray Gauss II (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995298#comment-13995298
 ] 

Ray Gauss II commented on TIKA-1278:


Hi [~tallison],

I thought about adding to {{PDFParser.properties}} but decided against it since 
PDFBox could change the default values or change the properties' scale or use, 
and if we weren't aware of that change we'd be inadvertently overriding those 
defaults.

Similarly with {{PDFParserConfig.configure}}, PDFBox's defaults seem to work 
well for most people.

We can certainly reconsider setting those defaults and/or adding other config 
if there are particular parameters people would find useful.

> Expose PDF Avg Char and Spacing Tolerance Config Params
> ---
>
> Key: TIKA-1278
> URL: https://issues.apache.org/jira/browse/TIKA-1278
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ray Gauss II
>Assignee: Ray Gauss II
> Fix For: 1.6
>
>
> {{PDFParserConfig}} should allow for override of PDFBox's 
> {{averageCharTolerance}} and {{spacingTolerance}} settings as noted by a TODO 
> comment in {{PDF2XHTML}}.
> Additionally, {{PDF2XHTML}}'s use of {{PDFParserConfig}} should be changed 
> slightly to allow for extension of that config class and its configuration 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-15 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997801#comment-13997801
 ] 

Tim Allison commented on TIKA-1294:
---

https://github.com/kryton/flaming-sailor/blob/master/src/main/java/com/zilbo/flamingSailor/TE/PDFParser.java

Code below this comment looks just like ours
{quote}
 /* The following code is REALLY raw. initial testing seemed to 
show memory leaks, and was REALLY slow*/
{quote}

> Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs
> ---
>
> Key: TIKA-1294
> URL: https://issues.apache.org/jira/browse/TIKA-1294
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: TIKA-1294.patch
>
>
> TIKA-1268 added the capability to extract embedded images as regular embedded 
> resources...a great feature!
> However, for some use cases, it might not be desirable to extract those types 
> of embedded resources.  I see two ways of allowing the client to choose 
> whether or not to extract those images:
> 1) set a value in the metadata for the extracted images that identifies them 
> as embedded PDXObjectImages vs regular image attachments.  The client can 
> then choose not to process embedded resources with a given metadata value.
> 2) allow the client to set a parameter in the PDFConfig object.
> My initial proposal is to go with option 2, and I'll attach a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-15 Thread David Meikle
Hi Nick,

On 7 May 2014, at 12:48, Nick Burch  wrote:

> Hi All
> 
> One for our JAXRS gurus here…

OK, not a guru here but I have a hunch.

> At ApacheCon, we came up with the idea of having a welcome page on the Tika 
> Server, so that we could point people to it to try Tika, and let them 
> discover what it offered. Based on that, and the mailing list discussions, we 
> raised TIKA-1269.
> 
> (Related to that is TIKA-1270, which aims to add endpoints similar to the 
> --list- ones the Tika CLI has, which is in progress)
> 
> While we work out the best way to allow users to discover + learn about + try 
> the various REST endpoints on TIKA-1269, I've started with something basic. 
> This is done with the simple TikaWelcome class, which has a Path of /
> 
> The problem - when the MetadataEP and UnpackerResource are enabled, it 
> doesn't work! With those to there, when you request / you get a 404 and the 
> server logs:
> rg.apache.cxf.jaxrs.utils.JAXRSUtils findTargetMethod
> WARNING: No operation matching request path "/" is found, Relative Path: /, 
> HTTP Method: GET, ContentType: */*, Accept: */*,. Please enable FINE/TRACE 
> log level for more details.
> 
> However, if you comment out those two endpoint classes from the 
> sf.setResourceClasses() call in TikaServerCLI, then the request gets 
> correctly routed to the welcome page.
> 
> Neither MetadataEP nor UnpackerResource have a path that clashes, so I've no 
> idea why having them active stops / working. Any ideas?

I am having a look quickly whilst traveling but from peeking at the code it 
looks like the following to me:

* MetadataEP - we have no @Produces which will fail in the introspection code 
on the TikaWelcome class
* UnpackerResource - as there is no class level @Path I am suspecting this is 
clashing with the TikaWelcome as it builds the routes with the method ones 
being place on the root as well.

I don’t have time to test it just now but I wonder what would happen if you 
reordered TikaWelcome to the top about UnpackerResource?  If my hunch is 
correct it should make the / request work using the self-generated 
documentation.


> (Patch below if you want to try disabling them yourself to investigate)
> 
> Nick
> 

Cheers,
Dave



[jira] [Updated] (TIKA-1275) Upgrade Commons compress to 1.8.1

2014-05-15 Thread Fabian Lange (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Lange updated TIKA-1275:
---

Description: 
Hi,
I am using Tika to detect content also from archives. But because the raw input 
stream is a CipherInputStream I ran into 
https://issues.apache.org/jira/browse/COMPRESS-277
which compress kindly solved for me.
To be able to use Tika without patching my stack, I would like to see an 
upgrade of commons compress to 1.8.1 as soon as it is out.
This may, or may not be in 1.6 timeframe.

Thanks!


  was:
Hi,
I am using Tika to detect content also from archives. But because the raw input 
stream is a CipherInputStream I ran into 
https://issues.apache.org/jira/browse/COMPRESS-277
which compress kindly solved for me.
To be able to use Tika without patching my stack, I would like to see an 
upgrade of commons compress to 1.9 as soon as it is out.
This may, or may not be in 1.6 timeframe.

Thanks!


Summary: Upgrade Commons compress to 1.8.1  (was: Upgrade Commons 
compress (to 1.9))

Hi Nick,
compress 1.8.1 was released:
http://markmail.org/message/rkwsqhs76hwrhrrw

this contains the fixes to the compressed streams. I updated the ticket to 
reflect the 1.8.1 version number.
Would be nice to upgrade, so that the detection support works nicely for all 
archives.

Fabian

> Upgrade Commons compress to 1.8.1
> -
>
> Key: TIKA-1275
> URL: https://issues.apache.org/jira/browse/TIKA-1275
> Project: Tika
>  Issue Type: Bug
>Reporter: Fabian Lange
>
> Hi,
> I am using Tika to detect content also from archives. But because the raw 
> input stream is a CipherInputStream I ran into 
> https://issues.apache.org/jira/browse/COMPRESS-277
> which compress kindly solved for me.
> To be able to use Tika without patching my stack, I would like to see an 
> upgrade of commons compress to 1.8.1 as soon as it is out.
> This may, or may not be in 1.6 timeframe.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-15 Thread Nick Burch

On Wed, 14 May 2014, Sergey Beryozkin wrote:

UnpackerResource has no Path annotation so it is defaulted to "/".


Every endpoint method within the class does have one though. I would've 
expected it to match based on those, is that not the case?


However, the selection between multiple root resources with the same 
top-level Path is more expensive so ideally we could introduce a 
dedicated @Path to UnpackerResource.


As we add more endpoints, that would seem to make sense to me. I'm not 
sure how widely used the unpacker resource is, so I don't know how much 
disruption it would be if we added a /unpacker/ prefix to the path?


The other option is to consider implementing a Welcome functionality in 
a JAX-RS 2.0 ContainerRequestFilter (supported in CXF 2.7.x), build a 
welcome info there and abort/block a request


Is that the more "normal" way to handle it in JAX-RS, or is what we've 
got so far a generally know practice?


Nick


[jira] [Resolved] (TIKA-1112) Parsing for OGV file with invalid checksum

2014-05-15 Thread Nick Burch (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-1112.
--

   Resolution: Fixed
Fix Version/s: 1.6

Fixed with upgrade to 0.6 in r1593570.

> Parsing for OGV file with invalid checksum
> --
>
> Key: TIKA-1112
> URL: https://issues.apache.org/jira/browse/TIKA-1112
> Project: Tika
>  Issue Type: Bug
>  Components: metadata, parser
>Affects Versions: 1.3
> Environment: OS X 10.8.3
> JDK 1.6.0_45 64-bit
>Reporter: Alexander Chow
> Fix For: 1.6
>
>
> When parsing any OGV file (e.g., 
> [Typing_example.ogv|http://commons.wikimedia.org/wiki/File:Typing_example.ogv]),
>  log will output something like the following:
> {code}
> Warning - invalid checksum on page 2 of stream 155f (5471)
> Warning - invalid checksum on page 3 of stream 155f (5471)
> Warning - invalid checksum on page 4 of stream 155f (5471)
> Warning - invalid checksum on page 5 of stream 155f (5471)
> Warning - invalid checksum on page 6 of stream 155f (5471)
> Warning - invalid checksum on page 7 of stream 155f (5471)
> ...
> Warning - invalid checksum on page 3071 of stream 155f (5471)
> Warning - invalid checksum on page 3072 of stream 155f (5471)
> Warning - invalid checksum on page 3073 of stream 155f (5471)
> Warning - invalid checksum on page 3074 of stream 155f (5471)
> Exception in thread "main" java.io.IOException: Asked to read 4228 bytes from 
> 0 but hit EoF at 2884
>   at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:39)
>   at org.gagravarr.ogg.IOUtils.readFully(IOUtils.java:31)
>   at org.gagravarr.ogg.OggPage.(OggPage.java:82)
>   at 
> org.gagravarr.ogg.OggPacketReader.getNextPacket(OggPacketReader.java:116)
>   at org.gagravarr.tika.OggDetector.detect(OggDetector.java:79)
>   at 
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
>   at com.test.OGVTest.main(OGVTest.java:31)
> {code}
> My test code was the following:
> {code:java}
>   void parse(String fileName) throws Exception {
>   InputStream inputStream = new FileInputStream(fileName);
>   
>   Metadata metadata = new Metadata();
>   
>   Parser parser = new AutoDetectParser();
>   
>   ParseContext parserContext = new ParseContext();
>   parserContext.set(Parser.class, parser);
>   ContentHandler contentHandler = new WriteOutContentHandler(
>   new DummyWriter());
>   parser.parse(inputStream, contentHandler, metadata, 
> parserContext);
>   
>   System.out.println(metadata);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


JAXRS, endpoints and a / welcome page - any ideas why it's broken?

2014-05-15 Thread Nick Burch

Hi All

One for our JAXRS gurus here...

At ApacheCon, we came up with the idea of having a welcome page on the 
Tika Server, so that we could point people to it to try Tika, and let them 
discover what it offered. Based on that, and the mailing list discussions, 
we raised TIKA-1269.


(Related to that is TIKA-1270, which aims to add endpoints similar to the 
--list- ones the Tika CLI has, which is in progress)


While we work out the best way to allow users to discover + learn about + 
try the various REST endpoints on TIKA-1269, I've started with something 
basic. This is done with the simple TikaWelcome class, which has a Path of 
/


The problem - when the MetadataEP and UnpackerResource are enabled, it 
doesn't work! With those to there, when you request / you get a 404 and 
the server logs:

rg.apache.cxf.jaxrs.utils.JAXRSUtils findTargetMethod
WARNING: No operation matching request path "/" is found, Relative Path: 
/, HTTP Method: GET, ContentType: */*, Accept: */*,. Please enable 
FINE/TRACE log level for more details.


However, if you comment out those two endpoint classes from the 
sf.setResourceClasses() call in TikaServerCLI, then the request gets 
correctly routed to the welcome page.


Neither MetadataEP nor UnpackerResource have a path that clashes, so I've 
no idea why having them active stops / working. Any ideas?


(Patch below if you want to try disabling them yourself to investigate)

Nick


Index: src/main/java/org/apache/tika/server/TikaServerCli.java
===
--- src/main/java/org/apache/tika/server/TikaServerCli.java	(revision 
1592656)
+++ src/main/java/org/apache/tika/server/TikaServerCli.java	(working 
copy)

@@ -92,10 +92,20 @@
   JAXRSServerFactoryBean sf = new JAXRSServerFactoryBean();
   // Note - at least one of these stops TikaWelcome matching on /
   // This prevents TikaWelcome acting as a partial solution to 
TIKA-1269

-  sf.setResourceClasses(MetadataEP.class, MetadataResource.class,
-  TikaResource.class, UnpackerResource.class,
-  TikaDetectors.class, TikaMimeTypes.class,
-  TikaVersion.class, TikaWelcome.class);
+//  sf.setResourceClasses(MetadataEP.class, MetadataResource.class,
+//  TikaResource.class, UnpackerResource.class,
+//  TikaDetectors.class, TikaMimeTypes.class,
+//  TikaVersion.class, TikaWelcome.class);
+  sf.setResourceClasses(
+//  MetadataEP.class,
+  MetadataResource.class,
+  TikaResource.class,
+//  UnpackerResource.class,
+  TikaDetectors.class,
+  TikaMimeTypes.class,
+  TikaVersion.class,
+  TikaWelcome.class
+  );

   List providers = new ArrayList();
   providers.add(new TarWriter());