Hi, I looked at tika-server in a bit more detail, and I'm a bit concerned about the dependency overhead it needs for the JAX-RS support:
+- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2 +- org.apache.cxf:cxf-common-utilities:jar:2.5.2 | +- org.apache.ws.xmlschema:xmlschema-core:jar:2.0.1 | \- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.1 | \- org.codehaus.woodstox:stax2-api:jar:3.1.1 +- org.apache.cxf:cxf-api:jar:2.5.2 | +- org.apache.neethi:neethi:jar:3.0.1 | \- wsdl4j:wsdl4j:jar:1.6.2 +- org.apache.cxf:cxf-rt-core:jar:2.5.2 | +- com.sun.xml.bind:jaxb-impl:jar:2.2.4-1 | \- org.apache.geronimo.specs:geronimo-javamail_1.4_spec:jar:1.7.1 +- org.springframework:spring-core:jar:3.0.6.RELEASE | \- org.springframework:spring-asm:jar:3.0.6.RELEASE +- javax.ws.rs:jsr311-api:jar:1.1.1 +- org.apache.cxf:cxf-rt-bindings-xml:jar:2.5.2 +- org.apache.cxf:cxf-rt-transports-http:jar:2.5.2 | +- org.apache.cxf:cxf-rt-transports-common:jar:2.5.2 | \- org.springframework:spring-web:jar:3.0.6.RELEASE | +- aopalliance:aopalliance:jar:1.0 | +- org.springframework:spring-beans:jar:3.0.6.RELEASE | \- org.springframework:spring-context:jar:3.0.6.RELEASE | +- org.springframework:spring-aop:jar:3.0.6.RELEASE | \- org.springframework:spring-expression:jar:3.0.6.RELEASE \- org.codehaus.jettison:jettison:jar:1.3.1 +- org.apache.cxf:cxf-rt-transports-http-jetty:jar:2.5.2 +- org.eclipse.jetty:jetty-server:jar:7.5.4.v20111024 | +- org.eclipse.jetty:jetty-continuation:jar:7.5.4.v20111024 | \- org.eclipse.jetty:jetty-http:jar:7.5.4.v20111024 | \- org.eclipse.jetty:jetty-io:jar:7.5.4.v20111024 | \- org.eclipse.jetty:jetty-util:jar:7.5.4.v20111024 +- org.eclipse.jetty:jetty-security:jar:7.5.4.v20111024 \- org.apache.geronimo.specs:geronimo-servlet_2.5_spec:jar:1.1.2 That's about 7MB of middleware code. Do we really need all this? If yes, who's going to review the licensing of all these dependencies and come up with appropriate LICENSE/NOTICE files to include in the tika-server jar? The services exposed by tika-server are pretty simple and straightforward, so I'm wondering if we could just replace all of the above with just an embedded Jetty server, or even just the HttpCore library [1]. [1] http://hc.apache.org/httpcomponents-core-ga/ BR, Jukka Zitting