Hi,
I looked at tika-server in a bit more detail, and I'm a bit concerned
about the dependency overhead it needs for the JAX-RS support:
+- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2
+- org.apache.cxf:cxf-common-utilities:jar:2.5.2
| +- org.apache.ws.xmlschema:xmlschema-core:jar:2.0.1
| \- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.1
| \- org.codehaus.woodstox:stax2-api:jar:3.1.1
+- org.apache.cxf:cxf-api:jar:2.5.2
| +- org.apache.neethi:neethi:jar:3.0.1
| \- wsdl4j:wsdl4j:jar:1.6.2
+- org.apache.cxf:cxf-rt-core:jar:2.5.2
| +- com.sun.xml.bind:jaxb-impl:jar:2.2.4-1
| \- org.apache.geronimo.specs:geronimo-javamail_1.4_spec:jar:1.7.1
+- org.springframework:spring-core:jar:3.0.6.RELEASE
| \- org.springframework:spring-asm:jar:3.0.6.RELEASE
+- javax.ws.rs:jsr311-api:jar:1.1.1
+- org.apache.cxf:cxf-rt-bindings-xml:jar:2.5.2
+- org.apache.cxf:cxf-rt-transports-http:jar:2.5.2
| +- org.apache.cxf:cxf-rt-transports-common:jar:2.5.2
| \- org.springframework:spring-web:jar:3.0.6.RELEASE
| +- aopalliance:aopalliance:jar:1.0
| +- org.springframework:spring-beans:jar:3.0.6.RELEASE
| \- org.springframework:spring-context:jar:3.0.6.RELEASE
| +- org.springframework:spring-aop:jar:3.0.6.RELEASE
| \- org.springframework:spring-expression:jar:3.0.6.RELEASE
\- org.codehaus.jettison:jettison:jar:1.3.1
+- org.apache.cxf:cxf-rt-transports-http-jetty:jar:2.5.2
+- org.eclipse.jetty:jetty-server:jar:7.5.4.v20111024
| +- org.eclipse.jetty:jetty-continuation:jar:7.5.4.v20111024
| \- org.eclipse.jetty:jetty-http:jar:7.5.4.v20111024
| \- org.eclipse.jetty:jetty-io:jar:7.5.4.v20111024
| \- org.eclipse.jetty:jetty-util:jar:7.5.4.v20111024
+- org.eclipse.jetty:jetty-security:jar:7.5.4.v20111024
\- org.apache.geronimo.specs:geronimo-servlet_2.5_spec:jar:1.1.2
That's about 7MB of middleware code. Do we really need all this? If
yes, who's going to review the licensing of all these dependencies and
come up with appropriate LICENSE/NOTICE files to include in the
tika-server jar?
The services exposed by tika-server are pretty simple and
straightforward, so I'm wondering if we could just replace all of the
above with just an embedded Jetty server, or even just the HttpCore
library [1].
[1] http://hc.apache.org/httpcomponents-core-ga/
BR,
Jukka Zitting