[jira] [Commented] (TIKA-1332) Create "eval" code

Tim Allison (JIRA) Mon, 24 Nov 2014 04:43:37 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222947#comment-14222947
 ]


Tim Allison commented on TIKA-1332:
-----------------------------------

In a personal communication, I asked [~sergey_beryozkin] for recommendations 
for handling static content in the jax-rs framework.  For the UI component of 
the eval code -- how the user interacts with the results of the eval -- Is 
there an easy equivalent in JAX-RS that allows for the user to browse a 
directory of files and click on desired files for download as easily as one can 
with Jetty's ResourceHandler.

With permission, I'm posting/summarizing [~sergey_beryozkin]'s responses.  If 
anyone else has a recommendation leveraging the JAX-RS framework for dynamic 
data and still using something so easy as Jetty's ResourceHandler for static 
content, please let us know.

Option 1: 
Handcode a JAX-RS handler that mimics Jetty's ResourceHandler
> That can be easily enough though with JAX-RS if you'd like to explore
> this path, something like this I guess:
>
{noformat}
 @Path("eval")
 public class TikaEvaluation {
       @Context
       private UriInfo ui;
       @GET
       @Path("list")
       @Produces("text/html")
       public Response getListOfResultURIs() {
           List<URI> uris = new LinkedList<URI>();
           for (File f : getResultFiles()) {

               uris.add(ui.getAbsoluteUriBuilder().path(f.getName()).build());
          }
           // uris list now how a list of links to individual files
           // next we need to decide how to convert that to HTML
           // one option is to return the list as is and redirect that to
           // JSP, another option is to build a basic HTML string right here in 
the
           // method, another option is to register a MessageBodyWriter that 
will
           // convert the list into HTML
           // the individual links will be managed by getFile() method

           return Response.ok(uris).build();
       }

       @GET
       @Path("list/{name}")
       @Produces("application/json", "multipart/mixed")
       public Response getFile(@PathParam("name") String name) {
           ...
       }

{noformat}

Option 2:
Run Jetty's ResourceHandler from the same embedded Jetty server that is hosting 
the JAX-RS code.
> This link would probably be the best one: [link| 
> https://git-wip-us.apache.org/repos/asf?p=cxf.git;a=blob_plain;f=distribution/src/main/release/samples/jax_rs/search/src/main/java/demo/jaxrs/search/server/Server.java;hb=HEAD]

> Tika JAX-RS server actually runs on top of Jetty right now too, but in
> this case we have a direct Jetty server setup.
>
> The server registers a CXF servlet and Jetty handlers too. CXF servlet
> also redirect to default handlers like a default handler for serving the
> static content. This is not needed if the result files are accessible
> over URI that does not overlap with a CXF servlet URI pattern.
> In fact, I wonder if a Tika JAXRS style of the registration may also do
> ? If you register a CXF endpoint at /eval and the results are accessible
> over /results then it should  work ? Unless Jetty ContentHandler is not
> installed by default - then the linked to code would def do :-)

> the only possible downside here is that as far as the consistent URI 
> space management is concerned we'd have one part of it (the static 
> resources) controlled natively by Jetty and the rest - by JAX-RS. so it 
> can be trickier to provide a support for searching the results, 
> enforcing the common security rules (when/if needed).
> That said may be it is not of a real concern, it can always be removed 
> in the future if needed.


Other options?


> Create "eval" code
> ------------------
>
>                 Key: TIKA-1332
>                 URL: https://issues.apache.org/jira/browse/TIKA-1332
>             Project: Tika
>          Issue Type: Sub-task
>          Components: cli, general, server
>            Reporter: Tim Allison
>
> For this issue, we can start with code to gather statistics on each run (# of 
> exceptions per file type, most common exceptions per file type, number of 
> metadata items, total text extracted, etc).  We should also be able to 
> compare one run against another.  Going forward, there's plenty of room to 
> improve.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1332) Create "eval" code

Reply via email to