[ https://issues.apache.org/jira/browse/SOLR-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973058#action_12973058 ]
Tomás Fernández Löbbe commented on SOLR-1526: --------------------------------------------- I have a possible implementation for this jira. I created a class SolrFileInputDocument that extends SolrInputDocument, the main difference is that it contains the methods: public void addFile(InputStream file) and public void addFile(InputStream file , Metadata metadata) This two methods will use Tika to extract the content and will end up creating fields (this.addField(...)) of the parent class SolrInputDocument. The SolrFileInputDocument accepts a Map instance to map the extracted metadata to a Solr field, something like this: Map<String, String> map = new HashMap<String, String>(); map.put("content", "text"); map.put("keywords", "cat"); map.put("creator", "manu"); SolrFileInputDocument document = new SolrFileInputDocument(map); I added the classes to another "contrib" directory, I don't know if this should be done this way, I just didn't want to add a dependency with Tika that might be not always needed. Adding this code to a client application would require to add the SolrJ jar plus the "clientextraction" jar I still haven't done anything to keep the "prefix" feature of the ExtractingRequestHandler (which I don't think is going to be difficult) and I'm still don't manage non text fields like dates, but I could do it if you think this is a good approach. Do you think this could work? I can upload the code tomorrow. > Client Side Tika integration > ---------------------------- > > Key: SOLR-1526 > URL: https://issues.apache.org/jira/browse/SOLR-1526 > Project: Solr > Issue Type: New Feature > Components: clients - java > Reporter: Grant Ingersoll > Priority: Minor > Fix For: Next > > > Often times it is cost prohibitive to send full, rich documents over the > wire. The contrib/extraction library has server side integration with Tika, > but it would be nice to have a client side implementation as well. It should > support both metadata and content or just metadata. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org