[jira] [Commented] (SOLR-3439) Make SolrCell easier to use out of the box

JIRA Mon, 23 Jul 2012 07:57:38 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420692#comment-13420692
 ]


Jan Høydahl commented on SOLR-3439:
-----------------------------------

bq. 1. Any reason to limit it to 5.0 and not backport to 4.0?

It is already marked with 4.0 and 5.0

bq. The one hope I hold out is that maybe we should modify the post tool to 
recognize that the file type is not ".xml" and then send rich documents to 
SolrCell with an explicit literal to initialize the "filename" field - which 
itself needs to be added.

I have a new patch using the result from {{resource.name}} which is the 
official way to send file name to ERH. It propagates out as Tika metadata 
resourceName, which is then lowercased to field {{resourcename}}.

bq. It would be nice to include my sample Word and PDF documents, or other 
equivalent sample rich documents

Agree. There should be an exampledocs folder with rich docs. Or that we simply 
describe in the tutorial how to index Solr's documentation as PDFs and JavaDocs 
from HTML.
                
> Make SolrCell easier to use out of the box
> ------------------------------------------
>
>                 Key: SOLR-3439
>                 URL: https://issues.apache.org/jira/browse/SOLR-3439
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction), Schema and 
> Analysis
>            Reporter: Jack Krupansky
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>         Attachments: Lincoln-Gettysburg-Address.docx, 
> Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch, SOLR-3439.patch, 
> SOLR-3439.patch
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a 
> document) to the "text" field which is the indexed-only (not stored) 
> catch-all for default queries. That searches fine, but doesn't show the 
> document content in the results, sometimes leading users to think that 
> something is wrong. Sure, the user can easily add the field (and this is 
> documented), but it would be a better user experience to have such a basic 
> feature work right out of the box without any config editing and without the 
> need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the 
> section of fields already defined to support SolrCell metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3439) Make SolrCell easier to use out of the box

Reply via email to