[ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-3439:
------------------------------

    Attachment: SOLR-3439.patch

New patch with these improvements:

* The new "content" field is now indexed="false", for performance reasons - you 
can always search using "text"
* Included changes to /browse RH:
** Added the SolrCell fields to qf
** Added facets for author and content_type
** Turned on highlighting for "content"
* Changes to Velocity templates
** Detects whether result doc is product, product-join doc or rich-text doc
** The richtext display shows the "title" instead of "name", with fallback to 
ID if title is missing
** We display a nice little icon for PDF, DOC, PPT, XLS
** For rich-text, we display highlighted content field, with HTML-encoded 
fallback if not hits
** Fixed #field() macro to display all snippets of highlighting and to 
HTML-encode fallback result
** Hide facets for which there are no results

I have tested with a mix of office docs and the other example docs and it looks 
nice here. Please test it.

Todo:
* It would be natural to display file name for SolrCell docs - where should we 
pick it from?
* Should fix SOLR-2730 to avoid HTMLencoding hack in template
* Should download the filetype graphics locally instead of linking to github..
                
> Make SolrCell easier to use out of the box
> ------------------------------------------
>
>                 Key: SOLR-3439
>                 URL: https://issues.apache.org/jira/browse/SOLR-3439
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction), Schema and 
> Analysis
>            Reporter: Jack Krupansky
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>         Attachments: Lincoln-Gettysburg-Address.docx, 
> Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch, SOLR-3439.patch
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a 
> document) to the "text" field which is the indexed-only (not stored) 
> catch-all for default queries. That searches fine, but doesn't show the 
> document content in the results, sometimes leading users to think that 
> something is wrong. Sure, the user can easily add the field (and this is 
> documented), but it would be a better user experience to have such a basic 
> feature work right out of the box without any config editing and without the 
> need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the 
> section of fields already defined to support SolrCell metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to