[
https://issues.apache.org/jira/browse/SOLR-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erik Hatcher updated SOLR-8590:
-------------------------------
Attachment: SOLR-8590.patch
This patch fixes the email_ss and url_ss field names, hardens the update script
so "content" isn't required, and sets a fallback language and increase the
threshold on language detection.
> example/files improvements
> --------------------------
>
> Key: SOLR-8590
> URL: https://issues.apache.org/jira/browse/SOLR-8590
> Project: Solr
> Issue Type: Bug
> Components: examples
> Reporter: Erik Hatcher
> Assignee: Erik Hatcher
> Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-8590.patch
>
>
> There are several example/files improvements/fixes that are warranted:
> * Fix e-mail and URL field names ({{<email>_ss}} and {{<url>_ss}}, with angle
> brackets in field names), also add display of these fields in /browse results
> rendering
> * Improve quality of extracted phrases
> * Extract, facet, and display acronyms
> * Add sorting controls, possibly all or some of these: last modified date,
> created date, relevancy, and title
> * Add grouping by doc_type perhaps
> * fix debug mode - currently does not update the parsed query debug output
> (this is probably a bug in data driven /browse as well)
> * Harden update-script: it currently errors if documents do not have a
> "content" field (eg indexing basic CSV), but should instead skip extraction
> of e-mail addresses and URLs when no "content". Not quite the use case (no
> "content") for example/files, but no reason to error in the update script at
> least.
> * Filter out bogus e-mail addresses. I'm seeing {{email_ss =
> "?@[^],\,/^@[$_a-z]"}} for some documents (using Solr docs/ directory as the
> dataset)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]