[ https://issues.apache.org/jira/browse/SOLR-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505146#comment-16505146 ]
ASF subversion and git services commented on SOLR-12376: -------------------------------------------------------- Commit c01287d7b34293d9ae7b0abcd1bf66334f9d5138 in lucene-solr's branch refs/heads/branch_7x from [~ctargett] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c01287d ] SOLR-12376: Add some links to other Ref Guide pages; minor format & typo cleanup > New TaggerRequestHandler (aka SolrTextTagger) > --------------------------------------------- > > Key: SOLR-12376 > URL: https://issues.apache.org/jira/browse/SOLR-12376 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Reporter: David Smiley > Assignee: David Smiley > Priority: Major > Fix For: 7.4 > > Attachments: SOLR-12376.patch, SOLR-12376.patch, SOLR-12376.patch > > > This issue introduces a new RequestHandler: {{TaggerRequestHandler}}, AKA the > SolrTextTagger from the OpenSextant project > [https://github.com/OpenSextant/SolrTextTagger]. It's used for named entity > recognition (NER) of text past to it. It doesn't do any NLP (outside of > Lucene text analysis) so it's said to be a "naive tagger", but it's > definitely useful as-is and a more complete NER or ERD (entity recognition > and disambiguation) system can be built with this as a key component. The > SolrTextTagger has been used on queries for query-understanding, and it's > been used on full-text, and it's been used on dictionaries that number tens > of millions in size. Since it's small and has been used a bunch (including > helping win an ERD competition and in [Apache > Stanbol|https://stanbol.apache.org/]), several people have asked me when or > why isn't this in Solr yet. So here it is. > To use it, first you need a collection of documents that have a name-like > field (short text) indexed with the ConcatenateFilter (LUCENE-8323) at the > end. We call this the dictionary. Once that's in place, you simply post text > to a {{TaggerRequestHandler}} and it returns the offset pairs into that text > for matches in the dictionary along with the uniqueKey of the matching > documents. It can also return other document data desired. That's the gist; > I'll add more details on use to the Solr Reference Guide. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org