[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability
[ https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560069#action_12560069 ] Mike Klaas commented on SOLR-461: - The max characters thing is directly from the lucene contrib highlighter. It is based on token offset, so it counts whitespace, and doesn't cut of a token in the middle. It also analogous to the RegexFragmenter's maxAnalyzedChars parameter, which can't be token-based. I'm not sure it is wise to add two apis with virtually the same functionality. Anyone who wants to set a high limit will have to set both. However, it might be nice to make the token filter a pluggable component, so that users can insert this token filter if they want. > Highlighting TokenStream Truncation capability > -- > > Key: SOLR-461 > URL: https://issues.apache.org/jira/browse/SOLR-461 > Project: Solr > Issue Type: Improvement > Components: highlighter >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > > It is sometimes the case when generating snippets that one need not > fragment/analyze the whole document (especially for large documents) in order > to show meaningful snippet highlights. > Patch to follow that adds a counting TokenFilter that returns null after X > number of Tokens have been seen. This filter will then be hooked into the > SolrHighlighter and configurable via solrconfig.xml. The default value will > be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field > Length is set to, as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability
[ https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560057#action_12560057 ] Grant Ingersoll commented on SOLR-461: -- I suppose it is similar, but I don't find counting characters all that intuitive. A token based approach doesn't cut off in the middle of a word and it isn't clear to me whether it is counting whitespace characters, etc. Plus, it is analogous to Lucene's Max Field Length, which is token based as well. > Highlighting TokenStream Truncation capability > -- > > Key: SOLR-461 > URL: https://issues.apache.org/jira/browse/SOLR-461 > Project: Solr > Issue Type: Improvement > Components: highlighter >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > > It is sometimes the case when generating snippets that one need not > fragment/analyze the whole document (especially for large documents) in order > to show meaningful snippet highlights. > Patch to follow that adds a counting TokenFilter that returns null after X > number of Tokens have been seen. This filter will then be hooked into the > SolrHighlighter and configurable via solrconfig.xml. The default value will > be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field > Length is set to, as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability
[ https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560053#action_12560053 ] Mike Klaas commented on SOLR-461: - Isn't this essentially the same thing as the hl.maxAnalyzedChars parameter? http://wiki.apache.org/solr/HighlightingParameters > Highlighting TokenStream Truncation capability > -- > > Key: SOLR-461 > URL: https://issues.apache.org/jira/browse/SOLR-461 > Project: Solr > Issue Type: Improvement > Components: highlighter >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > > It is sometimes the case when generating snippets that one need not > fragment/analyze the whole document (especially for large documents) in order > to show meaningful snippet highlights. > Patch to follow that adds a counting TokenFilter that returns null after X > number of Tokens have been seen. This filter will then be hooked into the > SolrHighlighter and configurable via solrconfig.xml. The default value will > be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field > Length is set to, as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.