[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability

2008-01-17 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560069#action_12560069
 ] 

Mike Klaas commented on SOLR-461:
-

The max characters thing is directly from the lucene contrib highlighter.  It 
is based on token offset, so it counts whitespace, and doesn't cut of a token 
in the middle.

It also analogous to the RegexFragmenter's maxAnalyzedChars parameter, which 
can't be token-based.

I'm not sure it is wise to add two apis with virtually the same functionality.  
Anyone who wants to set a high limit will have to set both.

However, it might be nice to make the token filter a pluggable component, so 
that users can insert this token filter if they want.

> Highlighting TokenStream Truncation capability
> --
>
> Key: SOLR-461
> URL: https://issues.apache.org/jira/browse/SOLR-461
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> It is sometimes the case when generating snippets that one need not 
> fragment/analyze the whole document (especially for large documents) in order 
> to show meaningful snippet highlights. 
> Patch to follow that adds a counting TokenFilter that returns null after X 
> number of Tokens have been seen.  This filter will then be hooked into the 
> SolrHighlighter and configurable via solrconfig.xml.  The default value will 
> be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field 
> Length is set to, as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability

2008-01-17 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560057#action_12560057
 ] 

Grant Ingersoll commented on SOLR-461:
--

I suppose it is similar, but I don't find counting characters all that 
intuitive.  A token based approach doesn't cut off in the middle of a word and 
it isn't clear to me whether it is counting whitespace characters, etc.  Plus, 
it is analogous to Lucene's Max Field Length, which is token based as well.

> Highlighting TokenStream Truncation capability
> --
>
> Key: SOLR-461
> URL: https://issues.apache.org/jira/browse/SOLR-461
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> It is sometimes the case when generating snippets that one need not 
> fragment/analyze the whole document (especially for large documents) in order 
> to show meaningful snippet highlights. 
> Patch to follow that adds a counting TokenFilter that returns null after X 
> number of Tokens have been seen.  This filter will then be hooked into the 
> SolrHighlighter and configurable via solrconfig.xml.  The default value will 
> be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field 
> Length is set to, as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-461) Highlighting TokenStream Truncation capability

2008-01-17 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560053#action_12560053
 ] 

Mike Klaas commented on SOLR-461:
-

Isn't this essentially the same thing as the hl.maxAnalyzedChars parameter?

http://wiki.apache.org/solr/HighlightingParameters


> Highlighting TokenStream Truncation capability
> --
>
> Key: SOLR-461
> URL: https://issues.apache.org/jira/browse/SOLR-461
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> It is sometimes the case when generating snippets that one need not 
> fragment/analyze the whole document (especially for large documents) in order 
> to show meaningful snippet highlights. 
> Patch to follow that adds a counting TokenFilter that returns null after X 
> number of Tokens have been seen.  This filter will then be hooked into the 
> SolrHighlighter and configurable via solrconfig.xml.  The default value will 
> be Integer.MAX_VALUE or, I suppose, it could be set to whatever Max Field 
> Length is set to, as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.