Re: Extracting span terms using WeightedSpanTermExtractor

Mark Miller Wed, 06 Jul 2011 18:41:23 -0700

Sorry - kind of my fault. When I fixed this to use maxDocCharsToAnalyze, I 
didn't set a default other than 0 because I didn't really count on this being 
used beyond how it is in the Highlighter - which always sets 
maxDocCharsToAnalyze with it's default.
 
You've got to explicitly set it higher than 0 for now.


Feel free to create a JIRA issue and we can give it's own default greater than 
0.

- Mark Miller
lucidimagination.com


On Jul 6, 2011, at 5:34 PM, Jahangir Anwari wrote:

> I have a CustomHighlighter that extends the SolrHighlighter and overrides
> the doHighlighting() method. Then for each document I am trying to extract
> the span terms so that later I can use it to get the span Positions. I tried
> to get the weightedSpanTerms using WeightedSpanTermExtractor but was
> unsuccessful. Below is the code that I am have. Is there something missing
> that needs to be added to get the span terms?
> 
> // in CustomHighlighter.java
> @Override
> public NamedList doHighlighting(DocList docs, Query query, SolrQueryRequest
> req, String[] defaultFields) throws IOException {
> 
>  NamedList highlightedSnippets = super.doHighlighting(docs, query, req,
> defaultFields);
> 
>  IndexReader reader = req.getSearcher().getIndexReader();
> 
>  String[] fieldNames = getHighlightFields(query, req, defaultFields);
>  for (String fieldName : fieldNames)
>  {
>  QueryScorer scorer = new QueryScorer(query, null);
>  scorer.setExpandMultiTermQuery(true);
>  scorer.setMaxDocCharsToAnalyze(51200);
> 
>  DocIterator iterator = docs.iterator();
>  for (int i = 0; i < docs.size(); i++)
>  {
> int docId = iterator.nextDoc();
> System.out.println("DocId: " + docId);
> TokenStream tokenStream = TokenSources.getTokenStream(reader, docId,
> fieldName);
> WeightedSpanTermExtractor wste = new WeightedSpanTermExtractor(fieldName);
> wste.setExpandMultiTermQuery(true);
> wste.setWrapIfNotCachingTokenFilter(true);
> 
> Map<String,WeightedSpanTerm> weightedSpanTerms  =
> wste.getWeightedSpanTerms(query, tokenStream, fieldName); // this is always
> empty
> System.out.println("weightedSpanTerms: " + weightedSpanTerms.values());
> 
>  }
>  }
>     return highlightedSnippets;
> 
> }
> 
> Thanks,
> Jahangir











---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Extracting span terms using WeightedSpanTermExtractor

Reply via email to