Hi,
I am going to make this question pretty short, so I don’t overwhelm with 
technical details until  the end.
I suspect that some folks may be seeing this issue without the particular 
configuration we are using.

What our problem is:

  1.  Correctly spelled words are returning as not spelled correctly, with the 
original, correctly spelled word with a single oddball character appended as 
multiple suggestions.
  2.  Incorrectly spelled words are returning correct spelling suggestions with 
a single oddball character appended as multiple suggestions.
  3.  We’re seeing this in Solr 4.5x and 4.7x.

Example:

The return values are all a single character (unicode shown in square brackets).

correction=attitude[2d]
correction=attitude[2f]
correction=attitude[2026]

Spurious characters:

  *   Unicode Character 'HYPHEN-MINUS' (U+002D)
  *   Unicode Character 'SOLIDUS' (U+002F)
  *   Unicode Character 'HORIZONTAL ELLIPSIS' (U+2026)

Anybody see anything like this?  Anybody fix something like this?

Thanks!
—Ed

========================================================================
OK, here’s the gory details:


What we are doing:
We have developed an application that returns  "did you mean” spelling 
alternatives against a specific (presumably misspelled word).
We’re using the vocabulary of indexed pages of a specified book as the source 
of the alternatives, so this is not a general dictionary spell check, we are 
returning only matching alternatives.
So when I say “correctly spelled” I mean they are words found on at least one 
page.  We are using the collations, so that we restrict ourselves to those 
pages in one book.
We are having to check for and “fix up” these faulty results.  That’s not a 
robust or desirable solution.

We are using SolrJ to get the collations,
              private static final String DID_YOU_MEAN_REQUEST_HANDLER = 
"/spell”;
….
                SolrQuery query = new SolrQuery(q);
query.set("spellcheck", true);
query.set(SpellingParams.SPELLCHECK_COUNT, 10);
query.set(SpellingParams.SPELLCHECK_COLLATE, true);
query.set(SpellingParams.SPELLCHECK_COLLATE_EXTENDED_RESULTS, true);
            query.set("wt", "json");
query.setRequestHandler(DID_YOU_MEAN_REQUEST_HANDLER);
                query.set("shards.qt", DID_YOU_MEAN_REQUEST_HANDLER);
                query.set("shards.tolerant", "true");
etc……

but we can duplicate the behavior without SolrJ with the collations/ 
misspellingsAndCorrections below:, e.g.:
solr/pg1/spell?q=+doc-id:(810500)+AND+attitudex&spellcheck=true&spellcheck.count=10&spellcheck.collate=true&spellcheck.collateExtendedResults=true&wt=json&qt=%2Fspell&shards.qt=%2Fspell&shards.tolerant=true.out.print


{"responseHeader":{"status":0,"QTime":60},"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]},"spellcheck":{"suggestions":["attitudex",{"numFound":6,"startOffset":21,"endOffset":30,"origFreq":0,"suggestion":[{"word":"attitudes","freq":362486},{"word":"attitu
 dex","freq":4819},{"word":"atti tudex","freq":3254},{"word":"attit 
udex","freq":159},{"word":"attitude-","freq":1080},{"word":"attituden","freq":261}]},"correctlySpelled",false,"collation",["collationQuery","
 doc-id:(810500) AND 
attitude-","hits",2,"misspellingsAndCorrections",["attitudex","attitude-"]],"collation",["collationQuery","
 doc-id:(810500) AND 
attitude/","hits",2,"misspellingsAndCorrections",["attitudex","attitude/"]],"collation",["collationQuery","
 doc-id:(810500) AND 
attitude…","hits",2,"misspellingsAndCorrections",["attitudex","attitude…"]]]}}

The configuration is:

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">

    <lst name="defaults">

      <str name="df">text</str>

      <str name="spellcheck.dictionary">default</str>

      <str name="spellcheck.dictionary">wordbreak</str>

      <str name="spellcheck">on</str>

      <str name="spellcheck.extendedResults">true</str>

      <str name="spellcheck.count">10</str>

      <str name="spellcheck.alternativeTermCount">5</str>

      <str name="spellcheck.maxResultsForSuggest">5</str>

      <str name="spellcheck.collate">true</str>

      <str name="spellcheck.collateExtendedResults">true</str>

      <str name="spellcheck.maxCollationTries">10</str>

      <str name="spellcheck.maxCollations">5</str>

    name="last-components">

      <str>spellcheck</str>

    </arr>

  </requestHandler>


<lst name="spellchecker">

      <str name="name">wordbreak</str>

      <str name="classname">solr.WordBreakSolrSpellChecker</str>

      <str name="field">text</str>

      <str name="combineWords">true</str>

      <str name="breakWords">true</str>

      <int name="maxChanges">25</int>

      <int name="minBreakLength">3</int>

</lst>


<lst name="spellchecker">

      <str name="name">default</str>

      <str name="field">text</str>

      <str name="classname">solr.DirectSolrSpellChecker</str>

      <str name="distanceMeasure">internal</str>

      <float name="accuracy">0.2</float>

      <int name="maxEdits">2</int>

      <int name="minPrefix">1</int>

      <int name="maxInspections">25</int>

      <int name="minQueryLength">4</int>

      <float name="maxQueryFrequency">1</float>

</lst>

--

Ed Smiley, Senior Software Architect, eBooks
ProQuest | 161 E Evelyn Ave|
Mountain View, CA 94041 | USA |
+1 650 475 8700 extension 3772
ed.smi...@proquest.com
www.proquest.com<http://www.proquest.com/> | 
www.ebrary.com<http://www.ebrary.com/> | www.eblib.com<http://www.eblib.com/>
ebrary and EBL, ProQuest businesses.

Reply via email to