On Thu, Aug 9, 2012 at 11:43 PM, Chris Hostetter
<[email protected]> wrote:
>
> : What many of us not familiar with the tokenizing rules of the standard
> : tokenizer just realized is that it's not a good default for english
> : and probably most other european languages.
>
> Jira is down for reindexing at the moment, so i can't file this suggestion
> as a new Feature proposal (or comment on it's relevance in SOLR-3723) and
> i probably won't be online for another few days, so i wanted to get this
> idea out there now for discussion instead of waiting.
>
>         ---
>
> Based on the link steven mentioned clarifying why exactly
> StandardTokenizer works the way it does...
>
>         http://unicode.org/reports/tr29/#Word_Boundaries
>
> ...I think it would be a good idea to add some new customization options
> to StandardTokenizer (and StandardTokenizerFactory) to "tailor" the
> behavior based on the various "tailored improvement" notes...
>

Use a CharFilter.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to