[ 
https://issues.apache.org/jira/browse/SOLR-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034176#comment-13034176
 ] 

Hoss Man commented on SOLR-2519:
--------------------------------

bq. Also: existing users would be unaffected by this? They've already copied 
over / edited their own schema.xml? This is mainly about new users?

The trap we've seen with this type of thing in the past (ie: the numeric 
fields) is that people who tend to use the example configs w/o changing them 
much refer to the example field types by name when talking about them on the 
mailing list, not considering that those names can have differnet meanings 
depending on version.

if we make radical changes to a {{<fieldType/>}} but leave the name alone, it 
could confuse a lot of people, ie: "i tried using the 'text' field but it 
didn't work"; "which version of solr are you using?"; "Solr 4.1"; "that should 
work, what exactly does your schema look like"; "..."; "that's the schema from 
3.6"; "yeah, i started with 3.6 nad then upgraded to 4.1 later", etc...

Bottom line: it's less confusing to *remove* {{<fieldType/>}} and add new ones 
with new names then to make radical changes to existing ones.

> Improve the defaults for the "text" field type in default schema.xml
> --------------------------------------------------------------------
>
>                 Key: SOLR-2519
>                 URL: https://issues.apache.org/jira/browse/SOLR-2519
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.2, 4.0
>
>         Attachments: SOLR-2519.patch
>
>
> Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5
> The text fieldType in schema.xml is unusable for non-whitespace
> languages, because it has the dangerous auto-phrase feature (of
> Lucene's QP -- see LUCENE-2458) enabled.
> Lucene leaves this off by default, as does ElasticSearch
> (http://http://www.elasticsearch.org/).
> Furthermore, the "text" fieldType uses WhitespaceTokenizer when
> StandardTokenizer is a better cross-language default.
> Until we have language specific field types, I think we should fix
> the "text" fieldType to work well for all languages, by:
>   * Switching from WhitespaceTokenizer to StandardTokenizer
>   * Turning off auto-phrase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to