autoGeneratePhraseQueries sort of silently set to false
Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words: aaa-bbb. Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml. This is a great feature, and everything is peachy if you start with Solr 3.4. But many of us started earlier and are upgrading, and that's a different story. It was surprising to me that a. the default for this new feature caused different search results than Solr 1.4 b. it wasn't documented clearly, IMO http://wiki.apache.org/solr/SchemaXml makes no mention of it In the schema.xml example, there is this at the top: !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.4 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. 1.3: removed optional field compress feature 1.4: default auto-phrase (QueryParser feature) to off -- And there was this in a couple of field definitions: fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false But that was it.
RE: autoGeneratePhraseQueries sort of silently set to false
Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5. Not sure how to do that. Tom -Original Message- From: Naomi Dushay [mailto:ndus...@stanford.edu] Sent: Thursday, February 23, 2012 1:57 PM To: solr-user@lucene.apache.org Subject: autoGeneratePhraseQueries sort of silently set to false Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words: aaa-bbb. Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml. This is a great feature, and everything is peachy if you start with Solr 3.4. But many of us started earlier and are upgrading, and that's a different story. It was surprising to me that a. the default for this new feature caused different search results than Solr 1.4 b. it wasn't documented clearly, IMO http://wiki.apache.org/solr/SchemaXml makes no mention of it In the schema.xml example, there is this at the top: !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.4 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. 1.3: removed optional field compress feature 1.4: default auto-phrase (QueryParser feature) to off -- And there was this in a couple of field definitions: fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false But that was it.
Re: autoGeneratePhraseQueries sort of silently set to false
there's this (for 3.1, but in the 3.x CHANGES.txt): * SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField. autoGeneratePhraseQueries=true (the default) causes the query parser to generate phrase queries if multiple tokens are generated from a single non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11 will cause the parser to generate text:pdp 11 rather than (text:PDP OR text:11). Note that autoGeneratePhraseQueries=true tends to not work well for non whitespace delimited languages. (yonik) with a ton of useful, though back and forth, commentary here: https://issues.apache.org/jira/browse/SOLR-2015 Note that the behavior, as Naomi pointed out so succinctly, is adjustable based off the *schema* version setting. (look at your schema line in schema.xml). The code is simply this: if (schema.getVersion() 1.3f) { autoGeneratePhraseQueries = false; } else { autoGeneratePhraseQueries = true; } on TextField. Specifying autoGeneratePhraseQueries explicitly on a field type overrides whatever the default may be. Erik On Feb 23, 2012, at 14:45 , Burton-West, Tom wrote: Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5. Not sure how to do that. Tom -Original Message- From: Naomi Dushay [mailto:ndus...@stanford.edu] Sent: Thursday, February 23, 2012 1:57 PM To: solr-user@lucene.apache.org Subject: autoGeneratePhraseQueries sort of silently set to false Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words: aaa-bbb. Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml. This is a great feature, and everything is peachy if you start with Solr 3.4. But many of us started earlier and are upgrading, and that's a different story. It was surprising to me that a. the default for this new feature caused different search results than Solr 1.4 b. it wasn't documented clearly, IMO http://wiki.apache.org/solr/SchemaXml makes no mention of it In the schema.xml example, there is this at the top: !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.4 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. 1.3: removed optional field compress feature 1.4: default auto-phrase (QueryParser feature) to off -- And there was this in a couple of field definitions: fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false But that was it.
RE: autoGeneratePhraseQueries sort of silently set to false
Thanks Erik, The 3.1 changes document the ability to set this and the default being set to true However apparently the change between 3.4 and 3.5 the default was set to false Since this will change the behavior of any field where autoGeneratePhraseQueries is not explicitly set, it could easily surprise users who update to 3.5. That's why I think the changing of the default behavior (i.e. when not explicitly set) should be called out explicitly in the changes.txt for 3.5. True, everyone should read the notes in the example schema.xml, but I think it would help if the change was also noted in changes.txt. Is it possible to revise the changes.txt for 3.5? Do you by any chance know where the change in the default behavior was discussed? I know it has been a contentious issue. Tom -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Thursday, February 23, 2012 2:53 PM To: solr-user@lucene.apache.org Subject: Re: autoGeneratePhraseQueries sort of silently set to false there's this (for 3.1, but in the 3.x CHANGES.txt): * SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField. autoGeneratePhraseQueries=true (the default) causes the query parser to generate phrase queries if multiple tokens are generated from a single non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11 will cause the parser to generate text:pdp 11 rather than (text:PDP OR text:11). Note that autoGeneratePhraseQueries=true tends to not work well for non whitespace delimited languages. (yonik) with a ton of useful, though back and forth, commentary here: https://issues.apache.org/jira/browse/SOLR-2015 Note that the behavior, as Naomi pointed out so succinctly, is adjustable based off the *schema* version setting. (look at your schema line in schema.xml). The code is simply this: if (schema.getVersion() 1.3f) { autoGeneratePhraseQueries = false; } else { autoGeneratePhraseQueries = true; } on TextField. Specifying autoGeneratePhraseQueries explicitly on a field type overrides whatever the default may be. Erik On Feb 23, 2012, at 14:45 , Burton-West, Tom wrote: Seems like a change in default behavior like this should be included in the changes.txt for Solr 3.5. Not sure how to do that. Tom -Original Message- From: Naomi Dushay [mailto:ndus...@stanford.edu] Sent: Thursday, February 23, 2012 1:57 PM To: solr-user@lucene.apache.org Subject: autoGeneratePhraseQueries sort of silently set to false Another thing I noticed when upgrading from Solr 1.4 to Solr 3.5 had to do with results when there were hyphenated words: aaa-bbb. Erik Hatcher pointed me to the autoGeneratePhraseQueries attribute now available on fieldtype definitions in schema.xml. This is a great feature, and everything is peachy if you start with Solr 3.4. But many of us started earlier and are upgrading, and that's a different story. It was surprising to me that a. the default for this new feature caused different search results than Solr 1.4 b. it wasn't documented clearly, IMO http://wiki.apache.org/solr/SchemaXml makes no mention of it In the schema.xml example, there is this at the top: !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.4 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. 1.3: removed optional field compress feature 1.4: default auto-phrase (QueryParser feature) to off -- And there was this in a couple of field definitions: fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false But that was it.