[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-14 Thread Jack Krupansky (JIRA)














































Jack Krupansky
 commented on  SOLR-3442


Example schema switch to DisMax instead of CopyField















When I initially read this issue I mistakenly read it as edismax rather than dismax. So, I would request that the intent be crystal clear - is it reasonable to switch the default query parser handler to edismax, or is it being suggested that the more limited dismax query parser be the new default? If the latter, we won't even be able to query specific fields without config changes.

Some of the discussion over on SOLR-2368 might be relevant, as to whether the default query for example should be severely "locked-down" as opposed to highly functional (fields, Lucene syntax, etc.)

I was going to proceed with an edismax-based patch, but now I am not so sure.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269794#comment-13269794
 ] 

Jack Krupansky commented on SOLR-3442:
--

bq. The lucene query parser generally shouldn't be used for user queries...

If that is the general sentiment, then having the default example *user* query 
parser be edismax makes perfect sense.


> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269710#comment-13269710
 ] 

Yonik Seeley commented on SOLR-3442:


bq. I would cringe a little if we change the schema so that it doesn't work 
very well if the user does drop back to the lucene query parser 

The lucene query parser generally shouldn't be used for user queries, only 
programmatically generated ones.  Using expicit fieldnames (or specifying df) 
for that case should be fine.

> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269684#comment-13269684
 ] 

Jack Krupansky commented on SOLR-3442:
--

I don't disagree with the gist of your argument, but I would cringe a little if 
we change the schema so that it doesn't work very well if the user does drop 
back to the lucene query parser with &defType=lucene which has only a single 
default field.

OTOH, maybe that is simply the cost of making the example schema (and config) 
be more representative of "best practices". But, that sort of implies that the 
Lucene query parser is not a "best practice", at least when searchable text 
content is spread over multiple fields.


> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269603#comment-13269603
 ] 

Jan Høydahl commented on SOLR-3442:
---

I'm not saying anything is "dead". Both the "lucene" queryparser and copyField 
has its mission and is supported, and you can mix and match these with DisMax 
to fit your needs. But for the example we should select the most useful and 
flexible way to show indexing and search, and that is no longer "text" 
catch-all and copyField. Aside from it doubling the size of your index, it is 
inflexible in that adding or removing a field from search means schema update 
and re-indexing. Catch-all fields with copyField can sometimes be used as a 
performance optimization, but you do not start in that end.

Maintaining many examples has shown not to be a very good strategy, look at the 
multi-core and DIH examples, they lag behind several versions when it comes to 
schema version and new solrconfig syntaxes. Instead, a single schema which can 
do both the product search and document search use cases well is easy to 
achieve. The Velocity GUI can be extended with two tabs if need be, one 
"products" tab and one "documents" tab. If we choose the example documents to 
index wisely, to be i.e. user guides for the products, we get a nice 
connection. You can search for "ipod" and see both products and user guides 
matching your search. 

> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269584#comment-13269584
 ] 

Jack Krupansky commented on SOLR-3442:
--

Maybe Solr has outgrown the concept of a single example schema/config. "Full 
function" and "maximal performance" conflict to some degree and picking one 
arbitrary point on the design spectrum does a disservice for those who have 
varying requirements. The current example already has performance tips and a 
warning advisory not to use it for benchmarking. And SolrCell documents having 
"core", common metadata is somewhat distinct from full-custom schema design.

The copyField to "text" pattern is more clearly targeted at non-dismax users, 
where "text" is the single default search field.

This issue essentially raises the question: Is non-dismax query parsing dead? 
If not, the copyField/text pattern still seems relevant.

Maybe it would be worth having a modest library of schema/config files that the 
user can select from when running "example". OTOH, maintaining a lot of 
somewhat similar files can be a pain. A way to configure the schema/config 
files (conditionals) would be helpful.


> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269509#comment-13269509
 ] 

Jan Høydahl commented on SOLR-3442:
---

Sure, I've seen it successfully used too, and I use it myself now and then to 
reduce the number of fields required in "qf".

For very small indexes without much need for tuning analysis or relevancy it 
does not matter very much. But I'm arguing that copyField is the legacy way of 
searching multiple fields in one go, while DisMax is the current 
recommendation. So why stick to the legacy in the default example?

> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3442) Example schema switch to DisMax instead of CopyField

2012-05-07 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269489#comment-13269489
 ] 

Chris Male commented on SOLR-3442:
--

I think it's a pretty bold claim to call it an anti-pattern.  I've seen it 
successfully used in many projects and it continues to fulfill user needs.

> Example schema switch to DisMax instead of CopyField
> 
>
> Key: SOLR-3442
> URL: https://issues.apache.org/jira/browse/SOLR-3442
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>  Labels: dismax
>
> Spinoff from SOLR-3439:
> The use of copyField in todays example schema is an anti pattern since we 
> indirectly teach people to duplicate most of their content, while most would 
> be better off using DisMax, or at least a combination.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org