[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024811#comment-13024811 ] Fuad Efendi commented on SOLR-2338: --- test-files/solr/conf/schema.xml contains sample of per-field definitions; example/solr/schema.xml doesn't have it yet improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: SOLR-2338.patch, SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013618#comment-13013618 ] Hoss Man commented on SOLR-2338: i was confused by some of roberts comments, and clarified them with him on IRC. summary (from my perspective) * global default similarity(factory) (using existing {{similarity/}} tag) is a good idea as a fall back for fieldTypes that don't define custom similarity * {{similarity/}} should probably not be advertised in the example configs .. but maybe, depends * SimilarityProvider should use a distinct config tag ({{similarityProvider/}} because it really is distinct, and people should (in theory) be able to use both) * SolrSimilarityProvider's get(field) method (which i didn't realize was final, hence part of my confusion) should be changed to use the {{similarity/}} as a default if it was specified. * SolrSimilarityProvider's get(field) method really needs to stay final, and should have docs explain why (consistency with schema) * SimilarityProviderFactory.init can be changed to using NamedList, but the docs should warn people about the possibility of performance penalties for using it directly in their SolrSimilarityProvider improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013626#comment-13013626 ] Yonik Seeley commented on SOLR-2338: bq. similarity/ should probably not be advertised in the example configs .. Do you mean the expert-level SimilarityProvider, which is typically only needed to change coord() or queryNorm()? If so, I'd agree - it's so expert level that most people shouldn't worry about it, hence it can simply be documented elsewhere. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013633#comment-13013633 ] Hoss Man commented on SOLR-2338: bq. Do you mean the expert-level SimilarityProvider, no i ment the existing global default {{similarity/}} mentioned in the previous bullet (but i was definitley vague -- sorry). we probably shouldn't go out of our way to advertise the global option anymore, we should encourage people to use the fieldType specific similarity instead. whether we should promote customizing the SimilarityProvider in the example .. i dunno, i can see it either way. but it seems like the order of importance for visibility is probably something like: * customize per fieldType similarity (new hotness) * customize similarityprovider (not something you are likely to need to do, but if you do there's really only one way so maybe we should advertise it's an option) * customize global default similarity (odds are you really don't need this, you can probably do it using per fieldType similarity) improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013652#comment-13013652 ] Robert Muir commented on SOLR-2338: --- I agree with your order, i think to keep the example simple we should just have a commented out example for a fieldType that customizes its similarity? I think we can support customizing the similarityprovider (in case you want to change how coord is calculated), and support changing the default-unless-otherwise-specified-in-the-schema similarity (for easier backwards transition), but I'm thinking it would be actually more confusing than helpful to advertise these in the example. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013686#comment-13013686 ] Yonik Seeley commented on SOLR-2338: I dunno - it seems much more likely that someone would want to easily change the default sim for all fields rather than change coord. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012538#comment-13012538 ] Robert Muir commented on SOLR-2338: --- I'll commit this in a few days unless anyone objects. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012787#comment-13012787 ] Hoss Man commented on SOLR-2338: skimming the patch, the one thing i'm not clear on is what happens when someone who has been using a custom similarity (or similarity factory) in Solr 1.4 or 3.1 will be affected on upgrading. the patch seems to remove the code that allows for a (global) {{similarity/}} element in schema.xml (replacing it with a check for {{similarityProvider/}} i'm not clear on whether there is really a compelling reason for this (if there is we should have a nice fat warning in the upgrading section of CHANGES.txt) or if we could still continue to respect the {{similarity/}} syntax ... it seems like that one tag could be used to refrence (by classname) a global SimilarityProviderFactory, or a default Similarity instance or default. even if there's a really good reason not to keep using what we might find in {{similarity/}}, we should check for it and log a nice fat error message saying it's being ignored. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012792#comment-13012792 ] Robert Muir commented on SOLR-2338: --- Hoss, thanks for looking at the patch. As far as the current form, Similarity is purely per-field, dictated by the schema (like Analyzers). For simplicity i removed the global one completely. If we want, the following options are available: * (global) similarity/ configures the default, which is used unless overridden by the fieldtype. * (global) similarity/ triggers an error. The only reason I yanked it in the patch, was that I felt it could be confusing to have this inheritance so to speak... but if you think its not confusing, we could support the global similarity/, which would always be used unless you supply a different similarity/ for a specific field. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012802#comment-13012802 ] Hoss Man commented on SOLR-2338: bq. but if you think its not confusing, we could support the global similarity/, which would always be used unless you supply a different similarity/ for a specific field. i don't personally think it would be confusing, but i also don't think we need to advertise it in the example. we should definitely encourage using similarity per field type, but for people who have used it in the past, having it continue to work as a global default when fieldTypes don't define a similarity gives us nice back-compatibility. More generally though, i'm thinking that the same {{similarity/}} tag can be used for both the old style (global default) Similarity/SimilarityFactory and the new SimilarityProviderFactory using instanceof checks... * instanceOf SimilarityProviderFactory ** instantiate it and use it. * instanceOf SimilarityFactory ** instantiate and wrap it in a SolrSimilarityProvider that delegates to it when the field type has no similarity set on it. return an anonymous SimilarityProviderFactory that uses this SolrSimilarityProvider * instanceOf Similarity ** instantiate and wrap it in a SolrSimilarityProvider that delegates to it when the field type has no similarity set on it. return an anonymous SimilarityProviderFactory that uses this SolrSimilarityProvider ...that way there is only one global option that can be specified, and we don't have to deal with weird edge cases of what the default should be for a fieldTYpe w/o a similarity if the schema.xml specifies both {{similarity/}} and {{similarityProvider/}} The one other thing i just noticed is that you have SimilarityProviderFactory.init(SolrParams) ... my vote would be to start using NamedList based initialization for all new types of solr plugins. it requires more verbosity in the config, but it supports a lot more types of information (multivalued keys, nested lists/maps, etc...) and could eventually lead us to actually being able to validate our config files using an XMLSchema and/or DTD (since the element/node names are finite) improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012809#comment-13012809 ] Robert Muir commented on SOLR-2338: --- {quote} i don't personally think it would be confusing, but i also don't think we need to advertise it in the example. we should definitely encourage using similarity per field type, but for people who have used it in the past, having it continue to work as a global default when fieldTypes don't define a similarity gives us nice back-compatibility. {quote} I agree here, this is a good compromise and by not advertising it in the example, I won't have concerns about the example being confusing. {quote} More generally though, i'm thinking that the same similarity/ tag can be used for both the old style (global default) Similarity/SimilarityFactory and the new SimilarityProviderFactory using instanceof checks... {quote} I have to disagree on this one. The new SimilarityProvider serves a totally different purpose, its not a global sim: it answers to requests for sims for specific fields. The only reason I provided a factory for it, is so that users can tune the parts of lucene's relevance ranking system that are not per-field: coord() and queryNorm(). But its not a way to configure tf() or idf() or anything like that. In the patch I added this with expert to the example, though we could remove it from the example entirely if its too expert (might be?) So I think we should do as you suggest and allow a global similarity/ that is the default term weighting unless otherwise specified by a field, but we shouldn't confuse this with the parts that arent field-specific... {quote} The one other thing i just noticed is that you have SimilarityProviderFactory.init(SolrParams) {quote} I configured it this way, because this is how similarity/ worked before (and it was just enough XML to not scare me away). Is it possible we can defer this improvement to a later issue? I think we should give this a little more thought, for example if we do this its a break in the API for similarityFactory/, which this patch does not actually do: it only MOVES it to the fieldType. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012812#comment-13012812 ] Robert Muir commented on SOLR-2338: --- By the way on the last part, its probably hard to see in the patch: this is because SimilarityFactory is currently (temporarily) backwards broken versus say, Solr 1.4, because I didn't want to take away any Solr capabilities until we resolved this issue... so in trunk right now it returns SimilarityProvider (which i know makes looking at the patch confusing). improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012823#comment-13012823 ] Robert Muir commented on SOLR-2338: --- I see, as far as using NamedList for the new SimilarityProviderFactory, that would be easy. However, at the moment my vote is against this, at least until SOLR-2292 is completed. At the moment NamedList contains methods such as get(String) which have a slow linear runtime, and just out of paranoia I think we should keep NamedList as far away from the scoring system as possible. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992614#comment-12992614 ] Yonik Seeley commented on SOLR-2338: Yep, sounds like a great idea! Should we specify the similarity class in each fieldType that want's to use a non-default similarity: {code} fieldType analyzer.../analyzer similarity class=.../similarity /fieldType {code} Or use named similarities and refer to them: {code} fieldType analyzer.../analyzer similarity name=short_text/ /fieldType similarity name=short_text class=.../similarity {code} improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992670#comment-12992670 ] Robert Muir commented on SOLR-2338: --- doesn't matter to me really, but what is the advantage of the named similarities? this would be a bit inconsistent from how you configure analyzers (and an additional level of indirection that might be confusing)... or am I missing something? improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992676#comment-12992676 ] Yonik Seeley commented on SOLR-2338: Other components in solrconfig use that indirection, but I'm fine w/ the approach taken by tokenizer / token filter config. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992728#comment-12992728 ] Hoss Man commented on SOLR-2338: Most existing situations where plugins are dereferenced by name are so we can reuse the exact same object instance (ie: for recording stats, or because they are heavyweight to construct on the fly) in the case of similarity, the main advantage i can think of would be if we wanted true per-field similiarity declaration, not just per field type ie... {code} similarity name=S_XX class=.../similarity similarity name=S_YY class=.../similarity ... fieldType name=FT_AA analyzer.../analyzer similarity name=S_XX/ /fieldType ... field name=F_111 type=FT_AA /!-- implied S_XX -- field name=F_222 type=FT_AA similarity=S_YY / {code} ...but even if we don't do that, i suppose it's also conceivable that someone might have their own Similarity implementation that is expensive to instantiate (ie: maintains some big in memory data structures?) and might want to be able to declare one instance and then refer to it by name in many different fieldType declarations. I think for now just supporting the first example yonik cited... {code} fieldType analyzer.../analyzer similarity class=.../similarity /fieldType {code} would be a huge win, and we can always enhance to add name derefrencing later. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992735#comment-12992735 ] Robert Muir commented on SOLR-2338: --- {quote} ...but even if we don't do that, i suppose it's also conceivable that someone might have their own Similarity implementation that is expensive to instantiate (ie: maintains some big in memory data structures?) and might want to be able to declare one instance and then refer to it by name in many different fieldType declarations. {quote} I don't think this is really a use case we need to support: the purpose of Similarity today is to do term weighting, not to be a huge data-structure holder. While I know Mike's original patch went this way with LUCENE-2392 (e.g. norms), I'm not sure i like it being in Similarity in the future either. Otherwise concepts like lazy-loading norms and all this other stuff get pushed onto the sim, which is an awkward place (imagine if you have many fields). So, I think we shouldn't really design for abuses of the API. If there are other use cases for named similarity that have to do with term weighting, I'm interested. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org