[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-04-25 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024811#comment-13024811
 ] 

Fuad Efendi commented on SOLR-2338:
---

test-files/solr/conf/schema.xml contains sample of per-field definitions;
example/solr/schema.xml doesn't have it yet

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2338.patch, SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013618#comment-13013618
 ] 

Hoss Man commented on SOLR-2338:


i was confused by some of roberts comments, and clarified them with him on IRC.

summary (from my perspective)

* global default similarity(factory) (using existing {{similarity/}} tag) 
is a good idea as a fall back for fieldTypes that don't define custom similarity
* {{similarity/}} should probably not be advertised in the example configs .. 
but maybe, depends
* SimilarityProvider should use a distinct config tag 
({{similarityProvider/}} because it really is distinct, and people should (in 
theory) be able to use both)
* SolrSimilarityProvider's get(field) method (which i didn't realize was final, 
hence part of my confusion) should be changed to use the {{similarity/}} as a 
default if it was specified.
* SolrSimilarityProvider's get(field) method really needs to stay final, and 
should have docs explain why (consistency with schema)
* SimilarityProviderFactory.init can be changed to using NamedList, but the 
docs should warn people about the possibility of performance penalties for 
using it directly in their SolrSimilarityProvider


 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-30 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013626#comment-13013626
 ] 

Yonik Seeley commented on SOLR-2338:



bq. similarity/ should probably not be advertised in the example configs ..

Do you mean the expert-level SimilarityProvider, which is typically only needed 
to change coord() or queryNorm()?  If so, I'd agree - it's so expert level that 
most people shouldn't worry about it, hence it can simply be documented 
elsewhere.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-30 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013633#comment-13013633
 ] 

Hoss Man commented on SOLR-2338:


bq. Do you mean the expert-level SimilarityProvider, 

no i ment the existing global default {{similarity/}} mentioned in the 
previous bullet (but i was definitley vague -- sorry).  we probably shouldn't 
go out of our way to advertise the global option anymore, we should encourage 
people to use the fieldType specific similarity instead.

whether we should promote customizing the SimilarityProvider in the example .. 
i dunno, i can see it either way.  but it seems like the order of importance 
for visibility is probably something like:

* customize per fieldType similarity (new hotness)
* customize similarityprovider (not something you are likely to need to do, but 
if you do there's really only one way so maybe we should advertise it's an 
option)
* customize global default similarity (odds are you really don't need this, you 
can probably do it using per fieldType similarity)

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013652#comment-13013652
 ] 

Robert Muir commented on SOLR-2338:
---

I agree with your order, i think to keep the example simple we should just have 
a commented out example for a fieldType that customizes its similarity?

I think we can support customizing the similarityprovider (in case you want to 
change how coord is calculated), and support changing the 
default-unless-otherwise-specified-in-the-schema similarity (for easier 
backwards transition), but I'm thinking it would be actually more confusing 
than helpful to advertise these in the example.


 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-30 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013686#comment-13013686
 ] 

Yonik Seeley commented on SOLR-2338:


I dunno - it seems much more likely that someone would want to easily change 
the default sim for all fields rather than change coord.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012538#comment-13012538
 ] 

Robert Muir commented on SOLR-2338:
---

I'll commit this in a few days unless anyone objects.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-29 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012787#comment-13012787
 ] 

Hoss Man commented on SOLR-2338:


skimming the patch, the one thing i'm not clear on is what happens when someone 
who has been using a custom similarity (or similarity factory) in Solr 1.4 or 
3.1 will be affected on upgrading.

the patch seems to remove the code that allows for a (global) {{similarity/}} 
element in schema.xml (replacing it with a check for {{similarityProvider/}}

i'm not clear on whether there is really a compelling reason for this (if there 
is we should have a nice fat warning in the upgrading section of CHANGES.txt) 
or if we could still continue to respect the {{similarity/}} syntax ... it 
seems like that one tag could be used to refrence (by classname) a global 
SimilarityProviderFactory, or a default Similarity instance or default.

even if there's a really good reason not to keep using what we might find in 
{{similarity/}}, we should check for it and log a nice fat error message 
saying it's being ignored.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012792#comment-13012792
 ] 

Robert Muir commented on SOLR-2338:
---

Hoss, thanks for looking at the patch.

As far as the current form, Similarity is purely per-field, dictated by the 
schema (like Analyzers). For simplicity i removed the global one completely.

If we want, the following options are available:
* (global) similarity/ configures the default, which is used unless 
overridden by the fieldtype.
* (global) similarity/ triggers an error.

The only reason I yanked it in the patch, was that I felt it could be confusing 
to have this inheritance so to speak... but if you think its not confusing, 
we could support the global similarity/, which would always be used unless 
you supply a different similarity/ for a specific field.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-29 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012802#comment-13012802
 ] 

Hoss Man commented on SOLR-2338:


bq. but if you think its not confusing, we could support the global 
similarity/, which would always be used unless you supply a different 
similarity/ for a specific field.

i don't personally think it would be confusing, but i also don't think we need 
to advertise it in the example.

we should definitely encourage using similarity per field type, but for people 
who have used it in the past, having it continue to work as a global default 
when fieldTypes don't define a similarity gives us nice back-compatibility.

More generally though, i'm thinking that the same {{similarity/}} tag can be 
used for both the old style (global default) Similarity/SimilarityFactory and 
the new SimilarityProviderFactory using instanceof checks...

* instanceOf SimilarityProviderFactory
** instantiate it and use it.
* instanceOf SimilarityFactory
** instantiate and wrap it in a SolrSimilarityProvider that delegates to it 
when the field type has no similarity set on it.  return an anonymous 
SimilarityProviderFactory that uses this SolrSimilarityProvider
* instanceOf Similarity
** instantiate and wrap it in a SolrSimilarityProvider that delegates to it 
when the field type has no similarity set on it.  return an anonymous 
SimilarityProviderFactory that uses this SolrSimilarityProvider

...that way there is only one global option that can be specified, and we 
don't have to deal with weird edge cases of what the default should be for a 
fieldTYpe w/o a similarity if the schema.xml specifies both {{similarity/}} 
and {{similarityProvider/}}


The one other thing i just noticed is that you have 
SimilarityProviderFactory.init(SolrParams) ... my vote would be to start using 
NamedList based initialization for all new types of solr plugins.  it requires 
more verbosity in the config, but it supports a lot more types of information 
(multivalued keys, nested lists/maps, etc...) and could eventually lead us to 
actually being able to validate our config files using an XMLSchema and/or DTD 
(since the element/node names are finite)



 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012809#comment-13012809
 ] 

Robert Muir commented on SOLR-2338:
---

{quote}
i don't personally think it would be confusing, but i also don't think we need 
to advertise it in the example.

we should definitely encourage using similarity per field type, but for people 
who have used it in the past, having it continue to work as a global default 
when fieldTypes don't define a similarity gives us nice back-compatibility.
{quote}

I agree here, this is a good compromise and by not advertising it in the 
example, I won't have concerns about the example being confusing.

{quote}
More generally though, i'm thinking that the same similarity/ tag can be used 
for both the old style (global default) Similarity/SimilarityFactory and the 
new SimilarityProviderFactory using instanceof checks...
{quote}

I have to disagree on this one. The new SimilarityProvider serves a totally 
different purpose, its not a global sim: it answers to requests for sims for 
specific fields. The only reason I provided a factory for it, is so that users 
can tune the parts of lucene's relevance ranking system that are not per-field: 
coord() and queryNorm(). But its not a way to configure tf() or idf() or 
anything like that. In the patch I added this with expert to the example, 
though we could remove it from the example entirely if its too expert (might 
be?)

So I think we should do as you suggest and allow a global similarity/ that is 
the default term weighting unless otherwise specified by a field, but we 
shouldn't confuse this with the parts that arent field-specific...

{quote}
The one other thing i just noticed is that you have 
SimilarityProviderFactory.init(SolrParams) 
{quote}

I configured it this way, because this is how similarity/ worked before (and 
it was just enough XML to not scare me away). Is it possible we can defer this 
improvement to a later issue? I think we should give this a little more 
thought, for example if we do this its a break in the API for 
similarityFactory/, which this patch does not actually do: it only MOVES it 
to the fieldType.


 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012812#comment-13012812
 ] 

Robert Muir commented on SOLR-2338:
---

By the way on the last part, its probably hard to see in the patch: this is 
because SimilarityFactory is currently (temporarily) backwards broken versus 
say, Solr 1.4, because I didn't want to take away any Solr capabilities until 
we resolved this issue... so in trunk right now it returns SimilarityProvider 
(which i know makes looking at the patch confusing).


 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012823#comment-13012823
 ] 

Robert Muir commented on SOLR-2338:
---

I see, as far as using NamedList for the new SimilarityProviderFactory, that 
would be easy.

However, at the moment my vote is against this, at least until SOLR-2292 is 
completed.

At the moment NamedList contains methods such as get(String) which have a slow 
linear runtime,
and just out of paranoia I think we should keep NamedList as far away from the 
scoring system
as possible.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2338.patch, SOLR-2338.patch


 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml

2011-02-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992614#comment-12992614
 ] 

Yonik Seeley commented on SOLR-2338:


Yep, sounds like a great idea!
Should we specify the similarity class in each fieldType that want's to use a 
non-default similarity:

{code}
fieldType
  analyzer.../analyzer
  similarity class=.../similarity
/fieldType
{code}

Or use named similarities and refer to them:
{code}
fieldType 
  analyzer.../analyzer
  similarity name=short_text/
/fieldType

similarity name=short_text class=.../similarity
{code}

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml

2011-02-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992670#comment-12992670
 ] 

Robert Muir commented on SOLR-2338:
---

doesn't matter to me really, but what is the advantage of the named 
similarities?

this would be a bit inconsistent from how you configure analyzers (and an 
additional level
of indirection that might be confusing)... or am I missing something?


 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml

2011-02-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992676#comment-12992676
 ] 

Yonik Seeley commented on SOLR-2338:


Other components in solrconfig use that indirection, but
I'm fine w/ the approach taken by tokenizer / token filter config.



 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml

2011-02-09 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992728#comment-12992728
 ] 

Hoss Man commented on SOLR-2338:


Most existing situations where plugins are dereferenced by name are so we can 
reuse the exact same object instance (ie: for recording stats, or because they 
are heavyweight to construct on the fly)

in the case of similarity, the main advantage i can think of would be if we 
wanted true per-field similiarity declaration, not just per field type ie...

{code}
similarity name=S_XX class=.../similarity
similarity name=S_YY class=.../similarity
...
fieldType name=FT_AA 
  analyzer.../analyzer
  similarity name=S_XX/
/fieldType
...
field name=F_111 type=FT_AA /!-- implied S_XX --
field name=F_222 type=FT_AA similarity=S_YY /
{code}

...but even if we don't do that, i suppose it's also conceivable that someone 
might have their own Similarity implementation that is expensive to instantiate 
(ie: maintains some big in memory data structures?) and might want to be able 
to declare one instance and then refer to it by name in many different 
fieldType declarations.

I think for now just supporting the first example yonik cited...

{code}
fieldType
  analyzer.../analyzer
  similarity class=.../similarity
/fieldType
{code}

would be a huge win, and we can always enhance to add name derefrencing later.

 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2338) improved per-field similarity integration into schema.xml

2011-02-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992735#comment-12992735
 ] 

Robert Muir commented on SOLR-2338:
---

{quote}
...but even if we don't do that, i suppose it's also conceivable that someone 
might have their own Similarity implementation that is expensive to instantiate 
(ie: maintains some big in memory data structures?) and might want to be able 
to declare one instance and then refer to it by name in many different 
fieldType declarations.
{quote}

I don't think this is really a use case we need to support: the purpose of 
Similarity today is to do term weighting, not to be a huge data-structure 
holder.

While I know Mike's original patch went this way with LUCENE-2392 (e.g. norms), 
I'm not sure i like it being in Similarity in the future either.

Otherwise concepts like lazy-loading norms and all this other stuff get pushed 
onto the sim, which is an awkward place (imagine if you have many fields). 

So, I think we shouldn't really design for abuses of the API. If there are 
other use cases for named similarity that have to do with term weighting, I'm 
interested.


 improved per-field similarity integration into schema.xml
 -

 Key: SOLR-2338
 URL: https://issues.apache.org/jira/browse/SOLR-2338
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Currently since LUCENE-2236, we can enable Similarity per-field, but in 
 schema.xml there is only a 'global' factory
 for the SimilarityProvider.
 In my opinion this is too low-level because to customize Similarity on a 
 per-field basis, you have to set your own
 CustomSimilarityProvider with similarity class=.../ and manage the 
 per-field mapping yourself in java code.
 Instead I think it would be better if you just specify the Similarity in the 
 FieldType, like after analyzer.
 As far as the example, one idea from LUCENE-1360 was to make a short_text 
 or metadata_text used by the
 various metadata fields in the example that has better norm quantization for 
 its shortness...

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org