It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Gabriele Kahlout
Hello,

I'm testing out different Similarity implementations, and to do that I
restart Solr each time I want to try a different similarity class I change
the class attributed of the similiary element in schema.xml. Beside running
multiple-cores, each with its own schema, is there a way to tell the
RequestHandler which similarity class to use?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Erik Hatcher
No, there's not a way to control Similarity on a per-request basis.  

Some factors from Similarity are computed at index-time though.

What factors are you trying to tweak that way and why?  Maybe doing boosting 
using some other mechanism (boosting functions, boosting clauses) would be a 
better way to go?

Erik




On Jun 16, 2011, at 14:55 , Gabriele Kahlout wrote:

 Hello,
 
 I'm testing out different Similarity implementations, and to do that I
 restart Solr each time I want to try a different similarity class I change
 the class attributed of the similiary element in schema.xml. Beside running
 multiple-cores, each with its own schema, is there a way to tell the
 RequestHandler which similarity class to use?
 
 -- 
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).
 
 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).



Re: It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Gabriele Kahlout
On Thu, Jun 16, 2011 at 9:14 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 No, there's not a way to control Similarity on a per-request basis.

 Some factors from Similarity are computed at index-time though.


You got me on this.


 What factors are you trying to tweak that way and why?  Maybe doing
 boosting using some other mechanism (boosting functions, boosting clauses)
 would be a better way to go?

 I'm trying to assess the impact of coord (search-time) on Qtime. In one
implementation coord returns 1, while in another it's actually computed.

Running multiple cores adds considerable complication (must specify to share
data but not conf).
Patching the request handler to change similarity (didn't yet look into
this) will only change 'search-time' similarity. How about breaking up
similarity into run-time and compile-time? So requesthandler could take a
parameter to 'safely' set the run-time similarity?
I think many would welcome such responsibility distinction.


Erik




 On Jun 16, 2011, at 14:55 , Gabriele Kahlout wrote:

  Hello,
 
  I'm testing out different Similarity implementations, and to do that I
  restart Solr each time I want to try a different similarity class I
 change
  the class attributed of the similiary element in schema.xml. Beside
 running
  multiple-cores, each with its own schema, is there a way to tell the
  RequestHandler which similarity class to use?
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee acknowledges the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or the
 email
  does not contain a valid code then the email is not received. A valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
  L(-[a-z]+[0-9]X)).




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: It's not possible to decide at run-time which similarity class to use, right?

2011-06-16 Thread Robert Muir
On Thu, Jun 16, 2011 at 3:23 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 I'm trying to assess the impact of coord (search-time) on Qtime. In one
 implementation coord returns 1, while in another it's actually computed.

On query time?

coord should be really cheap (unless your impl does something like
calculate a million digits of pi), as it is not actually computed
per-document.
instead, the result of all possible coord factors (e.g. 1/5, 2/5, 3/5,
4/5, 5/5) is computed up-front by BooleanQuery's scorers into a table.

See 
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/search/BooleanScorer.java
and 
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/search/BooleanScorer2.java