Re: DIH use of the ?command=full-import entity= command option

2009-03-15 Thread Shalin Shekhar Mangar
On Fri, Mar 13, 2009 at 9:56 PM, Jon Baer jonb...@gmail.com wrote:

 Bare in mind (and correct me if Im wrong) but a full-import is still a
 full-import no matter what entity you tack onto the param.

 Thus I think clean=false should be appended (a friend starting off in Solr
 was really confused by this + could not understand why it did a delete on
 all documents).

 Im not sure if that is clearly stated in the Wiki ...


Yes it is confusing and even more now that we have preImportDeleteQuery.

For a full-import command, the default is clean=true. If clean=false is
specified, then no cleanup is done (not even pre/postImportDeleteQuery).
Even if there is a pre/postImportDeleteQuery, if the first root entity does
not have a preImportDeleteQuery then all documents are deleted (which I
guess is a bug). For a delta-import command, the default is clean=false (and
no pre/postImportDeleteQuery is run).

I think we should open an issue to figure out and implement an acceptable
behavior before we release 1.4
-- 
Regards,
Shalin Shekhar Mangar.


Re: Adding authentication Token to the CommonsHttpSolrServer

2009-03-15 Thread Shalin Shekhar Mangar
On Fri, Mar 13, 2009 at 12:03 AM, Narayanan, Karthikeyan 
karthikeyan.naraya...@gs.com wrote:

 Hi,
  We have installed the Solr in Tomcat server and enabled the
 security constraint at the Tomcat level.. We require to pass the
 authentication token(cookie) to the search call that is made using
 CommonsHttpSolrServer. Would like to know how can I add  the token to
 the CommonsHttpSolrServer. Appreciate any idea on this.


I took a look at commons-httpclient javadocs. This should work:

Create an instance of HttpState and use the HttpState#addCookie to add your
auth cookie. Then you can create an instance of HttpClient and use
HttpClient#setState method to pass the HttpState object. Finally pass this
HttpClient object to the constructor of CommonsHttpSolrServer.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DIH use of the ?command=full-import entity= command option

2009-03-15 Thread Jon Baer
I think it could be as simple as if you have +1 entities in the param  
that clean=false as well (because you are specifically interested in  
just targeting that entity import) ...


- Jon

On Mar 15, 2009, at 3:07 AM, Shalin Shekhar Mangar wrote:


On Fri, Mar 13, 2009 at 9:56 PM, Jon Baer jonb...@gmail.com wrote:

Bare in mind (and correct me if Im wrong) but a full-import is  
still a

full-import no matter what entity you tack onto the param.

Thus I think clean=false should be appended (a friend starting off  
in Solr
was really confused by this + could not understand why it did a  
delete on

all documents).

Im not sure if that is clearly stated in the Wiki ...



Yes it is confusing and even more now that we have  
preImportDeleteQuery.


For a full-import command, the default is clean=true. If clean=false  
is
specified, then no cleanup is done (not even pre/ 
postImportDeleteQuery).
Even if there is a pre/postImportDeleteQuery, if the first root  
entity does
not have a preImportDeleteQuery then all documents are deleted  
(which I
guess is a bug). For a delta-import command, the default is  
clean=false (and

no pre/postImportDeleteQuery is run).

I think we should open an issue to figure out and implement an  
acceptable

behavior before we release 1.4
--
Regards,
Shalin Shekhar Mangar.




RE: How to correctly boost results in Solr Dismax query

2009-03-15 Thread Dean Missikowski (Consultant), CLSA
Hi,

My experience is that the BQ parameter can be used with any query type.
You can define boosts on the query fields (qf) that are used with the
query terms (q) in your query, AND you can define additional boosts for
fields that are not used with the query terms through the bq or bf
parameters. 

I think the relative weight that assigning a particular boost to a field
via BQ has on the overall scoring needs to take into consideration the
other fields in your query. If you're searching on titles, you might
want to consider setting omitNorms=true (means don't generate length
normalization vectors) for title in your schema.xml, and if you're using
Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
that results aren't skewed by short and long titles, or titles that
contain multiple occurrences of the same term (setting these requires
you to reindex). I think this should have the effect of making BQ boosts
like bq=media:DVD^2bq=media:BLU-RAY^1.5 more effective. 

-- Dean

-Original Message-
From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
Sent: 13/03/2009 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: How to correctly boost results in Solr Dismax query

Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
 bq works only with q.alt query and not with q queries. So, in your
case you
 would be using qf parameter for field boosting, you will have to give
both
 the fields in qf parameter i.e. both title and media.
 
 try this
 
 str name=qfmedia^1.0 title^100.0/str

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


 Pete Smith-3 wrote:
  
  Hi Amit,
  
  Thanks again for your reply. I am understanding it a bit better but
I
  think it would help if I posted an example. Say I have three
records:
  
  doc
  long name=id1/long
  str name=mediaBLU-RAY/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id2/long
  str name=mediaDVD/str
  str name=titleIndiana Jones and the Kingdom of the Crystal
  Skull/str
  /doc
  doc
  long name=id3/long
  str name=mediaDVD/str
  str name=titleCasino Royale/str
  /doc
  
  Now, if I search for indiana: select?q=indiana
  
  I want the first two rows to come back (not the third as it does not
  contain 'indiana'). I would like record 2 to be scored higher than
  record 1 as it's media type is DVD.
  
  At the moment I have in my config:
  
  str name=qftitle/str
  
  And i was trying to boost by media having a specific value by using
'bq'
  but from what you told me that is incorrect.
  
  Cheers,
  Pete
  
  
  On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
  Pete,
  
  Sorry, if wasnt clear. Here is the explanation.
  
  Suppose you have 2 records and they have films and media as 2
columns.
  
  Now first record has values like films=Indiana and media=blue
ray
  and 2nd record has values like films=Bond and media=Indiana
  
  Values for qf parameters
  
  str name=qfmedia^2.0 films^1.0/str
  
  Now, search for q=Indiana .. it should display both of the records
but
  record #2 will display above than the 1st.
  
  Let me know if you still have questions.
  
  Cheers,
  amit
  
  
  Pete Smith-3 wrote:
   
   Hi Amit,
   
   Thanks very much for your reply. What you said makes things a bit
   clearer but I am still a bit confused.
   
   On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
   If you want to boost the records with their field value then you
must
  use
   q
   query parameter instead of q.alt. 'q' parameter actually uses qf
   parameters
   from solrConfig for field boosting.
   
  From the documentation for Dismax queries, I thought that q is
simply
   a keyword parameter:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   q
   The guts of the search defining the main query. This is
designed to
  be
   support raw input strings provided by users with no special
escaping.
   '+' and '-' characters are treated as mandatory and
prohibited
   modifiers for the subsequent terms. Text wrapped in balanced
quote
   characters '' are treated as phrases, any query containing an
odd
   number of quote characters is evaluated as if there were no quote
   characters at all. Wildcards in this q parameter are not
supported. 
   
   And I thought 'qf' is a list of fields and boost scores:
   
  From http://wiki.apache.org/solr/DisMaxRequestHandler:
   qf (Query Fields)
   List of fields and the boosts to associate with each of them
when
   building DisjunctionMaxQueries from the user's query. The format
   supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which
indicates that
   fieldOne has a boost of 2.3, fieldTwo has the default boost, and
   fieldThree has a boost of 0.4 ... this indicates that matches in
   fieldOne are much more significant than matches in fieldTwo,
which are
   more significant than matches in fieldThree. 
   
   But if I want to, say, search for films with 'indiana' in the
title,
   with media=DVD scoring higher than