Re: Having trouble getting boosting queries to work with multiple terms

2012-10-17 Thread Asfand Qazi
Several ideas there, thanks guys.  I'll re-evaluate how I build and use 
the index from now on.


Asfand Qazi

On 16/10/12 17:35, Tomás Fernández Löbbe wrote:

I think sorting should work too, as I suggested before. In this case
(because by coincidence you need alphabetic sort on the type) sort=type
asc, score desc should work.

If you need to add other types, maybe add an int field that represents how
you would like those to be sorted. between types, the regular score will be
used for sorting.

Tomás

On Tue, Oct 16, 2012 at 1:18 PM, Walter Underwood wun...@wunderwood.orgwrote:


Here is an approach that avoids the IDF problem.

Add another field, perhaps named priority. In that field, put a boost
value, like 100 for allele docs, 10 for mi_attempt docs, and so on. In the
boost part of the query, use the value of that field boost=priority.

If you cannot change the index, you may be able do the same thing with if
statements in the function query, see
http://wiki.apache.org/solr/FunctionQuery

This is a common design request, to show all results of type A before all
results of type B, and it has a common and severe problem. If your query
term is common, the user will see 10,000 hits of type A, all the way to the
least relevant, before they see the highly-relevant first hit of type B.
So, the search is broken for all common query terms and there is nothing
the user can do to fix it.

Instead, use a smaller boost, maybe a bit more than a tiebreaker, but not
enough to force a total ordering. You may also want to use facets or fixed
filters, so that users can select only alleles or only my_attempts.

wunder

On Oct 16, 2012, at 8:21 AM, Asfand Qazi wrote:


On 16/10/12 16:15, Walter Underwood wrote:

Why do you want that ordering? That isn't what Solr is designed to do.

It is designed for relevance. I expect that idf (the rarity of the terms)
is being used in the ordering. mi_attempt is probably much more rare than
allele.


If you want that strict ordering, I recommend doing three queries and

concatenating the three result sets.


wunder


I want that ordering because alleles are more 'important' to a biologist

than an mi_attempt, which is 'more important' than a phenotype_attempt.


If Solr isn't designed for this kind of stuff, then I will do the

sorting manually after I have received all the documents.  I could give
huge boost values to each term, but then I guess I'm just using a
sledgehammer to crack a nut.


Thanks

--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research

Limited, a charity registered in England with number 1021457 and a company
registered in England with number 2742969, whose registered office is 215
Euston Road, London, NW1 2BE.

--
Walter Underwood
wun...@wunderwood.org









--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Having trouble getting boosting queries to work with multiple terms

2012-10-16 Thread Asfand Qazi

Hello,

The Solr server I am driving is found publicly at 
http://ikmc.vm.bytemark.co.uk:8983/solr/allele/search , it contains 
freely available information from science research establishments.


It contains many documents, and I usually do is look up all documents 
where the 'mgi_accession_id' field matches what I want it to.  This 
returns several documents, each one having a 'type' field.  The value 
can be either 'allele', 'mi_attempt' or 'phenotype_attempt'.


What I want to do is return all documents where the 'mgi_accession_id' 
matches what I want, and I want the documents ordered such that 
'type:allele' docs are at the top, followed by 'type:mi_attempt' docs, 
followed last by 'type:phenotype_attempt' docs.


Here is an example of a query I fire at it:

http://ikmc.vm.bytemark.co.uk:8983/solr/allele/search?q=mgi_accession_id:MGI:1315204bq=type:allele^100 
type:mi_attempt^10 type:phenotype_attempt^1


All the docs end up with the same score!

I'm clearly doing something wrong, but what?  Help is appreciated.

Thanks in advance.

--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: Having trouble getting boosting queries to work with multiple terms

2012-10-16 Thread Asfand Qazi

Hi, thanks for the reply.

I tried that:

http://ikmc.vm.bytemark.co.uk:8983/solr/allele/search?q=mgi_accession_id:MGI:1315204bq=type:allele^100 
OR type:mi_attempt^10 OR type:phenotype_attempt^1


(forgive the wrapping)

and I got mi_attempt at the top, then the allele, then the 
phenotype_attempt .  It should be allele first, then mi_attempt, then 
phenotype_attempt.  You can replicate it with the above URL, it is a 
publicly available index.


Thanks

On 16/10/12 15:37, Tomás Fernández Löbbe wrote:

you are missing the OR between the clauses of the bq. Try with:

bq=type:allele^100 OR type:mi_attempt^10 OR type:phenotype_attempt^1

or set OR as your default operator in the schema.xml

Tomás

On Tue, Oct 16, 2012 at 10:37 AM, Asfand Qazi a...@sanger.ac.uk wrote:


Hello,

The Solr server I am driving is found publicly at
http://ikmc.vm.bytemark.co.uk:**8983/solr/allele/searchhttp://ikmc.vm.bytemark.co.uk:8983/solr/allele/search,
 it contains freely available information from science research
establishments.

It contains many documents, and I usually do is look up all documents
where the 'mgi_accession_id' field matches what I want it to.  This returns
several documents, each one having a 'type' field.  The value can be either
'allele', 'mi_attempt' or 'phenotype_attempt'.

What I want to do is return all documents where the 'mgi_accession_id'
matches what I want, and I want the documents ordered such that
'type:allele' docs are at the top, followed by 'type:mi_attempt' docs,
followed last by 'type:phenotype_attempt' docs.

Here is an example of a query I fire at it:

http://ikmc.vm.bytemark.co.uk:**8983/solr/allele/search?q=mgi_**
accession_idhttp://ikmc.vm.bytemark.co.uk:8983/solr/allele/search?q=mgi_accession_id
:MGI:1315204bq=**type:allele^100 type:mi_attempt^10
type:phenotype_attempt^1

All the docs end up with the same score!

I'm clearly doing something wrong, but what?  Help is appreciated.

Thanks in advance.

--
Regards,
   Asfand Yar Qazi
   Team 87 - High Throughput Gene Targeting
   Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a company
registered in England with number 2742969, whose registered office is 215
Euston Road, London, NW1 2BE.





--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: Having trouble getting boosting queries to work with multiple terms

2012-10-16 Thread Asfand Qazi

On 16/10/12 16:15, Walter Underwood wrote:

Why do you want that ordering? That isn't what Solr is designed to do. It is designed for 
relevance. I expect that idf (the rarity of the terms) is being used in the ordering. 
mi_attempt is probably much more rare than allele.

If you want that strict ordering, I recommend doing three queries and 
concatenating the three result sets.

wunder


I want that ordering because alleles are more 'important' to a biologist 
than an mi_attempt, which is 'more important' than a phenotype_attempt.


If Solr isn't designed for this kind of stuff, then I will do the 
sorting manually after I have received all the documents.  I could give 
huge boost values to each term, but then I guess I'm just using a 
sledgehammer to crack a nut.


Thanks

--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Possible to have Solr documents with deeply nested data structures (i.e. 'hashes within hashes')?

2012-08-30 Thread Asfand Qazi

Hi,

Is it possible to have a Solr documents with deeply nested data structures?

e.g. (in JSON)

{
name: Fred,
measurements: {
chest: 15,
legs: 32,
...
}
}

?

Thanks

--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: Possible to have Solr documents with deeply nested data structures (i.e. 'hashes within hashes')?

2012-08-30 Thread Asfand Qazi

On 30/08/12 15:19, Jack Krupansky wrote:

The general rule is that you need to flatten your data. So, you would
have chest_measurement and leg_measurement fields.

-- Jack Krupansky


Ah.  What if I cannot flatten it because I have an array of hashes?

Thanks

Asfand Yar Qazi



-Original Message- From: Asfand Qazi
Sent: Thursday, August 30, 2012 6:03 AM
To: solr-user@lucene.apache.org
Subject: Possible to have Solr documents with deeply nested data
structures (i.e. 'hashes within hashes')?

Hi,

Is it possible to have a Solr documents with deeply nested data structures?

e.g. (in JSON)

{
 name: Fred,
 measurements: {
 chest: 15,
 legs: 32,
 ...
 }
}

?

Thanks




--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: Possible to have Solr documents with deeply nested data structures (i.e. 'hashes within hashes')?

2012-08-30 Thread Asfand Qazi

On 30/08/12 15:51, Alexandre Rafalovitch wrote:

Don't treat SOLR as your primary database with complex structure, it
is not built for that.


On 30/08/12 16:01, Jack Krupansky wrote:

Maybe start by focusing on what you expect that a user query will look
like in Solr.



Yeah, thanks guys - I'll remember that.  Maybe I'm getting ahead of myself.

The consensus around here seems to be to start using multiple cores to 
hold the different bits of a deeply nested record anyway, but I'll 
remember the flattening suggestions made.


Thanks

--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: Cannot get highlighting to work

2012-06-01 Thread Asfand Qazi

On 31/05/12 21:10, Jack Krupansky wrote:

Try a query that uses a term that doesn't split an alphanumeric term
into two terms.

Then check to see what field type you used for the symbol and
marker_symbol fields and whether the analyzer for that field type has
changed in 3.6.



Aha - yes, not using number fields makes the highlighter work.  The 
analyzer had been changed by another dev (helpfully) for the fields I 
was trying to highlight to solr.KeywordTokenizerFactory - I changed it 
back to solr.WhitespaceTokenizerFactory, as it was in the 1.4 config.


With a lot of hope I tried to fire the same query, but the exact same 
thing happened - the highlighting for a document is an empty document 
(i.e. { } ) just like before.


Any other clues?

Thanks







-- Jack Krupansky
-Original Message- From: Asfand Qazi
Sent: Thursday, May 31, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: Cannot get highlighting to work

Hello,

I am having problems doing highlighting a Solr 3.6 instance, while it
was working just fine before on our 1.4 instance.

The solrconfig.xml and schema.xml files are located here:

https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml


(please note the incorrect line wrapping - it should be on one line)


https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml


(please note the incorrect line wrapping - it should be on one line)


The query I fire off (which worked on the 1.4 instance) is:

/solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true


(please note the incorrect line wrapping - it should be on one line)

I expect a section like:
{
MGI:105369: {
symbol: [
emCbx/emem1/em
],
marker_symbol: [
emCbx/emem1/em
]
}
}


I get:
{
MGI:105369: { }
}


Can anyone help?

Thanks





--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Re: Cannot get highlighting to work

2012-06-01 Thread Asfand Qazi
Ah... on further inspection of the schema, I saw that the field type was 
a custom one that had been configured differently from the standard 
'text' one.  I simply got rid of the custom field type and set it back 
to text.  Then as you said I reindexed the data (another blunder on my 
part before).  Now it works!  Thanks


On 01/06/12 13:43, Jack Krupansky wrote:

I got confused in the last paragraph - does a purely alphabetic term get
highlighted properly or not? I am trying to figure out if the problem
relates only to terms that decompose into phrases (as alphanumeric terms
do) or for all terms. Thanks.

If the analyzer changes, the data must be reindexed.

-- Jack Krupansky

-Original Message- From: Asfand Qazi
Sent: Friday, June 01, 2012 5:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Cannot get highlighting to work

On 31/05/12 21:10, Jack Krupansky wrote:

Try a query that uses a term that doesn't split an alphanumeric term
into two terms.

Then check to see what field type you used for the symbol and
marker_symbol fields and whether the analyzer for that field type has
changed in 3.6.



Aha - yes, not using number fields makes the highlighter work. The
analyzer had been changed by another dev (helpfully) for the fields I
was trying to highlight to solr.KeywordTokenizerFactory - I changed it
back to solr.WhitespaceTokenizerFactory, as it was in the 1.4 config.

With a lot of hope I tried to fire the same query, but the exact same
thing happened - the highlighting for a document is an empty document
(i.e. { } ) just like before.

Any other clues?

Thanks







-- Jack Krupansky
-Original Message- From: Asfand Qazi
Sent: Thursday, May 31, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: Cannot get highlighting to work

Hello,

I am having problems doing highlighting a Solr 3.6 instance, while it
was working just fine before on our 1.4 instance.

The solrconfig.xml and schema.xml files are located here:

https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml



(please note the incorrect line wrapping - it should be on one line)


https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml



(please note the incorrect line wrapping - it should be on one line)


The query I fire off (which worked on the 1.4 instance) is:

/solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true



(please note the incorrect line wrapping - it should be on one line)

I expect a section like:
{
MGI:105369: {
symbol: [
emCbx/emem1/em
],
marker_symbol: [
emCbx/emem1/em
]
}
}


I get:
{
MGI:105369: { }
}


Can anyone help?

Thanks








--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


Cannot get highlighting to work

2012-05-31 Thread Asfand Qazi

Hello,

I am having problems doing highlighting a Solr 3.6 instance, while it 
was working just fine before on our 1.4 instance.


The solrconfig.xml and schema.xml files are located here:

https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml

(please note the incorrect line wrapping - it should be on one line)


https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml

(please note the incorrect line wrapping - it should be on one line)


The query I fire off (which worked on the 1.4 instance) is:

/solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true

(please note the incorrect line wrapping - it should be on one line)

I expect a section like:
{
  MGI:105369: {
symbol: [
  emCbx/emem1/em
],
marker_symbol: [
  emCbx/emem1/em
]
  }
}


I get:
{
  MGI:105369: { }
}


Can anyone help?

Thanks


--
Regards,
  Asfand Yar Qazi
  Team 87 - High Throughput Gene Targeting
  Wellcome Trust Sanger Institute


--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.