Re: Having trouble getting boosting queries to work with multiple terms
Several ideas there, thanks guys. I'll re-evaluate how I build and use the index from now on. Asfand Qazi On 16/10/12 17:35, Tomás Fernández Löbbe wrote: I think sorting should work too, as I suggested before. In this case (because by coincidence you need alphabetic sort on the type) sort=type asc, score desc should work. If you need to add other types, maybe add an int field that represents how you would like those to be sorted. between types, the regular score will be used for sorting. Tomás On Tue, Oct 16, 2012 at 1:18 PM, Walter Underwood wun...@wunderwood.orgwrote: Here is an approach that avoids the IDF problem. Add another field, perhaps named priority. In that field, put a boost value, like 100 for allele docs, 10 for mi_attempt docs, and so on. In the boost part of the query, use the value of that field boost=priority. If you cannot change the index, you may be able do the same thing with if statements in the function query, see http://wiki.apache.org/solr/FunctionQuery This is a common design request, to show all results of type A before all results of type B, and it has a common and severe problem. If your query term is common, the user will see 10,000 hits of type A, all the way to the least relevant, before they see the highly-relevant first hit of type B. So, the search is broken for all common query terms and there is nothing the user can do to fix it. Instead, use a smaller boost, maybe a bit more than a tiebreaker, but not enough to force a total ordering. You may also want to use facets or fixed filters, so that users can select only alleles or only my_attempts. wunder On Oct 16, 2012, at 8:21 AM, Asfand Qazi wrote: On 16/10/12 16:15, Walter Underwood wrote: Why do you want that ordering? That isn't what Solr is designed to do. It is designed for relevance. I expect that idf (the rarity of the terms) is being used in the ordering. mi_attempt is probably much more rare than allele. If you want that strict ordering, I recommend doing three queries and concatenating the three result sets. wunder I want that ordering because alleles are more 'important' to a biologist than an mi_attempt, which is 'more important' than a phenotype_attempt. If Solr isn't designed for this kind of stuff, then I will do the sorting manually after I have received all the documents. I could give huge boost values to each term, but then I guess I'm just using a sledgehammer to crack a nut. Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Walter Underwood wun...@wunderwood.org -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Having trouble getting boosting queries to work with multiple terms
Hello, The Solr server I am driving is found publicly at http://ikmc.vm.bytemark.co.uk:8983/solr/allele/search , it contains freely available information from science research establishments. It contains many documents, and I usually do is look up all documents where the 'mgi_accession_id' field matches what I want it to. This returns several documents, each one having a 'type' field. The value can be either 'allele', 'mi_attempt' or 'phenotype_attempt'. What I want to do is return all documents where the 'mgi_accession_id' matches what I want, and I want the documents ordered such that 'type:allele' docs are at the top, followed by 'type:mi_attempt' docs, followed last by 'type:phenotype_attempt' docs. Here is an example of a query I fire at it: http://ikmc.vm.bytemark.co.uk:8983/solr/allele/search?q=mgi_accession_id:MGI:1315204bq=type:allele^100 type:mi_attempt^10 type:phenotype_attempt^1 All the docs end up with the same score! I'm clearly doing something wrong, but what? Help is appreciated. Thanks in advance. -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Re: Having trouble getting boosting queries to work with multiple terms
Hi, thanks for the reply. I tried that: http://ikmc.vm.bytemark.co.uk:8983/solr/allele/search?q=mgi_accession_id:MGI:1315204bq=type:allele^100 OR type:mi_attempt^10 OR type:phenotype_attempt^1 (forgive the wrapping) and I got mi_attempt at the top, then the allele, then the phenotype_attempt . It should be allele first, then mi_attempt, then phenotype_attempt. You can replicate it with the above URL, it is a publicly available index. Thanks On 16/10/12 15:37, Tomás Fernández Löbbe wrote: you are missing the OR between the clauses of the bq. Try with: bq=type:allele^100 OR type:mi_attempt^10 OR type:phenotype_attempt^1 or set OR as your default operator in the schema.xml Tomás On Tue, Oct 16, 2012 at 10:37 AM, Asfand Qazi a...@sanger.ac.uk wrote: Hello, The Solr server I am driving is found publicly at http://ikmc.vm.bytemark.co.uk:**8983/solr/allele/searchhttp://ikmc.vm.bytemark.co.uk:8983/solr/allele/search, it contains freely available information from science research establishments. It contains many documents, and I usually do is look up all documents where the 'mgi_accession_id' field matches what I want it to. This returns several documents, each one having a 'type' field. The value can be either 'allele', 'mi_attempt' or 'phenotype_attempt'. What I want to do is return all documents where the 'mgi_accession_id' matches what I want, and I want the documents ordered such that 'type:allele' docs are at the top, followed by 'type:mi_attempt' docs, followed last by 'type:phenotype_attempt' docs. Here is an example of a query I fire at it: http://ikmc.vm.bytemark.co.uk:**8983/solr/allele/search?q=mgi_** accession_idhttp://ikmc.vm.bytemark.co.uk:8983/solr/allele/search?q=mgi_accession_id :MGI:1315204bq=**type:allele^100 type:mi_attempt^10 type:phenotype_attempt^1 All the docs end up with the same score! I'm clearly doing something wrong, but what? Help is appreciated. Thanks in advance. -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Re: Having trouble getting boosting queries to work with multiple terms
On 16/10/12 16:15, Walter Underwood wrote: Why do you want that ordering? That isn't what Solr is designed to do. It is designed for relevance. I expect that idf (the rarity of the terms) is being used in the ordering. mi_attempt is probably much more rare than allele. If you want that strict ordering, I recommend doing three queries and concatenating the three result sets. wunder I want that ordering because alleles are more 'important' to a biologist than an mi_attempt, which is 'more important' than a phenotype_attempt. If Solr isn't designed for this kind of stuff, then I will do the sorting manually after I have received all the documents. I could give huge boost values to each term, but then I guess I'm just using a sledgehammer to crack a nut. Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Possible to have Solr documents with deeply nested data structures (i.e. 'hashes within hashes')?
Hi, Is it possible to have a Solr documents with deeply nested data structures? e.g. (in JSON) { name: Fred, measurements: { chest: 15, legs: 32, ... } } ? Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Re: Possible to have Solr documents with deeply nested data structures (i.e. 'hashes within hashes')?
On 30/08/12 15:19, Jack Krupansky wrote: The general rule is that you need to flatten your data. So, you would have chest_measurement and leg_measurement fields. -- Jack Krupansky Ah. What if I cannot flatten it because I have an array of hashes? Thanks Asfand Yar Qazi -Original Message- From: Asfand Qazi Sent: Thursday, August 30, 2012 6:03 AM To: solr-user@lucene.apache.org Subject: Possible to have Solr documents with deeply nested data structures (i.e. 'hashes within hashes')? Hi, Is it possible to have a Solr documents with deeply nested data structures? e.g. (in JSON) { name: Fred, measurements: { chest: 15, legs: 32, ... } } ? Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Re: Possible to have Solr documents with deeply nested data structures (i.e. 'hashes within hashes')?
On 30/08/12 15:51, Alexandre Rafalovitch wrote: Don't treat SOLR as your primary database with complex structure, it is not built for that. On 30/08/12 16:01, Jack Krupansky wrote: Maybe start by focusing on what you expect that a user query will look like in Solr. Yeah, thanks guys - I'll remember that. Maybe I'm getting ahead of myself. The consensus around here seems to be to start using multiple cores to hold the different bits of a deeply nested record anyway, but I'll remember the flattening suggestions made. Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Re: Cannot get highlighting to work
On 31/05/12 21:10, Jack Krupansky wrote: Try a query that uses a term that doesn't split an alphanumeric term into two terms. Then check to see what field type you used for the symbol and marker_symbol fields and whether the analyzer for that field type has changed in 3.6. Aha - yes, not using number fields makes the highlighter work. The analyzer had been changed by another dev (helpfully) for the fields I was trying to highlight to solr.KeywordTokenizerFactory - I changed it back to solr.WhitespaceTokenizerFactory, as it was in the 1.4 config. With a lot of hope I tried to fire the same query, but the exact same thing happened - the highlighting for a document is an empty document (i.e. { } ) just like before. Any other clues? Thanks -- Jack Krupansky -Original Message- From: Asfand Qazi Sent: Thursday, May 31, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: Cannot get highlighting to work Hello, I am having problems doing highlighting a Solr 3.6 instance, while it was working just fine before on our 1.4 instance. The solrconfig.xml and schema.xml files are located here: https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml (please note the incorrect line wrapping - it should be on one line) https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml (please note the incorrect line wrapping - it should be on one line) The query I fire off (which worked on the 1.4 instance) is: /solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true (please note the incorrect line wrapping - it should be on one line) I expect a section like: { MGI:105369: { symbol: [ emCbx/emem1/em ], marker_symbol: [ emCbx/emem1/em ] } } I get: { MGI:105369: { } } Can anyone help? Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Re: Cannot get highlighting to work
Ah... on further inspection of the schema, I saw that the field type was a custom one that had been configured differently from the standard 'text' one. I simply got rid of the custom field type and set it back to text. Then as you said I reindexed the data (another blunder on my part before). Now it works! Thanks On 01/06/12 13:43, Jack Krupansky wrote: I got confused in the last paragraph - does a purely alphabetic term get highlighted properly or not? I am trying to figure out if the problem relates only to terms that decompose into phrases (as alphanumeric terms do) or for all terms. Thanks. If the analyzer changes, the data must be reindexed. -- Jack Krupansky -Original Message- From: Asfand Qazi Sent: Friday, June 01, 2012 5:08 AM To: solr-user@lucene.apache.org Subject: Re: Cannot get highlighting to work On 31/05/12 21:10, Jack Krupansky wrote: Try a query that uses a term that doesn't split an alphanumeric term into two terms. Then check to see what field type you used for the symbol and marker_symbol fields and whether the analyzer for that field type has changed in 3.6. Aha - yes, not using number fields makes the highlighter work. The analyzer had been changed by another dev (helpfully) for the fields I was trying to highlight to solr.KeywordTokenizerFactory - I changed it back to solr.WhitespaceTokenizerFactory, as it was in the 1.4 config. With a lot of hope I tried to fire the same query, but the exact same thing happened - the highlighting for a document is an empty document (i.e. { } ) just like before. Any other clues? Thanks -- Jack Krupansky -Original Message- From: Asfand Qazi Sent: Thursday, May 31, 2012 12:32 PM To: solr-user@lucene.apache.org Subject: Cannot get highlighting to work Hello, I am having problems doing highlighting a Solr 3.6 instance, while it was working just fine before on our 1.4 instance. The solrconfig.xml and schema.xml files are located here: https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml (please note the incorrect line wrapping - it should be on one line) https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml (please note the incorrect line wrapping - it should be on one line) The query I fire off (which worked on the 1.4 instance) is: /solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true (please note the incorrect line wrapping - it should be on one line) I expect a section like: { MGI:105369: { symbol: [ emCbx/emem1/em ], marker_symbol: [ emCbx/emem1/em ] } } I get: { MGI:105369: { } } Can anyone help? Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Cannot get highlighting to work
Hello, I am having problems doing highlighting a Solr 3.6 instance, while it was working just fine before on our 1.4 instance. The solrconfig.xml and schema.xml files are located here: https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/schema.xml (please note the incorrect line wrapping - it should be on one line) https://github.com/mpi2/mpi2_solr/blob/master/multicore/main/conf/solrconfig.xml (please note the incorrect line wrapping - it should be on one line) The query I fire off (which worked on the 1.4 instance) is: /solr/main/select?q=Cbx1wt=jsonhl=truehl.fl=*hl.usePhraseHighlighter=true (please note the incorrect line wrapping - it should be on one line) I expect a section like: { MGI:105369: { symbol: [ emCbx/emem1/em ], marker_symbol: [ emCbx/emem1/em ] } } I get: { MGI:105369: { } } Can anyone help? Thanks -- Regards, Asfand Yar Qazi Team 87 - High Throughput Gene Targeting Wellcome Trust Sanger Institute -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.