Faceting within groups
Hi, I am not sure if faceting with groups is supported, the documents do seem to suggest it works, but cant seem to get the intended results. str name=q(Amazon Cloud OR (IBM Cloud)/strstr name=group.fieldsourceId/strstr name=facet.fieldsentiment/strstr name=grouptrue/strstr name=group.facettrue/str Also, if it work's does solr cloud support it. Regards,Ayush
Re: Function query matching
But for your specific goal Peter: Yes, if the whole point of a function you have is to wrap generated a scaled score of your base $qq, ... Thanks for the confirmation, Chris. So, to do this efficiently, I think I need to implement a custom Collector that performs the scaling (and other math) after collecting the matching dismax query docs. I started a separate thread asking about the state of configurable collectors. Thanks, Peter On Sat, Dec 7, 2013 at 1:45 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: I had to do a double take when i read this sentence... : Even with any improvements to 'scale', all function queries will add a : linear increase to the Qtime as index size increases, since they match all : docs. ...because that smelled like either a bug in your methodology, or a bug in Solr. To convince myself there wasn't a bug in Solr, i wrote a test case (i'll commit tomorow, bunch of churn in svn right now making ant precommit unhappy) to prove that when wrapping boost functions arround queries, Solr will only evaluate the functions for docs matching the wrapped query -- so there is no linear increase as the index size increases, just the (neccessary) libera increase as the number of *matching* docs grows. (for most functions anyway -- as mentioned scale is special). BUT! ... then i remembered how this thread started, and your goal of scaling the scores from a wrapped query. I want to be clear for 99% of the people reading this, if you find yourself writting a query structure like this... q={!func}..functions involving wrapping $qq ... qq={!edismax ...lots of stuff but still only matching subset of the index...} fq={!query v=$qq} ...Try to restructure the match you want to do into the form of a multiplier q={!boost b=$b v=$qq} b=...functions producing a score multiplier... qq={!edismax ...lots of stuff but still only matching subset of the index...} Because the later case is much more efficient and Solr will only compute the function values for hte docs it needs to (that match the wrapped $qq query) But for your specific goal Peter: Yes, if the whole point of a function you have is to wrap generated a scaled score of your base $qq, then the function (wrapping the scale(), wrapping the query()) is going to have to be evaluated for every doc -- that will definitely be linear based on the size of the index. -Hoss http://www.lucidworks.com/
Re: [Spellcheck] NullPointerException on QueryComponent.mergeIds
James, Sorry for the late response. The shard.qt parameter actually solved my problem ! Thanks Jean-Marc 2013/11/12 Dyer, James james.d...@ingramcontent.com Jean-Marc, This might not solve the particular problem you're having, but to get spellcheck to work properly in a distributed enviornment, be sure to set the shards.qt parameter to the name of your request handler. See http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Jean-Marc Desprez [mailto:jm.desp...@gmail.com] Sent: Tuesday, November 12, 2013 8:57 AM To: solr-user@lucene.apache.org Subject: [Spellcheck] NullPointerException on QueryComponent.mergeIds Hello, I'm following this tutorial : http://wiki.apache.org/solr/SolrCloud with a SolR 4.5.0 I'm at the very first step, only two replica and two shard and I have only *one* document in the index. When I try to get a spellcheck, I have this error : java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843) I do not understand what I'm doing wrong and how I can get an error on mergeIds with only one document in the index (merge this doc with ... ??) Some technical details : URL : http://127.0.0.1:8983/solr/bench/select?shards.qt=ri_spell_fr_FRq=sistemdistrib=true If I set distrib to false, no error. My uniqueKey is indexed and stored : field name=ref type=string indexed=true stored=true multiValued=false / uniqueKeyref/uniqueKey My conf : requestHandler name=ri_spell_fr_FR class=solr.SearchHandler lazy=true lst name=defaults bool name=spellchecktrue/bool str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries3/str str name=spellcheck.collatetrue/str str name=spellcheck.count5/str str name=spellcheck.dictionaryri_spell_fr_FR/str str name=spellcheck.buildfalse/str /lst arr name=components strspellcheck_fr_FR/str /arr /requestHandler searchComponent name=spellcheck_fr_FR class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypesuggest_fr_FR/str lst name=spellchecker str name=nameri_spell_fr_FR/str str name=fieldspell_fr_FR/str str name=spellcheckIndexDir./spellchecker_fr_FR/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str /lst ... /searchComponent With this URL : http://127.0.0.1:8983/solr/bench/select?qt=ri_spell_fr_FRq=sistem I have no error but the response is empty : responselst name=responseHeaderint name=status0/intint name=QTime1/int/lst/response Thanks Jean-Marc
luke 4.6.0 released
Just released #luke https://plus.google.com/s/%23luke 4.6.0 for the latest Lucene 4.6.0: #luke https://plus.google.com/s/%23luke 4.6.0 https://github.com/DmitryKey/luke/releases/tag/4.6.0 -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: post filtering for boolean filter queries
On Thu, Dec 5, 2013 at 4:49 PM, Yonik Seeley yo...@heliosearch.com wrote: On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote: Thanks Erick! To be sure we are using cost 101 and no cache. It seems to affect on searches as we expected. Basically with cache on we see more fat spikes around commit points, as cache is getting flushed (we don't rerun too many entries from old cache). But when the post-filtering is involved, those spikes are thinner, but the rest of the queries take about 2 seconds longer (our queries are pretty heavy duty stuff). So the post-filtering gives an option of making trade-offs between query times for all users during normal execution and query times during commits. To rephrase we have 2 options: 1. Make all searches somewhat slower for all users and avoid really slow searches around commit points: post-filtering option OR 2. Make majority of searches really fast, but around commit points really slow: normal with cache option OR 3. Use warming queries or auto-warming of caches to make all searches fast but the commits themselves slow. thanks Yonik. This is indeeed what we have tried originally. But, as I have briefly described on the Dublin's Stump the Chump, auto-warming is way too long and does not complete within up to an hour. So the next commit kicks in and so on. So we opted for an external automatic warming. -Yonik http://heliosearch.com -- making solr shine -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: post filtering for boolean filter queries
bq. How slow is around commit points really slow? You could at least lessen the pain here by committing less often if you can stand the latency They are shamelessly slow, like 60-70 seconds. While normal searches are within 1-3 seconds range. And, yes. your idea is right and what we are pursuing: less commits. However we do have shards that are hot because we need to keep them that hot, i.e. we commit as often as data arrives. This is where the slow searches pop up. bq. Often users are more disturbed by getting (numbers from thin air) 2 second responses occasionally spiking to 20 seconds with an average of 3 seconds than getting all responses between 4 and 6 seconds with an average of 5. yes, I believe so too. So at the moment, the call for using post-filtering or cache is more or less for business folks to make. We have been looking into other things, like making our shards as small as possible. This a parallel route to making our cache efficient. Thanks, Dmitry On Thu, Dec 5, 2013 at 3:59 PM, Erick Erickson erickerick...@gmail.comwrote: bq: To be sure we are using cost 101 and no cache The guy who wrote the code is really good, but I'm paranoid too so I use 101. Based on the number of off-by-one errors I've coded :)... How slow is around commit points really slow? You could at least lessen the pain here by committing less often if you can stand the latency But otherwise you've pretty much nailed your options. One approach is to give users _predictable_ responses, not necessarily the best average. Often users are more disturbed by getting (numbers from thin air) 2 second responses occasionally spiking to 20 seconds with an average of 3 seconds than getting all responses between 4 and 6 seconds with an average of 5. FWIW, Erick On Thu, Dec 5, 2013 at 7:39 AM, Dmitry Kan solrexp...@gmail.com wrote: Thanks Erick! To be sure we are using cost 101 and no cache. It seems to affect on searches as we expected. Basically with cache on we see more fat spikes around commit points, as cache is getting flushed (we don't rerun too many entries from old cache). But when the post-filtering is involved, those spikes are thinner, but the rest of the queries take about 2 seconds longer (our queries are pretty heavy duty stuff). So the post-filtering gives an option of making trade-offs between query times for all users during normal execution and query times during commits. To rephrase we have 2 options: 1. Make all searches somewhat slower for all users and avoid really slow searches around commit points: post-filtering option OR 2. Make majority of searches really fast, but around commit points really slow: normal with cache option Dmitry On Wed, Dec 4, 2013 at 3:34 PM, Erick Erickson erickerick...@gmail.com wrote: OK, so cache=false and cost=100 should do it, see: http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ Best, Erick On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com wrote: Thanks Yonik. For our use case, we would like to skip caching only one particular filter cache, yet apply a high cost for it to make sure it executes last of all filter queries. So this means, the rest of the fqs will execute and cache as usual. On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com wrote: On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote: ok, we were able to confirm the behavior regarding not caching the filter query. It works as expected. It does not cache with {!cache=false}. We are still looking into clarifying the cost assignment: i.e. whether it works as expected for long boolean filter queries. Yes, filters should be ordered by cost (cheapest first) whenever you use {!cache=false} -Yonik http://heliosearch.com -- making solr shine -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
simple tokenizer question
Hi, I am new to solr and I guess this is a basic tokenizer question so please bear with me. I am trying to use SOLR to index a few (Indian) legal judgments in text form and search against them. One of the key points with these documents is that the sections/provisions of law usually have punctuation/special characters in them. For example search queries will TYPICALLY be section 12AA, section 80-IA, section 9(1)(vii) and the text of the judgments themselves will contain these sort of text with section references all over the place. Now, using a default schema setup with standardtokenizer, which seems to delimit on whitespace AND punctuations, I get really bad results because it looks like 12AA is split and results such having 12 and AA in them turn up. It becomes worse with 9(1)(vii) with results containing 9 and 1 etc being turned up. What is the best solution here? I really just want to index the document as-is and also to do whitespace tokenizing on the search and nothing more. So in other words: a) I would like the text document to be indexed as-is with say 12AA and 9(1)(vii) in the document stored as it is mentioned. b) I would like to be able to search for 12AA and for 9(1)(vii) and get proper full matches on them without any splitting up/munging etc. Any suggestions are appreciated. Thank you for your time. Thanks Vulcanoid
Re: Function query matching
(This is why i shouldn't send emails just before going to bed.) I woke up this morning realizing that of course I was completley wrong when i said this... : I want to be clear for 99% of the people reading this, if you find : yourself writting a query structure like this... : : q={!func}..functions involving wrapping $qq ... ... : ...Try to restructure the match you want to do into the form of a : multiplier ... : Because the later case is much more efficient and Solr will only compute : the function values for hte docs it needs to (that match the wrapped $qq : query) The reason i was wrong... Even though function queries do by default match all documents, and even if the main query is a function query (ie: q={!func}...), if there is an fq that filters down the set of documents, then the (main) function query will only be calculated for the documents that match the filter. It was trivial to ammend the test i mentioned last night to show this (and i feel silly for not doing that last night and stoping myself from saying something foolish)... https://svn.apache.org/viewvc?view=revisionrevision=r1548955 The bottom line for Peter is still the same: using scale() wrapped arround a function/query does involve a computing hte results for every document, and that is going to scale linearly as the size of hte index grows -- but it it is *only* because of the scale function. -Hoss http://www.lucidworks.com/
solr.xml
We are having issues with SWAP CoreAdmin in 4.5.1 and 4.6. Using legacy solr.xml we issue a SWAP, and we want it persistent. It has bee running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in 4.5.1 doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: simple tokenizer question
Have you tried a WhitespaceTokenizerFactory followed by the WordDelimiterFilterFactory? The latter is perhaps more configurable at what it does. Alternatively, you could use a RegexFilterFactory to remove extraneous punctuation that wasn't removed by the Whitespace Tokenizer. Upayavira On Sat, Dec 7, 2013, at 06:15 PM, Vulcanoid Developer wrote: Hi, I am new to solr and I guess this is a basic tokenizer question so please bear with me. I am trying to use SOLR to index a few (Indian) legal judgments in text form and search against them. One of the key points with these documents is that the sections/provisions of law usually have punctuation/special characters in them. For example search queries will TYPICALLY be section 12AA, section 80-IA, section 9(1)(vii) and the text of the judgments themselves will contain these sort of text with section references all over the place. Now, using a default schema setup with standardtokenizer, which seems to delimit on whitespace AND punctuations, I get really bad results because it looks like 12AA is split and results such having 12 and AA in them turn up. It becomes worse with 9(1)(vii) with results containing 9 and 1 etc being turned up. What is the best solution here? I really just want to index the document as-is and also to do whitespace tokenizing on the search and nothing more. So in other words: a) I would like the text document to be indexed as-is with say 12AA and 9(1)(vii) in the document stored as it is mentioned. b) I would like to be able to search for 12AA and for 9(1)(vii) and get proper full matches on them without any splitting up/munging etc. Any suggestions are appreciated. Thank you for your time. Thanks Vulcanoid
How to boost documents with all the query terms
Hi: I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't has all the query terms get ranked above other that contains all the terms in the search query. Using debugQuery I could see that the most part of the score in this cases come from the coord(q,d) factor. Is there any way I could boost the documents that contain all the search query terms? Greetings! III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: How to boost documents with all the query terms
Hi, Jorge, Here is a similar discussion : http://search-lucene.com/m/nK6t9j1fuc2/ On Sunday, December 8, 2013 2:48 AM, Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote: Hi: I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't has all the query terms get ranked above other that contains all the terms in the search query. Using debugQuery I could see that the most part of the score in this cases come from the coord(q,d) factor. Is there any way I could boost the documents that contain all the search query terms? Greetings! III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu