Re: block join and atomic updates
But isn't query time join much slower when it comes to a large amount of documents? Zitat von Mikhail Khludnev mkhlud...@griddynamics.com: Hello, It sounds like you need to switch to query time join. 15.02.2014 21:57 пользователь m...@preselect-media.com написал: Any suggestions? Zitat von m...@preselect-media.com: Yonik Seeley yo...@heliosearch.com: On Thu, Feb 13, 2014 at 8:25 AM, m...@preselect-media.com wrote: Is there any workaround to perform atomic updates on blocks or do I have to re-index the parent document and all its children always again if I want to update a field? The latter, unfortunately. Is there any plan to change this behavior in near future? So, I'm thinking of alternatives without loosing the benefit of block join. I try to explain an idea I just thought about: Let's say I have a parent document A with a number of fields I want to update regularly and a number of child documents AC_1 ... AC_n which are only indexed once and aren't going to change anymore. So, if I index A and AC_* in a block and I update A, the block is gone. But if I create an additional document AF which only contains something like an foreign key to A and indexing AF + AC_* as a block (not A + AC_* anymore), could I perform a {!parent ... } query on AF + AC_* and make an join from the results to get A? Does this makes any sense and is it even possible? ;-) And if it's possible, how can I do it? Thanks, - Moritz
RE: query parameters
It seams that fq doesn't except OR because: (organisations:(150 OR 41) AND roles:(174)) OR (-organisations:[ TO *] AND -roles:[ TO *]) only returns docs that match the first conditions. it doesn't return any docs with the empty fields organisations and roles. -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Montag, 17. Februar 2014 05:08 To: solr-user@lucene.apache.org Subject: query parameters in solrconfig of my solr 4.3 i have a userdefined requestHandler. i would like to use fq to force the following conditions: 1: organisations is empty and roles is empty 2: organisations contains one of the commadelimited list in variable $org 3: roles contains one of the commadelimited list in variable $r 4: rule 2 and 3 snipet of what i got (havent checked out if the is a in operator like in sql for the list value) lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=fq(organisations='' roles='') or (organisations=$org roles=$r) or (organisations='' roles=$r) or (organisations=$org roles='')/str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str !-- tested: now or newer or empty gets small boost -- str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
Re: Facet cache issue when deleting documents from the index
In the end the problem was actually in my code.. sorry for the noise. The documents were deleted from my database but not from the Solr index and I have a display filter that filters out search results that correspond to documents that don't exist any more in the database, but this filter doesn't update the facets. Thanks for the help, Marius On Mon, Feb 17, 2014 at 10:52 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: I tried to set the expungeDeletes flag but it didn't fix the problem. The SolrServer doesn't expose a way to set this flag so I had to use: new UpdateRequest().setAction(UpdateRequest.ACTION.COMMIT, true, true, 1, true).process(solrServer); Any other hints? Note that I managed to run my test in my real environment at runtime and it passed, so it seems the behaviour depends on the size of the documents that are committed (added to or deleted from the index). Thanks, Marius On Mon, Feb 17, 2014 at 2:32 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Also I noticed that in your code snippet you have server.delete(foo); which does not exists. deleteById and deleteByQuery methods are defined in SolrServer implementation. Yes, sorry, I have a wrapper over the SolrInstance that doesn't do much. In the case of delete it just forwards the call to deleteById. I'll check the expungeDeletes=true flag and post back the results. Thanks, Marius On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Marius, Facets are computed from indexed terms. Can you commit with expungeDeletes=true flag? Ahmet On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: Hi guys, I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is not invalidated when documents are deleted from the index. Sadly, for me, I cannot reproduce this issue with an integration test like this: --8-- SolrInstance server = getSolrInstance(); SolrInputDocument document = new SolrInputDocument(); document.setField(id, foo); document.setField(locale, en); server.add(document); server.commit(); document = new SolrInputDocument(); document.setField(id, bar); document.setField(locale, en); server.add(document); server.commit(); SolrQuery query = new SolrQuery(*:*); query.set(facet, on); query.set(facet.field, locale); QueryResponse response = server.query(query); Assert.assertEquals(2, response.getResults().size()); FacetField localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); Count en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(2, en.getCount()); server.delete(foo); server.commit(); response = server.query(query); Assert.assertEquals(1, response.getResults().size()); localeFacet = response.getFacetField(locale); Assert.assertEquals(1, localeFacet.getValues().size()); en = localeFacet.getValues().get(0); Assert.assertEquals(en, en.getName()); Assert.assertEquals(1, en.getCount()); --8-- Nevertheless, when I do the 'same' on my real environment, the count for the locale facet remains 2 after one of the documents is deleted. The search result count is fine, so that's why I think it's a facet cache issue. Note that the facet count remains 2 even after I restart the server, so the cache is persisted on the file system. Strangely, the facet count is updated correctly if I modify the document instead of deleting it (i.e. removing a keyword from the content so that it isn't matched by the search query any more). So it looks like only delete triggers the issue. Now, an interesting fact is that if, on my real environment, I delete one of the documents and then add a new one, the facet count becomes 3. So the last commit to the index, which inserts a new document, doesn't trigger a re-computation of the facet cache. The previous facet cache is simply incremented, so the error is perpetuated. At this point I don't even know how to fix the facet cache without deleting the Solr data folder so that the full index is rebuild. I'm still trying to figure out what is the difference between the integration test and my real environment (as I used the same schema and configuration). Do you know what might be wrong? Thanks, Marius
Re: query parameters
That could be because the second condition does not do what you think it does... have you tried running the second condition separately? You may have to add a base term to the second condition, like what you have for the bq parameter in your config file; i.e, something like (*:* -organisations:[ TO *] -roles:[ TO *]) On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote: It seams that fq doesn't except OR because: (organisations:(150 OR 41) AND roles:(174)) OR (-organisations:[ TO *] AND -roles:[ TO *]) only returns docs that match the first conditions. it doesn't return any docs with the empty fields organisations and roles. -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Montag, 17. Februar 2014 05:08 To: solr-user@lucene.apache.org Subject: query parameters in solrconfig of my solr 4.3 i have a userdefined requestHandler. i would like to use fq to force the following conditions: 1: organisations is empty and roles is empty 2: organisations contains one of the commadelimited list in variable $org 3: roles contains one of the commadelimited list in variable $r 4: rule 2 and 3 snipet of what i got (havent checked out if the is a in operator like in sql for the list value) lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=fq(organisations='' roles='') or (organisations=$org roles=$r) or (organisations='' roles=$r) or (organisations=$org roles='')/str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str !-- tested: now or newer or empty gets small boost -- str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
Re: block join and atomic updates
absolutely. On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote: But isn't query time join much slower when it comes to a large amount of documents? Zitat von Mikhail Khludnev mkhlud...@griddynamics.com: Hello, It sounds like you need to switch to query time join. 15.02.2014 21:57 пользователь m...@preselect-media.com написал: Any suggestions? Zitat von m...@preselect-media.com: Yonik Seeley yo...@heliosearch.com: On Thu, Feb 13, 2014 at 8:25 AM, m...@preselect-media.com wrote: Is there any workaround to perform atomic updates on blocks or do I have to re-index the parent document and all its children always again if I want to update a field? The latter, unfortunately. Is there any plan to change this behavior in near future? So, I'm thinking of alternatives without loosing the benefit of block join. I try to explain an idea I just thought about: Let's say I have a parent document A with a number of fields I want to update regularly and a number of child documents AC_1 ... AC_n which are only indexed once and aren't going to change anymore. So, if I index A and AC_* in a block and I update A, the block is gone. But if I create an additional document AF which only contains something like an foreign key to A and indexing AF + AC_* as a block (not A + AC_* anymore), could I perform a {!parent ... } query on AF + AC_* and make an join from the results to get A? Does this makes any sense and is it even possible? ;-) And if it's possible, how can I do it? Thanks, - Moritz -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Indexed a new big database while the old is running?
Dear Solr Users, We have actually a SOLR db with around 88 000 000 docs. All work fine :) We receive each year a new backfile with the same content (but improved). Index these docs takes several days on SOLR, So is it possible to create a new collection (restart SOLR) and Index these new 88 000 000 docs without stopping the current collection ? We have around 1 million connections by month. Do you think that this new indexation may cause problem to SOLR using? Note: new database will not be used until the current collection will be stopped. Thx for your comment, Bruno
Fault Tolerant Technique of Solr Cloud
Hi All, I want to have clear idea about the Fault Tolerant Capability of SolrCloud Considering I have setup the SolrCloud with a external Zookeeper, 2 shards, each having a replica with single collection as given in the official Solr Documentation. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud *Collection1* /\ /\ /\ /\ /\ / \ *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. Regards
Re: Best way to copy data from SolrCloud to standalone Solr?
Hi Shawn, Michael, Many thanks for your responses - we're going to try the replication/backup command, as we're thinking this is a 'two bird with one stone' approach which will not only allow us to copy the indexes, but also help with backups in SolrCloud as well. Thanks again to you both! Best wishes, Daniel On 17/02/2014 20:25, Michael Della Bitta wrote: I do know for certain that the backup command on a cloud core still works. We have a script like this running on a cron to snapshot indexes: curl -s ' http://localhost:8080/solr/#{core}/replication?command=backupnumberToKeep=4location=/tmp ' (not really using /tmp for this, parameters changed to protect the guilty) The admin handler for replication doesn't seem to be there, but the actual API seems to work normally. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey s...@elyograg.org wrote: On 2/17/2014 8:32 AM, Daniel Bryant wrote: I have a production SolrCloud server which has multiple sharded indexes, and I need to copy all of the indexes to a (non-cloud) Solr server within our QA environment. Can I ask for advice on the best way to do this please? I've searched the web and found solr2solr (https://github.com/dbashford/solr2solr), but the author states that this is best for small indexes, and ours are rather large at ~20Gb each. I've also looked at replication, but can't find a definite reference on how this should be done between SolrCloud and Solr? Any guidance is very much appreciated. If the master index isn't changing at the time of the copy, and you're on a non-Windows platform, you should be able to copy the index directory directly. On a Windows platform, whether you can copy the index while Solr is using it would depend on how Solr/Lucene opens the files. A typical Windows file open will prevent anything else from opening them, and I do not know whether Lucene is smarter than that. SolrCloud requires the replication handler to be enabled on all configs, but during normal operation, it does not actually use replication. This is a confusing thing for some users. I *think* you can configure the replication handler on slave cores with a non-cloud config that point at the master cores, and it should replicate the main Lucene index, but not the config files. I have no idea whether things will work right if you configure other master options like replicateAfter and config files, and I also don't know if those options might cause problems for SolrCloud itself. Those options shouldn't be necessary for just getting the data into a dev environment, though. Thanks, Shawn -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
Re: Increasing number of SolrIndexSearcher (Leakage)?
On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien tien.nguyenm...@gmail.com wrote: - *But after i index some docs and run softCommit or hardCommit with openSearcher=false, number of SolrIndexSearcher increase by 1* This is fine... it's more of an internal implementation detail (we open what is called a real-time searcher so we can drop some other data structures like the list of non-visible document updates, etc). If you did the commit again, the count should not continue to increase. If the number of searchers continues to increase, you have a searcher leak due to something else. Are you using any custom components or anything else that isn't stock Solr? -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr
RE: Boost Query Example
Hi Michael, Thanks for the information. Now I am trying with the query , but I am not getting the sequence in order. SKU with 223-CL10V3 lists first (Exact Match) ManfacturerNumber with 223-CL10V3 list Second (Exact Match) if first is available if not MangacturesNumber doc will be first in the list. SKU with 223-CL10V3* list third (Starts with the number if SKU or ManafactureNumeber not found then this will be first in Query. Can you check below query or rewrite the query or some help references? Below query not returning the way it should be.. http://localhost:8983/solr/SRSFR_ProductCollection/select?q=SKU:223-CL10V3^10%20OR%20ManufactureNumber:223-CL10V3^5%20 OR%20SKU:223-CL10V3*^1wt=jsonindent=true -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Monday, February 17, 2014 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Boost Query Example Hi, Filter queries don't affect score, so boosting won't have an effect there. If you want those query terms to get boosted, move them into the q parameter. http://wiki.apache.org/solr/CommonQueryParameters#fq Hope that helps! Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi can some one help me on the Boost Sort query example. http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonin dent=truefq=SKU:223-CL10V3^100 OR SKU:223-CL1^90 There is not different in the query Order, Let me know if I am missing something. Also I like to Order with the exact match for SKU:223-CL10V3^100 Thanks Ravi
Re: Boost Query Example
Add debugQuery=true to your queries and look at the scoring in the explain section. From the intermediate scoring by field, you should be able to do the math to figure out what boost would be required to rank your exact match high enough. -- Jack Krupansky -Original Message- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) Sent: Tuesday, February 18, 2014 9:50 AM To: solr-user@lucene.apache.org ; michael.della.bi...@appinions.com Subject: RE: Boost Query Example Hi Michael, Thanks for the information. Now I am trying with the query , but I am not getting the sequence in order. SKU with 223-CL10V3 lists first (Exact Match) ManfacturerNumber with 223-CL10V3 list Second (Exact Match) if first is available if not MangacturesNumber doc will be first in the list. SKU with 223-CL10V3* list third (Starts with the number if SKU or ManafactureNumeber not found then this will be first in Query. Can you check below query or rewrite the query or some help references? Below query not returning the way it should be.. http://localhost:8983/solr/SRSFR_ProductCollection/select?q=SKU:223-CL10V3^10%20OR%20ManufactureNumber:223-CL10V3^5%20 OR%20SKU:223-CL10V3*^1wt=jsonindent=true -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Monday, February 17, 2014 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Boost Query Example Hi, Filter queries don't affect score, so boosting won't have an effect there. If you want those query terms to get boosted, move them into the q parameter. http://wiki.apache.org/solr/CommonQueryParameters#fq Hope that helps! Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi can some one help me on the Boost Sort query example. http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonin dent=truefq=SKU:223-CL10V3^100 OR SKU:223-CL1^90 There is not different in the query Order, Let me know if I am missing something. Also I like to Order with the exact match for SKU:223-CL10V3^100 Thanks Ravi
Re: Limit amount of search result
You are welcome! On Mon, Feb 17, 2014 at 11:07 PM, rachun rachun.c...@gmail.com wrote: hi Samee, Thank you very much for your suggestion. Now I got it worked now;) Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/Limit-amount-of-search-result-tp4117062p4117952.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Boost Query Example
I am not much experience on this boosting, can you explain with an example? Really appreciated on you help. --Ravi -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Tuesday, February 18, 2014 9:58 AM To: solr-user@lucene.apache.org Subject: Re: Boost Query Example Add debugQuery=true to your queries and look at the scoring in the explain section. From the intermediate scoring by field, you should be able to do the math to figure out what boost would be required to rank your exact match high enough. -- Jack Krupansky -Original Message- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) Sent: Tuesday, February 18, 2014 9:50 AM To: solr-user@lucene.apache.org ; michael.della.bi...@appinions.com Subject: RE: Boost Query Example Hi Michael, Thanks for the information. Now I am trying with the query , but I am not getting the sequence in order. SKU with 223-CL10V3 lists first (Exact Match) ManfacturerNumber with 223-CL10V3 list Second (Exact Match) if first is available if not MangacturesNumber doc will be first in the list. SKU with 223-CL10V3* list third (Starts with the number if SKU or ManafactureNumeber not found then this will be first in Query. Can you check below query or rewrite the query or some help references? Below query not returning the way it should be.. http://localhost:8983/solr/SRSFR_ProductCollection/select?q=SKU:223-CL10V3^10%20OR%20ManufactureNumber:223-CL10V3^5%20 OR%20SKU:223-CL10V3*^1wt=jsonindent=true -Original Message- From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Sent: Monday, February 17, 2014 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Boost Query Example Hi, Filter queries don't affect score, so boosting won't have an effect there. If you want those query terms to get boosted, move them into the q parameter. http://wiki.apache.org/solr/CommonQueryParameters#fq Hope that helps! Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi can some one help me on the Boost Sort query example. http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonin dent=truefq=SKU:223-CL10V3^100 OR SKU:223-CL1^90 There is not different in the query Order, Let me know if I am missing something. Also I like to Order with the exact match for SKU:223-CL10V3^100 Thanks Ravi
Re: Indexed a new big database while the old is running?
On 2/18/2014 5:28 AM, Bruno Mannina wrote: We have actually a SOLR db with around 88 000 000 docs. All work fine :) We receive each year a new backfile with the same content (but improved). Index these docs takes several days on SOLR, So is it possible to create a new collection (restart SOLR) and Index these new 88 000 000 docs without stopping the current collection ? We have around 1 million connections by month. Do you think that this new indexation may cause problem to SOLR using? Note: new database will not be used until the current collection will be stopped. You can instantly switch between collections by using the alias feature. To do this, you would have collections named something like test201302 and test201402, then you would create an alias named 'test' that points to one of these collections. Your code can use 'test' as the collection name. Without a lot more information, it's impossible to say whether building a new collection will cause performance problems for the existing collection. It does seem like a problem that rebuilding the index takes several days. You might already be having performance problems. It's also possible that there's an aspect to this that I am not seeing, and that several days is perfectly normal for YOUR index. Not enough RAM is the most common reason for performance issues on a large index: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Fault Tolerant Technique of Solr Cloud
On 2/18/2014 6:05 AM, Vineet Mishra wrote: *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. What is the complete error you are getting? If you don't see the error in the response, you'll need to find your Solr Logfile and look for the error (including a large java stacktrace) there. Thanks, Shawn
Re: Fault Tolerant Technique of Solr Cloud
If localhost:8900 is down but localhost:8983 contain replica of the same shard(s) that 8900 was running, all data/documents are still available. You cannot query the shutdown server (port 8900), but you can query any of the other servers (8983, 7574 or 7500). If you make a distributed query to collection1 you should still be able to find all of your documents, even though 8900 is down. It is cumbersome to keep a list of crashed/shutdown servers, in order to make sure you are always querying a server that is not down. The information about what servers are running (and which are not) and which replica they run are all in ZooKeeper. So basically, just go look in ZooKeeper :-) Ahh, Solr has tool to help you do that - at least if you are running your client in java-code. Solr implement different kinds of clients (called XXXSolrServer - yes, obvious name for a client). There are HttpSolrServer that can do queries against a particular server (wont help you), there are LBHttpSolrServer that can do load-balancing over several HttpSolrServers (ahh, still not there), and there are CloudSolrServer that watches ZooKeeper in order to know what is running and where to send requests. CloudSolrServer uses LBHttpSolrServer behind the scenes. If you use CloudSolrServer as a client everything should be smooth and transparent with respect to querying when servers are down. CloudSolrServer will find out where to (and not to) route your requests. Regards, Per Steffensen On 18/02/14 14:05, Vineet Mishra wrote: Hi All, I want to have clear idea about the Fault Tolerant Capability of SolrCloud Considering I have setup the SolrCloud with a external Zookeeper, 2 shards, each having a replica with single collection as given in the official Solr Documentation. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud *Collection1* /\ /\ /\ /\ /\ / \ *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. Regards
Re: Fault Tolerant Technique of Solr Cloud
Solr will complaint only if you brought down both replica leader of same shard. It would be difficult to have highly available env. If you have less number of physical servers. Rgds AJ On 18-Feb-2014, at 18:35, Vineet Mishra clearmido...@gmail.com wrote: Hi All, I want to have clear idea about the Fault Tolerant Capability of SolrCloud Considering I have setup the SolrCloud with a external Zookeeper, 2 shards, each having a replica with single collection as given in the official Solr Documentation. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud *Collection1* /\ /\ /\ /\ /\ / \ *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. Regards
Re: Fault Tolerant Technique of Solr Cloud
On 2/18/2014 8:32 AM, Shawn Heisey wrote: On 2/18/2014 6:05 AM, Vineet Mishra wrote: *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. What is the complete error you are getting? If you don't see the error in the response, you'll need to find your Solr Logfile and look for the error (including a large java stacktrace) there. Good catch by Per. I did not notice that you were trying to send the query to the server that you took down. This isn't going to work -- if the software you're trying to reach is not running, it won't respond. Think about what happens if you are sending requests to a server and it crashes completely. If you want to always send to the same host/port, you will need a load balancer listening on that port. You'll also want something that maintains a shared IP address, so that if the machine dies, the IP address and the load balancer move to another machine. Haproxy and Pacemaker work very well as a combination for this. There are many other choices, both hardware and software. Per also mentioned the other option - you can write code that knows about multiple URLs and can switch between them. This is something you get for free with CloudSolrServer when writing Java code with SolrJ. Thanks, Shawn
Additive boost function
Hi Guys, I faced with a problem of additive boosting. 2 fields: last_name and first_name. User is searching for mike t Query: (last_name:mike^15 last_name:mike*^7 first_name:mike^10 first_name:mike*^5) AND (last_name:t^15 last_name:t*^7 first_name:t^10 first_name:*^5) The search result does not meet the expectations because score model includes others statics of searching terms on the SOLR index. According to scoring formula of DefaultSimilarity the result score is a multiplication. The question is how to implement additive score model based on my boost values ? -- View this message in context: http://lucene.472066.n3.nabble.com/Additive-boost-function-tp4118066.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Preventing multiple on-deck searchers without causing failed commits
On 02/17/2014 09:46 PM, Shawn Heisey wrote: I think I put too much information in my reply. Apologies. Here's the most important information to deal with first: Don't send hard commits at all. Configure autoCommit in your server config, with the all-important openSearcher parameter set to false. That will take care of all your hard commit needs, but those commits will never open a new searcher, so they cannot cause an overlap with the soft commits that DO open a new searcher. Thanks, Shawn I'll describe a bit more about our setup, so I can say why I don't think that'll work for us: * Our web servers send update requests to Solr via a background thread, so HTTP requests don't have to wait for the request to complete. * That background thread has a small chance of failing. If it does, the update request won't happen until our hard commit job runs. * Other scheduled jobs can send update requests to Solr. Some jobs suppress this, because they do a lot of updating, instead relying on the hard commit job. * The hard commit job does a batch of updates, waits for the commit to complete, then sets some flags in our database to indicate that the content has been successfully indexed. It's that last point that leads us to want to do explicit hard commits. By setting those flags in our database, we're assuring ourselves that, no matter if any other steps failed along the way, we're absolutely sure the content was indexed properly. If there's no other way to do this, I'm okay with filing an RFE in JIRA and continuing to ignore the multiple on-deck searchers warning for now.
Re: Best way to copy data from SolrCloud to standalone Solr?
There's a related issue: SOLR-5340 - Add support for named snapshots. I think we'd want this in SolrCloud soon. https://issues.apache.org/jira/browse/SOLR-5340 On Tue, Feb 18, 2014 at 7:23 PM, Daniel Bryant daniel.bry...@tai-dev.co.uk wrote: Hi Shawn, Michael, Many thanks for your responses - we're going to try the replication/backup command, as we're thinking this is a 'two bird with one stone' approach which will not only allow us to copy the indexes, but also help with backups in SolrCloud as well. Thanks again to you both! Best wishes, Daniel On 17/02/2014 20:25, Michael Della Bitta wrote: I do know for certain that the backup command on a cloud core still works. We have a script like this running on a cron to snapshot indexes: curl -s ' http://localhost:8080/solr/#{core}/replication?command=backupnumberToKeep=4location=/tmp ' (not really using /tmp for this, parameters changed to protect the guilty) The admin handler for replication doesn't seem to be there, but the actual API seems to work normally. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey s...@elyograg.org wrote: On 2/17/2014 8:32 AM, Daniel Bryant wrote: I have a production SolrCloud server which has multiple sharded indexes, and I need to copy all of the indexes to a (non-cloud) Solr server within our QA environment. Can I ask for advice on the best way to do this please? I've searched the web and found solr2solr (https://github.com/dbashford/solr2solr), but the author states that this is best for small indexes, and ours are rather large at ~20Gb each. I've also looked at replication, but can't find a definite reference on how this should be done between SolrCloud and Solr? Any guidance is very much appreciated. If the master index isn't changing at the time of the copy, and you're on a non-Windows platform, you should be able to copy the index directory directly. On a Windows platform, whether you can copy the index while Solr is using it would depend on how Solr/Lucene opens the files. A typical Windows file open will prevent anything else from opening them, and I do not know whether Lucene is smarter than that. SolrCloud requires the replication handler to be enabled on all configs, but during normal operation, it does not actually use replication. This is a confusing thing for some users. I *think* you can configure the replication handler on slave cores with a non-cloud config that point at the master cores, and it should replicate the main Lucene index, but not the config files. I have no idea whether things will work right if you configure other master options like replicateAfter and config files, and I also don't know if those options might cause problems for SolrCloud itself. Those options shouldn't be necessary for just getting the data into a dev environment, though. Thanks, Shawn -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk -- Regards, Shalin Shekhar Mangar.
Re: Preventing multiple on-deck searchers without causing failed commits
On 2/18/2014 10:59 AM, Colin Bartolome wrote: I'll describe a bit more about our setup, so I can say why I don't think that'll work for us: * Our web servers send update requests to Solr via a background thread, so HTTP requests don't have to wait for the request to complete. * That background thread has a small chance of failing. If it does, the update request won't happen until our hard commit job runs. * Other scheduled jobs can send update requests to Solr. Some jobs suppress this, because they do a lot of updating, instead relying on the hard commit job. * The hard commit job does a batch of updates, waits for the commit to complete, then sets some flags in our database to indicate that the content has been successfully indexed. It's that last point that leads us to want to do explicit hard commits. By setting those flags in our database, we're assuring ourselves that, no matter if any other steps failed along the way, we're absolutely sure the content was indexed properly. If you want to be completely in control like that, get rid of the automatic soft commits and just do the hard commits. I would personally choose another option for your setup -- get rid of *all* explicit commits entirely, and just configure autoCommit and autoSoftCommit in the server config. Since you're running 4.x, you really should have the transaction log (updateLog in the config) enabled. You can rely on the transaction log to replay updates since the last hard commit if there's ever a crash. I would also recommend upgrading to 4.6.1, but that's a completely separate item. Thanks, Shawn
Re: Solr Autosuggest - Strange issue with leading numbers in query
Thanks a lot for your response Erik. I was trying to find if I have any suggestion starting with numbers using terms component but I couldn't find any.. Its very strange!!! Anyways, thanks again for your response. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4118072.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Suggester not working in sharding (distributed search)
Try this http://solr:8983/solr/select?*q=*:**spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=/spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr -- View this message in context: http://lucene.472066.n3.nabble.com/using-distributed-search-with-the-suggest-component-tp3197651p4118075.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Additive boost function
The edismax query parser bf parameter gives you an additive boost. See: http://wiki.apache.org/solr/ExtendedDisMax#bf_.28Boost_Function.2C_additive.29 -- Jack Krupansky -Original Message- From: Zwer Sent: Tuesday, February 18, 2014 12:52 PM To: solr-user@lucene.apache.org Subject: Additive boost function Hi Guys, I faced with a problem of additive boosting. 2 fields: last_name and first_name. User is searching for mike t Query: (last_name:mike^15 last_name:mike*^7 first_name:mike^10 first_name:mike*^5) AND (last_name:t^15 last_name:t*^7 first_name:t^10 first_name:*^5) The search result does not meet the expectations because score model includes others statics of searching terms on the SOLR index. According to scoring formula of DefaultSimilarity the result score is a multiplication. The question is how to implement additive score model based on my boost values ? -- View this message in context: http://lucene.472066.n3.nabble.com/Additive-boost-function-tp4118066.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Preventing multiple on-deck searchers without causing failed commits
On 02/18/2014 10:15 AM, Shawn Heisey wrote: If you want to be completely in control like that, get rid of the automatic soft commits and just do the hard commits. I would personally choose another option for your setup -- get rid of *all* explicit commits entirely, and just configure autoCommit and autoSoftCommit in the server config. Since you're running 4.x, you really should have the transaction log (updateLog in the config) enabled. You can rely on the transaction log to replay updates since the last hard commit if there's ever a crash. I would also recommend upgrading to 4.6.1, but that's a completely separate item. Thanks, Shawn We use the automatic soft commits to get search index updates to our users faster, via Near Realtime Searching. We have the updateLog enabled. I'm not worried that the Solr side of the equation will lose data; I'm worried that the communication from our web servers and scheduled jobs to the Solr servers will break down and nothing will come along to make sure everything is up to date. It sounds like what we're picturing is not currently supported, so I'll file the RFE. Will upgrading to 4.6.1 help at all with this issue?
JOB @ Sematext: Professional Services Lead = Head
Hello, We have what I think is a great opening at Sematext. Ideal candidate would be in New York, but that's not an absolute must. More info below + on http://sematext.com/about/jobs.html in job-ad-speak, but I'd be happy to describe what we are looking for, what we do, and what types of companies we work with in regular-human-speak off-line. DESCRIPTION Sematext is hiring a technical, hands-onProfessional Services Lead to join, lead, and grow the Professional Services side of Sematext and potentially grow into the Head role. REQUIREMENTS * Experience working with Solr or Elasticsearch * Plan and coordinate customer engagements from business and technical perspective * Identify customer pain points, needs, and success criteria at the onset of each engagement * Provide expert-level consulting and support services and strive to be a trustworthy advisor to a wide range of customers * Resolve complex search issues involving Solr or Elasticsearch * Identify opportunities to provide customers with additional value through our products or services * Communicate high-value use cases and customer feedback to our Product teams * Participate in open source community by contributing bug fixes, improvements, answering questions, etc. EXPERIENCE * BS or higher in Engineering or Computer Science preferred * 2 or more years of IT Consulting and/or Professional Services experience required * Exposure to other related open source projects (Hadoop, Nutch, Kafka, Storm, Mahout, etc.) a plus * Experience with other commercial and open source search technologies a plus * Enterprise Search, eCommerce, and/or Business Intelligence experience a plus * Experience working in a startup a plus Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: Slow 95th-percentile
Thanks for the suggestions. I was thinking GC too, but it doesn’t feel like it is. Running jstat -gcutil only shows a 10-50ms parnew collection every 10 or 15 seconds and almost no full CMS collections. Anything other places to look for GC activity I might be missing? I did a little investigation this morning and found that if I run a query once a second, every 10th query is slow. Looks suspiciously like the soft commits are causing the slow downs. I could make it further in between. Anything else I can look at to make those commits less costly? Here are the java options: -server -XX:+AggressiveOpts -XX:+UseCompressedOops -Xmx3G -Xms3G -Xss256k -XX:MaxPermSize=128m -XX:PermSize=96m -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=6 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Xloggc:/var/log/tomcat7/gc-tomcat.log -verbose:gc -XX:GCLogFileSize=10M -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram -XX:+PrintTenuringDistribution -XX:-PrintGCApplicationStoppedTime -DzkHost=xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181/solr -Dcom.sun.management.jmxremote -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/usr/share/tomcat7/endorsed I’m using tomcat, though I’ve heard that jetty can be a better choice. I’ve also attached my solrconfig. -Allan On February 17, 2014 at 6:06:03 PM, Shawn Heisey (s...@elyograg.org) wrote: On 2/17/2014 6:12 PM, Allan Carroll wrote: I'm having trouble getting my Solr setup to get consistent performance. Average select latency is great, but 95% is dismal (10x average). It's probably something slightly misconfigured. I’ve seen it have nice, low variance latencies for a few hours here and there, but can’t figure out what’s different during those times. * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 150 updates per second. * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K docs. Stays around 30 segments. * Soft commits after 10 seconds, hard commits after 120 seconds. Though, turning off the update traffic doesn’t seem to have any affect on the select latencies. * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is around 3ms per second. Here’s a typical select query: fl=*,sortScore:textScoresort=textScore descstart=0q=text:((soccer OR MLS OR premier league OR FIFA OR world cup) OR (sorority OR fraternity OR greek life OR dorm OR campus))wt=jsonfq=startTime:[139265640 TO 139271754]fq={!frange l=2 u=3}timeflag(startTime)fq={!frange l=139265640 u=139269594 cache=false}timefix(startTime,-2160)fq=privacy:OPENdefType=edismaxrows=131 The first thing to say is that it's fairly normal for the 95th and 99th percentile values to be quite a lot higher than the median and average values. I don't have actual values so I don't know if it's bad or not. You're good on the most important performance-related resource, which is memory for the OS disk cache. The only thing that stands out as a possible problem from what I know so far is garbage collection. It might be a case of full garbage collections happening too frequently, or it might be a case of garbage collection pauses taking too long. It might even be a combination of both. To fix frequent full collections, increase the heap size. To fix the other problem, use the CMS collector and tune it. Two bits of information will help with recommendations: Your java startup options, and your solrconfig.xml. You're using an option in your query that I've never seen before. I don't know if frange is slow or not. One last thing that might cause problems is super-frequent commits. I could also be completely wrong! Thanks, Shawn
Caching Solr boost functions?
We're testing out a new handler that uses edismax with three different boost functions. One has a random() function in it, so is not very cacheable, but the other two boost functions do not change from query to query. I'd like to tell Solr to cache those boost queries for the life of the Searcher so they don't get recomputed every time. Is there any way to do that out of the box? In a different custom QParser we have we wrote a CachingValueSource that wrapped a ValueSource with a custom ValueSource cache. Would it make sense to implement that as a standard Solr function so that one could do: boost=cache(expensiveFunctionQuery()) Thanks. --Gregg
RE: query parameters
I tried it in solr admin query and it showed me all the docs without a value in ogranisations and roles. It didn't matter if i used a base term, isn't that give through the q-parameter? -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Dienstag, 18. Februar 2014 13:19 To: solr-user@lucene.apache.org Subject: Re: query parameters That could be because the second condition does not do what you think it does... have you tried running the second condition separately? You may have to add a base term to the second condition, like what you have for the bq parameter in your config file; i.e, something like (*:* -organisations:[ TO *] -roles:[ TO *]) On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote: It seams that fq doesn't except OR because: (organisations:(150 OR 41) AND roles:(174)) OR (-organisations:[ TO *] AND -roles:[ TO *]) only returns docs that match the first conditions. it doesn't return any docs with the empty fields organisations and roles. -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Montag, 17. Februar 2014 05:08 To: solr-user@lucene.apache.org Subject: query parameters in solrconfig of my solr 4.3 i have a userdefined requestHandler. i would like to use fq to force the following conditions: 1: organisations is empty and roles is empty 2: organisations contains one of the commadelimited list in variable $org 3: roles contains one of the commadelimited list in variable $r 4: rule 2 and 3 snipet of what i got (havent checked out if the is a in operator like in sql for the list value) lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=fq(organisations='' roles='') or (organisations=$org roles=$r) or (organisations='' roles=$r) or (organisations=$org roles='')/str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str !-- tested: now or newer or empty gets small boost -- str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
Using payloads for expanded query terms
Hello, I'm trying to handle a situation with taxonomy search - that is for each taxonomy I have a list of words with their boosts. These taxonomies are updated frequently so I retrieve these scored lists at query time from an external service. My expectation would be: q={!some_query_parser}Cities_France OR Cities_England = q=max(Paris^0.5 Lyon^0.4 La Defense^0.3) OR max(London^0.5, Oxford^4) Implementations possibilities I thought about: 1. An adapted synonym filter, where query term boosts are encoded as payloads. 2. Query parser that handles the term expansion and weighting. The main drawback is the fact it forces me to stick to my own query parser. 3. Building the query outside Solr. What would you recommand? Thanks, Manuel
Re: Slow 95th-percentile
On 2/18/2014 11:51 AM, Allan Carroll wrote: I was thinking GC too, but it doesn’t feel like it is. Running jstat -gcutil only shows a 10-50ms parnew collection every 10 or 15 seconds and almost no full CMS collections. Anything other places to look for GC activity I might be missing? I did a little investigation this morning and found that if I run a query once a second, every 10th query is slow. Looks suspiciously like the soft commits are causing the slow downs. I could make it further in between. Anything else I can look at to make those commits less costly? It does indeed sound like the 10 second soft commit is the problem. The opening a new searcher part of a commit tends to be fairly expensive. The impact is even greater when combined with flushing data to disk, which is why soft commits can be faster than hard commits ... but building a new searcher is not cheap even then. Do you have autoCommit configured, with openSearcher=false? If not, you should. If you are using Solr caches, reducing (or eliminating) the autowarmCount values on each cache (particularly the filterCache) can make commits happen quite a lot faster. With a commit potentially happening every ten seconds, you might want to configure those caches so they are pretty small. Frequent commits mean that the caches are frequently invalidated. If commit frequency is high and autowarmCount values are low, a large cache is just a waste of memory. The cache config was the main thing I was interested in seeing when I asked for solrconfig.xml. You have a lot of GC tuning going on, which is good - untuned GC and Solr do NOT get along. I'll just show you what I use and let you make your own decision. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Thanks, Shawn
RE: Solr4 performance
Hi, Thanks much for all suggestions. We're looking into reducing allocated heap size of Solr4 JVM. We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? Can someone please confirm? Would optimization help with performance? We did that in QA (took about 13 hours for 700 mil documents) Thanks! -Original Message- From: Roman Chyla [mailto:roman.ch...@gmail.com] Sent: Wednesday, February 12, 2014 3:17 PM To: solr-user@lucene.apache.org Subject: Re: Solr4 performance And perhaps one other, but very pertinent, recommendation is: allocate only as little heap as is necessary. By allocating more, you are working against the OS caching. To know how much is enough is bit tricky, though. Best, roman On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey s...@elyograg.org wrote: On 2/12/2014 12:07 PM, Greg Walters wrote: Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory- on-64bit.html as it's a pretty decent explanation of memory mapped files. I don't believe that the default configuration for solr is to use MMapDirectory but even if it does my understanding is that the entire file won't be forcibly cached by solr. The OS's filesystem cache should control what's actually in ram and the eviction process will depend on the OS. I only have a little bit to add. Here's the first thing that Uwe's blog post (linked above) says: Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default on 64bit Windows and Solaris systems; since version 3.3 also for 64bit Linux systems. The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory by default under the hood. A summary about all this that should be relevant to the original question: It's the *operating system* that handles memory mapping, including any caching that happens. Assuming that you don't have a badly configured virtual machine setup, I'm fairly sure that only real memory gets used, never swap space on the disk. If something else on the system makes a memory allocation, the operating system will instantly give up memory used for caching and mapping. One of the strengths of mmap is that it can't exceed available resources unless it's used incorrectly. Thanks, Shawn
Re: Solr4 performance
On 2/18/2014 2:14 PM, Joshi, Shital wrote: Thanks much for all suggestions. We're looking into reducing allocated heap size of Solr4 JVM. We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? Can someone please confirm? In Solr, NRTCachingDirectory does indeed use MMapDirectory as its default delegate. That's probably also the case with Lucene -- these are Lucene classes, after all. MMapDirectory is almost always the most efficient way to handle on-disk indexes. Thanks, Shawn
Cluster state ranges are all null after reboot
We've got a 15 shard cluster spread across 3 hosts. This morning our puppet software rebooted them all and afterwards the 'range' for each shard has become null in zookeeper. Is there any way to restore this value short of rebuilding a fresh index? I've read various questions from people with a similar problem, although in those cases it is usually a single shard that has become null allowing them to infer what the value should be and manually fix it in ZK. In this case I have no idea what the ranges should be. This is our test cluster, and checking production I can see that the ranges don't appear to be predictable based on the shard number. I'm also not certain why it even occurred. Our test cluster only has a single replica per shard, so when a JVM is rebooted the cluster is unavailable... would that cause this? Production has 3 replicas so we can do rolling reboots.
SOLR Suggester - return matched suggestion along with other suggestions
Hi, Is there a way to make suggester return the matched suggestion too? http://localhost:8983/solr/core1/suggest?q=name:iphone The above query should return *iphone * iphone5c iphone4g Currently it returns only iphone5c iphone4g I can use edge N gram filter to implement the above feature but not sure how to achieve it when using suggester. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Suggester-return-matched-suggestion-along-with-other-suggestions-tp4118132.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrJ 3.4 Client compatible with Solr 4.6 Server?
I'm in the process of updating from Solr 3.4 to Solr 4.6. Is the SolrJ 3.4 Client forward compatible with Solr 4.6? This isn't mentioned in the documentation http://wiki.apache.org/solr/javabin page. In a test environment, I did some indexing and querying with a SolrJ3.4 Client and a Solr4.6 server and there were no errors. I'm using the javabin format for updates and sharded queries. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-3-4-Client-compatible-with-Solr-4-6-Server-tp4118134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Escape \\n from getting highlighted - highlighter component
Your search expression means 'talk' OR 'n' OR 'text'. I think you want to do a phrase search. To do that, quote the whole thing with double-quotes talk n text, if you are using one of the Solr standard query parsers. On 02/17/2014 03:53 PM, Developer wrote: Hi, When searching for a text like 'talk n text' the highlighter component also adds the em tags to the special characters like \n. Is there a way to avoid highlighting the special characters? \\r\\n Family Messaging is getting replaced as \\r\\emn/em Family Messaging Kuro
Re: Slow 95th-percentile
Slowing the soft commits to every 100 seconds helped. The main culprit was a bad query that was coming through every few seconds. Something about the empty fq param and the q=* slowed everything else down. INFO: [event] webapp=/solr path=/select params={start=0q=*wt=javabinfq=fq=startTime:139283643version=2} hits=1894 status=0 QTime=6943 Thanks for all your help. -Allan On February 18, 2014 at 12:24:37 PM, Shawn Heisey (s...@elyograg.org) wrote: On 2/18/2014 11:51 AM, Allan Carroll wrote: I was thinking GC too, but it doesn’t feel like it is. Running jstat -gcutil only shows a 10-50ms parnew collection every 10 or 15 seconds and almost no full CMS collections. Anything other places to look for GC activity I might be missing? I did a little investigation this morning and found that if I run a query once a second, every 10th query is slow. Looks suspiciously like the soft commits are causing the slow downs. I could make it further in between. Anything else I can look at to make those commits less costly? It does indeed sound like the 10 second soft commit is the problem. The opening a new searcher part of a commit tends to be fairly expensive. The impact is even greater when combined with flushing data to disk, which is why soft commits can be faster than hard commits ... but building a new searcher is not cheap even then. Do you have autoCommit configured, with openSearcher=false? If not, you should. If you are using Solr caches, reducing (or eliminating) the autowarmCount values on each cache (particularly the filterCache) can make commits happen quite a lot faster. With a commit potentially happening every ten seconds, you might want to configure those caches so they are pretty small. Frequent commits mean that the caches are frequently invalidated. If commit frequency is high and autowarmCount values are low, a large cache is just a waste of memory. The cache config was the main thing I was interested in seeing when I asked for solrconfig.xml. You have a lot of GC tuning going on, which is good - untuned GC and Solr do NOT get along. I'll just show you what I use and let you make your own decision. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Thanks, Shawn
Re: SOLR Suggester - return matched suggestion along with other suggestions
Nevermind, I added a space to the end of all the field values (keywords) supplied to suggester and it works!!! iphone is indexed as iphone (with additional space at the end) I trim the value passed to the search after selection the keyword from dropdown suggestion so it will be again passed as iphone(without space) when querying SOLR. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Suggester-return-matched-suggestion-along-with-other-suggestions-tp4118132p4118137.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow 95th-percentile
: Slowing the soft commits to every 100 seconds helped. The main culprit : was a bad query that was coming through every few seconds. Something : about the empty fq param and the q=* slowed everything else down. : : INFO: [event] webapp=/solr path=/select : params={start=0q=*wt=javabinfq=fq=startTime:139283643version=2} : hits=1894 status=0 QTime=6943 1) if you are using Solr 4.1 or earlier, then q=* is an expensive useless query that doesn't mean what you think it does... https://issues.apache.org/jira/browse/SOLR-2996 2) an empty fq doesn't cost anything -- if you use debugQuery=true you should see that it's not even included in parsed_filter_queries because it's totally ignored. 3) if that startTime value changes at some fixed and regular interval, that could explain some anomoloies if it's normally the same and cached, but changes once a day/hour/minute or whatever and is a bit slow to cache. bottom line: a softCommit is going to re-open a searcher, which is going to wipe your caches. if you don't have any (auto)warming configured, that means any fqs, or qs that you run regularly are going to pay the price of being slow the first time they are run against a new searcher is opened. If your priority is low response time, you really want to open new searchers as infrequently as your SLA for visibility allows, and use (auto)warming for those common queries. -Hoss http://www.lucidworks.com/
Weird behavior of stopwords in search query
Hi, I'm observing a weird behavior while using stopwords as part of the search query. I'm able to replicate it in standalone Solr instance well. The issue pops up when I'm trying to use other and and stopword together in a query string. The query doesn't return any result. But it works with any other combination. For e.g. 1. query yields no result -- http://localhost:8983/solr/collection1/browse?q=AWS+other+and+SearchdebugQuery=truewt=xml Debug Query : str name=rawquerystringAWS other and Search/str str name=querystringAWS other and Search/strstr name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5/no_coord/str str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5) +(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5))/str 2. query yields result -- http://localhost:8983/solr/collection1/browse?q=AWS+other+an+SearchdebugQuery=truewt=xml Debug Query - str name=rawquerystringAWS other an Search/str str name=querystringAWS other an Search/strstr name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 | cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5/no_coord/str str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5) (id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5))/str Both other and and are part of the stopwords list. I ran an analysis on text_general field, both stopwords were shows as ignored during indexing and query time, but not happening during actual search. Not sure what I'm missing here, any pointers will be appreciated. - Thanks, Shamik
Re: SolrJ 3.4 Client compatible with Solr 4.6 Server?
On 2/18/2014 5:13 PM, Lan wrote: I'm in the process of updating from Solr 3.4 to Solr 4.6. Is the SolrJ 3.4 Client forward compatible with Solr 4.6? This isn't mentioned in the documentation http://wiki.apache.org/solr/javabin page. In a test environment, I did some indexing and querying with a SolrJ3.4 Client and a Solr4.6 server and there were no errors. I'm using the javabin format for updates and sharded queries. Almost everything you can do with the 3.x client will work without problems. If you're trying to do something unusual, you might have some trouble. Technically we don't recommend mixing versions, but I was running mixed versions for a number of months without problems. You mentioned javabin -- both versions of SolrJ utilize javabin for responses, but requests are still XML in SolrJ 3.x. You should avoid switching to BinaryRequestWriter until after you upgrade SolrJ, because the 3.x client will try to use a different URL path for binary update requests, one that is not compatible with a typical 4.x configuration. Are you in a position where you can make quick changes to the code and recompile? If you are, I can definitely help you work through any problems. I can't make promises about others, but I'm sure I'm not the only one willing to help. There are a fair number of jarfile changes required to upgrade SolrJ, but the number of required code changes is usually small. Upgrading SolrJ should be fairly high on your priority list, especially if you plan to utilize SolrCloud. Thanks, Shawn
Re: Preventing multiple on-deck searchers without causing failed commits
Colin: Stop. Back up. The automatic soft commits will make updates available to your users every second. Those documents _include_ anything from your hard commit jobs. What could be faster? Parenthetically I'll add that 1 second soft commits are rarely an actual requirement, but that's your decision. For the hard commits. Fine. Do them if you insist. Just set openSearcher=false. The documents will be searchable the next time the soft commit happens, within one second. The key is openSearcher=false. That prevents starting a brand new searcher. BTW, your commits are not failing. It's just that _after_ the commit happens, the warming searcher limit is exceeded. You can even wait until the segments are flushed to disk. All without opening a searcher. Shawn is spot on in his recommendations to not fixate on the commits. Solr handles that. Here's a long blog about all the details of durability .vs. visibility. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ You're over-thinking the problem here, trying to control commits with a sledgehammer when you don't need to, just use the built-in capabilities. Best, Erick On Tue, Feb 18, 2014 at 10:33 AM, Colin Bartolome co...@e-e.com wrote: On 02/18/2014 10:15 AM, Shawn Heisey wrote: If you want to be completely in control like that, get rid of the automatic soft commits and just do the hard commits. I would personally choose another option for your setup -- get rid of *all* explicit commits entirely, and just configure autoCommit and autoSoftCommit in the server config. Since you're running 4.x, you really should have the transaction log (updateLog in the config) enabled. You can rely on the transaction log to replay updates since the last hard commit if there's ever a crash. I would also recommend upgrading to 4.6.1, but that's a completely separate item. Thanks, Shawn We use the automatic soft commits to get search index updates to our users faster, via Near Realtime Searching. We have the updateLog enabled. I'm not worried that the Solr side of the equation will lose data; I'm worried that the communication from our web servers and scheduled jobs to the Solr servers will break down and nothing will come along to make sure everything is up to date. It sounds like what we're picturing is not currently supported, so I'll file the RFE. Will upgrading to 4.6.1 help at all with this issue?
Re: query parameters
Solr/Lucene query language is NOT strictly boolean, see Chris's excellent blog here: http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/ Best, Erick On Tue, Feb 18, 2014 at 11:54 AM, Andreas Owen a...@conx.ch wrote: I tried it in solr admin query and it showed me all the docs without a value in ogranisations and roles. It didn't matter if i used a base term, isn't that give through the q-parameter? -Original Message- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Dienstag, 18. Februar 2014 13:19 To: solr-user@lucene.apache.org Subject: Re: query parameters That could be because the second condition does not do what you think it does... have you tried running the second condition separately? You may have to add a base term to the second condition, like what you have for the bq parameter in your config file; i.e, something like (*:* -organisations:[ TO *] -roles:[ TO *]) On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote: It seams that fq doesn't except OR because: (organisations:(150 OR 41) AND roles:(174)) OR (-organisations:[ TO *] AND -roles:[ TO *]) only returns docs that match the first conditions. it doesn't return any docs with the empty fields organisations and roles. -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Montag, 17. Februar 2014 05:08 To: solr-user@lucene.apache.org Subject: query parameters in solrconfig of my solr 4.3 i have a userdefined requestHandler. i would like to use fq to force the following conditions: 1: organisations is empty and roles is empty 2: organisations contains one of the commadelimited list in variable $org 3: roles contains one of the commadelimited list in variable $r 4: rule 2 and 3 snipet of what i got (havent checked out if the is a in operator like in sql for the list value) lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=fq(organisations='' roles='') or (organisations=$org roles=$r) or (organisations='' roles=$r) or (organisations=$org roles='')/str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str !-- tested: now or newer or empty gets small boost -- str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
Re: block join and atomic updates
Thinking in terms of normalized data in the context of a Lucene index is dangerous. It is not a relational data model technology, and the join behaviors available to you have limited use. Each approach requires compromises that are likely impermissible for certain uses cases. If it is at all reasonable to consider you will likely be best served de-normalizing the data. Of course, your specific details may prove an exception to this rule…but generally approach works very well. On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: absolutely. On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote: But isn't query time join much slower when it comes to a large amount of documents? Zitat von Mikhail Khludnev mkhlud...@griddynamics.com: Hello, It sounds like you need to switch to query time join. 15.02.2014 21:57 пользователь m...@preselect-media.com написал: Any suggestions? Zitat von m...@preselect-media.com: Yonik Seeley yo...@heliosearch.com: On Thu, Feb 13, 2014 at 8:25 AM, m...@preselect-media.com wrote: Is there any workaround to perform atomic updates on blocks or do I have to re-index the parent document and all its children always again if I want to update a field? The latter, unfortunately. Is there any plan to change this behavior in near future? So, I'm thinking of alternatives without loosing the benefit of block join. I try to explain an idea I just thought about: Let's say I have a parent document A with a number of fields I want to update regularly and a number of child documents AC_1 ... AC_n which are only indexed once and aren't going to change anymore. So, if I index A and AC_* in a block and I update A, the block is gone. But if I create an additional document AF which only contains something like an foreign key to A and indexing AF + AC_* as a block (not A + AC_* anymore), could I perform a {!parent ... } query on AF + AC_* and make an join from the results to get A? Does this makes any sense and is it even possible? ;-) And if it's possible, how can I do it? Thanks, - Moritz -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: block join and atomic updates
Listen to that advice. Denormalize, denormalize, denormalize. Think about the results page and work backwards from that. Flat data model. wunder Search guy at Infoseek, Inktomi, Verity, Autonomy, Netflix, and Chegg On Feb 18, 2014, at 7:37 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Thinking in terms of normalized data in the context of a Lucene index is dangerous. It is not a relational data model technology, and the join behaviors available to you have limited use. Each approach requires compromises that are likely impermissible for certain uses cases. If it is at all reasonable to consider you will likely be best served de-normalizing the data. Of course, your specific details may prove an exception to this rule…but generally approach works very well. On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: absolutely. On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote: But isn't query time join much slower when it comes to a large amount of documents? Zitat von Mikhail Khludnev mkhlud...@griddynamics.com: Hello, It sounds like you need to switch to query time join. 15.02.2014 21:57 пользователь m...@preselect-media.com написал: Any suggestions? Zitat von m...@preselect-media.com: Yonik Seeley yo...@heliosearch.com: On Thu, Feb 13, 2014 at 8:25 AM, m...@preselect-media.com wrote: Is there any workaround to perform atomic updates on blocks or do I have to re-index the parent document and all its children always again if I want to update a field? The latter, unfortunately. Is there any plan to change this behavior in near future? So, I'm thinking of alternatives without loosing the benefit of block join. I try to explain an idea I just thought about: Let's say I have a parent document A with a number of fields I want to update regularly and a number of child documents AC_1 ... AC_n which are only indexed once and aren't going to change anymore. So, if I index A and AC_* in a block and I update A, the block is gone. But if I create an additional document AF which only contains something like an foreign key to A and indexing AF + AC_* as a block (not A + AC_* anymore), could I perform a {!parent ... } query on AF + AC_* and make an join from the results to get A? Does this makes any sense and is it even possible? ;-) And if it's possible, how can I do it? Thanks, - Moritz -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Walter Underwood wun...@wunderwood.org
Re: Preventing multiple on-deck searchers without causing failed commits
Inline quoting ahead, sorry: Colin: Stop. Back up. The automatic soft commits will make updates available to your users every second. Those documents _include_ anything from your hard commit jobs. What could be faster? Parenthetically I'll add that 1 second soft commits are rarely an actual requirement, but that's your decision. The one-second commits not my decision, per se; it's the default value in solrconfig.xml and is also suggested as a common configuration in the Near Real Time Searching section of the reference guide. (Our users at Experts Exchange used to have to wait up to five minutes before the search index updated with the latest content. While switching to Solr, we saw that the recommended configuration would refresh the index in seconds, rather than minutes, and rejoiced. We'd rather not increase the latency too far to solve this problem.) For the hard commits. Fine. Do them if you insist. Just set openSearcher=false. The documents will be searchable the next time the soft commit happens, within one second. The key is openSearcher=false. That prevents starting a brand new searcher. Are you saying that the automatic soft commit will trigger, no matter what, even after our code has explicitly requested a hard commit? If so, that is, if the automatic soft commit triggers, even if no additional update requests have come in since the hard commit, then great! We'll do that! BTW, your commits are not failing. It's just that _after_ the commit happens, the warming searcher limit is exceeded. My commits may indeed be succeeding, but the server is returning a HTTP 503 response, which leads to SolrJ throwing a SolrServerException with the message No live SolrServers available to handle this request. Our code, understandably, interprets that as a failed request. This causes our job to abort and try again the next time it runs. You can even wait until the segments are flushed to disk. All without opening a searcher. We will go with this if the automatic soft commit does indeed trigger after the explicit hard commit, thanks. Shawn is spot on in his recommendations to not fixate on the commits. Solr handles that. Here's a long blog about all the details of durability .vs. visibility. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ You're over-thinking the problem here, trying to control commits with a sledgehammer when you don't need to, just use the built-in capabilities. I get what you both are saying. If the problem is that I'm doing explicit hard commits, the solution is that I should stop doing explicit hard commits. That's not really a solution, though. What if, for whatever reason, I absolutely *had to* perform explicit hard commits? (I know you're saying I *don't* have to, but please indulge me for a moment.) Fortunately, the SolrJ client provides a way I can do this. But now my Solr server logs are full of Overlapping onDeckSearchers performance warnings. Fine, I'll turn maxWarmingSearchers down to 1. Now the server returns HTTP 503 responses every now and then and SolrJ throws an exception. I think that's a problem that the servers can solve: just queue up the request until the number of warming searchers is under the limit. So I filed that RFE. Even when all the above suggestions work perfectly and fix our issues, it's still a valid RFE.
Re: Weird behavior of stopwords in search query
Does other appear in the id, cat, or sku fields? This clause requires it to appear in at least one of those fields: +DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5)) The and is treated as the AND operator. What query parser are you using? Without and, the terms are OR'ed, which is the default query operator. -- Jack Krupansky -Original Message- From: Shamik Bandopadhyay Sent: Tuesday, February 18, 2014 8:53 PM To: solr-user@lucene.apache.org Subject: Weird behavior of stopwords in search query Hi, I'm observing a weird behavior while using stopwords as part of the search query. I'm able to replicate it in standalone Solr instance well. The issue pops up when I'm trying to use other and and stopword together in a query string. The query doesn't return any result. But it works with any other combination. For e.g. 1. query yields no result -- http://localhost:8983/solr/collection1/browse?q=AWS+other+and+SearchdebugQuery=truewt=xml Debug Query : str name=rawquerystringAWS other and Search/str str name=querystringAWS other and Search/strstr name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5/no_coord/str str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5) +(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5))/str 2. query yields result -- http://localhost:8983/solr/collection1/browse?q=AWS+other+an+SearchdebugQuery=truewt=xml Debug Query - str name=rawquerystringAWS other an Search/str str name=querystringAWS other an Search/strstr name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 | cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5/no_coord/str str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 | title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 | manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 | features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5) (id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 | description:search^5.0 | resourcename:search | name:search^1.2 | features:search | sku:search^1.5))/str Both other and and are part of the stopwords list. I ran an analysis on text_general field, both stopwords were shows as ignored during indexing and query time, but not happening during actual search. Not sure what I'm missing here, any pointers will be appreciated. - Thanks, Shamik
Re: Weird behavior of stopwords in search query
Jack, thanks for the pointer. I should have checked this closely. I'm using edismax and here's my qf entry : str name=qf id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 /str As you can see, I was boosting id and cat which are of type string and of course doesn't go through the stopwords filter. Removing them returned one result which is based on AND operator. The part what I'm not clear is how and is being treated even through its a stopword and the default operator is OR. Shouldn't this be ignored ? -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: block join and atomic updates
Colleagues, You are definitely right regarding denormcollapse. It works fine in most cases, but look at the case more precisely. Moritz needs to update the parent's fields, if they are copied during denormalization, the price of update is the same as block join's. With q-time join updates are way cheaper, but searching time, you know. 19.02.2014 8:15 пользователь Walter Underwood wun...@wunderwood.org написал: Listen to that advice. Denormalize, denormalize, denormalize. Think about the results page and work backwards from that. Flat data model. wunder Search guy at Infoseek, Inktomi, Verity, Autonomy, Netflix, and Chegg On Feb 18, 2014, at 7:37 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Thinking in terms of normalized data in the context of a Lucene index is dangerous. It is not a relational data model technology, and the join behaviors available to you have limited use. Each approach requires compromises that are likely impermissible for certain uses cases. If it is at all reasonable to consider you will likely be best served de-normalizing the data. Of course, your specific details may prove an exception to this rule...but generally approach works very well. On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: absolutely. On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote: But isn't query time join much slower when it comes to a large amount of documents? Zitat von Mikhail Khludnev mkhlud...@griddynamics.com: Hello, It sounds like you need to switch to query time join. 15.02.2014 21:57 пользователь m...@preselect-media.com написал: Any suggestions? Zitat von m...@preselect-media.com: Yonik Seeley yo...@heliosearch.com: On Thu, Feb 13, 2014 at 8:25 AM, m...@preselect-media.com wrote: Is there any workaround to perform atomic updates on blocks or do I have to re-index the parent document and all its children always again if I want to update a field? The latter, unfortunately. Is there any plan to change this behavior in near future? So, I'm thinking of alternatives without loosing the benefit of block join. I try to explain an idea I just thought about: Let's say I have a parent document A with a number of fields I want to update regularly and a number of child documents AC_1 ... AC_n which are only indexed once and aren't going to change anymore. So, if I index A and AC_* in a block and I update A, the block is gone. But if I create an additional document AF which only contains something like an foreign key to A and indexing AF + AC_* as a block (not A + AC_* anymore), could I perform a {!parent ... } query on AF + AC_* and make an join from the results to get A? Does this makes any sense and is it even possible? ;-) And if it's possible, how can I do it? Thanks, - Moritz -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Walter Underwood wun...@wunderwood.org
Re: Fault Tolerant Technique of Solr Cloud
Thanks for all your response but my doubt is which *Server:Port* should the query be made as we don't know the crashed server or which server might crash in the future(as any server can go down). The only intention for writing this doubt is to get an idea about how the query format for distributed search might work if any of the shard or replica goes down. Thanks On Tue, Feb 18, 2014 at 11:22 PM, Shawn Heisey s...@elyograg.org wrote: On 2/18/2014 8:32 AM, Shawn Heisey wrote: On 2/18/2014 6:05 AM, Vineet Mishra wrote: *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. What is the complete error you are getting? If you don't see the error in the response, you'll need to find your Solr Logfile and look for the error (including a large java stacktrace) there. Good catch by Per. I did not notice that you were trying to send the query to the server that you took down. This isn't going to work -- if the software you're trying to reach is not running, it won't respond. Think about what happens if you are sending requests to a server and it crashes completely. If you want to always send to the same host/port, you will need a load balancer listening on that port. You'll also want something that maintains a shared IP address, so that if the machine dies, the IP address and the load balancer move to another machine. Haproxy and Pacemaker work very well as a combination for this. There are many other choices, both hardware and software. Per also mentioned the other option - you can write code that knows about multiple URLs and can switch between them. This is something you get for free with CloudSolrServer when writing Java code with SolrJ. Thanks, Shawn
Re: Increasing number of SolrIndexSearcher (Leakage)?
I found a custom component cause that issue, It creates a SolrQueryRequest but doesn't close at the end that make ref to SolrIndexSearcher don't go to 0 and SIS is not released. On Tue, Feb 18, 2014 at 9:31 PM, Yonik Seeley yo...@heliosearch.com wrote: On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien tien.nguyenm...@gmail.com wrote: - *But after i index some docs and run softCommit or hardCommit with openSearcher=false, number of SolrIndexSearcher increase by 1* This is fine... it's more of an internal implementation detail (we open what is called a real-time searcher so we can drop some other data structures like the list of non-visible document updates, etc). If you did the commit again, the count should not continue to increase. If the number of searchers continues to increase, you have a searcher leak due to something else. Are you using any custom components or anything else that isn't stock Solr? -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr
Re: Fault Tolerant Technique of Solr Cloud
As Shawn had pointed, if you are using CloudSolrServer client, then you are immune to the scenario where a shard and its replica(s) go down. The communication should be ideally with the zookeepers and not the solr servers directly, One thing you need to make sure is to add the shard.tolerant parameter so that the query returns result from the shard which is alive, though it'll fetch a partial resultset. -- View this message in context: http://lucene.472066.n3.nabble.com/Fault-Tolerant-Technique-of-Solr-Cloud-tp4118003p4118196.html Sent from the Solr - User mailing list archive at Nabble.com.