Re: Performance on faceting using docValues
Hello, I have one consideration on top of my head, would you mind to show a brief snapshot by a sampler? On Thu, Mar 5, 2015 at 10:18 PM, lei simpl...@gmail.com wrote: Hi there, I'm testing facet performance with vs without docValues in Solr 4.7, and found that on first request, performance with docValues is much faster than non-docValues. However, for subsequent requests (where the queries are cached), the performance is slower for docValues than non-docValues. Is this an expected behavior? Any idea or solution is appreciated. Thanks. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Performance on faceting using docValues
Here is the specs of some example query faceting on three fields (all string type): first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues) subsequent calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues) consistently the total # of docs returned is around 600,000 On Thu, Mar 5, 2015 at 11:18 AM, lei simpl...@gmail.com wrote: Hi there, I'm testing facet performance with vs without docValues in Solr 4.7, and found that on first request, performance with docValues is much faster than non-docValues. However, for subsequent requests (where the queries are cached), the performance is slower for docValues than non-docValues. Is this an expected behavior? Any idea or solution is appreciated. Thanks.
Performance on faceting using docValues
Hi there, I'm testing facet performance with vs without docValues in Solr 4.7, and found that on first request, performance with docValues is much faster than non-docValues. However, for subsequent requests (where the queries are cached), the performance is slower for docValues than non-docValues. Is this an expected behavior? Any idea or solution is appreciated. Thanks.
Re: Performance on faceting using docValues
Some mistake in the previous email. Here is the specs of some example query faceting on three fields (all string type): first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues) subsequent calls: 100+ ms (with docValues) vs. 30+ ms (w/o docValues) consistently the total # of docs returned is around 600,000 The query looks like this: q=*:*fq=country:USfq=category:112facet=onfacet.sort=indexfacet.mincount=1facet.limit=2000facet.field=manufacturerfacet.field=sellerfacet.field=materialf.manufacturer.facet.mincount=1f.manufacturer.facet.sort=countf.manufacturer.facet.limit=100f.seller.facet.mincount=1f.seller.facet.sort=countf.seller.facet.limit=100f.material.facet.mincount=1sort=score+desc Thanks, On Thu, Mar 5, 2015 at 11:42 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, I have one consideration on top of my head, would you mind to show a brief snapshot by a sampler? On Thu, Mar 5, 2015 at 10:18 PM, lei simpl...@gmail.com wrote: Hi there, I'm testing facet performance with vs without docValues in Solr 4.7, and found that on first request, performance with docValues is much faster than non-docValues. However, for subsequent requests (where the queries are cached), the performance is slower for docValues than non-docValues. Is this an expected behavior? Any idea or solution is appreciated. Thanks. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: Performance on faceting using docValues
This is consistent with my experience. DocValues is faster for the first call (compared to UnInvertedField, which is what is used when there are no DocValues), but is slower on subsequent calls. I'm curious as to this as well, since I haven't heard anyone else before you also mention this. I thought maybe I was the only one... -Michael -Original Message- From: lei [mailto:simpl...@gmail.com] Sent: Thursday, March 05, 2015 2:40 PM To: solr-user@lucene.apache.org Subject: Re: Performance on faceting using docValues Here is the specs of some example query faceting on three fields (all string type): first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues) subsequent calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues) consistently the total # of docs returned is around 600,000 On Thu, Mar 5, 2015 at 11:18 AM, lei simpl...@gmail.com wrote: Hi there, I'm testing facet performance with vs without docValues in Solr 4.7, and found that on first request, performance with docValues is much faster than non-docValues. However, for subsequent requests (where the queries are cached), the performance is slower for docValues than non-docValues. Is this an expected behavior? Any idea or solution is appreciated. Thanks.
Re: Solrcloud Index corruption
Hi Erick, Thank you for your detailed reply. You say in our case some docs didn't made it to the node, but that's not really true: the docs can be found on the corrupted nodes when I search on ID. The docs are also complete. The problem is that the docs do not appear when I filter on certain fields (however the fields are in the doc and have the right value when I search on ID). So something seems to be corrupt in the filter index. We will try the checkindex, hopefully it is able to identify the problematic cores. I understand there is not a master in SolrCloud. In our case we use haproxy as a load balancer for every request. So when indexing every document will be sent to a different solr server, immediately after each other. Maybe SolrCloud is not able to handle that correctly? Thanks, Martin Erick Erickson schreef op 05.03.2015 19:00: Wait up. There's no master index in SolrCloud. Raw documents are forwarded to each replica, indexed and put in the local tlog. If a replica falls too far out of synch (say you take it offline), then the entire index _can_ be replicated from the leader and, if the leader's index was incomplete then that might propagate the error. The practical consequence of this is that if _any_ replica has a complete index, you can recover. Before going there though, the brute-force approach is to just re-index everything from scratch. That's likely easier, especially on indexes this size. Here's what I'd do. Assuming you have the Collections API calls for ADDREPLICA and DELETEREPLICA, then: 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Copy 1 good index from each shard somewhere just to have a backup. 2 DELETEREPLICA on all the incomplete replicas 2.5 I might shut down all the nodes at this point and check that all the cores I'd deleted were gone. If any remnants exist, 'rm -rf deleted_core_dir'. 3 ADDREPLICA to get the ones removed in back. should copy the entire index from the leader for each replica. As you do the leadership will change and after you've deleted all the incomplete replicas, one of the complete ones will be the leader and you should be OK. If you don't want to/can't use the Collections API, then 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Shut 'em all down. 2 Copy the good index somewhere just to have a backup. 3 'rm -rf data' for all the incomplete cores. 4 Bring up the good cores. 5 Bring up the cores that you deleted the data dirs from. What should do is replicate the entire index from the leader. When you restart the good cores (step 4 above), they'll _become_ the leader. bq: Is it possible to make Solrcloud invulnerable for network problems I'm a little surprised that this is happening. It sounds like the network problems were such that some nodes weren't out of touch long enough for Zookeeper to sense that they were down and put them into recovery. Not sure there's any way to secure against that. bq: Is it possible to see if a core is corrupt? There's CheckIndex, here's at least one link: http://java.dzone.com/news/lucene-and-solrs-checkindex What you're describing, though, is that docs just didn't make it to the node, _not_ that the index has unexpected bits, bad disk sectors and the like so CheckIndex can't detect that. How would it know what _should_ have been in the index? bq: I noticed a difference in the Gen column on Overview - Replication. Does this mean there is something wrong? You cannot infer anything from this. In particular, the merging will be significantly different between a single full-reindex and what the state of segment merges is in an incrementally built index. The admin UI screen is rooted in the pre-cloud days, the Master/Slave thing is entirely misleading. In SolrCloud, since all the raw data is forwarded to all replicas, and any auto commits that happen may very well be slightly out of sync, the index size, number of segments, generations, and all that are pretty safely ignored. Best, Erick On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries mar...@downnotifier.com wrote: Hi Andrew, Even our master index is corrupt, so I'm afraid this won't help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45: Force a fetchindex on slave from master command: http://slave_host:port/solr/replication?command=fetchindex - from http://wiki.apache.org/solr/SolrReplication [1] The above command will download the whole index from master to slave, there are configuration options in solr to make this problem happen less often (allowing it to recover from new documents added and only send the changes with a wider gap) - but I cant remember what those were. Links: -- [1] http://wiki.apache.org/solr/SolrReplication
Re: How to start solr in solr cloud mode using external zookeeper ?
The other way you can do that is to specify the startup parameters in solr.in.sh. Example : SOLR_MODE=solrcloud ZK_HOST=zoohost1:2181,zoohost2:2181,zoohost3:2181 SOLR_PORT=4567 You can simply start solr by running ./solr start -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solrcloud Index corruption
For updates, the document will always get routed to the leader of the appropriate shard, no matter what server first receives the request. -Original Message- From: Martin de Vries [mailto:mar...@downnotifier.com] Sent: Thursday, March 05, 2015 4:14 PM To: solr-user@lucene.apache.org Subject: Re: Solrcloud Index corruption Hi Erick, Thank you for your detailed reply. You say in our case some docs didn't made it to the node, but that's not really true: the docs can be found on the corrupted nodes when I search on ID. The docs are also complete. The problem is that the docs do not appear when I filter on certain fields (however the fields are in the doc and have the right value when I search on ID). So something seems to be corrupt in the filter index. We will try the checkindex, hopefully it is able to identify the problematic cores. I understand there is not a master in SolrCloud. In our case we use haproxy as a load balancer for every request. So when indexing every document will be sent to a different solr server, immediately after each other. Maybe SolrCloud is not able to handle that correctly? Thanks, Martin Erick Erickson schreef op 05.03.2015 19:00: Wait up. There's no master index in SolrCloud. Raw documents are forwarded to each replica, indexed and put in the local tlog. If a replica falls too far out of synch (say you take it offline), then the entire index _can_ be replicated from the leader and, if the leader's index was incomplete then that might propagate the error. The practical consequence of this is that if _any_ replica has a complete index, you can recover. Before going there though, the brute-force approach is to just re-index everything from scratch. That's likely easier, especially on indexes this size. Here's what I'd do. Assuming you have the Collections API calls for ADDREPLICA and DELETEREPLICA, then: 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Copy 1 good index from each shard somewhere just to have a backup. 2 DELETEREPLICA on all the incomplete replicas 2.5 I might shut down all the nodes at this point and check that all the cores I'd deleted were gone. If any remnants exist, 'rm -rf deleted_core_dir'. 3 ADDREPLICA to get the ones removed in back. should copy the entire index from the leader for each replica. As you do the leadership will change and after you've deleted all the incomplete replicas, one of the complete ones will be the leader and you should be OK. If you don't want to/can't use the Collections API, then 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Shut 'em all down. 2 Copy the good index somewhere just to have a backup. 3 'rm -rf data' for all the incomplete cores. 4 Bring up the good cores. 5 Bring up the cores that you deleted the data dirs from. What should do is replicate the entire index from the leader. When you restart the good cores (step 4 above), they'll _become_ the leader. bq: Is it possible to make Solrcloud invulnerable for network problems I'm a little surprised that this is happening. It sounds like the network problems were such that some nodes weren't out of touch long enough for Zookeeper to sense that they were down and put them into recovery. Not sure there's any way to secure against that. bq: Is it possible to see if a core is corrupt? There's CheckIndex, here's at least one link: http://java.dzone.com/news/lucene-and-solrs-checkindex What you're describing, though, is that docs just didn't make it to the node, _not_ that the index has unexpected bits, bad disk sectors and the like so CheckIndex can't detect that. How would it know what _should_ have been in the index? bq: I noticed a difference in the Gen column on Overview - Replication. Does this mean there is something wrong? You cannot infer anything from this. In particular, the merging will be significantly different between a single full-reindex and what the state of segment merges is in an incrementally built index. The admin UI screen is rooted in the pre-cloud days, the Master/Slave thing is entirely misleading. In SolrCloud, since all the raw data is forwarded to all replicas, and any auto commits that happen may very well be slightly out of sync, the index size, number of segments, generations, and all that are pretty safely ignored. Best, Erick On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries mar...@downnotifier.com wrote: Hi Andrew, Even our master index is corrupt, so I'm afraid this won't help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45: Force a fetchindex on slave from master command: http://slave_host:port/solr/replication?command=fetchindex - from http://wiki.apache.org/solr/SolrReplication [1] The above command will download the whole index from master to slave, there are configuration options in solr to make this
Re: solr cloud does not start with many collections
I've tried a few variations, with 3 x ZK, 6 X nodes, solr 4.10.3, solr 5.0 without any success and no real difference. There is a tipping point at around 3,000-4,000 cores (varies depending on hardware) from where I can restart the cloud OK within ~4min, to the cloud not working and continuous 'conflicting information about the leader of shard' warnings. On 5 March 2015 at 14:15, Shawn Heisey apa...@elyograg.org wrote: On 3/4/2015 5:37 PM, Damien Kamerman wrote: I'm running on Solaris x86, I have plenty of memory and no real limits # plimit 15560 15560: /opt1/jdk/bin/java -d64 -server -Xss512k -Xms32G -Xmx32G -XX:MaxMetasp resource current maximum time(seconds) unlimited unlimited file(blocks) unlimited unlimited data(kbytes) unlimited unlimited stack(kbytes) unlimited unlimited coredump(blocks) unlimited unlimited nofiles(descriptors) 65536 65536 vmemory(kbytes) unlimited unlimited I've been testing with 3 nodes, and that seems OK up to around 3,000 cores total. I'm thinking of testing with more nodes. I have opened an issue for the problems I encountered while recreating a config similar to yours, which I have been doing on Linux. https://issues.apache.org/jira/browse/SOLR-7191 It's possible that the only thing the issue will lead to is improvements in the documentation, but I'm hopeful that there will be code improvements too. Thanks, Shawn -- Damien Kamerman
Re: Solrcloud Index corruption
If you google replication can cause index corruption there are two jira issues that are the most likely cause of corruption in a solrcloud env. - Mark On Mar 5, 2015, at 2:20 PM, Garth Grimm garthgr...@averyranchconsulting.com wrote: For updates, the document will always get routed to the leader of the appropriate shard, no matter what server first receives the request. -Original Message- From: Martin de Vries [mailto:mar...@downnotifier.com] Sent: Thursday, March 05, 2015 4:14 PM To: solr-user@lucene.apache.org Subject: Re: Solrcloud Index corruption Hi Erick, Thank you for your detailed reply. You say in our case some docs didn't made it to the node, but that's not really true: the docs can be found on the corrupted nodes when I search on ID. The docs are also complete. The problem is that the docs do not appear when I filter on certain fields (however the fields are in the doc and have the right value when I search on ID). So something seems to be corrupt in the filter index. We will try the checkindex, hopefully it is able to identify the problematic cores. I understand there is not a master in SolrCloud. In our case we use haproxy as a load balancer for every request. So when indexing every document will be sent to a different solr server, immediately after each other. Maybe SolrCloud is not able to handle that correctly? Thanks, Martin Erick Erickson schreef op 05.03.2015 19:00: Wait up. There's no master index in SolrCloud. Raw documents are forwarded to each replica, indexed and put in the local tlog. If a replica falls too far out of synch (say you take it offline), then the entire index _can_ be replicated from the leader and, if the leader's index was incomplete then that might propagate the error. The practical consequence of this is that if _any_ replica has a complete index, you can recover. Before going there though, the brute-force approach is to just re-index everything from scratch. That's likely easier, especially on indexes this size. Here's what I'd do. Assuming you have the Collections API calls for ADDREPLICA and DELETEREPLICA, then: 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Copy 1 good index from each shard somewhere just to have a backup. 2 DELETEREPLICA on all the incomplete replicas 2.5 I might shut down all the nodes at this point and check that all the cores I'd deleted were gone. If any remnants exist, 'rm -rf deleted_core_dir'. 3 ADDREPLICA to get the ones removed in back. should copy the entire index from the leader for each replica. As you do the leadership will change and after you've deleted all the incomplete replicas, one of the complete ones will be the leader and you should be OK. If you don't want to/can't use the Collections API, then 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Shut 'em all down. 2 Copy the good index somewhere just to have a backup. 3 'rm -rf data' for all the incomplete cores. 4 Bring up the good cores. 5 Bring up the cores that you deleted the data dirs from. What should do is replicate the entire index from the leader. When you restart the good cores (step 4 above), they'll _become_ the leader. bq: Is it possible to make Solrcloud invulnerable for network problems I'm a little surprised that this is happening. It sounds like the network problems were such that some nodes weren't out of touch long enough for Zookeeper to sense that they were down and put them into recovery. Not sure there's any way to secure against that. bq: Is it possible to see if a core is corrupt? There's CheckIndex, here's at least one link: http://java.dzone.com/news/lucene-and-solrs-checkindex What you're describing, though, is that docs just didn't make it to the node, _not_ that the index has unexpected bits, bad disk sectors and the like so CheckIndex can't detect that. How would it know what _should_ have been in the index? bq: I noticed a difference in the Gen column on Overview - Replication. Does this mean there is something wrong? You cannot infer anything from this. In particular, the merging will be significantly different between a single full-reindex and what the state of segment merges is in an incrementally built index. The admin UI screen is rooted in the pre-cloud days, the Master/Slave thing is entirely misleading. In SolrCloud, since all the raw data is forwarded to all replicas, and any auto commits that happen may very well be slightly out of sync, the index size, number of segments, generations, and all that are pretty safely ignored. Best, Erick On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries mar...@downnotifier.com wrote: Hi Andrew, Even our master index is corrupt, so I'm afraid this won't help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45:
Re: Solrcloud Index corruption
On 3/5/2015 3:13 PM, Martin de Vries wrote: I understand there is not a master in SolrCloud. In our case we use haproxy as a load balancer for every request. So when indexing every document will be sent to a different solr server, immediately after each other. Maybe SolrCloud is not able to handle that correctly? SolrCloud can handle that correctly, but currently sending index updates to a core that is not the leader of the shard will incur a significant performance hit, compared to always sending updates to the correct core. A small performance penalty would be understandable, because the request must be redirected, but what actually happens is a much larger penalty than anyone expected. We have an issue in Jira to investigate that performance issue and make it work as efficiently as possible. Indexing batches of documents is recommended, not sending one document per update request. General performance problems with Solr itself can lead to extremely odd and unpredictable behavior from SolrCloud. Most often these kinds of performance problems are related in some way to memory, either the java heap or available memory in the system. http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Admin UI doesn't show logs?
On 3/5/2015 6:01 PM, Jakov Sosic wrote: I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI. When I click on a Logging - I don't see actual entries but only: No Events available The logging tab in the admin UI only shows log entries where the severity of the log is at least WARN. The default file-level logging setup in the example logs a lot more -- it is normally set to INFO, and a normal startup will generate hundreds or thousands of log entries at the INFO level, which would be overwhelming to view in a web browser. That's why they are only logged to a file named ./logs/solr.log, if you have the log4j.properties file included in the example. I believe there is a way to configure Solr so that the admin UI will show you everything, but trust me when I say that you do most likely do not want those log entries to be in the admin UI, because there are a LOT of them. Thanks, Shawn
Admin UI doesn't show logs?
Hi, I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI. When I click on a Logging - I don't see actual entries but only: No Events available and round icon circling non stop. When I click on Level, I see the same icon, and message Loading Is there a hint or something you could point me to, so I could fix it?
Re: Admin UI doesn't show logs?
And given that you configured it under Tomcat, I'd check that the logs are generated at all first. Just as a sanity check. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 5 March 2015 at 20:15, Shawn Heisey apa...@elyograg.org wrote: On 3/5/2015 6:01 PM, Jakov Sosic wrote: I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI. When I click on a Logging - I don't see actual entries but only: No Events available The logging tab in the admin UI only shows log entries where the severity of the log is at least WARN. The default file-level logging setup in the example logs a lot more -- it is normally set to INFO, and a normal startup will generate hundreds or thousands of log entries at the INFO level, which would be overwhelming to view in a web browser. That's why they are only logged to a file named ./logs/solr.log, if you have the log4j.properties file included in the example. I believe there is a way to configure Solr so that the admin UI will show you everything, but trust me when I say that you do most likely do not want those log entries to be in the admin UI, because there are a LOT of them. Thanks, Shawn
Re: How to start solr in solr cloud mode using external zookeeper ?
Thanks shamik :) With Regards Aman Tandon On Fri, Mar 6, 2015 at 3:30 AM, shamik sham...@gmail.com wrote: The other way you can do that is to specify the startup parameters in solr.in.sh. Example : SOLR_MODE=solrcloud ZK_HOST=zoohost1:2181,zoohost2:2181,zoohost3:2181 SOLR_PORT=4567 You can simply start solr by running ./solr start -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR query parameters
Whew! I was afraid that my memory was failing since I'd no memory of ever seeing anything remotely like that! Erick On Thu, Mar 5, 2015 at 6:04 AM, phi...@free.fr wrote: Please ignore my question. These are form field names which I created a couple of months ago, not SOLR query parameters. Philippe - Mail original - De: phi...@free.fr À: solr-user@lucene.apache.org Envoyé: Jeudi 5 Mars 2015 14:54:26 Objet: SOLR query parameters Hello, could someone please explain what these SOLR query parameter keywords stand for: - ppcdb - srbycb - as For instance, http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=srbycb=as=q=kaisersort= I could not find them in the SOLR documentation. Many thanks. Philippe
Re: Help needed to understand zookeeper in solrcloud
I start out with 5 zk's. All good. One zk fails - I'm left with four. Are they guaranteed to split 4/0 or 3/1 - because if they split 2/2 I'm screwed, right? Surely to start with 5 zk's (or in fact any odd number - it could be 21 even), and from a single failure you drop to an even number - then there is the danger of NOT getting quorum. So ... I can only assume that there is a mechanism in place inside zk to guarantee this cannot happen, right? -- Cheers Jules. On 05/03/2015 06:47, svante karlsson wrote: Yes, as long as it is three (the majority of 5) or more. This is why there is no point of having a 4 node cluster. This would also require 3 nodes for majority thus giving it the fault tolerance of a 3 node cluster but slower and more expensive. 2015-03-05 7:41 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks svante. What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will zookeeper election can occur with 4 / even number of zookeepers alive? With Regards Aman Tandon On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson s...@csi.se wrote: synchronous update of state and a requirement of more than half the zookeepers alive (and in sync) this makes it impossible to have a split brain situation ie when you partition a network and get let's say 3 alive on one side and 2 on the other. In this case the 2 node networks stops serving request since it's not in majority. 2015-03-03 13:15 GMT+01:00 Aman Tandon amantandon...@gmail.com: But how they handle the failure? With Regards Aman Tandon On Tue, Mar 3, 2015 at 5:17 PM, O. Klein kl...@octoweb.nl wrote: Zookeeper requires a majority of servers to be available. For example: Five machines ZooKeeper can handle the failure of two machines. That's why odd numbers are recommended.
Re: Help needed to understand zookeeper in solrcloud
The network will only split if you get errors on your network hardware. (or fiddle with iptables) Let's say you placed your zookeepers in separate racks and someone pulls network cable between them - that will leave you with 5 working servers but they can't reach each other. This is split brain scenario. Are they guaranteed to split 4/0 Yes. A node failure will not partition the network. any odd number - it could be 21 even Since all write a synchronous you don't want to use a too large number of zookeepers since that would slow down the cluster. Use a reasonable number to reach your SLA. (3 or 5 are common choices) and from a single failure you drop to an even number - then there is the danger of NOT getting quorum. No, se above. BUT, if you first lose most of your nodes due to a network partition and then lose another due to node failure - then you are out of quorum. /svante 2015-03-05 9:29 GMT+01:00 Julian Perry ju...@limitless.co.uk: I start out with 5 zk's. All good. One zk fails - I'm left with four. Are they guaranteed to split 4/0 or 3/1 - because if they split 2/2 I'm screwed, right? Surely to start with 5 zk's (or in fact any odd number - it could be 21 even), and from a single failure you drop to an even number - then there is the danger of NOT getting quorum. So ... I can only assume that there is a mechanism in place inside zk to guarantee this cannot happen, right? -- Cheers Jules. On 05/03/2015 06:47, svante karlsson wrote: Yes, as long as it is three (the majority of 5) or more. This is why there is no point of having a 4 node cluster. This would also require 3 nodes for majority thus giving it the fault tolerance of a 3 node cluster but slower and more expensive. 2015-03-05 7:41 GMT+01:00 Aman Tandon amantandon...@gmail.com: Thanks svante. What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will zookeeper election can occur with 4 / even number of zookeepers alive? With Regards Aman Tandon On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson s...@csi.se wrote: synchronous update of state and a requirement of more than half the zookeepers alive (and in sync) this makes it impossible to have a split brain situation ie when you partition a network and get let's say 3 alive on one side and 2 on the other. In this case the 2 node networks stops serving request since it's not in majority. 2015-03-03 13:15 GMT+01:00 Aman Tandon amantandon...@gmail.com: But how they handle the failure? With Regards Aman Tandon On Tue, Mar 3, 2015 at 5:17 PM, O. Klein kl...@octoweb.nl wrote: Zookeeper requires a majority of servers to be available. For example: Five machines ZooKeeper can handle the failure of two machines. That's why odd numbers are recommended.
Re: How to start solr in solr cloud mode using external zookeeper ?
Thanks Erick. So for the other audience who got stuck in same situation. Here is the solution. If you are able to run the remote/local zookeeper ensemble, then you can create the Solr Cluster by the following method. Suppose you have an zookeeper ensemble of 3 zookeeper server running on three different machines which has the IP addresses as :192.168.11.12, 192.168.101.12, 192.168.101.92 and these machines are using the zookeeper client port as 2181 for every machine (as mentioned in zoo.cfg) and in my case I am using the solr-5.0.0 version Now go to the bin directory of your extracted solr tar/zip file and run this command for each solr server of your SolrCloud cluster. ./solr start -c -z 192.168.11.12:2181,192.168.101.12:2181, 192.168.101.92:2181 -p 4567 -p - for specifying the another port number other than 8983 in my case it is 4567 -c - to start server in cloud mode -z - to specifying the zookeeper host address With Regards Aman Tandon On Wed, Mar 4, 2015 at 5:18 AM, Erick Erickson erickerick...@gmail.com wrote: Have you seen this page?: https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference This is really the new way Best, Erick On Tue, Mar 3, 2015 at 7:18 AM, Aman Tandon amantandon...@gmail.com wrote: Thanks Shawn, also thanks for sharing info about chroot. I am trying to implement the solr cloud with solr-5.0.0. I also checked the documentations https://wiki.apache.org/solr/SolrCloud , the method shown there is using start.jar. But after few update start.jar (jetty) will not work. So I want to go through the way which will work as it is even after upgrade. So how could i start it from bin directory with all these parameters of external zookeeper or any other best way which you can suggest. With Regards Aman Tandon On Tue, Mar 3, 2015 at 8:09 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/3/2015 4:21 AM, Aman Tandon wrote: I am new to solr-cloud, i have connected the zookeepers located on 3 remote servers. All the configs are uploaded and linked successfully. Now i am stuck to how to start solr in cloud mode using these external zookeeper which are remotely located. Zookeeper is installed at 3 servers and using the 2181 as client port. ON all three server, solr server along with external zookeeper is present. solrcloud1.com (solr + zookeper is present) solrcloud2.com solrcloud3.com Now i have to start the solr by telling the solr to use the external zookeeper. So how should I do that. You simply tell Solr about all your zookeeper servers on startup, using the zkHost property. Here's the format of that property: server1:port,server2:port,server3:port/solr1 The /solr1 part (the ZK chroot) is optional, but I recommend it ... it can be just about any text you like, starting with a forward slash. What this does is put all of SolrCloud's information inside a path in zookeeper, sort of like a filesystem. With no chroot, that information is placed at the root of zookeeper. If you want to use a zookeeper ensemble for multiple applications, you're going to need a chroot. Even when multiple applications are not required, I recommend it to keep the zookeeper root clean. You can see some examples of zkHost values in the javadoc for SolrJ: http://lucene.apache.org/solr/5_0_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#CloudSolrClient%28java.lang.String%29 Thanks, Shawn
Re: [ANNOUNCE] Apache Solr 4.10.4 released
Hello Mike, How are you? This is Oded Sofer from IBM Guardium. We had moved to SolrCloud, I thought you may be able to help me find something. The Facet search is very slow, I do not know how to check what is the size of our facets (gb / count). Do you know how I can check it? On Thursday, March 5, 2015 5:28 PM, Michael McCandless luc...@mikemccandless.com wrote: October 2014, Apache Solr™ 4.10.4 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.10.4 is available for immediate download at: http://www.apache.org/dyn/closer.cgi/lucene/solr/4.10.4 Solr 4.10.4 includes 24 bug fixes, as well as Lucene 4.10.4 and its 13 bug fixes. See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Mike McCandless http://blog.mikemccandless.com
Re: Issue while enabling clustering/integrating carrot2 with solr 4.4.0 and tomact under ubuntu
Class cast exceptions are usually the result of having a mix of old and new jars in your classpath, or even of having the same jar in two different places. Is this possible here? Best, Erick On Wed, Mar 4, 2015 at 6:44 PM, sthita sthit...@gmail.com wrote: 1.My solr.xml ?xml version=1.0 encoding=UTF-8 ? solr persistent=true sharedLib=/solr/lib cores defaultCoreName=rn0 hostContext=/solr adminPath=/admin/cores hostPort=8980 core schema=schema.xml shard=shard1 instanceDir=rn0/ name=rn0 config=solrconfig.xml collection=rn/ .. .. /cores /solr 2.My solrconfig.xml changes for carrot2 integrate searchComponent class=org.apache.solr.handler.clustering.ClusteringComponent enable=${solr.clustering.enabled:false} name=clustering lst name=engine str name=namedefault/str str name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str str name=LingoClusteringAlgorithm.desiredClusterCountBase20/str /lst /searchComponent requestHandler name=/clustering startup=lazy enable=${solr.clustering.enabled:false} class=solr.SearchHandler . . . . /requestHandler lib dir=/solr/lib regex=.*\.jar / 3.Copied all the required jars to /solr/lib folder those are solr-clustering-4.4.0.jar carrot2-mini-3.6.2.jar hppc-0.4.1.jar jackson-core-asl-1.7.4.jar jackson-mapper-asl-1.7.4.jar mahout-collections-1.0.jar mahout-math-0.6.jar simple-xml-2.6.4.jar 4.created a file named setenv.sh under /usr/share/tomcat/bin/ with clustering enabled CATALINA_OPTS = -Dsolr.clustering.enabled=true 5.Restarted tomcat and I am getting the following error while starting solr server after -Dsolr.clustering.enabled=true on CATALINA_OPTS ERROR org.apache.solr.servlet.SolrDispatchFilter – null:org.apache.solr.common.SolrException: SolrCore 'rn0' is not available due to init f ailure: Error Instantiating SearchComponent, org.apache.solr.handler.clustering.ClusteringComponent failed to instantiate org.apache.solr.handler.component.SearchCompon ent at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: Error Instantiating SearchComponent, org.apache.solr.handler.clustering.ClusteringComponent failed to instantiate org.a pache.solr.handler.component.SearchComponent at org.apache.solr.core.SolrCore.(SolrCore.java:835) at org.apache.solr.core.SolrCore.(SolrCore.java:629) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:1) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: org.apache.solr.common.SolrException: Error Instantiating SearchComponent, org.apache.solr.handler.clustering.ClusteringComponent failed to instantiate org.apache.solr.handler.component.SearchComponent at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:551) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:586) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2173) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2167) at
Re: Solrcloud Index corruption
Wait up. There's no master index in SolrCloud. Raw documents are forwarded to each replica, indexed and put in the local tlog. If a replica falls too far out of synch (say you take it offline), then the entire index _can_ be replicated from the leader and, if the leader's index was incomplete then that might propagate the error. The practical consequence of this is that if _any_ replica has a complete index, you can recover. Before going there though, the brute-force approach is to just re-index everything from scratch. That's likely easier, especially on indexes this size. Here's what I'd do. Assuming you have the Collections API calls for ADDREPLICA and DELETEREPLICA, then: 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Copy 1 good index from each shard somewhere just to have a backup. 2 DELETEREPLICA on all the incomplete replicas 2.5 I might shut down all the nodes at this point and check that all the cores I'd deleted were gone. If any remnants exist, 'rm -rf deleted_core_dir'. 3 ADDREPLICA to get the ones removed in 2 back. 3 should copy the entire index from the leader for each replica. As you do 2 the leadership will change and after you've deleted all the incomplete replicas, one of the complete ones will be the leader and you should be OK. If you don't want to/can't use the Collections API, then 0 Identify the complete replicas. If you're lucky you have at least one for each shard. 1 Shut 'em all down. 2 Copy the good index somewhere just to have a backup. 3 'rm -rf data' for all the incomplete cores. 4 Bring up the good cores. 5 Bring up the cores that you deleted the data dirs from. What 5 should do is replicate the entire index from the leader. When you restart the good cores (step 4 above), they'll _become_ the leader. bq: Is it possible to make Solrcloud invulnerable for network problems I'm a little surprised that this is happening. It sounds like the network problems were such that some nodes weren't out of touch long enough for Zookeeper to sense that they were down and put them into recovery. Not sure there's any way to secure against that. bq: Is it possible to see if a core is corrupt? There's CheckIndex, here's at least one link: http://java.dzone.com/news/lucene-and-solrs-checkindex What you're describing, though, is that docs just didn't make it to the node, _not_ that the index has unexpected bits, bad disk sectors and the like so CheckIndex can't detect that. How would it know what _should_ have been in the index? bq: I noticed a difference in the Gen column on Overview - Replication. Does this mean there is something wrong? You cannot infer anything from this. In particular, the merging will be significantly different between a single full-reindex and what the state of segment merges is in an incrementally built index. The admin UI screen is rooted in the pre-cloud days, the Master/Slave thing is entirely misleading. In SolrCloud, since all the raw data is forwarded to all replicas, and any auto commits that happen may very well be slightly out of sync, the index size, number of segments, generations, and all that are pretty safely ignored. Best, Erick On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries mar...@downnotifier.com wrote: Hi Andrew, Even our master index is corrupt, so I'm afraid this won't help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45: Force a fetchindex on slave from master command: http://slave_host:port/solr/replication?command=fetchindex - from http://wiki.apache.org/solr/SolrReplication The above command will download the whole index from master to slave, there are configuration options in solr to make this problem happen less often (allowing it to recover from new documents added and only send the changes with a wider gap) - but I cant remember what those were.
Labels for facets on Velocity
Hello, I’ve been trying to have a pretty name for my facets on Velocity Response Writer. Do you know how can I do that? For example, suppose that I am faceting field1. My query returns 3 facets: uglyfacet1, uglyfacet2 and uglyfacet3. I want to show them to the user a pretty name, like Pretty Facet 1, Pretty Facet 2 and Pretty Facet 3. The thing is that linking on velocity should still work, so the user can navigate the results. Thank you. Henrique.
RE: Cores and and ranking (search quality)
Hello - facetting will be the same and distributed more like this is also possible since 5.0, and there is a working patch for 4.10.3. Regular search will work as well since 5.0 because of distributed IDF, which you need to enable manually. Behaviour will not be the same if you rely on average document length statistics, which is true when you use BM25 instead of the default TFIDF similarity. Solr will do the result merging so everything is transparent, awesome! Markus -Original message- From:johnmu...@aol.com johnmu...@aol.com Sent: Thursday 5th March 2015 14:38 To: solr-user@lucene.apache.org Subject: Cores and and ranking (search quality) Hi, I have data in which I will index and search on. This data is well define such that I can index into a single core or multiple cores like so: core_1:Jan2015, core_2:Feb2015, core_3:Mar2015, etc. My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? If yes, how will it be different? Also, will facet and more-like-this quality / result be the same? Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Thanks! - MJ
SOLR query parameters
Hello, could someone please explain what these SOLR query parameter keywords stand for: - ppcdb - srbycb - as For instance, http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=srbycb=as=q=kaisersort= I could not find them in the SOLR documentation. Many thanks. Philippe
Re: SOLR query parameters
Please ignore my question. These are form field names which I created a couple of months ago, not SOLR query parameters. Philippe - Mail original - De: phi...@free.fr À: solr-user@lucene.apache.org Envoyé: Jeudi 5 Mars 2015 14:54:26 Objet: SOLR query parameters Hello, could someone please explain what these SOLR query parameter keywords stand for: - ppcdb - srbycb - as For instance, http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=srbycb=as=q=kaisersort= I could not find them in the SOLR documentation. Many thanks. Philippe
Re: Cores and and ranking (search quality)
On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote: My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? Yes, it will be different. The practical impact depends on how homogeneous your data are across the shards and how large your shards are. If you have small and dissimilar shards, your ranking will suffer a lot. Work is being done to remedy this: https://issues.apache.org/jira/browse/SOLR-1632 Also, will facet and more-like-this quality / result be the same? It is not formally guaranteed, but for most practical purposes, faceting on multi-shards will give you the same results as single-shards. I don't know about more-like-this. My guess is that it will be affected in the same way that standard searches are. Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Yes. From a user-perspective, searches are no different. - Toke Eskildsen, State and University Library, Denmark
[ANNOUNCE] Apache Solr 4.10.4 released
October 2014, Apache Solr™ 4.10.4 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.10.4 is available for immediate download at: http://www.apache.org/dyn/closer.cgi/lucene/solr/4.10.4 Solr 4.10.4 includes 24 bug fixes, as well as Lucene 4.10.4 and its 13 bug fixes. See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Mike McCandless http://blog.mikemccandless.com
Re: solr 4.7.2 mergeFactor/ Merge policy issue
I would, BTW, either just get rid of the maxBufferedDocs all together or make it much higher, i.e. 10. I don't think this is really your problem, but you're creating a lot of segments here. But I'm kind of at a loss as to what would be different about your setup. Is there _any_ chance that you have some secondary process looking at your index that's maintaining open searchers? Any custom code that's perhaps failing to close searchers? Is this a Unix or Windows system? And just to be really clear, you _only_ seeing more segments being added, right? If you're only counting files in the index directory, it's _possible_ that merging is happening, you're just seeing new files take the place of old ones. Best, Erick On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/4/2015 4:12 PM, Erick Erickson wrote: I _think_, but don't know for sure, that the merging stuff doesn't get triggered until you commit, it doesn't just happen. Shot in the dark... I believe that new segments are created when the indexing buffer (ramBufferSizeMB) fills up, even without commits. I'm pretty sure that anytime a new segment is created, the merge policy is checked to see whether a merge is needed. Thanks, Shawn
Parsing cluster result's docs
Hi, I have a Solr instance using the clustering component (with the Lingo algorithm) working perfectly. However when I get back the cluster results only the ID's of these come back with it. What is the easiest way to retrieve full documents instead? Should I parse these IDs into a new query to Solr, or is there some configuration I am missing to return full docs instead of IDs? If it matters, I am using Solr 4.10. Thanks.
Solrcloud Index corruption
Hi, We have index corruption on some cores on our Solrcloud running version 4.8.1. The index is corrupt on several servers. (for example: when we do an fq search we get results on some servers, on other servers we don't, while the stored document contains the field on all servers). A full re-index of the content didn't help, so we created a new core and did the reindex on that one. We think the index corruption is caused by network issues we had a few weeks ago. I hope someone can help us with some questions: - Is it possible to make Solrcloud invulnerable for network problems like packet loss or connection errors? Will it for example help to use an SSL connection between the Solr servers? - Is it possible to see if a core is corrupt? We now noticed because we didn't find some documents while searching on the website, but don't know if other cores are corrupt. I noticed a difference in the Gen column on Overview - Replication. Does this mean there is something wrong? Or is there any other way to see the corruption? Corrupt core: Version Gen Size Master (Searching) 1425565575249 2023309 472.41 MB Master (Replicable) 1425566098510 2023310 - Slave (Searching) 1425565575253 2023308 472.38 MB Re-created core: Version Gen Size Master (Searching) 1425566108174 35 283.98 MB Master (Replicable) 1425566108174 35 - Slave (Searching) 1425566106674 35 288.24 MB Kind regards, Martin
Re: Solrcloud Index corruption
We had a similar issue, when this happened we did a fetch index on each core out of sync to put them back right again Sent from my iPhone On 5 Mar 2015, at 14:40, Martin de Vries mar...@downnotifier.com wrote: Hi, We have index corruption on some cores on our Solrcloud running version 4.8.1. The index is corrupt on several servers. (for example: when we do an fq search we get results on some servers, on other servers we don't, while the stored document contains the field on all servers). A full re-index of the content didn't help, so we created a new core and did the reindex on that one. We think the index corruption is caused by network issues we had a few weeks ago. I hope someone can help us with some questions: - Is it possible to make Solrcloud invulnerable for network problems like packet loss or connection errors? Will it for example help to use an SSL connection between the Solr servers? - Is it possible to see if a core is corrupt? We now noticed because we didn't find some documents while searching on the website, but don't know if other cores are corrupt. I noticed a difference in the Gen column on Overview - Replication. Does this mean there is something wrong? Or is there any other way to see the corruption? Corrupt core: VersionGenSize Master (Searching)14255655752492023309472.41 MB Master (Replicable)14255660985102023310- Slave (Searching)14255655752532023308472.38 MB Re-created core: VersionGenSize Master (Searching)142556610817435283.98 MB Master (Replicable)142556610817435- Slave (Searching)142556610667435288.24 MB Kind regards, Martin
RE: Solrcloud Index corruption
Force a fetchindex on slave from master command: http://slave_host:port/solr/replication?command=fetchindex - from http://wiki.apache.org/solr/SolrReplication The above command will download the whole index from master to slave, there are configuration options in solr to make this problem happen less often (allowing it to recover from new documents added and only send the changes with a wider gap) - but I cant remember what those were. -Original Message- From: Andrew Butkus [mailto:andrew.but...@c6-intelligence.com] Sent: 05 March 2015 14:42 To: solr-user@lucene.apache.org Subject: Re: Solrcloud Index corruption We had a similar issue, when this happened we did a fetch index on each core out of sync to put them back right again Sent from my iPhone On 5 Mar 2015, at 14:40, Martin de Vries mar...@downnotifier.com wrote: Hi, We have index corruption on some cores on our Solrcloud running version 4.8.1. The index is corrupt on several servers. (for example: when we do an fq search we get results on some servers, on other servers we don't, while the stored document contains the field on all servers). A full re-index of the content didn't help, so we created a new core and did the reindex on that one. We think the index corruption is caused by network issues we had a few weeks ago. I hope someone can help us with some questions: - Is it possible to make Solrcloud invulnerable for network problems like packet loss or connection errors? Will it for example help to use an SSL connection between the Solr servers? - Is it possible to see if a core is corrupt? We now noticed because we didn't find some documents while searching on the website, but don't know if other cores are corrupt. I noticed a difference in the Gen column on Overview - Replication. Does this mean there is something wrong? Or is there any other way to see the corruption? Corrupt core: VersionGenSize Master (Searching)14255655752492023309472.41 MB Master (Replicable)14255660985102023310- Slave (Searching)14255655752532023308472.38 MB Re-created core: VersionGenSize Master (Searching)142556610817435283.98 MB Master (Replicable)142556610817435- Slave (Searching)142556610667435288.24 MB Kind regards, Martin
Cores and and ranking (search quality)
Hi, I have data in which I will index and search on. This data is well define such that I can index into a single core or multiple cores like so: core_1:Jan2015, core_2:Feb2015, core_3:Mar2015, etc. My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? If yes, how will it be different? Also, will facet and more-like-this quality / result be the same? Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Thanks! - MJ
RE: Solrcloud Index corruption
Hi Andrew, Even our master index is corrupt, so I'm afraid this won't help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45: Force a fetchindex on slave from master command: http://slave_host:port/solr/replication?command=fetchindex - from http://wiki.apache.org/solr/SolrReplication The above command will download the whole index from master to slave, there are configuration options in solr to make this problem happen less often (allowing it to recover from new documents added and only send the changes with a wider gap) - but I cant remember what those were.
Re: Performance on faceting using docValues
On Thu, 2015-03-05 at 21:14 +0100, lei wrote: You present a very interesting observation. I have not noticed what you describe, but on the other hand we have not done comparative speed tests. q=*:*fq=country:USfq=category:112 First observation: Your query is '*:*, which is a magic query. Non-DV faceting has optimizations both for this query (although that ought to be disabled due to the fq) and for the inverse case where there are more hits than non-hits. Perhaps you could test with a handful of queries, which has different result sizes? facet=onfacet.sort=indexfacet.mincount=1facet.limit=2000 The combination of index order and a high limit might be an explanation: When resolving the Strings of the facet result, non-DV will perform ordinal-lookup, which is fast when done in monotonic rising order (sort=index) and if the values are close (limit=2000). I do not know if DV benefits the same way. On the other hand, your limit seems to apply only to material, so it could be that the real number of unique values is low and you just set the limit to 2000 to be sure you get everything? facet.field=manufacturerfacet.field=sellerfacet.field=material f.manufacturer.facet.mincount=1f.manufacturer.facet.sort=countf.manufacturer.facet.limit=100 f.seller.facet.mincount=1f.seller.facet.sort=countf.seller.facet.limit=100 f.material.facet.mincount=1sort=score+desc How large is your index in bytes, how many documents does it contain and is it single-shard or cloud? Could you paste the loglines containing UnInverted field, which describes the number of unique values and size of your facet fields? - Toke Eskildsen, State and University Library, Denmark
Re: problem with tutorial
do you publish you solr in tomcat?which is the tomcat port? 2014-12-16 15:45 GMT+08:00 Xin Cai xincai2...@gmail.com: hi Everyone I am a complete noob when it comes to Solr and when I try to follow the tutorial and run Solr I get the error message Waiting to see Solr listening on port 8983 [-] Still not seeing Solr listening on 8983 after 30 seconds! I did some googling and all I found was instruction for removing grep commands which doesn't sound right to me...I have checked my ports and currently I don't have any service listening on port 8983 and my firewall is not on, so I am not sure what is happening. Any help would be appreciated. Thanks Xin Cai