Re: Filtering Search Cloud
I cannot emphasize strongly enough that you need to _prove_ you have a problem before you decide on a solution! Do you have any evidence that solrcloud can't handle the load you intend? Might a better approach be just to create more shards thus spreading the load and get all the HA/DR goodness of SolrCloud? So far you've said you'll have a heavy load without giving us any numbers. 10,000 update/second? 10 updates/second? 1 query/second? 100,000 queries/second? 100,000 documents? 1,000,000,000,000 documents? Best Erick On Wed, Apr 3, 2013 at 5:15 PM, Shawn Heisey s...@elyograg.org wrote: On 4/3/2013 1:52 PM, Furkan KAMACI wrote: Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr, can I eliminate (filter) something (something that is indexed from master) from being a response after querying (querying from slaves) ? I don't understand the question. I will attempt to give you more information, but it might not answer your question. If not, you'll have to try to improve your question. Your master and each of that master's slaves will have the same index as soon as replication is done. A query on the slave has no idea that the master exists. Thanks, Shawn
Re: Filtering Search Cloud
Ok, I will test and give you a detailed report for it, thanks for your help. 2013/4/5 Erick Erickson erickerick...@gmail.com I cannot emphasize strongly enough that you need to _prove_ you have a problem before you decide on a solution! Do you have any evidence that solrcloud can't handle the load you intend? Might a better approach be just to create more shards thus spreading the load and get all the HA/DR goodness of SolrCloud? So far you've said you'll have a heavy load without giving us any numbers. 10,000 update/second? 10 updates/second? 1 query/second? 100,000 queries/second? 100,000 documents? 1,000,000,000,000 documents? Best Erick On Wed, Apr 3, 2013 at 5:15 PM, Shawn Heisey s...@elyograg.org wrote: On 4/3/2013 1:52 PM, Furkan KAMACI wrote: Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr, can I eliminate (filter) something (something that is indexed from master) from being a response after querying (querying from slaves) ? I don't understand the question. I will attempt to give you more information, but it might not answer your question. If not, you'll have to try to improve your question. Your master and each of that master's slaves will have the same index as soon as replication is done. A query on the slave has no idea that the master exists. Thanks, Shawn
Re: Filtering Search Cloud
On 4/1/2013 3:02 PM, Furkan KAMACI wrote: I want to separate my cloud into two logical parts. One of them is indexer cloud of SolrCloud. Second one is Searcher cloud of SolrCloud. My first question is that. Does separating my cloud system make sense about performance improvement. Because I think that when indexing, searching make time to response and if I separate them I get a performance improvement. On the other hand maybe using all Solr machines as whole (I mean not partitioning as I mentioned) SolrCloud can make a better load balancing, I would want to learn it. My second question is that. Let's assume that I have separated my machines as I mentioned. Can I filter some indexes to be searchable or not from Searcher SolrCloud? SolrCloud gets rid of the master and slave designations. It also gets rid of the line between indexing and querying. Each shard has a replica that is designated the leader, but that has no real impact on searching and indexing, only on deciding which data to use when replicas get out of sync. In the old master-slave architecture, you indexed to the master and the updated index files were replicated to the slave. The slave did not handle the analysis for indexing, so it was usually better to send queries to slaves and let the master only do indexing. SolrCloud is very different. When you index, the documents are indexed on all replicas at about the same time. When you query, the requests are load balanced across all replicas. During normal operation, SolrCloud does not use replication at all. The replication feature is only used when a replica gets out of sync with the leader, and in that case, the entire index is replicated. Thanks, Shawn
Re: Filtering Search Cloud
Shawn, thanks for your detailed explanation. My system will work on high load. I mean I will always index something and something always will be queried at my system. That is why I consider about physically separating indexer and query reply machines. I think about that: imagine a machine that both does indexing (a disk IO for it, I don't know the underlying system maybe Solr makes a sequential IO) and both trying to reply queries (another kind of IO) That is my main challenge to decide separating them. And the next step is that, if I separate them before response can I filter the data of indexer machines (I don't have any filtering issues right now, I just think that maybe I can need it at future) 2013/4/3 Shawn Heisey s...@elyograg.org On 4/1/2013 3:02 PM, Furkan KAMACI wrote: I want to separate my cloud into two logical parts. One of them is indexer cloud of SolrCloud. Second one is Searcher cloud of SolrCloud. My first question is that. Does separating my cloud system make sense about performance improvement. Because I think that when indexing, searching make time to response and if I separate them I get a performance improvement. On the other hand maybe using all Solr machines as whole (I mean not partitioning as I mentioned) SolrCloud can make a better load balancing, I would want to learn it. My second question is that. Let's assume that I have separated my machines as I mentioned. Can I filter some indexes to be searchable or not from Searcher SolrCloud? SolrCloud gets rid of the master and slave designations. It also gets rid of the line between indexing and querying. Each shard has a replica that is designated the leader, but that has no real impact on searching and indexing, only on deciding which data to use when replicas get out of sync. In the old master-slave architecture, you indexed to the master and the updated index files were replicated to the slave. The slave did not handle the analysis for indexing, so it was usually better to send queries to slaves and let the master only do indexing. SolrCloud is very different. When you index, the documents are indexed on all replicas at about the same time. When you query, the requests are load balanced across all replicas. During normal operation, SolrCloud does not use replication at all. The replication feature is only used when a replica gets out of sync with the leader, and in that case, the entire index is replicated. Thanks, Shawn
Re: Filtering Search Cloud
On 4/3/2013 1:13 PM, Furkan KAMACI wrote: Shawn, thanks for your detailed explanation. My system will work on high load. I mean I will always index something and something always will be queried at my system. That is why I consider about physically separating indexer and query reply machines. I think about that: imagine a machine that both does indexing (a disk IO for it, I don't know the underlying system maybe Solr makes a sequential IO) and both trying to reply queries (another kind of IO) That is my main challenge to decide separating them. And the next step is that, if I separate them before response can I filter the data of indexer machines (I don't have any filtering issues right now, I just think that maybe I can need it at future) We do seem to have a language barrier, so let me try to be very clear: If you use SolrCloud, you can't separate querying and indexing. You will have to use the master-slave replication that been part of Solr since at least 1.4, possibly earlier. Thanks, Shawn
Re: Filtering Search Cloud
Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr, can I eliminate (filter) something (something that is indexed from master) from being a response after querying (querying from slaves) ? 2013/4/3 Shawn Heisey s...@elyograg.org On 4/3/2013 1:13 PM, Furkan KAMACI wrote: Shawn, thanks for your detailed explanation. My system will work on high load. I mean I will always index something and something always will be queried at my system. That is why I consider about physically separating indexer and query reply machines. I think about that: imagine a machine that both does indexing (a disk IO for it, I don't know the underlying system maybe Solr makes a sequential IO) and both trying to reply queries (another kind of IO) That is my main challenge to decide separating them. And the next step is that, if I separate them before response can I filter the data of indexer machines (I don't have any filtering issues right now, I just think that maybe I can need it at future) We do seem to have a language barrier, so let me try to be very clear: If you use SolrCloud, you can't separate querying and indexing. You will have to use the master-slave replication that been part of Solr since at least 1.4, possibly earlier. Thanks, Shawn
Re: Filtering Search Cloud
On 4/3/2013 1:52 PM, Furkan KAMACI wrote: Thanks for your explanation, you explained every thing what I need. Just one more question. I see that I can not make it with Solr Cloud, but I can do something like that with master-slave replication of Solr. If I use master-slave replication of Solr, can I eliminate (filter) something (something that is indexed from master) from being a response after querying (querying from slaves) ? I don't understand the question. I will attempt to give you more information, but it might not answer your question. If not, you'll have to try to improve your question. Your master and each of that master's slaves will have the same index as soon as replication is done. A query on the slave has no idea that the master exists. Thanks, Shawn
Filtering Search Cloud
I want to separate my cloud into two logical parts. One of them is indexer cloud of SolrCloud. Second one is Searcher cloud of SolrCloud. My first question is that. Does separating my cloud system make sense about performance improvement. Because I think that when indexing, searching make time to response and if I separate them I get a performance improvement. On the other hand maybe using all Solr machines as whole (I mean not partitioning as I mentioned) SolrCloud can make a better load balancing, I would want to learn it. My second question is that. Let's assume that I have separated my machines as I mentioned. Can I filter some indexes to be searchable or not from Searcher SolrCloud?