Re: Filtering Search Cloud

2013-04-05 Thread Erick Erickson
I cannot emphasize strongly enough that you need to _prove_ you have
a problem before you decide on a solution! Do you have any evidence
that solrcloud can't handle the load you intend? Might a better approach
be just to create more shards thus spreading the load and get all the
HA/DR goodness of SolrCloud?

So far you've said you'll have a heavy load without giving us any numbers.
10,000 update/second? 10 updates/second? 1 query/second? 100,000
queries/second? 100,000 documents? 1,000,000,000,000 documents?

Best
Erick

On Wed, Apr 3, 2013 at 5:15 PM, Shawn Heisey s...@elyograg.org wrote:
 On 4/3/2013 1:52 PM, Furkan KAMACI wrote:
 Thanks for your explanation, you explained every thing what I need. Just
 one more question. I see that I can not make it with Solr Cloud, but I can
 do something like that with master-slave replication of Solr. If I use
 master-slave replication of Solr, can I eliminate (filter) something
 (something that is indexed from master) from being a response after
 querying (querying from slaves) ?

 I don't understand the question.  I will attempt to give you more
 information, but it might not answer your question.  If not, you'll have
 to try to improve your question.

 Your master and each of that master's slaves will have the same index as
 soon as replication is done.  A query on the slave has no idea that the
 master exists.

 Thanks,
 Shawn



Re: Filtering Search Cloud

2013-04-05 Thread Furkan KAMACI
Ok, I will test and give you a detailed report for it, thanks for your help.


2013/4/5 Erick Erickson erickerick...@gmail.com

 I cannot emphasize strongly enough that you need to _prove_ you have
 a problem before you decide on a solution! Do you have any evidence
 that solrcloud can't handle the load you intend? Might a better approach
 be just to create more shards thus spreading the load and get all the
 HA/DR goodness of SolrCloud?

 So far you've said you'll have a heavy load without giving us any
 numbers.
 10,000 update/second? 10 updates/second? 1 query/second? 100,000
 queries/second? 100,000 documents? 1,000,000,000,000 documents?

 Best
 Erick

 On Wed, Apr 3, 2013 at 5:15 PM, Shawn Heisey s...@elyograg.org wrote:
  On 4/3/2013 1:52 PM, Furkan KAMACI wrote:
  Thanks for your explanation, you explained every thing what I need. Just
  one more question. I see that I can not make it with Solr Cloud, but I
 can
  do something like that with master-slave replication of Solr. If I use
  master-slave replication of Solr, can I eliminate (filter) something
  (something that is indexed from master) from being a response after
  querying (querying from slaves) ?
 
  I don't understand the question.  I will attempt to give you more
  information, but it might not answer your question.  If not, you'll have
  to try to improve your question.
 
  Your master and each of that master's slaves will have the same index as
  soon as replication is done.  A query on the slave has no idea that the
  master exists.
 
  Thanks,
  Shawn
 



Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/1/2013 3:02 PM, Furkan KAMACI wrote:
 I want to separate my cloud into two logical parts. One of them is indexer
 cloud of SolrCloud. Second one is Searcher cloud of SolrCloud.
 
 My first question is that. Does separating my cloud system make sense about
 performance improvement. Because I think that when indexing, searching make
 time to response and if I separate them I get a performance improvement. On
 the other hand maybe using all Solr machines as whole (I mean not
 partitioning as I mentioned) SolrCloud can make a better load balancing, I
 would want to learn it.
 
 My second question is that. Let's assume that I have separated my machines
 as I mentioned. Can I filter some indexes to be searchable or not from
 Searcher SolrCloud?

SolrCloud gets rid of the master and slave designations.  It also gets
rid of the line between indexing and querying.  Each shard has a replica
that is designated the leader, but that has no real impact on searching
and indexing, only on deciding which data to use when replicas get out
of sync.

In the old master-slave architecture, you indexed to the master and the
updated index files were replicated to the slave.  The slave did not
handle the analysis for indexing, so it was usually better to send
queries to slaves and let the master only do indexing.

SolrCloud is very different.  When you index, the documents are indexed
on all replicas at about the same time.  When you query, the requests
are load balanced across all replicas.  During normal operation,
SolrCloud does not use replication at all.  The replication feature is
only used when a replica gets out of sync with the leader, and in that
case, the entire index is replicated.

Thanks,
Shawn



Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Shawn, thanks for your detailed explanation. My system will work on high
load. I mean I will always index something and something always will be
queried at my system. That is why I consider about physically separating
indexer and query reply machines. I think about that: imagine a machine
that both does indexing (a disk IO for it, I don't know the underlying
system maybe Solr makes a sequential IO) and both trying to reply queries
(another kind of IO) That is my main challenge to decide separating them.
And the next step is that, if I separate them before response can I filter
the data of indexer machines (I don't have any filtering  issues right now,
I just think that maybe I can need it at future)


2013/4/3 Shawn Heisey s...@elyograg.org

 On 4/1/2013 3:02 PM, Furkan KAMACI wrote:
  I want to separate my cloud into two logical parts. One of them is
 indexer
  cloud of SolrCloud. Second one is Searcher cloud of SolrCloud.
 
  My first question is that. Does separating my cloud system make sense
 about
  performance improvement. Because I think that when indexing, searching
 make
  time to response and if I separate them I get a performance improvement.
 On
  the other hand maybe using all Solr machines as whole (I mean not
  partitioning as I mentioned) SolrCloud can make a better load balancing,
 I
  would want to learn it.
 
  My second question is that. Let's assume that I have separated my
 machines
  as I mentioned. Can I filter some indexes to be searchable or not from
  Searcher SolrCloud?

 SolrCloud gets rid of the master and slave designations.  It also gets
 rid of the line between indexing and querying.  Each shard has a replica
 that is designated the leader, but that has no real impact on searching
 and indexing, only on deciding which data to use when replicas get out
 of sync.

 In the old master-slave architecture, you indexed to the master and the
 updated index files were replicated to the slave.  The slave did not
 handle the analysis for indexing, so it was usually better to send
 queries to slaves and let the master only do indexing.

 SolrCloud is very different.  When you index, the documents are indexed
 on all replicas at about the same time.  When you query, the requests
 are load balanced across all replicas.  During normal operation,
 SolrCloud does not use replication at all.  The replication feature is
 only used when a replica gets out of sync with the leader, and in that
 case, the entire index is replicated.

 Thanks,
 Shawn




Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:13 PM, Furkan KAMACI wrote:
 Shawn, thanks for your detailed explanation. My system will work on high
 load. I mean I will always index something and something always will be
 queried at my system. That is why I consider about physically separating
 indexer and query reply machines. I think about that: imagine a machine
 that both does indexing (a disk IO for it, I don't know the underlying
 system maybe Solr makes a sequential IO) and both trying to reply queries
 (another kind of IO) That is my main challenge to decide separating them.
 And the next step is that, if I separate them before response can I filter
 the data of indexer machines (I don't have any filtering  issues right now,
 I just think that maybe I can need it at future)

We do seem to have a language barrier, so let me try to be very clear:
If you use SolrCloud, you can't separate querying and indexing.  You
will have to use the master-slave replication that been part of Solr
since at least 1.4, possibly earlier.

Thanks,
Shawn



Re: Filtering Search Cloud

2013-04-03 Thread Furkan KAMACI
Thanks for your explanation, you explained every thing what I need. Just
one more question. I see that I can not make it with Solr Cloud, but I can
do something like that with master-slave replication of Solr. If I use
master-slave replication of Solr, can I eliminate (filter) something
(something that is indexed from master) from being a response after
querying (querying from slaves) ?


2013/4/3 Shawn Heisey s...@elyograg.org

 On 4/3/2013 1:13 PM, Furkan KAMACI wrote:
  Shawn, thanks for your detailed explanation. My system will work on high
  load. I mean I will always index something and something always will be
  queried at my system. That is why I consider about physically separating
  indexer and query reply machines. I think about that: imagine a machine
  that both does indexing (a disk IO for it, I don't know the underlying
  system maybe Solr makes a sequential IO) and both trying to reply queries
  (another kind of IO) That is my main challenge to decide separating them.
  And the next step is that, if I separate them before response can I
 filter
  the data of indexer machines (I don't have any filtering  issues right
 now,
  I just think that maybe I can need it at future)

 We do seem to have a language barrier, so let me try to be very clear:
 If you use SolrCloud, you can't separate querying and indexing.  You
 will have to use the master-slave replication that been part of Solr
 since at least 1.4, possibly earlier.

 Thanks,
 Shawn




Re: Filtering Search Cloud

2013-04-03 Thread Shawn Heisey
On 4/3/2013 1:52 PM, Furkan KAMACI wrote:
 Thanks for your explanation, you explained every thing what I need. Just
 one more question. I see that I can not make it with Solr Cloud, but I can
 do something like that with master-slave replication of Solr. If I use
 master-slave replication of Solr, can I eliminate (filter) something
 (something that is indexed from master) from being a response after
 querying (querying from slaves) ?

I don't understand the question.  I will attempt to give you more
information, but it might not answer your question.  If not, you'll have
to try to improve your question.

Your master and each of that master's slaves will have the same index as
soon as replication is done.  A query on the slave has no idea that the
master exists.

Thanks,
Shawn



Filtering Search Cloud

2013-04-01 Thread Furkan KAMACI
I want to separate my cloud into two logical parts. One of them is indexer
cloud of SolrCloud. Second one is Searcher cloud of SolrCloud.

My first question is that. Does separating my cloud system make sense about
performance improvement. Because I think that when indexing, searching make
time to response and if I separate them I get a performance improvement. On
the other hand maybe using all Solr machines as whole (I mean not
partitioning as I mentioned) SolrCloud can make a better load balancing, I
would want to learn it.

My second question is that. Let's assume that I have separated my machines
as I mentioned. Can I filter some indexes to be searchable or not from
Searcher SolrCloud?