Re: solr in distributed mode

2009-06-11 Thread Rakhi Khatwani
Hi,
 i went through the document:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

i have a couple of questions:

1. In the document its been mentioned that
There will be a 'master' server for each shard and then 1-n 'slaves' that
are replicated from the master.

how is the replication process done?

suppose i have 2 machines nodeA and nodeB
I edited scripts.config in solr/conf of both nodeA and nodeB to point to the
master (i.e. nodeA).
   i) is it the right approach for setting up master/slave configuration?
   ii) to start the master/slave config, should i execute start.jar from
both the nodes? or just from the master node?
   iii) are indexes automatically replicated when you insert/update it in
the master.. or do we have to run a script for that?
   iv) how do i know if replication process is sucessfully carried out.
   v) suppose the master goes down. i do i perform a node failover.. for
example make one of the slaves as master without disrupting my application?


2. It has also been mentioned that:

With distribution and replication, none of the master shards know about
each other. You index to each master, the index is replicated to each slave,
and then searches are distributed across the slaves, using one slave from
each master/slave shard.

  i) Are slaves used only for index replications? i mean can't i have
indexes distributed across slaves so that when i perform a search, it
searches across all slaves?
ii) since none of the shards have any information about one another, if i
update/delete the document based on term, how does the index gets updated
across all shards? or do we have to merge, update/delete and then distribute
it across shards?

Regards,
Rakahi





In a distributed configuration, one server 'shard' will get a query request
and then search itself, as well as the other shards in the configuration,
and return the combined results from each shard.



On Wed, Jun 10, 2009 at 11:23 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Hello,

 All of this is covered on the Wiki, search for: distributed search

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Rakhi Khatwani rkhatw...@gmail.com
  To: solr-user@lucene.apache.org
  Cc: ninad.r...@germinait.com; ranjit.n...@germinait.com;
 saurabh.maha...@germinait.com
  Sent: Tuesday, June 9, 2009 4:55:55 AM
  Subject: solr in distributed mode
 
  Hi,
  I was looking for ways in which we can use solr in distributed mode.
  is there anyways we can use solr indexes across machines or by using
 Hadoop
  Distributed File System?
 
  Its has been mentioned in the wiki that
  When an index becomes too large to fit on a single system, or when a
 single
  query takes too long to execute, an index can be split into multiple
 shards,
  and Solr can query and merge results across those shards.
 
  what i understand is that shards are a partition. are shards on the same
  machine or can it be on different machines?? do we have to manually
  split the indexes to store in different shards.
 
  do you have an example or some tutorial which demonstrates distributed
 index
  searching/ storing using shards?
 
  Regards,
  Raakhi




solr in distributed mode

2009-06-09 Thread Rakhi Khatwani
Hi,
I was looking for ways in which we can use solr in distributed mode.
is there anyways we can use solr indexes across machines or by using Hadoop
Distributed File System?

Its has been mentioned in the wiki that
When an index becomes too large to fit on a single system, or when a single
query takes too long to execute, an index can be split into multiple shards,
and Solr can query and merge results across those shards.

what i understand is that shards are a partition. are shards on the same
machine or can it be on different machines?? do we have to manually
split the indexes to store in different shards.

do you have an example or some tutorial which demonstrates distributed index
searching/ storing using shards?

Regards,
Raakhi


Re: solr in distributed mode

2009-06-09 Thread Mark Miller

Rakhi Khatwani wrote:

Hi,
I was looking for ways in which we can use solr in distributed mode.
is there anyways we can use solr indexes across machines or by using Hadoop
Distributed File System?

Its has been mentioned in the wiki that
When an index becomes too large to fit on a single system, or when a single
query takes too long to execute, an index can be split into multiple shards,
and Solr can query and merge results across those shards.

what i understand is that shards are a partition. are shards on the same
machine or can it be on different machines?? do we have to manually
split the indexes to store in different shards.

do you have an example or some tutorial which demonstrates distributed index
searching/ storing using shards?

Regards,
Raakhi

  
You might check out this article to get an idea of how Solr scales (lot 
of extra stuff in Lucene in there too, just skip to around)

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

You can also check out the wiki: 
http://wiki.apache.org/solr/DistributedSearch


Also see:

Solr 1.4 : http://wiki.apache.org/solr/SolrReplication
Solr 1.3,1.4: http://wiki.apache.org/solr/CollectionDistribution

--
- Mark

http://www.lucidimagination.com





Re: solr in distributed mode

2009-06-09 Thread Otis Gospodnetic

Hello,

All of this is covered on the Wiki, search for: distributed search

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Rakhi Khatwani rkhatw...@gmail.com
 To: solr-user@lucene.apache.org
 Cc: ninad.r...@germinait.com; ranjit.n...@germinait.com; 
 saurabh.maha...@germinait.com
 Sent: Tuesday, June 9, 2009 4:55:55 AM
 Subject: solr in distributed mode
 
 Hi,
 I was looking for ways in which we can use solr in distributed mode.
 is there anyways we can use solr indexes across machines or by using Hadoop
 Distributed File System?
 
 Its has been mentioned in the wiki that
 When an index becomes too large to fit on a single system, or when a single
 query takes too long to execute, an index can be split into multiple shards,
 and Solr can query and merge results across those shards.
 
 what i understand is that shards are a partition. are shards on the same
 machine or can it be on different machines?? do we have to manually
 split the indexes to store in different shards.
 
 do you have an example or some tutorial which demonstrates distributed index
 searching/ storing using shards?
 
 Regards,
 Raakhi