RE: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-06 Thread Markus Jelsma
The remainder of an arithmetic division

http://en.wikipedia.org/wiki/Modulo_operation
-Original message-
From: Dennis Gearon 
Sent: Mon 06-09-2010 22:04
To: solr-user@lucene.apache.org; 
Subject: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

What is a 'simple MOD'?

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/6/10, Andrzej Bialecki  wrote:

> From: Andrzej Bialecki 
> Subject: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)
> To: solr-user@lucene.apache.org
> Date: Monday, September 6, 2010, 11:30 AM
> On 2010-09-06 16:41, Yonik Seeley
> wrote:
> > On Mon, Sep 6, 2010 at 10:18 AM, MitchK 
> wrote:
> > [...consistent hashing...]
> >> But it doesn't solve the problem at all, correct
> me if I am wrong, but: If
> >> you add a new server, let's call him IP3-1, and
> IP3-1 is nearer to the
> >> current ressource X, than doc x will be indexed at
> IP3-1 - even if IP2-1
> >> holds the older version.
> >> Am I right?
> > 
> > Right.  You still need code to handle migration.
> > 
> > Consistent hashing is a way for everyone to be able to
> agree on the
> > mapping, and for the mapping to change
> incrementally.  i.e. you add a
> > node and it only changes the docid->node mapping of
> a limited percent
> > of the mappings, rather than changing the mappings of
> potentially
> > everything, as a simple MOD would do.
> 
> Another strategy to avoid excessive reindexing is to keep
> splitting the largest shards, and then your mapping becomes
> a regular MOD plus a list of these additional splits.
> Really, there's an infinite number of ways you could
> implement this...
> 
> > 
> > For SolrCloud, I don't think we'll end up using
> consistent hashing -
> > we don't need it (although some of the concepts may
> still be useful).
> 
> I imagine there could be situations where a simple MOD
> won't do ;) so I think it would be good to hide this
> strategy behind an interface/abstract class. It costs
> nothing, and gives you flexibility in how you implement this
> mapping.
> 
> -- Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _
> _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic
> Web
> ___|||__||  \|  ||  |  Embedded Unix,
> System Integration
> http://www.sigram.com  Contact: info at sigram dot
> com
> 
> 


RE: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

2010-09-06 Thread Dennis Gearon
Oh, THAT MOD! LOL!

I thought it was some search engine specific acronym.
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/6/10, Markus Jelsma  wrote:

> From: Markus Jelsma 
> Subject: RE: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)
> To: solr-user@lucene.apache.org
> Date: Monday, September 6, 2010, 2:53 PM
> The remainder of an arithmetic
> division
> 
> http://en.wikipedia.org/wiki/Modulo_operation

> -Original message-
> From: Dennis Gearon 
> Sent: Mon 06-09-2010 22:04
> To: solr-user@lucene.apache.org;
> 
> Subject: Re: SolrCloud distributed indexing (Re: anyone use
> hadoop+solr?)
> 
> What is a 'simple MOD'?
> 
> Dennis Gearon
> 
> Signature Warning
> 
> EARTH has a Right To Life,
>  otherwise we all die.
> 
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php

> 
> 
> --- On Mon, 9/6/10, Andrzej Bialecki 
> wrote:
> 
> > From: Andrzej Bialecki 
> > Subject: Re: SolrCloud distributed indexing (Re:
> anyone use hadoop+solr?)
> > To: solr-user@lucene.apache.org
> > Date: Monday, September 6, 2010, 11:30 AM
> > On 2010-09-06 16:41, Yonik Seeley
> > wrote:
> > > On Mon, Sep 6, 2010 at 10:18 AM, MitchK 
> > wrote:
> > > [...consistent hashing...]
> > >> But it doesn't solve the problem at all,
> correct
> > me if I am wrong, but: If
> > >> you add a new server, let's call him IP3-1,
> and
> > IP3-1 is nearer to the
> > >> current ressource X, than doc x will be
> indexed at
> > IP3-1 - even if IP2-1
> > >> holds the older version.
> > >> Am I right?
> > > 
> > > Right.  You still need code to handle
> migration.
> > > 
> > > Consistent hashing is a way for everyone to be
> able to
> > agree on the
> > > mapping, and for the mapping to change
> > incrementally.  i.e. you add a
> > > node and it only changes the docid->node
> mapping of
> > a limited percent
> > > of the mappings, rather than changing the
> mappings of
> > potentially
> > > everything, as a simple MOD would do.
> > 
> > Another strategy to avoid excessive reindexing is to
> keep
> > splitting the largest shards, and then your mapping
> becomes
> > a regular MOD plus a list of these additional splits.
> > Really, there's an infinite number of ways you could
> > implement this...
> > 
> > > 
> > > For SolrCloud, I don't think we'll end up using
> > consistent hashing -
> > > we don't need it (although some of the concepts
> may
> > still be useful).
> > 
> > I imagine there could be situations where a simple
> MOD
> > won't do ;) so I think it would be good to hide this
> > strategy behind an interface/abstract class. It costs
> > nothing, and gives you flexibility in how you
> implement this
> > mapping.
> > 
> > -- Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _
> > _   __
> > [__ || __|__/|__||\/|  Information Retrieval,
> Semantic
> > Web
> > ___|||__||  \|  ||  |  Embedded Unix,
> > System Integration
> > http://www.sigram.com  Contact: info at sigram dot
> > com
> > 
> > 
>