On Thu, Jul 10, 2014 at 1:00 PM, Ravikumar Govindarajan <[email protected]> wrote: > Aaron, > > This is a lengthy post. Please bear... > > We are looking at Blur slightly differently. No Map-Red ops, No immutable > RowId data etc... Just plain online-search like regular lucene/SOLR/ES > > Our use-case mandates that Documents for a RowId will arrive incrementally. > We don't have the luxury of dropping the whole-row and re-indexing it, as a > given Row will have hundreds of thousands of docs... > > A single row-id will always be found in one shard, but spread across > segments. We have modified blur sources on both indexing/search side to > support this requirement > > In other words, we support ADD_RECORDS thrift-op to an existing Row.. > > We actually are now testing a sharding strategy similar to databases in Blur > > 1. Initially we start with lets say 300 shards per table aka base-shards > 2. Each shard has a fixed size lets say 16 GB. Client will watch for this > and spawn a new shard when size exceeds. {An alias-shard in ES terms} > 3. ZK will hold the Base --> List-of-Alias shards > 4. A RowId will be allocated a shard that has least number of alias shards. > This mapping will never change in the lifetime of a Row > 5. ADD_RECORDS op will go the latest alias, while DEL/UPDATE will go to > all aliases+base shards. > 6. Once all 300 base-shards have spawned aliases, admins can create new > base shards on the cluster. Newer RowIds will auto-allocate to freshly > created shards > 7. Both horizontal & vertical scaling of shards can be supported easily by > this approach > > Now all these are possible only if the RowId -> Base-Shard mapping is > maintained externally.
Hi Ravi, Can you explain how searching across a records in a row works in this case? For example, the row query example in the docs[1]? Thanks, --tim [1] - http://incubator.apache.org/blur/docs/0.2.2/data-model.html#row_query
