Well, we don't have any need for a row-query in our model. All queries return individual records...
Ex: We always assume RowId=userId. So we are only interested in getting records for a matching row-id/user-id. In terms of SQL, it will always be "SELECT * from .... WHERE... AND RowId=<XYZ> LIMIT N" Forming a row-query based scoring should also be possible no? If I remember, I had submitted a very rough draft of row-query scoring in https://issues.apache.org/jira/browse/BLUR-290 [RowDocsCollector, BlurRowCodec etc...] Do you think such a Codec based approach will work for row-queries? -- Ravi On Fri, Jul 11, 2014 at 5:58 PM, Tim Williams <[email protected]> wrote: > On Thu, Jul 10, 2014 at 1:00 PM, Ravikumar Govindarajan > <[email protected]> wrote: > > Aaron, > > > > This is a lengthy post. Please bear... > > > > We are looking at Blur slightly differently. No Map-Red ops, No immutable > > RowId data etc... Just plain online-search like regular lucene/SOLR/ES > > > > Our use-case mandates that Documents for a RowId will arrive > incrementally. > > We don't have the luxury of dropping the whole-row and re-indexing it, > as a > > given Row will have hundreds of thousands of docs... > > > > A single row-id will always be found in one shard, but spread across > > segments. We have modified blur sources on both indexing/search side to > > support this requirement > > > > In other words, we support ADD_RECORDS thrift-op to an existing Row.. > > > > We actually are now testing a sharding strategy similar to databases in > Blur > > > > 1. Initially we start with lets say 300 shards per table aka base-shards > > 2. Each shard has a fixed size lets say 16 GB. Client will watch for this > > and spawn a new shard when size exceeds. {An alias-shard in ES terms} > > 3. ZK will hold the Base --> List-of-Alias shards > > 4. A RowId will be allocated a shard that has least number of alias > shards. > > This mapping will never change in the lifetime of a Row > > 5. ADD_RECORDS op will go the latest alias, while DEL/UPDATE will go to > > all aliases+base shards. > > 6. Once all 300 base-shards have spawned aliases, admins can create new > > base shards on the cluster. Newer RowIds will auto-allocate to > freshly > > created shards > > 7. Both horizontal & vertical scaling of shards can be supported easily > by > > this approach > > > > Now all these are possible only if the RowId -> Base-Shard mapping is > > maintained externally. > > Hi Ravi, > Can you explain how searching across a records in a row works in this > case? For example, the row query example in the docs[1]? > > Thanks, > --tim > > [1] - > http://incubator.apache.org/blur/docs/0.2.2/data-model.html#row_query >
