First off let me say welcome! Hopefully I can answer your questions inline below.
On Tue, Sep 17, 2013 at 6:52 AM, Ravikumar Govindarajan < [email protected]> wrote: > I am quite new to Blur and need some help with the following questions > > 1. Lets say I have a replication_factor=3 for all HDFS indexes. In case one > of the server hosting HDFS indexes goes down [temporary or take-down], what > will happen to writes? Some kind-of HintedHandoff [as in Cassandra] is > supported? > When there is a Blur Shard Server failure state in ZooKeeper will change and the other shard servers will take action to bring the down shard(s) online. This is similar to the HBase region model. While the shard(s) are being relocated (which really means being reopened from HDFS) writes to the shard(s) being moved are not available. However the bulk load capability is always available as long as HDFS is available, this can be used through Hadoop MapReduce. > > To re-phrase, what is the Consistency Vs Availability trade-off in Blur, > with replication_factor>1 for HDFS indexes? > Of the two Consistency is favored over Availability, however we are starting development (in 0.3.0) to increase availability during failures. > > 2. Since HDFSInputStream is used underneath, will this result in too much > of data-transfer back-and-forth? A case of multi-segment-merge or even > wild-card search could trigger it. > Blur uses an in process file system cache (Block Cache is the term used in the code) to reduce the IO from HDFS. During index merges data that is not in the Block Cache is read from HDFS and the output is written back to HDFS. Overall once an index is hot (been online for some time) the IO for any given search is fairly small assuming that the cluster has enough memory configured in the Block Cache. > > 3. Does Blur also support foreign-key like semantics to search across > column-families as well as delete using row_id? > Blur supports something called Row Queries that allow for searches across column families within single Rows. Take a look at this page for a better explanation: http://incubator.apache.org/blur/docs/0.2.0/data-model.html#querying And yes Blur supports deletes by Row check out: http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_mutate and http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_RowMutation Hopefully this can answer so of your questions. Let us know if you have any more. Thanks, Aaron > > -- > Ravi >
