Re: Few Questions on Blur Architecture...

Aaron McCurry Tue, 17 Sep 2013 06:32:09 -0700

First off let me say welcome!  Hopefully I can answer your questions inline
below.

On Tue, Sep 17, 2013 at 6:52 AM, Ravikumar Govindarajan <
[email protected]> wrote:

> I am quite new to Blur and need some help with the following questions
>
> 1. Lets say I have a replication_factor=3 for all HDFS indexes. In case one
> of the server hosting HDFS indexes goes down [temporary or take-down], what
> will happen to writes? Some kind-of HintedHandoff [as in Cassandra] is
> supported?
>

When there is a Blur Shard Server failure state in ZooKeeper will change
and the other shard servers will take action to bring the down shard(s)
online.  This is similar to the HBase region model.  While the shard(s) are
being relocated (which really means being reopened from HDFS) writes to the
shard(s) being moved are not available.  However the bulk load capability
is always available as long as HDFS is available, this can be used through
Hadoop MapReduce.

>
> To re-phrase, what is the Consistency Vs Availability trade-off in Blur,
> with replication_factor>1 for HDFS indexes?
>

Of the two Consistency is favored over Availability, however we are
starting development (in 0.3.0) to increase availability during failures.

>
> 2. Since HDFSInputStream is used underneath, will this result in too much
> of data-transfer back-and-forth? A case of multi-segment-merge or even
> wild-card search could trigger it.
>

Blur uses an in process file system cache (Block Cache is the term used in
the code) to reduce the IO from HDFS.  During index merges data that is not
in the Block Cache is read from HDFS and the output is written back to
HDFS.  Overall once an index is hot (been online for some time) the IO for
any given search is fairly small assuming that the cluster has enough
memory configured in the Block Cache.

>
> 3. Does Blur also support foreign-key like semantics to search across
> column-families as well as delete using row_id?
>

Blur supports something called Row Queries that allow for searches across
column families within single Rows.  Take a look at this page for a better
explanation:

http://incubator.apache.org/blur/docs/0.2.0/data-model.html#querying

And yes Blur supports deletes by Row check out:

http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_mutate
and
http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_RowMutation

Hopefully this can answer so of your questions.  Let us know if you have
any more.

Thanks,
Aaron

>
> --
> Ravi
>

Re: Few Questions on Blur Architecture...

Reply via email to