Re: Few Questions on Blur Architecture...

Ravikumar Govindarajan Fri, 27 Sep 2013 06:15:12 -0700

Thanks Aaron, for the clarification

Detecting row hotspots and isolating them to private shards is a pretty
useful feature in production.


--
Ravi


On Fri, Sep 27, 2013 at 5:08 PM, Aaron McCurry <[email protected]> wrote:

> Ravi,
>
> You can definitely walk through the row ids in blur and detect how many
> records are within the given row.  But there is no built-in way to detect
> the size of a given row including the size of a record without fetching the
> row for inspection.  This sounds like it could be a useful feature to
> detect row hotspots in the shards.
>
> The current scheme for distributing data is through the BlurPartitioner and
> it is a basic hash of the rowid.  So that should provide a fairly even
> distribution of rows but with no regard to the row's size or the size of
> the shard.  In 0.3.0 we will be beginning to expose internal APIs that
> should allow for more control of the shard layout in the cluster (meaning
> what machine it is beginning served from).  We also have a task to deal
> with large rows in a better way.
>
> https://issues.apache.org/jira/browse/BLUR-220
>
> Aaron
>
>
> On Thu, Sep 26, 2013 at 9:30 PM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > Hi,
> >
> > The statistics are as follows
> >
> > 1. Few hundreds of users - 1-10 GB of indexes
> > 2. Few thousands of users - 100 MB - 1 GB of indexes
> > 3. All others - < 100 MB of indexes
> >
> > The system has very few heavy-weight users, a little more middle-weight
> and
> > then lots of light-weight users.
> >
> > Yes, like you had said I might need to split my row-keys internally for
> > handling large data per-user.
> >
> > The problem with such an approach is that IDF will get distributed for a
> > rowId, impacting either scores or response times.
> >
> > Alternatively, I was looking at periodically running schedules to detect
> > the size of a row for a given user and then isolate the heavy-weights to
> a
> > separate shard.
> >
> > Is such an operation supported in Blur?
> >
> > --
> > Ravi
> >
> >
> > On Thu, Sep 26, 2013 at 5:56 PM, Garrett Barton <
> [email protected]
> > >wrote:
> >
> > > First thing I can think of is to not use your userid key directly for
> the
> > > rowid. Instead hash/encrypt or a combination of the two to guarantee
> more
> > > evenly partitioned keys thus making the shards more even.
> > >
> > > Second thing that comes to mind is to periodically rebuild the index
> from
> > > scratch and up the number of shards.
> > >
> > > How many rows are you expecting per user? Avg row width?
> > >
> > > ~Garrett
> > >
> > >
> > >
> > >
> > > On Thu, Sep 26, 2013 at 5:25 AM, Ravikumar Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > Rahul & Aaron,
> > > >
> > > > I will take a look at the alternate approach of having different
> > > clusters.
> > > > Sounds like a very promising method.
> > > >
> > > > I have another question related to BlurPartitioner.
> > > >
> > > > I assume that rowId and it's data resides in only one shard. Is this
> > > > correct? In that case, how to handle a single-shard that become too
> > > > unwieldy over a period of time? [serving too much data and/or too
> many
> > > > rowIds]. What are your suggestions.
> > > >
> > > > --
> > > > Ravi
> > > >
> > > >
> > > > On Tue, Sep 24, 2013 at 6:36 AM, Aaron McCurry <[email protected]>
> > > wrote:
> > > >
> > > > > Ravi,
> > > > >
> > > > > See my comments below.
> > > > >
> > > > > On Mon, Sep 23, 2013 at 8:48 AM, Ravikumar Govindarajan <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Rahul,
> > > > > >
> > > > > >     "When a search request is issued to the Blur Controller it
> > > searches
> > > > > > though all the shard servers in parallel and each shard server
> > > searches
> > > > > > through all of its shards.
> > > > > >     Unlike database partitioning, I believe we cannot direct a
> > search
> > > > to
> > > > > a
> > > > > > particular shard.
> > > > > >     However
> > > > > >         1. Upon shard server start, all the shards are warmed up
> ie
> > > > Index
> > > > > > Reader's for each shard is loaded into memory
> > > > > >         2. Blur uses a block level cache. With sufficient memory
> > > > > allocated
> > > > > > to the cache, performance will be greatly enhanced"
> > > > > >
> > > > > > You are spot-on that what I was referring was a DB-Sharding like
> > > > > technique
> > > > > > and it definitely has some advantages, at least in our set-up.
> > > > > >
> > > > > > I think, what it translates in Blur, is to create N identical
> > tables
> > > > and
> > > > > > maintain user-wise mappings in our app.
> > > > > >
> > > > > > I mean lets say I create a table for every 10k users. I will end
> up
> > > > with
> > > > > > 300 tables for 3 million users. What are the problems you foresee
> > > with
> > > > > that
> > > > > > large number of tables? I know for sure some K-V stores prohibit
> > such
> > > > an
> > > > > > approach.
> > > > > >
> > > > >
> > > > > I think that this is valid approach, however I wouldn't optimize
> too
> > > > soon.
> > > > >  I think the performance will likely be better with one larger
> table
> > > (or
> > > > at
> > > > > least fewer) than hundreds of smaller tables.  However if you need
> to
> > > > split
> > > > > into separate tables there is a feature in Blur that you will
> likely
> > be
> > > > > interested in using.
> > > > >
> > > > > In Blur the controllers act as a gateway/router for the shard
> > cluster.
> > > > >  However they can access more than just a single shard cluster.  If
> > you
> > > > > name each shard cluster a different name (blur-site.properties
> file)
> > > the
> > > > > controllers see all of them and make all the tables accessible
> > through
> > > > the
> > > > > controller cluster.
> > > > >
> > > > > For example:
> > > > >
> > > > > Given:
> > > > >
> > > > > Shard Cluster A (100 servers) 10 tables
> > > > > Shard Cluster B (100 servers) 10 tables
> > > > > Shard Cluster C (100 servers) 10 tables
> > > > > Shard Cluster D (100 servers) 10 tables
> > > > >
> > > > > The controllers would present all 40 tables to the clients.  I have
> > > > > normally used this feature to do full replacements of MapReduced
> > > indexes
> > > > > onto a off cluster while another was presenting the data users were
> > > > > accessing.  Once the indexing was complete the application was soft
> > > > > switched to use the new table.
> > > > >
> > > > > Just some more ideas to think about.
> > > > >
> > > > > Aaron
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > --
> > > > > > Ravi
> > > > > >
> > > > > >
> > > > > > On Thu, Sep 19, 2013 at 11:16 PM, rahul challapalli <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > I will attempt to answer some of your questions below. Aaron or
> > > > someone
> > > > > > > else can correct me if I am wrong
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Sep 19, 2013 at 6:15 AM, Ravikumar Govindarajan <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > > > Thanks Aaron. I think, it has answered my question. I have a
> > few
> > > > more
> > > > > > and
> > > > > > > > would be great if you can clarify them
> > > > > > > >
> > > > > > > > 1. Is the number of shards per-table fixed during
> > table-creation
> > > or
> > > > > we
> > > > > > > can
> > > > > > > > dynamically allocate shards?
> > > > > > > >
> > > > > > >
> > > > > > >    I believe we cannot dynamically allocate shards. The only
> > thing
> > > we
> > > > > can
> > > > > > > dynamically add to an existing table is new columns
> > > > > > >
> > > > > > > >
> > > > > > > > 2. Assuming I have 10k shards with each shard-size=2GB, for a
> > > total
> > > > > of
> > > > > > 20
> > > > > > > > TB table size. I typically use RowId = UserId and there are
> > > approx
> > > > 3
> > > > > > > > million users, in our system
> > > > > > > >
> > > > > > > >     How do I ensure that when a user issues a query, I should
> > not
> > > > > > end-up
> > > > > > > > searching all these 10k shards, but rather search only a very
> > > small
> > > > > set
> > > > > > > of
> > > > > > > > shards?
> > > > > > > >
> > > > > > >
> > > > > > >     When a search request is issued to the Blur Controller it
> > > > searches
> > > > > > > though all the shard servers in parallel and each shard server
> > > > searches
> > > > > > > through all of its shards.
> > > > > > >     Unlike database partitioning, I believe we cannot direct a
> > > search
> > > > > to
> > > > > > a
> > > > > > > particular shard.
> > > > > > >     However
> > > > > > >         1. Upon shard server start, all the shards are warmed
> up
> > ie
> > > > > Index
> > > > > > > Reader's for each shard is loaded into memory
> > > > > > >         2. Blur uses a block level cache. With sufficient
> memory
> > > > > > allocated
> > > > > > > to the cache, performance will be greatly enhanced
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > 3. Are there any advantages of running shard-server and
> > > data-nodes
> > > > > > {HDFS}
> > > > > > > > in the same machine?
> > > > > > > >
> > > > > > > >    Someone else can provide a better answer here.
> > > > > > >    In a typical Hadoop installation Task Trackers and Data
> Nodes
> > > run
> > > > > > > alongside each other on the same machine. Since data nodes
> store
> > > the
> > > > > > first
> > > > > > > block replica on the
> > > > > > >    same  machine, shard servers might see an advantage in terms
> > of
> > > > > > network
> > > > > > > latency. However I think it is not a good idea to run Blur
> > > alongside
> > > > > Task
> > > > > > > Trackers for
> > > > > > >    performance reasons
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > --
> > > > > > > > Ravi
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Sep 19, 2013 at 2:36 AM, Aaron McCurry <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I will attempt to answer below:
> > > > > > > > >
> > > > > > > > > On Wed, Sep 18, 2013 at 1:54 AM, Ravikumar Govindarajan <
> > > > > > > > > [email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > Thanks a bunch for a concise and quick reply. Few more
> > > > questions
> > > > > > > > > >
> > > > > > > > > > 1. Any pointers/links on how you plan to tackle the
> > > > availability
> > > > > > > > problem?
> > > > > > > > > >
> > > > > > > > > > Lets say we store-forward hints to the failed
> shard-server.
> > > > Won't
> > > > > > the
> > > > > > > > > HDFS
> > > > > > > > > > index-files differ in shard replicas?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am in the process of documenting the strategy and will be
> > > > adding
> > > > > it
> > > > > > > to
> > > > > > > > > JIRA soon.  The way I am planning on solving this problem
> > > doesn't
> > > > > > > involve
> > > > > > > > > storing the indexes in more than once in HDFS (which of
> > course
> > > is
> > > > > > > > > replicated).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. I did not phrase my question on cross-join correctly.
> > Let
> > > me
> > > > > > > clarify
> > > > > > > > > >
> > > > > > > > > > RowKey = 123
> > > > > > > > > >
> > > > > > > > > >    RecId = 1000
> > > > > > > > > >    Family = "ACCOUNTS"
> > > > > > > > > >      Col-Name = "NAME"
> > > > > > > > > >      Col-Value = "ABC"
> > > > > > > > > >      ......
> > > > > > > > > >
> > > > > > > > > >    RecId = 1001
> > > > > > > > > >    Family = "CONTACTS"
> > > > > > > > > >      Col-Name = "NAME"
> > > > > > > > > >      Col-Value = "XYZ"
> > > > > > > > > >      Col-Name = "ACCOUNTS-NAME" [FK to RecId=1000]
> > > > > > > > > >      Col-Value = "1000"
> > > > > > > > > >      .......
> > > > > > > > > >
> > > > > > > > > > Lets say the user specifies the search query as
> > > > > > > > > > key=123 AND name:(ABC OR XYZ)
> > > > > > > > > >
> > > > > > > > > > Initially I will apply this query to each of the Family
> > > types,
> > > > > > namely
> > > > > > > > > > "ACCOUNTS", "CONTACTS" etc.... and get their RecIds..
> > > > > > > > > >
> > > > > > > > > > After this, I will have to filter "CONTACTS" family
> > results,
> > > > > based
> > > > > > on
> > > > > > > > > > RecIds received from "ACCOUNTS" [Join within records of
> > > > different
> > > > > > > > family,
> > > > > > > > > > based on FK]
> > > > > > > > > >
> > > > > > > > > > Is something like this achievable? Can I design it
> > > differently
> > > > to
> > > > > > > > satisfy
> > > > > > > > > > my requirements?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I may not fully understand your scenario.
> > > > > > > > >
> > > > > > > > > As I understand your example above:
> > > > > > > > >
> > > > > > > > > Row {
> > > > > > > > >   "id" => "123",
> > > > > > > > >   "records" => [
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1000", "family" => "accounts",
> > > > > > > > >       "columns" => [Column {"name" => "abc"}]
> > > > > > > > >     },
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1001", "family" => "contacts",
> > > > > > > > >       "columns" => [Column {"name" => "abc"}]
> > > > > > > > >     }
> > > > > > > > >   ]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Let me go through some example queries that we support:
> > > > > > > > >
> > > > > > > > > +<accounts.name:abc> +<contacts.name:abc>
> > > > > > > > >
> > > > > > > > > Another way of writing it would be:
> > > > > > > > >
> > > > > > > > > <accounts.name:abc> AND <contacts.name:abc>
> > > > > > > > >
> > > > > > > > > Would yield a hit on the Row, there aren't any FKs in Blur.
> > > > > > > > >
> > > > > > > > > However if there are some interesting queries that be done
> > with
> > > > > more
> > > > > > > > > examples:
> > > > > > > > >
> > > > > > > > > Row {
> > > > > > > > >   "id" => "123",
> > > > > > > > >   "records" => [
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1000", "family" => "accounts",
> > > > > > > > >       "columns" => [Column {"name" => "abc"}]
> > > > > > > > >     },
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1001", "family" => "contacts",
> > > > > > > > >       "columns" => [Column {"name" => "abc"}]
> > > > > > > > >     }
> > > > > > > > >   ]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Row {
> > > > > > > > >   "id" => "456",
> > > > > > > > >   "records" => [
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1000", "family" => "accounts",
> > > > > > > > >       "columns" => [Column {"name" => "abc"}]
> > > > > > > > >     },
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1001", "family" => "contacts",
> > > > > > > > >       "columns" => [Column {"name" => "abc"}]
> > > > > > > > >     },
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1002", "family" => "contacts",
> > > > > > > > >       "columns" => [Column {"name" => "def"}]
> > > > > > > > >     }
> > > > > > > > >   ]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Row {
> > > > > > > > >   "id" => "789",
> > > > > > > > >   "records" => [
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1000", "family" => "accounts",
> > > > > > > > >       "columns" => [Column {"name" => "abc"}]
> > > > > > > > >     },
> > > > > > > > >     Record {
> > > > > > > > >       "recordId" => "1002", "family" => "contacts",
> > > > > > > > >       "columns" => [Column {"name" => "def"}]
> > > > > > > > >     }
> > > > > > > > >   ]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > For the given query: "<accounts.name:abc> AND <
> contacts.name
> > :
> > > > abc>"
> > > > > > > would
> > > > > > > > > yield 2 Row hits. 123 and 456
> > > > > > > > > For the given query: "<accounts.name:abc> AND <
> contacts.name
> > :
> > > > def>"
> > > > > > > would
> > > > > > > > > yield 2 Row hits. 456 and 789
> > > > > > > > > For the given query: "<contacts.name:abc> AND <
> contacts.name
> > :
> > > > def>"
> > > > > > > would
> > > > > > > > > yield 1 Row hit of 456.  NOTICE that the family is the same
> > > here
> > > > > > > > > "contacts".
> > > > > > > > >
> > > > > > > > > Also in Blur you can turn off the Row query and just query
> > the
> > > > > > records.
> > > > > > > > > This would be your typical Document like access.
> > > > > > > > >
> > > > > > > > > I fear that this has not answered your question, so if it
> > > hasn't
> > > > > > please
> > > > > > > > let
> > > > > > > > > me know.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Aaron
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Ravi
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 17, 2013 at 7:01 PM, Aaron McCurry <
> > > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > First off let me say welcome!  Hopefully I can answer
> > your
> > > > > > > questions
> > > > > > > > > > inline
> > > > > > > > > > > below.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 17, 2013 at 6:52 AM, Ravikumar
> Govindarajan <
> > > > > > > > > > > [email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I am quite new to Blur and need some help with the
> > > > following
> > > > > > > > > questions
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Lets say I have a replication_factor=3 for all
> HDFS
> > > > > indexes.
> > > > > > > In
> > > > > > > > > case
> > > > > > > > > > > one
> > > > > > > > > > > > of the server hosting HDFS indexes goes down
> [temporary
> > > or
> > > > > > > > > take-down],
> > > > > > > > > > > what
> > > > > > > > > > > > will happen to writes? Some kind-of HintedHandoff [as
> > in
> > > > > > > Cassandra]
> > > > > > > > > is
> > > > > > > > > > > > supported?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > When there is a Blur Shard Server failure state in
> > > ZooKeeper
> > > > > will
> > > > > > > > > change
> > > > > > > > > > > and the other shard servers will take action to bring
> the
> > > > down
> > > > > > > > shard(s)
> > > > > > > > > > > online.  This is similar to the HBase region model.
> >  While
> > > > the
> > > > > > > > shard(s)
> > > > > > > > > > are
> > > > > > > > > > > being relocated (which really means being reopened from
> > > HDFS)
> > > > > > > writes
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > shard(s) being moved are not available.  However the
> bulk
> > > > load
> > > > > > > > > capability
> > > > > > > > > > > is always available as long as HDFS is available, this
> > can
> > > be
> > > > > > used
> > > > > > > > > > through
> > > > > > > > > > > Hadoop MapReduce.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > To re-phrase, what is the Consistency Vs Availability
> > > > > trade-off
> > > > > > > in
> > > > > > > > > > Blur,
> > > > > > > > > > > > with replication_factor>1 for HDFS indexes?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Of the two Consistency is favored over Availability,
> > > however
> > > > we
> > > > > > are
> > > > > > > > > > > starting development (in 0.3.0) to increase
> availability
> > > > during
> > > > > > > > > failures.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2. Since HDFSInputStream is used underneath, will
> this
> > > > result
> > > > > > in
> > > > > > > > too
> > > > > > > > > > much
> > > > > > > > > > > > of data-transfer back-and-forth? A case of
> > > > > multi-segment-merge
> > > > > > or
> > > > > > > > > even
> > > > > > > > > > > > wild-card search could trigger it.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Blur uses an in process file system cache (Block Cache
> is
> > > the
> > > > > > term
> > > > > > > > used
> > > > > > > > > > in
> > > > > > > > > > > the code) to reduce the IO from HDFS.  During index
> > merges
> > > > data
> > > > > > > that
> > > > > > > > is
> > > > > > > > > > not
> > > > > > > > > > > in the Block Cache is read from HDFS and the output is
> > > > written
> > > > > > back
> > > > > > > > to
> > > > > > > > > > > HDFS.  Overall once an index is hot (been online for
> some
> > > > time)
> > > > > > the
> > > > > > > > IO
> > > > > > > > > > for
> > > > > > > > > > > any given search is fairly small assuming that the
> > cluster
> > > > has
> > > > > > > enough
> > > > > > > > > > > memory configured in the Block Cache.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 3. Does Blur also support foreign-key like semantics
> to
> > > > > search
> > > > > > > > across
> > > > > > > > > > > > column-families as well as delete using row_id?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Blur supports something called Row Queries that allow
> for
> > > > > > searches
> > > > > > > > > across
> > > > > > > > > > > column families within single Rows.  Take a look at
> this
> > > page
> > > > > > for a
> > > > > > > > > > better
> > > > > > > > > > > explanation:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > http://incubator.apache.org/blur/docs/0.2.0/data-model.html#querying
> > > > > > > > > > >
> > > > > > > > > > > And yes Blur supports deletes by Row check out:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_mutate
> > > > > > > > > > > and
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_RowMutation
> > > > > > > > > > >
> > > > > > > > > > > Hopefully this can answer so of your questions.  Let us
> > > know
> > > > if
> > > > > > you
> > > > > > > > > have
> > > > > > > > > > > any more.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Aaron
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Ravi
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Few Questions on Blur Architecture...

Reply via email to