Hi bookkeepers,
I'm using BookKeeper for serveral projects, every project has its own
workload characteristics and I would like to be able to assign bookies
depending of the client type. It is quite common to share a BookKeeper
cluster between different applications.

For instance I am using Bookies to store Database logs, Task Brokers
logs and recently I have started to use BookKeeper as data storage.

Within the cluster I would like to use specific Bookies for mid-term
storage, some bookies for logs...and so on, but current placement
policies are not able to "distinguish" bookies.

Actually I can achieve my goal by using a custom policy + custom
metadata + out of band bookie metadata.

I would like to introduce a first step, following the work of on
"Resource aware data placement" (1), and introduce a list of "labels"
to be assigned to every bookie.

For instance: bookies for long term storage will have label
"long-term", bookies for transaction logs may have label "wals".

Another use case is to be able to request BookKeeper to write ledger
data on specific sets of bookies depending on the "customer" who is
the owner of data (I have customers already grouped by labels/tags)

I would like to have a simple "standard" policy which uses some
"standard" metadata to select bookies.

Thinks to add:
- a set  of "labels" configurable for bookies
- Enrich the API (getBookieInfo) to query for labels and BookKeeper
client to keep a local cache of label-to-bookie assignments
- add a standard "custom metadata field"  which is a list of labels to
use to select bookies, a bookie would be used only of it currently
"has" all of the labels requested


[1] 
https://cwiki.apache.org/confluence/display/BOOKKEEPER/BP-2+-+Resource+aware+data+placement

All comments are welcome

-- Enrico

Reply via email to