[jira] [Commented] (ACCUMULO-3842) [UMBRELLA] Remove non-transient data from ZooKeeper

Josh Elser (JIRA) Fri, 22 May 2015 13:34:13 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556790#comment-14556790
 ]


Josh Elser commented on ACCUMULO-3842:
--------------------------------------

{quote}
bq.    ZooKeeper doesn't keep us from accomplishing this. We would need to 
write code to actually get the strong consensus for ourselves.

Isn't this statement is true for HDFS also? Wether using zookeeper or HDFS, 
config needs to be cached on tservers for efficiency. For most operations 
config is not changing, do not want to make a synchronous read to ZK of HDFS 
before servicing each RPC. I think we need something at the API level to 
address this, regardless of implementation.
{quote}

I haven't put nearly enough thought into this (nor reading necessary 
literature), but my original thought was to use Accumulo tables as much as 
possible for the persistence (and wrap the access in some classes to make 
everything more natural).

If we keep a notion of monotonically increasing versions for configs, a FATE op 
could wait for each server to report at least a minimal version. Discerning the 
"end" state might be difficult in the face of servers dying and starting...

> [UMBRELLA] Remove non-transient data from ZooKeeper
> ---------------------------------------------------
>
>                 Key: ACCUMULO-3842
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3842
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, tserver
>            Reporter: Josh Elser
>             Fix For: 1.8.0
>
>
> Wanted to start brainstorming about this.
> We store a lot of persistent data in ZooKeeper that would better stored in 
> something backed by HDFS. ZooKeeper can be a very convenient place to store 
> persisted data so that it's available to all nodes, but it comes at a price 
> and often must be asynchronously accessed to achieve good performance.
> * Table/Namespace configuration
> * Users/Authorizations
> * Problem reports (maybe?)
> * System configuration overrides (maybe?)
> Some benefits we'd see from this:
> * Loss of ZooKeeper doesn't lose table configuration and users.
> * Greatly reduce zookeeper watchers (assume 
> watchers=50*num_tables*num_tservers)
> * Consistent updates of table constraints and all other table properties
> The last note is the most important one IMO. The number of test issues alone 
> that we've had with constraints not being seen on all servers are bound to 
> affect users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ACCUMULO-3842) [UMBRELLA] Remove non-transient data from ZooKeeper

Reply via email to