Re: Review Request 19790: ACCUMULO-378 Design document

Josh Elser Mon, 31 Mar 2014 09:40:25 -0700


> On March 31, 2014, 4:21 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 34
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line34>
> >
> >     Can a table be replicated to multiple clusters?
> 
> kturner wrote:
>     More specifically, can a table on one cluster be replicated to multiple 
> cluster directly.  The graph described seemed to only imply one outgoing 
> edge.  I am just wondering about multiple outgoing edges from a single 
> cluster.   It seems like this would implact the implementation of book 
> keeping for what files were replicated where.

No, the intent was to support replication from one cluster to N clusters. We 
could make this detail transparent by including the destination in the table 
that we store references data to be replicated at the cost of storing N*M 
records instead of just M records. N is the number of clusters the source is 
replicating to while M is the number of references to data that needs to be 
replicated. The more I think about it, the more I think it's definitely worth 
it.

> On March 31, 2014, 4:21 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 80
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line80>
> >
> >     Whats the rational for replicating WAL as opposed to replicating minor 
> > compacted rfiles?  What are the pros and cons? One con w/ WALs is that they 
> > could possibly contain a lot of data for tables that are not being 
> > replicated.  This data would need to be filtered.

The biggest issue is for using them is that they drastically reduce the latency 
for data to *begin* the replication process. We certainly could use RFiles for 
everything which would simplify things, but I'm worried about the latency that 
would incur. If we used RFiles, the only solution I can come up with to speed 
up that latency before replication even begins would be to increase the minc's 
frequency. Maybe that's sufficient for a first-pass? I think I need to quantify 
this opinions with some numbers.

Right now, we tend to recommend a bigger in-memory map for increased ingest 
performance. The worry here would be that recommendation now comes with 
increased replication latency.

- Josh

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19790/#review39051
-----------------------------------------------------------

On March 28, 2014, 5:54 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19790/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 5:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> ACCUMULO-378 Design document.  Posting for review here, not meant for commit. 
>  Final version of document should be posted on issue.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19790/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>

Re: Review Request 19790: ACCUMULO-378 Design document

Reply via email to