Re: Review Request 19790: ACCUMULO-378 Design document

keith Mon, 31 Mar 2014 10:26:37 -0700


> On March 28, 2014, 6:53 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 119
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line119>
> >
> >     Seems like accumulo should have a public API for querying what needs to 
> > be replicated, notifying it when something has been replicated, and methods 
> > for importing replicated data.  I am thinking of something different than a 
> > plugin, more like the import/export table API.  How the replication happens 
> > is up the user.  We could provide a default implementation that does 
> > replication as you mentioned.  Some users may want to occassionally 
> > replicate large batches using map reduce.  Others may want to continually 
> > replicate files using distributed queueing solutions.
> 
> Josh Elser wrote:
>     My initial thoughts were to provide something at a public api layer due 
> to the likely desire to integrate WALs as a part of said API. Opening up an 
> API might prove difficult to implement well -- we would have to design 
> something that scales out to adequately support the ingest rates Accumulo 
> will support.
>     
>     Not saying I'm against it, but it would be difficult to get right. 
> Hooking into it would also likely be difficult to implement.
> 
> kturner wrote:
>     I agree would not want to expose internals of walogs to users.  However, 
> I think this API would just expose URI that need to be replicated.  The user 
> woud not have to care about what the actuall data is pointed to be the URI.
>     
>     I am going about this all wrong.  I should outline what I would like to 
> see Accumulo do instead of some incomplete "how" to do it.  Stepping back i 
> would like to see this feature designed to empower admins.
>     
>     ZFS is a file system I really like that empowers admins.  One way it 
> empowers admins is by providing a really flexible easy to use mechanism for 
> replicating file systems. W/ ZFS an admin can do something like the following 
> to initially replicate a file system.
>     
>      # zfs snapshot tank/home@snap1 
>      # zfs send tank/home@snap1 | ssh host2 zfs recv newtank/home
>     
>     After some period of time they can easy replicate the changes to the file 
> system w/ the following commands.
>     
>      # zfs snapshot tank/home@snap2 
>      # zfs send -i tank/home@snap1 tank/home@snap2 | ssh host2 zfs recv 
> newtank/dana
>     
>     What I like about this is that zfs send writes to std out, so that admin 
> could write to a file, send over the network, write to tape, etc.   Whenever 
> and however the admin wants to move the data, the ZFS API makes it super easy 
> for them to do it.    Of course we can not do exatcly what ZFS does, but we 
> can make it easy for admins to move data between clusters in different ways 
> and on different schedules.
> 
> Josh Elser wrote:
>     So, wrapping something around (ranges of) WALs and RFile is definitely 
> desirable here. I believe with that, we can better separate the logic into 
> discrete pieces: 1) Generate data 2) Transmit data 3) Apply data
>     
>     The more we can make the implementations more agnostic of the underlying 
> data, likely the better. The wrapper around WALs and RFiles would need to 
> support some semantics like ordering (WAL1 needs to be applied before WAL2), 
> verification/validation on the remote side (checksum?), and the ability to 
> efficiently replay this data.
>     
>     Thinking further, you could even generalize the problem of how to get 
> from #1 to #2 as a FIFO queue backed by a table.


Need to call out at least one more step, 4) Report data applied.  So the source 
can GC.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19790/#review38927
-----------------------------------------------------------


On March 28, 2014, 5:54 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19790/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 5:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> ACCUMULO-378 Design document.  Posting for review here, not meant for commit. 
>  Final version of document should be posted on issue.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19790/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>

Re: Review Request 19790: ACCUMULO-378 Design document

Reply via email to