> On March 28, 2014, 6:53 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 119
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line119>
> >
> >     Seems like accumulo should have a public API for querying what needs to 
> > be replicated, notifying it when something has been replicated, and methods 
> > for importing replicated data.  I am thinking of something different than a 
> > plugin, more like the import/export table API.  How the replication happens 
> > is up the user.  We could provide a default implementation that does 
> > replication as you mentioned.  Some users may want to occassionally 
> > replicate large batches using map reduce.  Others may want to continually 
> > replicate files using distributed queueing solutions.

My initial thoughts were to provide something at a public api layer due to the 
likely desire to integrate WALs as a part of said API. Opening up an API might 
prove difficult to implement well -- we would have to design something that 
scales out to adequately support the ingest rates Accumulo will support.

Not saying I'm against it, but it would be difficult to get right. Hooking into 
it would also likely be difficult to implement.


> On March 28, 2014, 6:53 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 139
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line139>
> >
> >     A walog or bulk imported file could be referenced by multiple tablets.  
> > I am wondering if it would be better to move this info out of the tablet 
> > and do something like ~del markers in the metadata table.  Like a 
> > ~repl_hdfs://foo/a.rf row in the metadata table.  This row could store 
> > replication status.  If the ~repl row exist, then file would not be 
> > deleted.  The ~repl marker could not be removed until the file is 
> > replicated and there are no more refs in the tablet metadata (is this 
> > sufficient to prevent addint a repl marker for something that already 
> > replicated).  Could possibly update repl markers using conditional 
> > mutations, since multiple tablets and the master may mutate it.

Yeah, this ties into what Mike had asked about. Having it in a completely 
separate table would be best from a "screwing up other things" perspective. I 
don't have any example of why these markers would need to be in the same row as 
the tablets. I need to read some of that code again.


> On March 28, 2014, 6:53 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 157
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line157>
> >
> >     Are you thinking a FATE operation per file?  FATE uses zookeeper, and 
> > zookeeper keeps everything in memory.

Not sure. Using FATE when appropriate is mostly what I was thinking of right 
now - I don't have explicit examples of where we would want to use FATE. The 
obvious place is that we don't want multiple hosts sending the same data more 
than once, but we also want to make sure we re-send data that failed to send 
the first time around.

Some more thought is needed here, I believe.


> On March 28, 2014, 6:53 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 184
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line184>
> >
> >     How will the locality group config changing be handled?

Given the current replication configuration, the updated locality group 
configuration would use that replication configuration to determine which new 
data should be replicated (and to where). Splitting out the data from the WAL 
into aforementioned replication records may prove to be difficult, at which 
point, I may drop down to only supporting table-wide replication rules.

Supporting locality group replication might require changes in how WALs work 
before this really becomes feasible.


> On March 28, 2014, 6:53 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 254
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line254>
> >
> >     Not gonna work.  It may be worthwhile to consider having the 
> > ConditionalWriter detect unsupported replication configurations and throw 
> > an exception.

Haha, I figured this to be the case. Throwing an exception is what I had 
planned.


> On March 28, 2014, 6:53 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 261
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line261>
> >
> >     Tablets could support an atomic operation that marks all of its current 
> > files as needing replication and appropriately handle new data coming in.  
> > The master would go through all tablets in a table calling this operation.  
> > Tablets could write something to the metadata table when the operation is 
> > successful.  This allows the master to know which tablets are done.

That's a possibility. Like you mentioned earlier, depending on the amount of 
data to be replicated, exporttable and distcp might be (wildly) more efficient. 
Worth it to try to do this now, or leave pre-existing data replication as a 
follow-on?


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19790/#review38927
-----------------------------------------------------------


On March 28, 2014, 5:54 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19790/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 5:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> ACCUMULO-378 Design document.  Posting for review here, not meant for commit. 
>  Final version of document should be posted on issue.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19790/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>

Reply via email to