Thanks Christopher. I haven't yet hit any blockers to using EC with Accumulo. There's still a lot of work to be done testing performance and rooting out any hidden gotchas. The only big issue I've run across, which I mention in my blog post, is that the WAL absolutely cannot be written to an erasure coded directory. It might be a good idea to add some guard code to the DfsLogger to check the policies set on the WAL directory and throw at least a warning if EC is detected.
I've been working on usability improvements to make working with EC easier. Right now, to set the policy for a table requires using the "hdfs ec" command and setting policies on the /accumulo/tables/<id> and children manually. I'm trying to add per-namespace/table properties to control EC (and storage policy), the idea being that a user sets an encoding policy for a namespace, and then any tables created within that namespace will have their tablet directories set to that policy. I'm also trying to implement changing the EC policy at the directory level whenever the encoding policy property is changed via the shell. I'd like to invite any interested parties to check out my fork at https://github.com/etseidl/accumulo/tree/ecprops-2.1 It's already out of date since Keith just checked in some conflicting changes, but you can at least see what I'm trying to accomplish. I'd appreciate some feedback to let me know if I'm on a reasonable track. In particular, the propagation of changes is pretty raw (in the past I used the table config observer to detect changes, but that disappeared). I'd also like to know if how I've approached things would work with how you envision abstracting the filesystem...I currently check for DistributedFileSystem in VolumeManagerImpl, but am not keen on using instanceof. I don't know if any of this is baked enough to do a pull request, but will do so if you'd prefer. Thanks, Ed ________________________________ From: Christopher <[email protected]> Sent: Wednesday, October 30, 2019 3:07 PM To: accumulo-dev <[email protected]> Subject: Re: new contributor intro Awesome! Thanks for the intro Ed. I'm very curious if there's any improvements or features we need to change in Accumulo to better support erasure coding in HDFS (and especially if we can do so without increasing our entanglement with Hadoop HDFS, specifically, as it is a long-term goal of mine to abstract our DFS-related code, to more easily use alternative implementations).
