Yeah I like the idea of compressing them. you thinking of rewriting them with the wal compression feature enabled, or just something simple like running the whole file through a compressor? Maybe I should poke at what difference in resultant file size looks like.
IIRC things already get moved out to archive before being deleted. There's a default TTL of something like 10 minutes before a WAL can be deleted from the archive area. disadvantage to always compressing archived WALs would be overhead to the Replication process? anything else? On Sat, Mar 16, 2019 at 10:51 AM Andrew Purtell <andrew.purt...@gmail.com> wrote: > > How about an option that tells the cleaner to archive them, with compression? > There’s a lot of wastage in WAL files due to repeated information, and > reasons to not enable WAL compression for live files, but I think little > reason not to rewrite an archived WAL file with a typical and standard > archival compression format like BZIP if retaining it for only possible > debugging purposes. (Or maybe a home grown incremental backup solution built > on snapshots and log replay. Or...) > > So, a switch that tells the cleaner to archive rather than delete, and maybe > another toggle that starts a background task to find archived WALs that are > uncompressed and compress them, only removing them once the compressed > version is in place. Compress, optionally, in a temporary location with final > atomic rename like compaction. > > ? > > > > On Mar 16, 2019, at 7:01 AM, Sean Busbey <bus...@apache.org> wrote: > > > > Hi folks! > > > > Sometimes while working to diagnose an HBase failure in production settings > > I need to ensure WALs stick around so that I can examine or possibly replay > > them. For difficult problems on clusters with plenty of HDFS space relative > > to the HBase write workload sometimes that might mean for days or a week. > > > > The way I've always done this is by setting up placeholder replication > > information for a peer that's disabled. It nicely makes the cleaner chore > > pass over things, doesn't require a restart of anything, and has a > > relatively straight forward way to go back to normal. > > > > Lately I've been thinking that I do this often enough that a command for it > > would be better (kind of like how we can turn the balancer on and off). > > > > How do other folks handle this operational need? Am I just missing an > > easier way? > > > > If a new command is needed, what do folks think the minimally useful > > version is? Keep all WALs until told otherwise? Limit to most recent/oldest > > X bytes? Limit to files that include edits to certain > > namespace/table/region?