[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266307#comment-13266307 ]
jirapos...@reviews.apache.org commented on HBASE-5547: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4633/#review7458 ----------------------------------------------------------- Thanks for the review. Going to do a bit of investigation as to more commonality/interface hiding, but unfortunately we handle hfiles in both the RS and Master (and need to actuate from the client), so we need to touch a lot. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveMonitor.java <https://reviews.apache.org/r/4633/#comment16469> hfiles, tables are another patch :) src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveMonitor.java <https://reviews.apache.org/r/4633/#comment16468> top-level as in o.a.h.hbase? Seems like a bit of clutter and this is just a specialization on that packaging and IMO doesn't preclude either master or rs packages from usage. But I can move it if there are strong feelings. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTableTracker.java <https://reviews.apache.org/r/4633/#comment16470> your right, fubbing my commenting a bit. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTableTracker.java <https://reviews.apache.org/r/4633/#comment16471> But its does tracking! And it doesn't actually actuate any changes on the rest of the cluster so Manager seems bad too. How about Monitor? src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTableTracker.java <https://reviews.apache.org/r/4633/#comment16472> Started thinking about your later comment about a factory in HRS.java and think I'm going with the static factory method there. This still needs to be public for testing, but in general we will use the the ServerArchiveTableTracker (and I'll add docs to that effect as well). src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTableTracker.java <https://reviews.apache.org/r/4633/#comment16473> It won't be used except for at startup, but seemed bad practice to prematurely read ZK before we get the start call on the parent. At the very least we can say that we are removing the previous tables though. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16474> Should be fixed with the naming change above. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16475> Yeah, its definitely a bit odd. Going with a factory here and only having the pass in option for testing exposure. See above (tabletracker) and below (hrs) for more info on the factory stuff. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16476> done. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16477> Yeah, its pretty gross. I've been thinking about the ways we can fix this/overhaul the zookeeper infrastructure. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16478> This is actually a little nicer in that it doesn't preclude you from having a table named the same thing as the archive directory. Otherwise, you are checking the node vs. the node name of watcher.archiveHFileZNode, which could be the same. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16479> Done. Seemed to be the prevailing style, but this goes back to the fact that the watcher stuff if crufty. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16480> correct! Though anything in them would naturally be pb-ed :) src/main/java/org/apache/hadoop/hbase/backup/ServerHFileTableArchiveTracker.java <https://reviews.apache.org/r/4633/#comment16481> Inherit from tabletracker so we know when we should archive. However, seems like a bit cleaner to just let the subclasses just override registerTable() for testing. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java <https://reviews.apache.org/r/4633/#comment16482> what do you mean by backup? Its moving them to another directory, rather than being deleted. Sounds like backup to me :) src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java <https://reviews.apache.org/r/4633/#comment16483> just throws a switch. should note its async, so you could have some extra backup files, but ZK is fast, so it shouldn' be that many. src/main/java/org/apache/hadoop/hbase/client/HFileArchiveManager.java <https://reviews.apache.org/r/4633/#comment16484> yup. src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java <https://reviews.apache.org/r/4633/#comment16485> Only need one - the other is going to be unused in revised version. src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java <https://reviews.apache.org/r/4633/#comment16486> Cruft naming - fixed. src/main/java/org/apache/hadoop/hbase/master/HMaster.java <https://reviews.apache.org/r/4633/#comment16487> cruft - just needs the tracker now. Paradigm seems to be to _not_ have that be an interface. src/main/java/org/apache/hadoop/hbase/master/HMaster.java <https://reviews.apache.org/r/4633/#comment16488> per-table hfile archiving tracker. So... both and neither? Name seems to be ok...maybe something like TableHFileArchiveTracker? src/main/java/org/apache/hadoop/hbase/master/HMaster.java <https://reviews.apache.org/r/4633/#comment16490> master didn't, but didn't want to add all the zk dependencies into the CJ. And if the archive stuff gets more complex, it might need to be in the master (weak excuse, but still...) src/main/java/org/apache/hadoop/hbase/master/HMaster.java <https://reviews.apache.org/r/4633/#comment16489> truth. src/main/java/org/apache/hadoop/hbase/master/HMaster.java <https://reviews.apache.org/r/4633/#comment16491> nope, and its a private field. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <https://reviews.apache.org/r/4633/#comment16492> Not really - it only cares about if the HFiles should be archived or not, which is what the monitor is doing. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <https://reviews.apache.org/r/4633/#comment16493> The monitor is what is keeping track of the archive/not logic, but yeah, this could be encapsulated in another class. Wanted to avoid all the redirection, since the code doesn't seem all that complex but moving it out could be reasonable. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java <https://reviews.apache.org/r/4633/#comment16494> see above. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <https://reviews.apache.org/r/4633/#comment16495> much cleaned up. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <https://reviews.apache.org/r/4633/#comment16466> I was actually initially considering using a factory here, but its definitely not. Rather, just creating the ServerHFileArchiveTracker with a "Server" instance that will give it the correct server name to register, so its kinda factory-ish, but there is only ever one instance (rather than a tabletracker creating class) and is super light weight, as opposed to all the double-checked locking, special creating etc that goes around using a factory in this situation. The current impl seems to be the simplest, though I guess we could use an ArchiveTracker factory that takes a zk and Server; seems a bit overkill, though it does hide some of the impl details... src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <https://reviews.apache.org/r/4633/#comment16467> fat-fingered the save on HRS.java. Sorry for the cruft. I'll fix it in the next review. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <https://reviews.apache.org/r/4633/#comment16496> No, this is for how the delete method is currently setup - takes a regionServerServices, which hides impl details, meaning we need to have this method. But its nice for testing too. src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java <https://reviews.apache.org/r/4633/#comment16497> This ends up being cleaner - otherwise each region actually needs to keep around a reference to the ArchiveMonitor and then pass it into deleteRegion (in HRegion.merge). But if the predominant opinion is to keep that around instead, I can make the change. src/main/java/org/apache/hadoop/hbase/regionserver/Store.java <https://reviews.apache.org/r/4633/#comment16498> This is situation 2 where hfiles need to be archived. Tried to keep as much in common between the two situations, but there is limited commonality between the "Archive everything here" and "just archive these couple files" functionality. And either case, you need to know if you keep around the hfiles. I'll try to do some more combining, but we'll see how much more I can pull out. src/main/java/org/apache/hadoop/hbase/util/HFileArchiveCleanup.java <https://reviews.apache.org/r/4633/#comment16499> Nope, eclipse just didn't want to format this to wrap. Fixed. And moved into backup package. src/main/java/org/apache/hadoop/hbase/util/HFileArchiveUtil.java <https://reviews.apache.org/r/4633/#comment16500> Just for static methods. Makes it easier for the method interfaces, but I'll look into if simplifying makes more sense. src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java <https://reviews.apache.org/r/4633/#comment16501> regionserverservices - yup. see above comment and see if you think differently. - Jesse On 2012-05-01 02:42:34, Jesse Yates wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4633/ bq. ----------------------------------------------------------- bq. bq. (Updated 2012-05-01 02:42:34) bq. bq. bq. Review request for hbase, Michael Stack and Lars Hofhansl. bq. bq. bq. Summary bq. ------- bq. bq. Essentially, whenever an hfile would be deleted, it is instead moved to the archive directory. In this impl, the archive directory is on a per table basis, but defaults to '.archive'. Removing hfiles occurs in three places - compaction, merge and catalog janitor. The former and two latter are distinctly different code paths, but but did pull out some similarities. The latter two end up calling the same method, so there should be a reasonable amount of overlap. bq. bq. Implementation wise: bq. Updated the HMasterInterface to pass the calls onto the zookeeper. bq. Added a zk listener to handle updates from the master to the RS to backup. bq. Added a utility for removing files and finding archive directories bq. Added tests for the regionserver and catalogjanitor approaches. bq. Added creation of manager in regionserver. bq. bq. bq. This addresses bug HBASE-5547. bq. https://issues.apache.org/jira/browse/HBASE-5547 bq. bq. bq. Diffs bq. ----- bq. bq. src/main/java/org/apache/hadoop/hbase/HConstants.java a9d80a0 bq. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveMonitor.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTableTracker.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/backup/HFileArchiveTracker.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/backup/ServerHFileTableArchiveTracker.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java ee16e72 bq. src/main/java/org/apache/hadoop/hbase/client/HFileArchiveManager.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java 79d5fdd bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java d47b83a bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7858846 bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 61a5988 bq. src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java 6884d53 bq. src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java ea12da4 bq. src/main/java/org/apache/hadoop/hbase/regionserver/Store.java bf1618e bq. src/main/java/org/apache/hadoop/hbase/util/HFileArchiveCleanup.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/util/HFileArchiveUtil.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 4fc105f bq. src/main/resources/hbase-default.xml f54b345 bq. src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java a59e152 bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java cedf31e bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionHFileArchiving.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/HFileArchiveTestingUtil.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 7d02759 bq. src/test/java/org/apache/hadoop/hbase/util/TestHFileArchivingCleanup.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4633/diff bq. bq. bq. Testing bq. ------- bq. bq. Added two tests for the separate cases - archiving via the regionserver and for the catalog tracker. Former runs in a mini cluster and also touches the changes to HMasterInterface and zookeeper. bq. bq. bq. Thanks, bq. bq. Jesse bq. bq. > Don't delete HFiles when in "backup mode" > ----------------------------------------- > > Key: HBASE-5547 > URL: https://issues.apache.org/jira/browse/HBASE-5547 > Project: HBase > Issue Type: New Feature > Reporter: Lars Hofhansl > Assignee: Jesse Yates > Attachments: java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, > java_HBASE-5547_v6.patch > > > This came up in a discussion I had with Stack. > It would be nice if HBase could be notified that a backup is in progress (via > a znode for example) and in that case either: > 1. rename HFiles to be delete to <file>.bck > 2. rename the HFiles into a special directory > 3. rename them to a general trash directory (which would not need to be tied > to backup mode). > That way it should be able to get a consistent backup based on HFiles (HDFS > snapshots or hard links would be better options here, but we do not have > those). > #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira