[ https://issues.apache.org/jira/browse/HBASE-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997737#comment-16997737 ]
Duo Zhang commented on HBASE-23326: ----------------------------------- {quote} Thanks for putting up the doc. Can we have 'comment' access. Here are a few notes in meantime (some carried over from github comments): {quote} Done. {quote} On layout, could have a 'master' namespace so master Region is in same place in filesystem (Might be too awkward excluding this namespace from consideration in general processing). Or, here you make a MasterProcs directory. Old system had a MasterProcWALs dir. Make instead a generic 'master' dir at top-level into which we put all stuff master wants to persist to filesystem of which these new procedures WALs would be first. {quote} The design here is to make the procedure store be self-managed. IMO, most data should be stored in hbase:meta, or other system tables. And why we need this special store is that, inializing and assigning meta depend on the procedure framework so we can not use hbase:meta store these things. This should not be a common case, so let's isolate it from the normal region store. {quote} You cannot pass a RegionServerServices that has the special implementations of flush/compaction/rolling? Just to minimize how this Region implementation deviates from the norm. {quote} It is a 'local' HRegion, so in general, it should not have a RegionServerServices along with it. And in fact, if we pass a RegionServerServices in, lot's of other features will be activated, such as quota, metrics, etc. This will cause problem as if we do not enable table on master, some of the components are not initialized. Of course, metrics is useful here, but can be a follow on. And in general, I think we can do some refactoring on HRegion, to make it decouple with RegionServerServices, and for the optional features, we can add some interface to make them pluggable, then the code here will be more clean. But anyway, as said above, this should not be a common case, so do not need to be hurry on the refactoring. {quote} WAL dirs will be deleted/cleanedup after WALs are moved to recovered.edits? There'll be no accumulation of WALs? What about archiving? Peter figured how to get the MasterProcWALs into the general WAL archive. Maybe no archiving of these WALs? {quote} I do not think we can archive the WALs to the general place as if we enable region on master, it will mess things up. Now I haven't take care of this part yet, the intention is to just delete the WALs in the first version. And later, we could implement our own archiving logic, it is easy I think. Anyway, the design here is to be self-managed. And for tracing the problem, if we assume that the HRegion and WAL framework are fine(If it is not fine, then we should find out on the normal read/write path), then the problem should be in our code which read/write to the HRegion. So maybe we could enable multi version and keep deleted cells on this region, to make it more debug friendly. {quote} For recovered.edits, they content is supposed to be 'sorted'. When we move WALs to recovered.edits, they will be 'sorted' because we write in procedure order? Is there anything we need to do to ensure edits go into the WAL 'ordered'? {quote} Technically, they do not need to be 'sorted'. As there are sequece ids in the WALEntry and we do not do compaction then replaying, order is not important. And why we make them sorted is because performance. As when splitting we can know all the sequence ids of the WALEntries contained in a splitted WAL file, so we can just name it with the sequence ids. Then when replaying, we can quickly filter out the unnecessary WAL files. But here, since we do not need to split, it is not necessary to read the files again and rename them... And that's why I use a different name of the directory to put these WAL files. You can see the modification in HRegion, I added a special config to specify the special directory to place recovered'edits. If this option is set, then the logic is a bit different, where we will not filter out any WAL files, and do not check its name parttern. Thanks. > Implement a ProcedureStore which stores procedures in a HRegion > --------------------------------------------------------------- > > Key: HBASE-23326 > URL: https://issues.apache.org/jira/browse/HBASE-23326 > Project: HBase > Issue Type: Improvement > Components: proc-v2 > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Major > > So we can resue the code in HRegion for persisting the procedures, and also > the optimized WAL implementation for better performance. > This requires we merge the hbase-procedure module to hbase-server, which is > an anti-pattern as we make the hbase-server module more overloaded. But I > think later we can first try to move the WAL stuff out. -- This message was sent by Atlassian Jira (v8.3.4#803005)