On Sat, Jan 14, 2017 at 9:50 PM, Lars George <lars.geo...@gmail.com> wrote:
> I think that makes sense. The tool with its custom code dates back to > where we had no built in version. I am all for removing all of the tools > and leave the API call only. That is the same for an admin then compared to > calling flush or split. > > No? > > Sounds good to me. St.Ack > Lars > > Sent from my iPhone > > On 15 Jan 2017, at 04:25, Stephen Jiang <syuanjiang...@gmail.com> wrote: > > >> If you remove the util.Merge tool, how then does an operator ask for a > merge > > in its absence? > > > > We have a shell command to merge region. In the past, it calls the same > RS > > side code. I don't think there is a need to have util.Merge (even if we > > really want, we can ask this utility to call HBaseAdmin.mergeRegions, > which > > is the same path from the merge command through 'hbase shell'). > > > > Thanks > > Stephen > > > >> On Fri, Jan 13, 2017 at 11:29 PM, Stack <st...@duboce.net> wrote: > >> > >> On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang <syuanjiang...@gmail.com > > > >> wrote: > >> > >>> Revive this thread > >>> > >>> I am in the process of removing Region Server side merge (and split) > >>> transaction code in master branch; as now we have merge (and split) > >>> procedure(s) from master doing the same thing. > >>> > >>> > >> Good (Issue?) > >> > >> > >>> The Merge tool depends on RS-side merge code. I'd like to use this > >> chance > >>> to remove the util.Merge tool. This is for 2.0 and up releases only. > >>> Deprecation does not work here; as keeping the RS-side merge code would > >>> have duplicate logic in source code and make the new Assignment manager > >>> code more complicated. > >>> > >>> > >> Could util.Merge be changed to ask the Master run the merge (via AMv2)? > >> > >> If you remove the util.Merge tool, how then does an operator ask for a > >> merge in its absence? > >> > >> Thanks Stephen > >> > >> S > >> > >> > >>> Please let me know whether you have objection. > >>> > >>> Thanks > >>> Stephen > >>> > >>> PS. I could deprecated HMerge code if anyone is really using it. It > has > >>> its own logic and standalone (supposed to dangerously work offline and > >>> merge more than 2 regions - the util.Merge and shell not support these > >>> functionality for now). > >>> > >>> On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <enis....@gmail.com> > >>> wrote: > >>> > >>>> @Appy what is not clear from above? > >>>> > >>>> I think we should get rid of both Merge and HMerge. > >>>> > >>>> We should not have any tool which will work in offline mode by going > >> over > >>>> the HDFS data. Seems very brittle to be broken when things get > changed. > >>>> Only use case I can think of is that somehow you end up with a lot of > >>>> regions and you cannot bring the cluster back up because of OOMs, etc > >> and > >>>> you have to reduce the number of regions in offline mode. However, we > >> did > >>>> not see this kind of thing in any of our customers for the last couple > >> of > >>>> years so far. > >>>> > >>>> I think we should seriously look into improving normalizer and > enabling > >>>> that by default for all the tables. Ideally, normalizer should be > >> running > >>>> much more frequently, and should be configured with higher-level goals > >>> and > >>>> heuristics. Like on average how many regions per node, etc and should > >> be > >>>> looking at the global state (like the balancer) to decide on split / > >>> merge > >>>> points. > >>>> > >>>> Enis > >>>> > >>>> On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <a...@cloudera.com> > >>>> wrote: > >>>> > >>>>> bq. HMerge can merge multiple regions by going over the list of > >>>>> regions and checking > >>>>> their sizes. > >>>>> bq. But both of these tools (Merge and HMerge) are very dangerous > >>>>> > >>>>> I came across HMerge and it looks like dead code. Isn't referenced > >> from > >>>>> anywhere except one test. (This is what lars also pointed out in the > >>>> first > >>>>> email too). > >>>>> It would make perfect sense if it was a tool or was being referenced > >>> from > >>>>> somewhere, but with lack of either of that, am a bit confused here. > >>>>> @Enis, you seem to know everything about them, please educate me. > >>>>> Thanks > >>>>> - Appy > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <enis....@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Merge has very limited usability singe it can do a single merge and > >>> can > >>>>>> only run when HBase is offline. > >>>>>> HMerge can merge multiple regions by going over the list of regions > >>> and > >>>>>> checking their sizes. > >>>>>> And of course we have the "supported" online merge which is the > >> shell > >>>>>> command. > >>>>>> > >>>>>> But both of these tools (Merge and HMerge) are very dangerous I > >>> think. > >>>> I > >>>>>> would say we should deprecate both to be replaced by the online > >>> merger > >>>>>> tool. We should not allow offline merge at all. I fail to see the > >>>> usecase > >>>>>> that you have to use an offline merge. > >>>>>> > >>>>>> Enis > >>>>>> > >>>>>> On Wed, Sep 28, 2016 at 7:32 AM, Lars George < > >> lars.geo...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Hey, > >>>>>>> > >>>>>>> Sorry to resurrect this old thread, but working on the book > >>> update, I > >>>>>>> came across the same today, i.e. we have Merge and HMerge. I > >> tried > >>>> and > >>>>>>> Merge works fine now. It is also the only one of the two flagged > >> as > >>>>>>> being a tool. Should HMerge be removed? At least deprecated? > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Lars > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yuzhih...@gmail.com> > >>> wrote: > >>>>>>>>>> there is already an issue to do this but not revamp of these > >>>> Merge > >>>>>>>> classes > >>>>>>>> I guess the issue is HBASE-1621 > >>>>>>>> > >>>>>>>> On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> > >> wrote: > >>>>>>>> > >>>>>>>>> Yeah, can you file an issue Lars. This stuff is ancient and > >>> needs > >>>>> to > >>>>>>>>> be redone AND redone so we can do merging while table is > >> online > >>>>> (there > >>>>>>>>> is already an issue to do this but not revamp of these Merge > >>>>> classes). > >>>>>>>>> The unit tests for Merge are also all junit3 and do whacky > >>> stuff > >>>> to > >>>>>>>>> put up multiple regions. This should be redone too (they are > >>>> often > >>>>>>>>> first thing broke when major change and putting them back > >>> together > >>>>> is > >>>>>>>>> a headache since they do not follow the usual pattern). > >>>>>>>>> > >>>>>>>>> St.Ack > >>>>>>>>> > >>>>>>>>> On Sun, Jul 3, 2011 at 12:38 AM, Lars George < > >>>> lars.geo...@gmail.com > >>>>>> > >>>>>>>>> wrote: > >>>>>>>>>> Hi Ted, > >>>>>>>>>> > >>>>>>>>>> The log is from an earlier attempt, I tried this a few > >> times. > >>>> This > >>>>>> is > >>>>>>> all > >>>>>>>>> local, after rm'ing the /hbase. So the files are all pretty > >>> empty, > >>>>> but > >>>>>>> since > >>>>>>>>> I put data in I was assuming it should work. Once you gotten > >>> into > >>>>> this > >>>>>>>>> state, you also get funny error messages in the shell: > >>>>>>>>>> > >>>>>>>>>> hbase(main):001:0> list > >>>>>>>>>> TABLE > >>>>>>>>>> 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using > >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for > >>>>>>>>> org.apache.hadoop.hbase.ipc.HMasterInterface > >>>>>>>>>> > >>>>>>>>>> ERROR: undefined method `map' for nil:NilClass > >>>>>>>>>> > >>>>>>>>>> Here is some help for this command: > >>>>>>>>>> List all tables in hbase. Optional regular expression > >>> parameter > >>>>>> could > >>>>>>>>>> be used to filter the output. Examples: > >>>>>>>>>> > >>>>>>>>>> hbase> list > >>>>>>>>>> hbase> list 'abc.*' > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> hbase(main):002:0> > >>>>>>>>>> > >>>>>>>>>> I am assuming this is collateral, but why? The UI works but > >>> the > >>>>>> table > >>>>>>> is > >>>>>>>>> gone too. > >>>>>>>>>> > >>>>>>>>>> Lars > >>>>>>>>>> > >>>>>>>>>>> On Jul 2, 2011, at 10:55 PM, Ted Yu wrote: > >>>>>>>>>>> > >>>>>>>>>>> There is TestMergeTool which tests Merge. > >>>>>>>>>>> > >>>>>>>>>>> From the log you provided, I got a little confused as why > >>>>>>>>>>> 'testtable,row-20,1309613053987. > >>> 23a35ac696bdf4a8023dcc4c5b8419 > >>>>> e0.' > >>>>>>>>> didn't > >>>>>>>>>>> appear in your command line or the output from .META. > >>> scanning. > >>>>>>>>>>> > >>>>>>>>>>> On Sat, Jul 2, 2011 at 10:36 AM, Lars George < > >>>>>> lars.geo...@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> These two seem both in a bit of a weird state: HMerge is > >>>> scoped > >>>>>>> package > >>>>>>>>>>>> local, therefore no one but the package can call the > >> merge() > >>>>>>>>> functions... > >>>>>>>>>>>> and no one does that but the unit test. But it would be > >> good > >>>> to > >>>>>> have > >>>>>>>>> this on > >>>>>>>>>>>> the CLI and shell as a command (and in the shell maybe > >> with > >>> a > >>>>>>>>> confirmation > >>>>>>>>>>>> message?), but it is not available AFAIK. > >>>>>>>>>>>> > >>>>>>>>>>>> HMerge can merge regions of tables that are disabled. It > >>> also > >>>>>> merges > >>>>>>>>> all > >>>>>>>>>>>> that qualify, i.e. where the merged region is less than or > >>>> equal > >>>>>> of > >>>>>>>>> half the > >>>>>>>>>>>> configured max file size. > >>>>>>>>>>>> > >>>>>>>>>>>> Merge on the other hand does have a main(), so can be > >>> invoked: > >>>>>>>>>>>> > >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge > >>>>>>>>>>>> Usage: bin/hbase merge <table-name> <region-1> <region-2> > >>>>>>>>>>>> > >>>>>>>>>>>> Note how the help insinuates that you can use it as a > >> tool, > >>>> but > >>>>>>> that is > >>>>>>>>> not > >>>>>>>>>>>> correct. Also, it only merges two given regions, and the > >>>> cluster > >>>>>>> must > >>>>>>>>> be > >>>>>>>>>>>> shut down (only the HBase daemons). So that is a step > >> back. > >>>>>>>>>>>> > >>>>>>>>>>>> What is worse is that I cannot get it to work. I tried in > >>> the > >>>>>> shell: > >>>>>>>>>>>> > >>>>>>>>>>>> hbase(main):001:0> create 'testtable', 'colfam1', {SPLITS > >>> => > >>>>>>>>>>>> ['row-10','row-20','row-30','row-40','row-50']} > >>>>>>>>>>>> 0 row(s) in 0.2640 seconds > >>>>>>>>>>>> > >>>>>>>>>>>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' > >> do > >>>> put > >>>>>>>>>>>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end > >> end > >>>>>>>>>>>> 0 row(s) in 1.0450 seconds > >>>>>>>>>>>> > >>>>>>>>>>>> hbase(main):003:0> flush 'testtable' > >>>>>>>>>>>> 0 row(s) in 0.2000 seconds > >>>>>>>>>>>> > >>>>>>>>>>>> hbase(main):004:0> scan '.META.', { COLUMNS => > >>>>>> ['info:regioninfo']} > >>>>>>>>>>>> ROW COLUMN+CELL > >>>>>>>>>>>> testtable,,1309614509037.612d1e0112 > >> column=info:regioninfo, > >>>>>>>>>>>> timestamp=130... > >>>>>>>>>>>> 406e6c2bb482eeaec57322. STARTKEY => '', ENDKEY > >>> => > >>>>>>> 'row-10' > >>>>>>>>>>>> testtable,row-10,1309614509040.2fba > >> column=info:regioninfo, > >>>>>>>>>>>> timestamp=130... > >>>>>>>>>>>> fcc9bc6afac94c465ce5dcabc5d1. STARTKEY => 'row-10', > >>>> ENDKEY > >>>>>> => > >>>>>>>>>>>> 'row-20' > >>>>>>>>>>>> testtable,row-20,1309614509041.e7c1 > >> column=info:regioninfo, > >>>>>>>>>>>> timestamp=130... > >>>>>>>>>>>> 6267eb30e147e5d988c63d40f982. STARTKEY => 'row-20', > >>>> ENDKEY > >>>>>> => > >>>>>>>>>>>> 'row-30' > >>>>>>>>>>>> testtable,row-30,1309614509041.a9cd > >> column=info:regioninfo, > >>>>>>>>>>>> timestamp=130... > >>>>>>>>>>>> e1cbc7d1a21b1aca2ac7fda30ad8. STARTKEY => 'row-30', > >>>> ENDKEY > >>>>>> => > >>>>>>>>>>>> 'row-40' > >>>>>>>>>>>> testtable,row-40,1309614509041.d458 > >> column=info:regioninfo, > >>>>>>>>>>>> timestamp=130... > >>>>>>>>>>>> 236feae097efcf33477e7acc51d4. STARTKEY => 'row-40', > >>>> ENDKEY > >>>>>> => > >>>>>>>>>>>> 'row-50' > >>>>>>>>>>>> testtable,row-50,1309614509041.74a5 > >> column=info:regioninfo, > >>>>>>>>>>>> timestamp=130... > >>>>>>>>>>>> 7dc7e3e9602d9229b15d4c0357d1. STARTKEY => 'row-50', > >>>> ENDKEY > >>>>>> => > >>>>>>> '' > >>>>>>>>>>>> 6 row(s) in 0.0440 seconds > >>>>>>>>>>>> > >>>>>>>>>>>> hbase(main):005:0> exit > >>>>>>>>>>>> > >>>>>>>>>>>> $ ./bin/stop-hbase.sh > >>>>>>>>>>>> > >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \ > >>>>>>>>>>>> testtable,row-20,1309614509041. > >>> e7c16267eb30e147e5d988c63d40f9 > >>>>> 82. > >>>>>> \ > >>>>>>>>>>>> testtable,row-30,1309614509041. > >>> a9cde1cbc7d1a21b1aca2ac7fda30a > >>>>> d8. > >>>>>>>>>>>> > >>>>>>>>>>>> But I get consistently errors: > >>>>>>>>>>>> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO util.Merge: Merging regions > >>>>>>>>>>>> testtable,row-20,1309613053987. > >>> 23a35ac696bdf4a8023dcc4c5b8419 > >>>>> e0. > >>>>>>> and > >>>>>>>>>>>> testtable,row-30,1309613053987. > >> 3664920956c30ac5ff2a7726e4e6 > >>>> in > >>>>>>> table > >>>>>>>>>>>> testtable > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: > >>>>> blocksize=32 > >>>>>>> MB, > >>>>>>>>>>>> rollsize=30.4 MB, enabled=true, optionallogflushinternal= > >>>> 1000ms > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: New hlog > >>>>>>>>>>>> > >>>>>>>>> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_ > >>> 1309616449171/hlog. > >>>>>>> 1309616449181 > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: > >>> getNumCurrentReplicas--HDFS- > >>>>> 826 > >>>>>>> not > >>>>>>>>>>>> available; hdfs_out=org.apache.hadoop.fs. > >>>>>>> FSDataOutputStream@25961581, > >>>>>>>>>>>> > >>>>>>>>> exception=org.apache.hadoop.fs.ChecksumFileSystem$ > >>>>>>> ChecksumFSOutputSummer.getNumCurrentReplicas() > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up > >>>>>>> tabledescriptor > >>>>>>>>>>>> config now ... > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined > >>>>>>>>> -ROOT-,,0.70236052; > >>>>>>>>>>>> next sequenceid=1 > >>>>>>>>>>>> info: null > >>>>>>>>>>>> region1: [B@48fd918a > >>>>>>>>>>>> region2: [B@7f5e2075 > >>>>>>>>>>>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed > >>>>>>>>>>>> java.io.IOException: Could not find meta region for > >>>>>>>>>>>> testtable,row-20,1309613053987. > >>> 23a35ac696bdf4a8023dcc4c5b8419 > >>>>> e0. > >>>>>>>>>>>> at > >>>>>>>>>>>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge. > >>>>>> java:211) > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > >>>> Merge.run(Merge.java:111) > >>>>>>>>>>>> at org.apache.hadoop.util. > >> ToolRunner.run(ToolRunner. > >>>>>> java:65) > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > >>>>> Merge.main(Merge.java:386) > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up > >>>>>>> tabledescriptor > >>>>>>>>>>>> config now ... > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined > >>>>>>>>> .META.,,1.1028785192; > >>>>>>>>>>>> next sequenceid=1 > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed > >>>>>>> -ROOT-,,0.70236052 > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting > >>>>>>>>>>>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error > >>>>>>>>>>>> java.lang.NullPointerException > >>>>>>>>>>>> at > >>>>>>>>> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java: > >> 119) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion( > >>>>>>> MetaUtils.java:229) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion( > >>>>>>> MetaUtils.java:258) > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > >>>> Merge.run(Merge.java:116) > >>>>>>>>>>>> at org.apache.hadoop.util. > >> ToolRunner.run(ToolRunner. > >>>>>> java:65) > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > >>>>> Merge.main(Merge.java:386) > >>>>>>>>>>>> > >>>>>>>>>>>> After which I most of the times have shot .META. with an > >>> error > >>>>>>>>>>>> > >>>>>>>>>>>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase. > >>>>>>> master.HMaster: > >>>>>>>>> Failed > >>>>>>>>>>>> getting all descriptors > >>>>>>>>>>>> java.io.FileNotFoundException: No status for > >>>>>>>>>>>> hdfs://localhost:8020/hbase/.corrupt > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime( > >>>>>>> FSUtils.java:888) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.get( > >>>>>>> FSTableDescriptors.java:122) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll( > >>>>>>> FSTableDescriptors.java:149) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.master.HMaster. > >>>>> getHTableDescriptors(HMaster. > >>>>>>> java:1429) > >>>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl. > >>> invoke0(Native > >>>>>>> Method) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke( > >>>>>>> NativeMethodAccessorImpl.java:39) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke( > >>>>>>> DelegatingMethodAccessorImpl.java:25) > >>>>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call( > >>>>>>> WritableRpcEngine.java:312) > >>>>>>>>>>>> at > >>>>>>>>>>>> > >>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run( > >>>>>>> HBaseServer.java:1065) > >>>>>>>>>>>> > >>>>>>>>>>>> Lars > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> -- Appy > >>>>> > >>>> > >>> > >> >