---pc3 as RegionServer and datanode you only have 1 RS(split to 100 region), HDFS replicate to 3? perhaps the compaction cannot catch the flush speed, # of store files hit "hbase.hstore.blockingStoreFiles" and flush blocked sometime(90s by default). during this period, memstore continues to grow and finally reached blocking size(128M*4) ...
turning on debug can see more messages. ps "hbase.hregion.majorcompaction"=0 cannot turn off major compaction. On Fri, Nov 21, 2014 at 12:55 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Louis: > See this thread: > http://search-hadoop.com/m/DHED4XrURi2 > > On Thu, Nov 20, 2014 at 7:33 PM, mail list <louis.hust...@gmail.com> > wrote: > > > If set the target, YCSB will sleep to control the flow, so it looks like > > the same as fewer threads. > > But I want to know with heavy write, why some region which exceeds the > > limit no flushed. > > Maybe like you said, it wait in a flush queue, i will try to repeat the > > scenario and lookup the > > flush queue. > > > > On Nov 21, 2014, at 1:44, Vladimir Rodionov <vladrodio...@gmail.com> > > wrote: > > > > > Could you please rerun your tests with -target ? > > > You should limit # of transactions per second and find the maximum your > > > cluster can sustain. > > > > > > -Vladimir Rodionov > > > > > > On Wed, Nov 19, 2014 at 10:53 PM, mail list <louis.hust...@gmail.com> > > wrote: > > > > > >> hi all, > > >> > > >> I build an HBASE test environment, with three PC server, with CHD > 5.1.0 > > >> > > >> pc1 pc2 pc3 > > >> > > >> pc1 and pc2 as HMASTER and hadoop namenode > > >> pc3 as RegionServer and datanode > > >> > > >> Then I create user as following: > > >> > > >> create 'usertable', 'family', {SPLITS => (1..100).map {|i| > > "user#{1000+i*(9999-1000)/100}"} } > > >> > > >> > > >> Using YCSB for load data as following: > > >> > > >> ./bin/ycsb load hbase -P workloads/workloadc -p > columnfamily=family > > >> -p recordcount=1000000000 -p threadcount=32 -s > result/workloadc > > >> > > >> > > >> But when after a while, the ycsb return with following error: > > >> > > >> 14/11/20 12:23:44 INFO client.AsyncProcess: #15, table=usertable, > > >> attempt=35/35 failed 715 ops, last exception: > > >> org.apache.hadoop.hbase.RegionTooBusyException: > > >> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, > > >> > > > regionName=usertable,user9099,1416453519676.2552d36eb407a8af12d2b58c973d68a9., > > >> server=l-hbase10.dba.cn1.qunar.com,60020,1416451280772, > > >> memstoreSize=536897120, blockingMemStoreSize=536870912 > > >> at > > >> > > > org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2822) > > >> at > > >> > > > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2234) > > >> at > > >> > > > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2201) > > >> at > > >> > > > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2205) > > >> at > > >> > > > org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4253) > > >> at > > >> > > > org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469) > > >> at > > >> > > > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359) > > >> at > > >> > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503) > > >> at > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) > > >> at > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) > > >> at java.lang.Thread.run(Thread.java:744) > > >> on l-hbase10.dba.cn1.qunar.com,60020,1416451280772, tracking started > > Thu > > >> Nov 20 12:15:07 CST 2014, retrying after 20051 ms, replay 715 ops. > > >> > > >> > > >> It seems the user9099 region is too busy, so I lookup the memstore > > metrics > > >> in web: > > >> > > >> > > >> > > >> As you see, the user9099 is bigger than other region, I think it is > > >> flushing, but after a while, it does not change to a small size and > YCSB > > >> quit finally. > > >> > > >> The region server is configured as below: > > >> > > >> Summary: HP DL380p Gen8, 1 x Xeon E5-2630 v2 2.60GHz, 126GB / 128GB > > >> 1600MHz DDR3 > > >> System: HP ProLiant DL380p Gen8 > > >> Processors: 1 (of 2) x Xeon E5-2630 v2 2.60GHz 100MHz FSB (HT > enabled, 6 > > >> cores, 24 threads) > > >> Memory: 126GB / 128GB 1600MHz DDR3 == 16 x 8GB, 8 x empty > > >> Disk: sda (scsi2): 450GB (1%) JBOD == 1 x HP-LOGICAL-VOLUME > > >> Disk: sdb (scsi2): 27TB (0%) JBOD == 1 x HP-LOGICAL-VOLUME > > >> Disk-Control: ata_piix0: Intel C600/X79 series chipset 4-Port SATA IDE > > >> Controller > > >> Disk-Control: hpsa0: Hewlett-Packard Company Smart Array Gen8 > > Controllers > > >> Disk-Control: shannon0: Device 1cb0:0275 > > >> Network: eth0 (tg3): Broadcom NetXtreme BCM5719 Gigabit PCIe, > > >> 40:a8:f0:23:55:fc, 1000Mb/s <full-duplex> > > >> Network: eth1 (tg3): Broadcom NetXtreme BCM5719 Gigabit PCIe, > > >> 40:a8:f0:23:55:fd, no carrier > > >> Network: eth2 (tg3): Broadcom NetXtreme BCM5719 Gigabit PCIe, > > >> 40:a8:f0:23:55:fe, no carrier > > >> Network: eth3 (tg3): Broadcom NetXtreme BCM5719 Gigabit PCIe, > > >> 40:a8:f0:23:55:ff, no carrier > > >> OS: CentOS 6.4 (Final), Linux 2.6.32-358.23.2.el6.x86_64 x86_64, > 64-bit > > >> BIOS: HP P70 02/10/2014 > > >> Hostname: l-hbase10.dba.cn1.qunar.com > > >> > > >> And i attach the hbase configuration file. > > >> > > >> > > >> I am new to HBase, any idea will be appreciated! > > >> > > >> > > >> > > > > >