Re: Strange HBase failure
Ok, thanks, we'll check it. 2015-01-12 11:28 GMT+03:00 Esteban Gutierrez : > Hi Serega, > > Do you have enough resources allocated for each VM? Just some swapping on > the VMs or the host can make things unstable. Also from the number of > services on each VM sounds like your host should have at least 12GB of free > RAM just for running things smoothly otherwise you might want to try with > less VMs and with some RAM each. > > cheers, > esteban. > > > > -- > Cloudera, Inc. > > > On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak > > wrote: > > > Hi, HBase was down during 08:25 to 09:15 > > I was looking into logs, and thinking. I've tried to find something more > > clever. than dummy restart. > > We are using Cloudera distro, each of daemons run in it's own JVM. > > I'll try to find CPU load logs. > > There is really low load, > > Finished memstore flush of ~7.7 K/7840, > > > > Flushed , sequenceid=229369, memsize=16.3 K > > > > > > Completed major compaction of 4 file(s) in CF of > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for > > store is 76.6 M > > > > > > See there are less than 100 MB of data for 3 VMs. It's nothing. > > > > > > > > 2015-01-12 6:38 GMT+03:00 Ted Yu : > > > > > Serega: > > > Was the snippet of log from NODE01 ? Looks like NODE01 may have been > > under > > > heavy load - considering the number of daemons running on that node. > > > > > > Please check GC log. > > > > > > Cheers > > > > > > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin > > wrote: > > > > > > > From the log I see no log was produced during 08:25 to 09:15, why did > > > this > > > > happen? > > > > > > > > 08:25:06.274INFOorg.apache. > > > > hadoop.hbase.regionserver.wal.HLog > > > > > > > > moving old hlog file > > > > > > > > > > > > > > /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > > > > whose highest sequenceid is 229359 to > > > > > > > > > > > > > > /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > > > > > > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > > > > > > > Regards, > > > > Shuai > > > > > > > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak < > > > serega.shey...@gmail.com> > > > > wrote: > > > > > > > > > Hi, I have PoC HBase cluster running on 3 VM > > > > > deployment schema is: > > > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server > > (ZK), > > > DN > > > > > NODE02 RegionServer, DN > > > > > NODE03 RegionServer, DN > > > > > > > > > > Suddenly ONLY HBase went offline, all services: HM RS > > > > > HDFS was working, no alerts were there > > > > > ZK server was working, no alerts there. > > > > > VMWare didn't publish any alerts. > > > > > Only restart of HBase service helped. > > > > > > > > > > We are using this: > > > > > > > > > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html > > > > > hbase-0.94.15+113 > > > > > > > > > > I made a deep dive into logs and found this stuff: > > > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > > > > > > > > > regionserver60020.periodicFlusher requesting flush for region > > > > > > > > > > > > > > > > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > > > > after a delay of 3026 > > > > > > > > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > > > > > Bloom filter type for > > > > > > > > > > > > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > > > > ROW, CompoundBloomFilterWriter > > > > > > > > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > > > > > Delete Family Bloom filter type for > > > > > > > > > > > > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > > > > CompoundBloomFilterWriter > > > > > > > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > > > > > General Bloom and NO DeleteFamily was added to HFile > > > > > > > > > > > > > > > > > > > > (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb) > > > > > > > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store > > > > > > > > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file > > > > > > > > > > > > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb > > > > > > > > > > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > > > > > > > > > Loaded ROW (CompoundBloomFilter) metadata for >
Re: Strange HBase failure
Hi Serega, Do you have enough resources allocated for each VM? Just some swapping on the VMs or the host can make things unstable. Also from the number of services on each VM sounds like your host should have at least 12GB of free RAM just for running things smoothly otherwise you might want to try with less VMs and with some RAM each. cheers, esteban. -- Cloudera, Inc. On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak wrote: > Hi, HBase was down during 08:25 to 09:15 > I was looking into logs, and thinking. I've tried to find something more > clever. than dummy restart. > We are using Cloudera distro, each of daemons run in it's own JVM. > I'll try to find CPU load logs. > There is really low load, > Finished memstore flush of ~7.7 K/7840, > > Flushed , sequenceid=229369, memsize=16.3 K > > > Completed major compaction of 4 file(s) in CF of > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for > store is 76.6 M > > > See there are less than 100 MB of data for 3 VMs. It's nothing. > > > > 2015-01-12 6:38 GMT+03:00 Ted Yu : > > > Serega: > > Was the snippet of log from NODE01 ? Looks like NODE01 may have been > under > > heavy load - considering the number of daemons running on that node. > > > > Please check GC log. > > > > Cheers > > > > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin > wrote: > > > > > From the log I see no log was produced during 08:25 to 09:15, why did > > this > > > happen? > > > > > > 08:25:06.274INFOorg.apache. > > > hadoop.hbase.regionserver.wal.HLog > > > > > > moving old hlog file > > > > > > > > > /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > > > whose highest sequenceid is 229359 to > > > > > > > > > /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > > > > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > > > > > Regards, > > > Shuai > > > > > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak < > > serega.shey...@gmail.com> > > > wrote: > > > > > > > Hi, I have PoC HBase cluster running on 3 VM > > > > deployment schema is: > > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server > (ZK), > > DN > > > > NODE02 RegionServer, DN > > > > NODE03 RegionServer, DN > > > > > > > > Suddenly ONLY HBase went offline, all services: HM RS > > > > HDFS was working, no alerts were there > > > > ZK server was working, no alerts there. > > > > VMWare didn't publish any alerts. > > > > Only restart of HBase service helped. > > > > > > > > We are using this: > > > > > > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html > > > > hbase-0.94.15+113 > > > > > > > > I made a deep dive into logs and found this stuff: > > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > > > > > > > regionserver60020.periodicFlusher requesting flush for region > > > > > > > > > > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > > > after a delay of 3026 > > > > > > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > > > Bloom filter type for > > > > > > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > > > ROW, CompoundBloomFilterWriter > > > > > > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > > > Delete Family Bloom filter type for > > > > > > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > > > CompoundBloomFilterWriter > > > > > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > > > General Bloom and NO DeleteFamily was added to HFile > > > > > > > > > > > > > > (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb) > > > > > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store > > > > > > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file > > > > > > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb > > > > > > > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > > > > > > > Loaded ROW (CompoundBloomFilter) metadata for > > > > 8e68424066dc4c02a60ca57ec98128fb > > > > > > > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > > > > > > > Loaded ROW (CompoundBloomFilter) metadata for > > > > 8e68424066dc4c02a60ca57ec98128fb > > > > > > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store > > > > > > > > Added > > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_do
Re: Strange HBase failure
Hi, HBase was down during 08:25 to 09:15 I was looking into logs, and thinking. I've tried to find something more clever. than dummy restart. We are using Cloudera distro, each of daemons run in it's own JVM. I'll try to find CPU load logs. There is really low load, Finished memstore flush of ~7.7 K/7840, Flushed , sequenceid=229369, memsize=16.3 K Completed major compaction of 4 file(s) in CF of epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for store is 76.6 M See there are less than 100 MB of data for 3 VMs. It's nothing. 2015-01-12 6:38 GMT+03:00 Ted Yu : > Serega: > Was the snippet of log from NODE01 ? Looks like NODE01 may have been under > heavy load - considering the number of daemons running on that node. > > Please check GC log. > > Cheers > > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin wrote: > > > From the log I see no log was produced during 08:25 to 09:15, why did > this > > happen? > > > > 08:25:06.274INFOorg.apache. > > hadoop.hbase.regionserver.wal.HLog > > > > moving old hlog file > > > > > /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > > whose highest sequenceid is 229359 to > > > > > /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > > > Regards, > > Shuai > > > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak < > serega.shey...@gmail.com> > > wrote: > > > > > Hi, I have PoC HBase cluster running on 3 VM > > > deployment schema is: > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), > DN > > > NODE02 RegionServer, DN > > > NODE03 RegionServer, DN > > > > > > Suddenly ONLY HBase went offline, all services: HM RS > > > HDFS was working, no alerts were there > > > ZK server was working, no alerts there. > > > VMWare didn't publish any alerts. > > > Only restart of HBase service helped. > > > > > > We are using this: > > > > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html > > > hbase-0.94.15+113 > > > > > > I made a deep dive into logs and found this stuff: > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > > > > > regionserver60020.periodicFlusher requesting flush for region > > > > > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > > after a delay of 3026 > > > > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > Bloom filter type for > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > > ROW, CompoundBloomFilterWriter > > > > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > Delete Family Bloom filter type for > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > > CompoundBloomFilterWriter > > > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > > > General Bloom and NO DeleteFamily was added to HFile > > > > > > > > > (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb) > > > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store > > > > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file > > > > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb > > > > > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > > > > > Loaded ROW (CompoundBloomFilter) metadata for > > > 8e68424066dc4c02a60ca57ec98128fb > > > > > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > > > > > Loaded ROW (CompoundBloomFilter) metadata for > > > 8e68424066dc4c02a60ca57ec98128fb > > > > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store > > > > > > Added > > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb, > > > entries=8, sequenceid=229362, filesize=2.7 K > > > > > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion > > > > > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region > > > > > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > > in 80ms, sequenceid=229362, compaction requested=true > > > > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion > > > > > > Starting compaction on CF in region > > > > > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > > > > > 08:15:55.077INFOo
Re: Strange HBase failure
Serega: Was the snippet of log from NODE01 ? Looks like NODE01 may have been under heavy load - considering the number of daemons running on that node. Please check GC log. Cheers On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin wrote: > From the log I see no log was produced during 08:25 to 09:15, why did this > happen? > > 08:25:06.274INFOorg.apache. > hadoop.hbase.regionserver.wal.HLog > > moving old hlog file > > /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > whose highest sequenceid is 229359 to > > /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > Regards, > Shuai > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak > wrote: > > > Hi, I have PoC HBase cluster running on 3 VM > > deployment schema is: > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN > > NODE02 RegionServer, DN > > NODE03 RegionServer, DN > > > > Suddenly ONLY HBase went offline, all services: HM RS > > HDFS was working, no alerts were there > > ZK server was working, no alerts there. > > VMWare didn't publish any alerts. > > Only restart of HBase service helped. > > > > We are using this: > > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html > > hbase-0.94.15+113 > > > > I made a deep dive into logs and found this stuff: > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > > > regionserver60020.periodicFlusher requesting flush for region > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > after a delay of 3026 > > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > Bloom filter type for > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > ROW, CompoundBloomFilterWriter > > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > Delete Family Bloom filter type for > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > > CompoundBloomFilterWriter > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > General Bloom and NO DeleteFamily was added to HFile > > > > > (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb) > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store > > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb > > > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > > > Loaded ROW (CompoundBloomFilter) metadata for > > 8e68424066dc4c02a60ca57ec98128fb > > > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > > > Loaded ROW (CompoundBloomFilter) metadata for > > 8e68424066dc4c02a60ca57ec98128fb > > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store > > > > Added > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb, > > entries=8, sequenceid=229362, filesize=2.7 K > > > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion > > > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > in 80ms, sequenceid=229362, compaction requested=true > > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion > > > > Starting compaction on CF in region > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store > > > > Starting compaction of 4 file(s) in CF of > > > > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > into > > > tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp, > > seqid=229362, totalSize=76.6 M > > > > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > Bloom filter type for > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2: > > ROW, CompoundBloomFilterWriter > > > > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > > > Delete Family Bloom filter type for > > > > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2: > > CompoundBloomFilterWriter > > > > 08:15:59.245INFOorg.apache.hadoop.hbase.reg
Re: Strange HBase failure
>From the log I see no log was produced during 08:25 to 09:15, why did this happen? 08:25:06.274INFOorg.apache. hadoop.hbase.regionserver.wal.HLog moving old hlog file /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 whose highest sequenceid is 229359 to /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer Regards, Shuai On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak wrote: > Hi, I have PoC HBase cluster running on 3 VM > deployment schema is: > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN > NODE02 RegionServer, DN > NODE03 RegionServer, DN > > Suddenly ONLY HBase went offline, all services: HM RS > HDFS was working, no alerts were there > ZK server was working, no alerts there. > VMWare didn't publish any alerts. > Only restart of HBase service helped. > > We are using this: > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html > hbase-0.94.15+113 > > I made a deep dive into logs and found this stuff: > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer > > regionserver60020.periodicFlusher requesting flush for region > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > after a delay of 3026 > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > Bloom filter type for > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > ROW, CompoundBloomFilterWriter > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > Delete Family Bloom filter type for > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb: > CompoundBloomFilterWriter > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > General Bloom and NO DeleteFamily was added to HFile > > (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb) > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > Loaded ROW (CompoundBloomFilter) metadata for > 8e68424066dc4c02a60ca57ec98128fb > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > Loaded ROW (CompoundBloomFilter) metadata for > 8e68424066dc4c02a60ca57ec98128fb > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store > > Added > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb, > entries=8, sequenceid=229362, filesize=2.7 K > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > in 80ms, sequenceid=229362, compaction requested=true > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion > > Starting compaction on CF in region > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store > > Starting compaction of 4 file(s) in CF of > > epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457. > into > tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp, > seqid=229362, totalSize=76.6 M > > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > Bloom filter type for > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2: > ROW, CompoundBloomFilterWriter > > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > Delete Family Bloom filter type for > > hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2: > CompoundBloomFilterWriter > > 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile > > General Bloom and NO DeleteFamily was added to HFile > > (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2) > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader > > Loaded ROW (CompoundBloomFilter) metadata for > 8bf8e92031834676b5d40b352120c5f2 > > 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store > > Renaming compacted file at > > hdfs://etp-hdfs-n1-sg.passp