Re: Strange HBase failure

2015-01-12 Thread Serega Sheypak
Ok, thanks, we'll check it.

2015-01-12 11:28 GMT+03:00 Esteban Gutierrez :

> Hi Serega,
>
> Do you have enough resources allocated for each VM? Just some swapping on
> the VMs or the host can make things unstable. Also from the number of
> services on each VM sounds like your host should have at least 12GB of free
> RAM just for running things smoothly otherwise you might want to try with
> less VMs and with some RAM each.
>
> cheers,
> esteban.
>
>
>
> --
> Cloudera, Inc.
>
>
> On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak  >
> wrote:
>
> > Hi, HBase was down during 08:25 to 09:15
> > I was looking into logs, and thinking. I've tried to find something more
> > clever. than dummy restart.
> > We are using Cloudera distro, each of daemons run in it's own JVM.
> > I'll try to find CPU load logs.
> > There is really low load,
> > Finished memstore flush of ~7.7 K/7840,
> >
> > Flushed , sequenceid=229369, memsize=16.3 K
> >
> >
> > Completed major compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> > store is 76.6 M
> >
> >
> > See there are less than 100 MB of data for 3 VMs. It's nothing.
> >
> >
> >
> > 2015-01-12 6:38 GMT+03:00 Ted Yu :
> >
> > > Serega:
> > > Was the snippet of log from NODE01 ? Looks like NODE01 may have been
> > under
> > > heavy load - considering the number of daemons running on that node.
> > >
> > > Please check GC log.
> > >
> > > Cheers
> > >
> > > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin 
> > wrote:
> > >
> > > > From the log I see no log was produced during 08:25 to 09:15, why did
> > > this
> > > > happen?
> > > >
> > > > 08:25:06.274INFOorg.apache.
> > > > hadoop.hbase.regionserver.wal.HLog
> > > >
> > > > moving old hlog file
> > > >
> > > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > > whose highest sequenceid is 229359 to
> > > >
> > > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > >
> > > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > Regards,
> > > > Shuai
> > > >
> > > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> > > serega.shey...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi, I have PoC HBase cluster running on 3 VM
> > > > > deployment schema is:
> > > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server
> > (ZK),
> > > DN
> > > > > NODE02 RegionServer, DN
> > > > > NODE03 RegionServer, DN
> > > > >
> > > > > Suddenly ONLY HBase went offline, all services: HM RS
> > > > > HDFS was working, no alerts were there
> > > > > ZK server was working, no alerts there.
> > > > > VMWare didn't publish any alerts.
> > > > > Only restart of HBase service helped.
> > > > >
> > > > > We are using this:
> > > > >
> > >
> http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > > > hbase-0.94.15+113
> > > > >
> > > > > I made a deep dive into logs and found this stuff:
> > > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > > >
> > > > > regionserver60020.periodicFlusher requesting flush for region
> > > > >
> > > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > > after a delay of 3026
> > > > >
> > > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > > ROW, CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > Delete Family Bloom filter type for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > > CompoundBloomFilterWriter
> > > > >
> > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > > >
> > > > > General Bloom and NO DeleteFamily was added to HFile
> > > > >
> > > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > > > >
> > > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > > > >
> > > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > > > >
> > > > >
> 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > > >
> > > > > Loaded ROW (CompoundBloomFilter) metadata for
> 

Re: Strange HBase failure

2015-01-12 Thread Esteban Gutierrez
Hi Serega,

Do you have enough resources allocated for each VM? Just some swapping on
the VMs or the host can make things unstable. Also from the number of
services on each VM sounds like your host should have at least 12GB of free
RAM just for running things smoothly otherwise you might want to try with
less VMs and with some RAM each.

cheers,
esteban.



--
Cloudera, Inc.


On Sun, Jan 11, 2015 at 11:55 PM, Serega Sheypak 
wrote:

> Hi, HBase was down during 08:25 to 09:15
> I was looking into logs, and thinking. I've tried to find something more
> clever. than dummy restart.
> We are using Cloudera distro, each of daemons run in it's own JVM.
> I'll try to find CPU load logs.
> There is really low load,
> Finished memstore flush of ~7.7 K/7840,
>
> Flushed , sequenceid=229369, memsize=16.3 K
>
>
> Completed major compaction of 4 file(s) in CF of
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
> store is 76.6 M
>
>
> See there are less than 100 MB of data for 3 VMs. It's nothing.
>
>
>
> 2015-01-12 6:38 GMT+03:00 Ted Yu :
>
> > Serega:
> > Was the snippet of log from NODE01 ? Looks like NODE01 may have been
> under
> > heavy load - considering the number of daemons running on that node.
> >
> > Please check GC log.
> >
> > Cheers
> >
> > On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin 
> wrote:
> >
> > > From the log I see no log was produced during 08:25 to 09:15, why did
> > this
> > > happen?
> > >
> > > 08:25:06.274INFOorg.apache.
> > > hadoop.hbase.regionserver.wal.HLog
> > >
> > > moving old hlog file
> > >
> > >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > > whose highest sequenceid is 229359 to
> > >
> > >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > >
> > > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > Regards,
> > > Shuai
> > >
> > > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> > serega.shey...@gmail.com>
> > > wrote:
> > >
> > > > Hi, I have PoC HBase cluster running on 3 VM
> > > > deployment schema is:
> > > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server
> (ZK),
> > DN
> > > > NODE02 RegionServer, DN
> > > > NODE03 RegionServer, DN
> > > >
> > > > Suddenly ONLY HBase went offline, all services: HM RS
> > > > HDFS was working, no alerts were there
> > > > ZK server was working, no alerts there.
> > > > VMWare didn't publish any alerts.
> > > > Only restart of HBase service helped.
> > > >
> > > > We are using this:
> > > >
> > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > > hbase-0.94.15+113
> > > >
> > > > I made a deep dive into logs and found this stuff:
> > > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > > >
> > > > regionserver60020.periodicFlusher requesting flush for region
> > > >
> > > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > > after a delay of 3026
> > > >
> > > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > ROW, CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > Delete Family Bloom filter type for
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > > CompoundBloomFilterWriter
> > > >
> > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > > >
> > > > General Bloom and NO DeleteFamily was added to HFile
> > > >
> > > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > > >
> > > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > > >
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > > >
> > > > Loaded ROW (CompoundBloomFilter) metadata for
> > > > 8e68424066dc4c02a60ca57ec98128fb
> > > >
> > > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> > > >
> > > > Added
> > > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_do

Re: Strange HBase failure

2015-01-11 Thread Serega Sheypak
Hi, HBase was down during 08:25 to 09:15
I was looking into logs, and thinking. I've tried to find something more
clever. than dummy restart.
We are using Cloudera distro, each of daemons run in it's own JVM.
I'll try to find CPU load logs.
There is really low load,
Finished memstore flush of ~7.7 K/7840,

Flushed , sequenceid=229369, memsize=16.3 K


Completed major compaction of 4 file(s) in CF of
epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
into 8bf8e92031834676b5d40b352120c5f2, size=76.6 M; total size for
store is 76.6 M


See there are less than 100 MB of data for 3 VMs. It's nothing.



2015-01-12 6:38 GMT+03:00 Ted Yu :

> Serega:
> Was the snippet of log from NODE01 ? Looks like NODE01 may have been under
> heavy load - considering the number of daemons running on that node.
>
> Please check GC log.
>
> Cheers
>
> On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin  wrote:
>
> > From the log I see no log was produced during 08:25 to 09:15, why did
> this
> > happen?
> >
> > 08:25:06.274INFOorg.apache.
> > hadoop.hbase.regionserver.wal.HLog
> >
> > moving old hlog file
> >
> >
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> > whose highest sequenceid is 229359 to
> >
> >
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> >
> > 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > Regards,
> > Shuai
> >
> > On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak <
> serega.shey...@gmail.com>
> > wrote:
> >
> > > Hi, I have PoC HBase cluster running on 3 VM
> > > deployment schema is:
> > > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK),
> DN
> > > NODE02 RegionServer, DN
> > > NODE03 RegionServer, DN
> > >
> > > Suddenly ONLY HBase went offline, all services: HM RS
> > > HDFS was working, no alerts were there
> > > ZK server was working, no alerts there.
> > > VMWare didn't publish any alerts.
> > > Only restart of HBase service helped.
> > >
> > > We are using this:
> > >
> http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > > hbase-0.94.15+113
> > >
> > > I made a deep dive into logs and found this stuff:
> > > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> > >
> > > regionserver60020.periodicFlusher requesting flush for region
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > after a delay of 3026
> > >
> > > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > ROW, CompoundBloomFilterWriter
> > >
> > > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > Delete Family Bloom filter type for
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > > CompoundBloomFilterWriter
> > >
> > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> > >
> > > General Bloom and NO DeleteFamily was added to HFile
> > >
> > >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> > >
> > > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> > >
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> > >
> > > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > 8e68424066dc4c02a60ca57ec98128fb
> > >
> > > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> > >
> > > Loaded ROW (CompoundBloomFilter) metadata for
> > > 8e68424066dc4c02a60ca57ec98128fb
> > >
> > > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> > >
> > > Added
> > >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > > entries=8, sequenceid=229362, filesize=2.7 K
> > >
> > > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > > in 80ms, sequenceid=229362, compaction requested=true
> > >
> > > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> > >
> > > Starting compaction on CF in region
> > >
> > >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > >
> > > 08:15:55.077INFOo

Re: Strange HBase failure

2015-01-11 Thread Ted Yu
Serega:
Was the snippet of log from NODE01 ? Looks like NODE01 may have been under
heavy load - considering the number of daemons running on that node.

Please check GC log.

Cheers

On Sun, Jan 11, 2015 at 6:57 PM, Shuai Lin  wrote:

> From the log I see no log was produced during 08:25 to 09:15, why did this
> happen?
>
> 08:25:06.274INFOorg.apache.
> hadoop.hbase.regionserver.wal.HLog
>
> moving old hlog file
>
> /hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
> whose highest sequenceid is 229359 to
>
> /hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
>
> 09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> Regards,
> Shuai
>
> On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak 
> wrote:
>
> > Hi, I have PoC HBase cluster running on 3 VM
> > deployment schema is:
> > NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN
> > NODE02 RegionServer, DN
> > NODE03 RegionServer, DN
> >
> > Suddenly ONLY HBase went offline, all services: HM RS
> > HDFS was working, no alerts were there
> > ZK server was working, no alerts there.
> > VMWare didn't publish any alerts.
> > Only restart of HBase service helped.
> >
> > We are using this:
> > http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> > hbase-0.94.15+113
> >
> > I made a deep dive into logs and found this stuff:
> > 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
> >
> > regionserver60020.periodicFlusher requesting flush for region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > after a delay of 3026
> >
> > 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > ROW, CompoundBloomFilterWriter
> >
> > 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> > CompoundBloomFilterWriter
> >
> > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > General Bloom and NO DeleteFamily was added to HFile
> >
> >
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
> >
> > 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
> >
> > Loaded ROW (CompoundBloomFilter) metadata for
> > 8e68424066dc4c02a60ca57ec98128fb
> >
> > 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Added
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> > entries=8, sequenceid=229362, filesize=2.7 K
> >
> > 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > in 80ms, sequenceid=229362, compaction requested=true
> >
> > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
> >
> > Starting compaction on CF in region
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> >
> > 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
> >
> > Starting compaction of 4 file(s) in CF of
> >
> >
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> > into
> >
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> > seqid=229362, totalSize=76.6 M
> >
> > 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > ROW, CompoundBloomFilterWriter
> >
> > 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
> >
> > Delete Family Bloom filter type for
> >
> >
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> > CompoundBloomFilterWriter
> >
> > 08:15:59.245INFOorg.apache.hadoop.hbase.reg

Re: Strange HBase failure

2015-01-11 Thread Shuai Lin
>From the log I see no log was produced during 08:25 to 09:15, why did this
happen?

08:25:06.274INFOorg.apache.
hadoop.hbase.regionserver.wal.HLog

moving old hlog file
/hbase/.logs/etp-hdfs-n1-sg.passport.local,60020,1414102905372/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020
whose highest sequenceid is 229359 to
/hbase/.oldlogs/etp-hdfs-n1-sg.passport.local%2C60020%2C1414102905372.1420856706020

09:15:52.020INFOorg.apache.hadoop.hbase.regionserver.HRegionServer

Regards,
Shuai

On Mon, Jan 12, 2015 at 3:47 AM, Serega Sheypak 
wrote:

> Hi, I have PoC HBase cluster running on 3 VM
> deployment schema is:
> NODE01 NN, SN, HMaster (HM), RegionServer (RS), Zookeeper server (ZK), DN
> NODE02 RegionServer, DN
> NODE03 RegionServer, DN
>
> Suddenly ONLY HBase went offline, all services: HM RS
> HDFS was working, no alerts were there
> ZK server was working, no alerts there.
> VMWare didn't publish any alerts.
> Only restart of HBase service helped.
>
> We are using this:
> http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html
> hbase-0.94.15+113
>
> I made a deep dive into logs and found this stuff:
> 08:15:51.968INFOorg.apache.hadoop.hbase.regionserver.HRegionServer
>
> regionserver60020.periodicFlusher requesting flush for region
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> after a delay of 3026
>
> 08:15:55.011INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> ROW, CompoundBloomFilterWriter
>
> 08:15:55.012INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Delete Family Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb:
> CompoundBloomFilterWriter
>
> 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> General Bloom and NO DeleteFamily was added to HFile
>
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb)
>
> 08:15:55.035INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Flushed , sequenceid=229362, memsize=7.7 K, into tmp file
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8e68424066dc4c02a60ca57ec98128fb
>
> 08:15:55.053INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> 8e68424066dc4c02a60ca57ec98128fb
>
> 08:15:55.072INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> 8e68424066dc4c02a60ca57ec98128fb
>
> 08:15:55.073INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Added
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/CF/8e68424066dc4c02a60ca57ec98128fb,
> entries=8, sequenceid=229362, filesize=2.7 K
>
> 08:15:55.076INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Finished memstore flush of ~7.7 K/7840, currentsize=0/0 for region
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> in 80ms, sequenceid=229362, compaction requested=true
>
> 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.HRegion
>
> Starting compaction on CF in region
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
>
> 08:15:55.077INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Starting compaction of 4 file(s) in CF of
>
> epd_documents,403ded58-45fa-4526-ae5f-da69683bc620,1418822716508.f2cca08a8628d1660a4143f4383a5457.
> into
> tmpdir=hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp,
> seqid=229362, totalSize=76.6 M
>
> 08:15:55.096INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> ROW, CompoundBloomFilterWriter
>
> 08:15:55.097INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> Delete Family Bloom filter type for
>
> hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2:
> CompoundBloomFilterWriter
>
> 08:15:59.245INFOorg.apache.hadoop.hbase.regionserver.StoreFile
>
> General Bloom and NO DeleteFamily was added to HFile
>
> (hdfs://etp-hdfs-n1-sg.passport.local:8020/hbase/epd_documents/f2cca08a8628d1660a4143f4383a5457/.tmp/8bf8e92031834676b5d40b352120c5f2)
>
> 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.StoreFile$Reader
>
> Loaded ROW (CompoundBloomFilter) metadata for
> 8bf8e92031834676b5d40b352120c5f2
>
> 08:15:59.255INFOorg.apache.hadoop.hbase.regionserver.Store
>
> Renaming compacted file at
>
> hdfs://etp-hdfs-n1-sg.passp