Re: StochasticLoadBalancer contra-productive

2024-08-29 Thread Wellington Chevreuil
The balancer logs shared suggest it's deciding to move regions because of
the following factors:

   - Data locality (hdfs blocks for the regions files)
   - Read/Write load
   - Memstore size/utilisation

So you need to look into those stats. It could be that the cluster
is under a "hotspot" situation, where a subset of your regions handle most
of the requests.


Em qui., 29 de ago. de 2024 às 21:22, Frens Jan Rumph
 escreveu:

> Dear HBase users/devs!
>
>
> *Summary*
>
> After a node outage, the HBase balancer was switched off. When turning it
> on later again, the StochasticLoadBalancer increased/created the region
> count skew. Which, given the mostly default configuration is unexpected.
> Any help is much appreciated!
>
>
> *Details:*
>
> I’m fighting an issue with HBase 2.5.7 on a 11 node cluster with ~15.000
> regions from ~1000 tables. I’m hoping that someone has a pointer.
>
> *Incident -> turned balancer off*
>
> We’ve recently lost one of the nodes and ran into severe data imbalance
> issues at the level of the HDFS disks while the cluster was ‘only’ 80%
> full. Some nodes were filling up to over 98%, causing YARN to take these
> nodes out of rotation. We were unable to identify the cause of this
> imbalance. In an attempt to mitigate this, the HBase region balancer was
> disabled.
>
> *Manually under control -> turned balancer on again*
>
> Two region servers had a hard restart after the initial incident, so
> regions were reassigned, but not yet balanced. I didn’t dare turn on the
> balancer right away, fearing to get back into the situation of imbalanced
> disk usage. So regions were manually (with some scripting) re-assigned to
> get back to a balanced situation with ~1500 regions per node; in a naive
> way, similar to the SimpleLoadBalancer.
>
> We’ve got the disk usage fairly balanced right now. So I turned the
> balancer back on.
>
> *Region count skew increased*
>
> However, it started moving regions away from a few nodes quite
> aggressively. Every run it moved 2000 to 4000 regions, expecting a cost
> decrease. But then at the next run, the initial computed cost was higher
> than before. I gave the balancer some rounds, but stopped it as some
> servers had only ~400 regions and others were responsible for 2000+
> regions. Above this limit, splits are prevented.
>
> This chart shows the effect of switching the balancer on from ~09:30, I
> stopped to at ~11:30:
>
> [image: Screenshot 2024-08-29 at 22.04.58.png]
>
>
> Some (formatted) example logging from the Balancer chore:
>
> 2024-08-28 09:57:54,678 INFO  [master/m1:16000.Chore.5] 
> balancer.StochasticLoadBalancer: ...
> Going from a computed imbalance of 1.4793890584018785 to a new imbalance 
> of 0.69336982505148. funtionCost=
> RegionCountSkewCostFunction : (multiplier=500.0, 
> imbalance=0.004313540707257566);
> PrimaryRegionCountSkewCostFunction : (not needed);
> MoveCostFunction : (multiplier=7.0, imbalance=0.1888262494457465, need 
> balance);
> ServerLocalityCostFunction : (multiplier=25.0, 
> imbalance=0.39761170318154926, need balance);
> RackLocalityCostFunction : (multiplier=15.0, imbalance=0.0);
> TableSkewCostFunction : (multiplier=35.0, imbalance=11.404401695266312, 
> need balance);
> RegionReplicaHostCostFunction : (not needed);
> RegionReplicaRackCostFunction : (not needed);
> ReadRequestCostFunction : (multiplier=5.0, 
> imbalance=0.028254565577063396, need balance);
> WriteRequestCostFunction : (multiplier=5.0, imbalance=0.7593874996431397, 
> need balance);
> MemStoreSizeCostFunction : (multiplier=5.0, 
> imbalance=0.16192309175499753, need balance);
> StoreFileCostFunction : (multiplier=5.0, imbalance=0.01758057650125178);
>
> ...
>
> 2024-08-28 10:26:34,946 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=63,queue=3,port=16000] 
> balancer.StochasticLoadBalancer: ...
> Going from a computed imbalance of 1.5853428527425468 to a new imbalance 
> of 0.6737463520617091. funtionCost=
> RegionCountSkewCostFunction : (multiplier=500.0, 
> imbalance=0.023543776971639504);
> PrimaryRegionCountSkewCostFunction : (not needed);
> MoveCostFunction : (multiplier=7.0, imbalance=0.20349610488314648, need 
> balance);
> ServerLocalityCostFunction : (multiplier=25.0, 
> imbalance=0.41889718087643735, need balance);
> RackLocalityCostFunction : (multiplier=15.0, imbalance=0.0);
> TableSkewCostFunction : (multiplier=35.0, imbalance=10.849642781445127, 
> need balance);
> RegionReplicaHostCostFunction : (not needed);
> RegionReplicaRackCostFunction : (not needed);
> ReadRequestCostFunction : (multiplier=5.0, imbalance=0.02832763401695891, 
> need balance);
> WriteRequestCostFunction : (multiplier=5.0, imbalance=0.2960273848432453, 
> need balance);
> MemStoreSizeCostFunction : (multiplier=5.0, 
> imbalance=0.08973896446650413, need balance);
> StoreFileCostFunction : (multiplier=5.0, imbalance=0.02370918640463713);
>
>
> The balancer 

Re: Question regards sudden drop of compactionQueueLength metric

2024-05-20 Thread Wellington Chevreuil
By the description, it seems compaction of a specific subset of regions is
taking a long time to complete, filling all the compaction threads whilst
all other compaction requests are queued waiting for these long ones to
complete. Once these long compassions are finished, the queued ones are
processed very quickly. I'm not sure about hbase 1.1.3 (this is quite old,
btw), but newer versions do log the total time and size of compacted data
at the end of each compaction. The only other way compaction queues would
be cleaned out is if RSes got restarted.

Em dom., 19 de mai. de 2024 às 22:30, Rural Hunter 
escreveu:

> Hi,
>
> We are experiencing periodic slow response issue. We investigated the issue
> and found it's related to hfile compaction. The slow down happens when
> there are many compaction activities in log. So we tuned some compaction
> parameters and also started to monitor the metric: compactionQueueLength.
> When the slow response happens, we can see the compactionQueueLength keeps
> increasing. In the log there is one item of major compaction completion
> every several minutes. One interesting finding is that
> compactionQueueLength
> keeps increasing to more than 1000 or even 3000 on some servers until at
> some point it drops to 0 suddenly, like it is it cleared by someone. There
> is nothing special in the log at the time and after that there is not much
> compaction activity.  I searched the doc and web but couldn't find any
> explanation for that. Can anyone explain what happened? Thanks in advance.
> btw, our hbase version is 1.1.3
>


Re: Re: INDEX_BLOCK_ENCODING=> PREFIX_TREE cannot be used properly

2024-03-26 Thread Wellington Chevreuil
That implementation is still incomplete, and PREFIX_TREE for index block
encoding is still unavailable.

Em ter., 26 de mar. de 2024 às 10:59, Bryan Beaudreault <
bbeaudrea...@apache.org> escreveu:

> INDEX_BLOCK_ENCODING is a new feature, but just the configuration exists.
> No actual encodings have been committed. Development on the PR stalled. See
> https://github.com/apache/hbase/pull/4782
>
> It would be great if someone picked up this work again.
>
> On Tue, Mar 26, 2024 at 6:05 AM lisoda  wrote:
>
> > No. INDEX_BLOCK_ENCODING is a new feature introduced by HBASE-27329 .
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 在 2024-03-26 17:59:50,"Wellington Chevreuil" <
> > wellington.chevre...@gmail.com> 写道:
> > >PREFIX_TREE encoding support has been removed in HBase 2. Please see:
> > >https://hbase.apache.org/book.html#upgrade2.0.prefix-tree.removed
> > >
> > >Em ter., 26 de mar. de 2024 às 09:39, lisoda 
> escreveu:
> > >
> > >> HI.
> > >>
> > >> I am testing HBase version 2.5.8. I found that I can't use
> > >> INDEX_BLOCK_ENCODING=> PREFIX_TREE properly. Can anyone help me with
> > this
> > >> problem?
> > >>
> > >>
> > >> HBase create table:
> > >> create 'trade_test_02_t', {NAME => 'cf', VERSIONS => 1, COMPRESSION =>
> > >> 'ZSTD', PREFETCH_BLOCKS_ON_OPEN => 'false', DATA_BLOCK_ENCODING =>
> > >> 'ROW_INDEX_V1',IN_MEMORY_COMPACTION => 'ADAPTIVE',BLOCKCACHE =>
> 'true',
> > >> METADATA => {'COMPRESSION_COMPACT' =>
> > >> 'ZSTD','INDEX_BLOCK_ENCODING'=>'PREFIX_TREE'}}, { NUMREGIONS => 200,
> > >> SPLITALGO => 'HexStringSplit'}
> > >>
> > >>
> > >>
> > >>
> > >> Error:
> > >> 2024-03-26 17:20:00,324 ERROR [MemStoreFlusher.0]
> > >> regionserver.StoreEngine: Failed to open store file :
> > >>
> >
> hdfs://spacex-dc-hbase/apps/hbase/data/data/default/trade_test_02_t/f392e6fe841adbc338a29fa3481d5f85/.tmp/cf/cac2fcabf39a4097b3e3ab99d894a803,
> > >> keeping it in tmp location
> > >> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem
> reading
> > >> data index and meta index from file
> > >>
> >
> hdfs://spacex-dc-hbase/apps/hbase/data/data/default/trade_test_02_t/f392e6fe841adbc338a29fa3481d5f85/.tmp/cf/cac2fcabf39a4097b3e3ab99d894a803
> > >> at org.apache.hadoop.hbase.io
> > >> .hfile.HFileInfo.initMetaAndIndex(HFileInfo.java:392)
> > >> at
> > >>
> > org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:394)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:518)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:225)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:218)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.StoreEngine.validateStoreFile(StoreEngine.java:237)
> > >> at
> > >>
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1963)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2840)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2582)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2554)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2424)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:603)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:572)
> > >> at
> > 

Re: INDEX_BLOCK_ENCODING=> PREFIX_TREE cannot be used properly

2024-03-26 Thread Wellington Chevreuil
PREFIX_TREE encoding support has been removed in HBase 2. Please see:
https://hbase.apache.org/book.html#upgrade2.0.prefix-tree.removed

Em ter., 26 de mar. de 2024 às 09:39, lisoda  escreveu:

> HI.
>
> I am testing HBase version 2.5.8. I found that I can't use
> INDEX_BLOCK_ENCODING=> PREFIX_TREE properly. Can anyone help me with this
> problem?
>
>
> HBase create table:
> create 'trade_test_02_t', {NAME => 'cf', VERSIONS => 1, COMPRESSION =>
> 'ZSTD', PREFETCH_BLOCKS_ON_OPEN => 'false', DATA_BLOCK_ENCODING =>
> 'ROW_INDEX_V1',IN_MEMORY_COMPACTION => 'ADAPTIVE',BLOCKCACHE => 'true',
> METADATA => {'COMPRESSION_COMPACT' =>
> 'ZSTD','INDEX_BLOCK_ENCODING'=>'PREFIX_TREE'}}, { NUMREGIONS => 200,
> SPLITALGO => 'HexStringSplit'}
>
>
>
>
> Error:
> 2024-03-26 17:20:00,324 ERROR [MemStoreFlusher.0]
> regionserver.StoreEngine: Failed to open store file :
> hdfs://spacex-dc-hbase/apps/hbase/data/data/default/trade_test_02_t/f392e6fe841adbc338a29fa3481d5f85/.tmp/cf/cac2fcabf39a4097b3e3ab99d894a803,
> keeping it in tmp location
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading
> data index and meta index from file
> hdfs://spacex-dc-hbase/apps/hbase/data/data/default/trade_test_02_t/f392e6fe841adbc338a29fa3481d5f85/.tmp/cf/cac2fcabf39a4097b3e3ab99d894a803
> at org.apache.hadoop.hbase.io
> .hfile.HFileInfo.initMetaAndIndex(HFileInfo.java:392)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:394)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:518)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:225)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:218)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.validateStoreFile(StoreEngine.java:237)
> at
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1963)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2840)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2582)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2554)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2424)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:603)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:572)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:65)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:344)
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hbase.io
> .hfile.HFileBlockIndex$CellBasedKeyBlockIndexReaderV2.readMultiLevelIndexRoot(HFileBlockIndex.java:553)
> at org.apache.hadoop.hbase.io
> .hfile.HFileInfo.initMetaAndIndex(HFileInfo.java:373)
> ... 15 more
> 2024-03-26 17:20:00,329 WARN  [MemStoreFlusher.0] regionserver.HStore:
> Failed validating store file
> hdfs://spacex-dc-hbase/apps/hbase/data/data/default/trade_test_02_t/f392e6fe841adbc338a29fa3481d5f85/.tmp/cf/cac2fcabf39a4097b3e3ab99d894a803,
> retrying num=9
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading
> data index and meta index from file
> hdfs://spacex-dc-hbase/apps/hbase/data/data/default/trade_test_02_t/f392e6fe841adbc338a29fa3481d5f85/.tmp/cf/cac2fcabf39a4097b3e3ab99d894a803
> at org.apache.hadoop.hbase.io
> .hfile.HFileInfo.initMetaAndIndex(HFileInfo.java:392)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:394)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:518)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:225)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:218)
> at
> org.apache.hadoop.hbase.regionserver.StoreEngine.validateStoreFile(StoreEngine.java:237)
> at
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1963)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2840)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2582)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2554)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2424)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.

Re: How to do wire encryption without using kerberos.

2022-01-25 Thread Wellington Chevreuil
AFAIK, the answer is no, currently, there's no way to encrypt rpc requests
if not using kerberos. There were some discussions about it in HBASE-26548
 and on this slack thread
. The
patch in HBASE-26548 is a very rough PoC taking advantage of netty TLS
capabilities, and hardcoding it in the netty RPC layer. Can be used as a
reference, or refined further to a point where it would be merge ready, if
you are interested in investing some effort on this.


Re: Regions per regionserver

2021-09-29 Thread Wellington Chevreuil
>
> My query and interest centres around how it happens that two HBASE setups
> with a big discrepancy in the number of nodes can end up with regions in
> the 400-500 range.
>

It depends on factors like (the total size of you data / # of RSes),
SplitPolicy configured (as Sergey mentioned previously) and mostly,
hbase.hregion.max.filesize config value (the lower the value, regions split
sooner, contributing to more regions in general).

Em qua., 29 de set. de 2021 às 19:03, Sergey Soldatov <
sergeysolda...@gmail.com> escreveu:

> Hey Marc,
> Possibly that happens because of your split policy which relies on the
> memstore flush size and the number of regions for the table hosted by the
> particular region server. That would lead that the first cluster would have
> more regions of smaller size.
>
> Thanks,
> Sergey
>
> On Wed, Sep 29, 2021 at 8:41 AM Marc Hoppins 
> wrote:
>
> > Hi all,
> >
> > I would guess that this topic has probably been raised umpteen times in
> > the past, so I apologise in advance if some of you are miffed.  I am not
> a
> > 'big data' database person so all this tuning mucky-muck has me a bit
> > confused.
> >
> > We currently have two clusters:
> >
> > Cluster 1
> > 76 regionservers, each with 7TB of available HDFS and 64GB of RAM
> > Approx 424 regions per RS
> >
> > Config
> > hbase.client.write.buffer = 4MB
> > hbase.regionserver.handler.count = 30
> > hbase.hregion.memstore.flush.size = 128MB
> > hbase.hregion.memstore.block.multiplier = 2
> > hbase.hregion.max.filesize = 10GB
> >
> > Cluster 2
> > 10 regionservers, each with 7TB of available HDFS and (min) 128GB of RAM
> > Approx.. 483 regions per RS
> >
> > Config:
> >
> > hbase.client.write.buffer = 2MB
> > hbase.regionserver.handler.count = 100
> > hbase.hregion.memstore.flush.size = 1GB
> > hbase.hregion.memstore.block.multiplier = 32
> > hbase.hregion.max.filesize = 6GB
> >
> > The number of regions per region server seems to be approximately
> > consistent given the number and configuration of regionservers and also
> > despite the difference in configuration.  To begin with, the two clusters
> > (cloudera) were setup using defaults but cluster 2 has been recently
> > altered as the main entity using it complained of "too many regions".
> >
> > My query and interest centres around how it happens that two HBASE setups
> > with a big discrepancy in the number of nodes can end up with regions in
> > the 400-500 range.
> >
> > Yours, in ignorance
> >
> > Marc
> >
>


Re: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

2021-05-20 Thread Wellington Chevreuil
Yes, replication interfaces are compatible between these two major
versions.

So I created two clusters in AWS and tried enable replication between HBase
> 1.4.13 and 2.2.5. But have got error "table exists but descriptors are not
> the same" (I will put screenshot in the attachment but not sure it will
> work here).
>
Can you describe in detail which steps (commands/configurations) have you
executed to get the mentioned error? After which exact action have you seen
this message? And are the tables schema identical in both clusters?

Em qua., 19 de mai. de 2021 às 21:00, Sergey Semenoff <
box4semen...@gmail.com> escreveu:

> We are thinking about simulator issue. Our clusters much less - 4 by 100
> RS however we need process data continuously too. So I created two clusters
> in AWS and tried enable replication between HBase 1.4.13 and 2.2.5. But
> have got error "table exists but descriptors are not the same" (I will put
> screenshot in the attachment but not sure it will work here).
>
> I have some ideas how to make upgrade by another way and would glad to
> discuss it with you. So you could write me at box4semen...@gmail.com to
> dig to details.
>
> ср, 19 мая 2021 г., 15:50 Bryan Beaudreault
> :
>
>> We are running about 40 HBase clusters, with over 5000 regionservers
>> total.
>> These are all running cdh5.16.2. We also have thousands of clients (from
>> APIs to kafka workers to hadoop jobs, etc) hitting these various clusters,
>> also running cdh5.16.2.
>>
>> We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read
>> through the docs on https://hbase.apache.org/book.html#_upgrade_paths,
>> and
>> am starting to plan our approach. More than a few seconds of downtime is
>> not an option, but rolling upgrade also seems risky (if not impossible for
>> our version).
>>
>> One thought I had is whether replication is compatible between these two
>> versions. If so, we probably would consider swapping onto upgraded
>> clusters
>> using backup/restore + replication. If we were to go this route we'd
>> probably want to consider bi-directional replication so that we can roll
>> back to the old cluster if there's a regression.
>>
>> Does anyone have any experience with this approach? Is replication
>> protocol
>> compatible across the seversions? Any concerns, tips or other
>> considerations to keep in mind? We do the backup/restore + replication
>> approach pretty regularly to move tables between clusters.
>>
>> Thanks!
>>
>


Re: HBASE WALs

2021-03-23 Thread Wellington Chevreuil
>
> I am still not certain what will happen.  masterProcWALs contain info for
> all (running) tables, yes?
>
masterProcWALs only contain info for running procedures, not user table
data. User table data go on "normal" WALs, not "masterProcWALs".

 If all tables are disabled and I remove the master wals, how will that
> affect the other tables? When I disabled all tables, hundreds of master
> WALs are now created. This means there is a bunch of pending operations,
> yes?  Is it going to make some other things inconsistent?

Table disabling involves the unassignment of all these tables regions. Each
of these "unassign" operations comprise a set of sequential phases. These
internal operations are called "procedures". Information about the progress
of these operations as it progresses through its different phases are
stored in these masterProcWALs files. That's why triggering the  "disable"
command will create some data under masterProcWALs. If all the disable
commands finished successfully, and all your procedures are finished (apart
from that rogue one existing for while already), you would be good to clean
out masterProcWALs.

I did try to set the table state manually to see if the faulty table would
> fire up and I restarted hbase...state was the same a locked table state due
> to pending disable and stuck region.
>
That's because of the rogue procedure. When you restarted master, it went
through masterProcWals and resumed the rogue procedure from the unfinished
state it was when you restarted hbase. If you had removed masterProcWALs
prior to restart, the rogue procedure would now be gone.

We may have the go-ahead to remove this table - I assume we cannot clone it
> while it is in a state of (DISABLED) flux but, once again, messing with
> master WALs has me on edge.

>From what I understand, you already have the tables disabled, and no
unfinished procs apart from the rogue one, so just clean out masterProcWALs
and restart master.

Em ter., 23 de mar. de 2021 às 11:13, Marc Hoppins 
escreveu:

> I am still not certain what will happen.  masterProcWALs contain info for
> all (running) tables, yes?
>
> If all tables are disabled and I remove the master wals, how will that
> affect the other tables? When I disabled all tables, hundreds of master
> WALs are now created. This means there is a bunch of pending operations,
> yes?  Is it going to make some other things inconsistent?
>
> I did try to set the table state manually to see if the faulty table would
> fire up and I restarted hbase...state was the same a locked table state due
> to pending disable and stuck region.
>
> We may have the go-ahead to remove this table - I assume we cannot clone
> it while it is in a state of (DISABLED) flux but, once again, messing with
> master WALs has me on edge.
>
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Tuesday, March 16, 2021 4:50 PM
> To: Hbase-User 
> Subject: Re: HBASE WALs
>
> EXTERNAL
>
> >
> > To be clear, if the other tables are stopped, I assume all pending and
> > current operations will finish. How long will it take to write all
> > data - if indeed the data does get permanently written - so that we
> > can safely remove WALs?
> >
> If by "tables stopped" you mean your tables are disabled, then yeah, all
> related data would already have been flushed into hfiles and wouldn't be on
> your wals. But please be aware that what you really need here to get rid of
> the rogue proc is to remove master proc wals, not normal wals.
>
> Em ter., 16 de mar. de 2021 às 07:12, Marc Hoppins 
> escreveu:
>
> > Overall, I am mystified as to how this could happen.  If Hadoop has a
> > replication factor (I believe we use the default) of 3 and we have two
> > datacenters with masters and workers in both, how can a network outage
> > affect Hadoop operation? Surely it should have used available
> > resources to continue operations...or have I misinterpreted entirely?
> >
> > -Original Message-
> > From: Stack 
> > Sent: Tuesday, March 16, 2021 7:16 AM
> > To: Hbase-User 
> > Subject: Re: HBASE WALs
> >
> > EXTERNAL
> >
> > On Fri, Mar 12, 2021 at 2:17 AM Marc Hoppins 
> wrote:
> >
> > > Hi, all,
> > >
> > > For our stuck region, this exists in meta.  Could we alter the state
> > > to CLOSED (maybe via intermediate OPEN, CLOSING, CLOSED)?
> > >
> > > You could but IIRC, in that version of HBase, you may need to
> > > restart the
> > Master after the change (changing hbase:meta does not update the
> > Master's in-memory state). On restart, Master will read hbase:meta to
>

Re: HBASE WALs

2021-03-16 Thread Wellington Chevreuil
>
> To be clear, if the other tables are stopped, I assume all pending and
> current operations will finish. How long will it take to write all data -
> if indeed the data does get permanently written - so that we can safely
> remove WALs?
>
If by "tables stopped" you mean your tables are disabled, then yeah, all
related data would already have been flushed into hfiles and wouldn't be on
your wals. But please be aware that what you really need here to get rid of
the rogue proc is to remove master proc wals, not normal wals.

Em ter., 16 de mar. de 2021 às 07:12, Marc Hoppins 
escreveu:

> Overall, I am mystified as to how this could happen.  If Hadoop has a
> replication factor (I believe we use the default) of 3 and we have two
> datacenters with masters and workers in both, how can a network outage
> affect Hadoop operation? Surely it should have used available resources to
> continue operations...or have I misinterpreted entirely?
>
> -Original Message-
> From: Stack 
> Sent: Tuesday, March 16, 2021 7:16 AM
> To: Hbase-User 
> Subject: Re: HBASE WALs
>
> EXTERNAL
>
> On Fri, Mar 12, 2021 at 2:17 AM Marc Hoppins  wrote:
>
> > Hi, all,
> >
> > For our stuck region, this exists in meta.  Could we alter the state
> > to CLOSED (maybe via intermediate OPEN, CLOSING, CLOSED)?
> >
> > You could but IIRC, in that version of HBase, you may need to restart
> > the
> Master after the change (changing hbase:meta does not update the Master's
> in-memory state). On restart, Master will read hbase:meta to discover
> Region state.
>
> S
>
>
> > hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec.
> > column=info:regioninfo, timestamp=1613580024017, value={ENCODED =>
> > f25fe93e24b34cb2f7fffddee1d89eec, NAME =>
> > 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec.',
> > STARTKEY => 'BDFFEEF', ENDKEY => 'BEAA821D2'}
> > hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec.
> > column=info:seqnumDuringOpen, timestamp=1611787189839,
> > value=\x00\x00\x00\x00\x00\x00\x04\x8F
> >  hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec.
> > column=info:server, timestamp=1611787189839, value=
> > dr1-hbase18.jumbo.hq.eset.com:16020
> >  hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec.
> > column=info:serverstartcode, timestamp=1611787189839,
> > value=1611785264032
> hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec.
> > column=info:sn, timestamp=1613580024017, value=
> > ba-hbase25.jumbo.hq.eset.com,16020,1604475904456
> >  hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec.
> > column=info:state, timestamp=1613580024017, value=OPENING
> >
> > -Original Message-
> > From: Wellington Chevreuil 
> > Sent: Wednesday, March 10, 2021 10:56 AM
> > To: Hbase-User 
> > Subject: Re: HBASE WALs
> >
> > EXTERNAL
> >
> > >
> > > Sorry if I seem stupid but this is still all new to me.
> > >
> > Forgot to mention, there's no stupid questions here. Don't be shy and
> > keep'em coming.
> >
> > Em qua., 10 de mar. de 2021 às 09:48, Wellington Chevreuil <
> > wellington.chevre...@gmail.com> escreveu:
> >
> > > However, how would that help anyway?  If we cannot fix this at this
> > > time
> > >> then any upgrade would have inconsistencies also, yes?
> > >>
> > > The upgrade on it's own wouldn't fix existing inconsistencies, but
> > > you would now have support for additional tooling
> > > (hbase-operators-tool) to help you with this.
> > >
> > > As all the 'SUCCESS' procedures have a parent ID 73587, does this
> > > mean
> > >> that they were successfully and fully moved from hbase25 to each
> > >> server mentioned in that procedure?  Or does it just mean that the
> > >> region was successfully unassigned from hbase25 but the data still
> > >> resides on hbase25?  I see locality 0.
> > >>
> > > IIRC, those were all UnassignProcedures, so it means the
> > > unassignment of the related region has completed and the region for
> > > that particular procedure went offline.
> > >
> > > If we change the table state in meta to 'ENABLED', could this
> > > kickstart
> > >> all these things or will it just lead to further problems?
> > >
> > > Masters work with its own memory cache of meta, so manually updating
> > > it will just make mas

Re: HBASE WALs

2021-03-10 Thread Wellington Chevreuil
>
> Sorry if I seem stupid but this is still all new to me.
>
Forgot to mention, there's no stupid questions here. Don't be shy and
keep'em coming.

Em qua., 10 de mar. de 2021 às 09:48, Wellington Chevreuil <
wellington.chevre...@gmail.com> escreveu:

> However, how would that help anyway?  If we cannot fix this at this time
>> then any upgrade would have inconsistencies also, yes?
>>
> The upgrade on it's own wouldn't fix existing inconsistencies, but you
> would now have support for additional tooling (hbase-operators-tool)  to
> help you with this.
>
> As all the 'SUCCESS' procedures have a parent ID 73587, does this mean
>> that they were successfully and fully moved from hbase25 to each server
>> mentioned in that procedure?  Or does it just mean that the region was
>> successfully unassigned from hbase25 but the data still resides on
>> hbase25?  I see locality 0.
>>
> IIRC, those were all UnassignProcedures, so it means the unassignment of
> the related region has completed and the region for that particular
> procedure went offline.
>
> If we change the table state in meta to 'ENABLED', could this kickstart
>> all these things or will it just lead to further problems?
>
> Masters work with its own memory cache of meta, so manually updating it
> will just make masters cache inconsistent with meta. You would need to
> restart masters to get its cache reloaded from master. The main problem is
> that you still have the rogue procedures, which you can't get rid of
> without stopping the cluster. One alternative to a full cluster outage
> would be to identify all RSes running the rogue procs (you can find that
> from active master logs), then stop only those and master, clean
> masterprocwals, then start it again.
>
>
>> I suppose it means I am asking, the 73587 DisableTableProcedure, does it
>> mean that the table is waiting to be disabled?  HBASE master declares that
>> table is NOT enabled.
>>
> The table state may have been already updated to disabled, most of its
> regions may already be offline, but the 73587 DisableTableProcedure cannot
> be considered "done" until all its sub procedures are indeed completed.
>
>
> Em ter., 9 de mar. de 2021 às 13:40, Marc Hoppins 
> escreveu:
>
>> Thanks for that.
>>
>> Alas, we are (currently) constrained by using Cloudera (CDH) 6.3.1 and do
>> not have a viable business use to pay the extortionate amount of money
>> required to upgrade.  Which would give these cluster access to newer
>> versions.
>>
>> However, how would that help anyway?  If we cannot fix this at this time
>> then any upgrade would have inconsistencies also, yes?
>>
>> As all the 'SUCCESS' procedures have a parent ID 73587, does this mean
>> that they were successfully and fully moved from hbase25 to each server
>> mentioned in that procedure?  Or does it just mean that the region was
>> successfully unassigned from hbase25 but the data still resides on
>> hbase25?  I see locality 0.
>>
>> If we change the table state in meta to 'ENABLED', could this kickstart
>> all these things or will it just lead to further problems?  I suppose it
>> means I am asking, the 73587 DisableTableProcedure, does it mean that the
>> table is waiting to be disabled?  HBASE master declares that table is NOT
>> enabled.
>>
>> Sorry if I seem stupid but this is still all new to me.
>>
>> I appreciate the help.
>>
>> -Original Message-
>> From: Wellington Chevreuil 
>> Sent: Tuesday, March 9, 2021 1:20 PM
>> To: Hbase-User 
>> Subject: Re: HBASE WALs
>>
>> EXTERNAL
>>
>> >
>> > All fails are waiting on the same PID (73587), a DISABLE TABLE
>> procedure.
>> > The offending region (f25fe93e24b34cb2f7fffddee1d89eec) seems to be
>> > the problem.
>> >
>> Per your list procedures output attached, it seems the procs states are
>> all inconsistent. There's a WAIT_TIMEOUT subproc of 73587 with PID 73827,
>> which is the UnassignProcedure for this region. Problem is that there are
>> already 5 APs for the same region, which may be causing some deadlocks. If
>> this cluster was on a hbck2 supported version, you could get rid of this
>> state using bypass command on all these proc ids, then manually get the
>> table/regions states consistent again using
>> setRegionState/setTableState/assigns/unassigns methods.
>>
>> Without tooling, the only option I can think of is to stop cluster, clean
>> out masterprocwals, restart cluster, then use hbase she

Re: HBASE WALs

2021-03-10 Thread Wellington Chevreuil
>
> However, how would that help anyway?  If we cannot fix this at this time
> then any upgrade would have inconsistencies also, yes?
>
The upgrade on it's own wouldn't fix existing inconsistencies, but you
would now have support for additional tooling (hbase-operators-tool)  to
help you with this.

As all the 'SUCCESS' procedures have a parent ID 73587, does this mean that
> they were successfully and fully moved from hbase25 to each server
> mentioned in that procedure?  Or does it just mean that the region was
> successfully unassigned from hbase25 but the data still resides on
> hbase25?  I see locality 0.
>
IIRC, those were all UnassignProcedures, so it means the unassignment of
the related region has completed and the region for that particular
procedure went offline.

If we change the table state in meta to 'ENABLED', could this kickstart all
> these things or will it just lead to further problems?

Masters work with its own memory cache of meta, so manually updating it
will just make masters cache inconsistent with meta. You would need to
restart masters to get its cache reloaded from master. The main problem is
that you still have the rogue procedures, which you can't get rid of
without stopping the cluster. One alternative to a full cluster outage
would be to identify all RSes running the rogue procs (you can find that
from active master logs), then stop only those and master, clean
masterprocwals, then start it again.


> I suppose it means I am asking, the 73587 DisableTableProcedure, does it
> mean that the table is waiting to be disabled?  HBASE master declares that
> table is NOT enabled.
>
The table state may have been already updated to disabled, most of its
regions may already be offline, but the 73587 DisableTableProcedure cannot
be considered "done" until all its sub procedures are indeed completed.


Em ter., 9 de mar. de 2021 às 13:40, Marc Hoppins 
escreveu:

> Thanks for that.
>
> Alas, we are (currently) constrained by using Cloudera (CDH) 6.3.1 and do
> not have a viable business use to pay the extortionate amount of money
> required to upgrade.  Which would give these cluster access to newer
> versions.
>
> However, how would that help anyway?  If we cannot fix this at this time
> then any upgrade would have inconsistencies also, yes?
>
> As all the 'SUCCESS' procedures have a parent ID 73587, does this mean
> that they were successfully and fully moved from hbase25 to each server
> mentioned in that procedure?  Or does it just mean that the region was
> successfully unassigned from hbase25 but the data still resides on
> hbase25?  I see locality 0.
>
> If we change the table state in meta to 'ENABLED', could this kickstart
> all these things or will it just lead to further problems?  I suppose it
> means I am asking, the 73587 DisableTableProcedure, does it mean that the
> table is waiting to be disabled?  HBASE master declares that table is NOT
> enabled.
>
> Sorry if I seem stupid but this is still all new to me.
>
> I appreciate the help.
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Tuesday, March 9, 2021 1:20 PM
> To: Hbase-User 
> Subject: Re: HBASE WALs
>
> EXTERNAL
>
> >
> > All fails are waiting on the same PID (73587), a DISABLE TABLE procedure.
> > The offending region (f25fe93e24b34cb2f7fffddee1d89eec) seems to be
> > the problem.
> >
> Per your list procedures output attached, it seems the procs states are
> all inconsistent. There's a WAIT_TIMEOUT subproc of 73587 with PID 73827,
> which is the UnassignProcedure for this region. Problem is that there are
> already 5 APs for the same region, which may be causing some deadlocks. If
> this cluster was on a hbck2 supported version, you could get rid of this
> state using bypass command on all these proc ids, then manually get the
> table/regions states consistent again using
> setRegionState/setTableState/assigns/unassigns methods.
>
> Without tooling, the only option I can think of is to stop cluster, clean
> out masterprocwals, restart cluster, then use hbase shell to
> enable/disable/assign regions. You may also need to manually update
> table/region states in meta table. Of course, you can automate these manual
> steps into your own tooling, but may be a better strategy in the long term
> to upgrade to a more stable version that also benefits from more tooling
> supported by the community.
>
>
>
>
>
> Em seg., 8 de mar. de 2021 às 07:50, Marc Hoppins 
> escreveu:
>
> > Hi, Wellington,
> >
> > I was on 'vacation' (no road trip or overseas anything) for a week.
> >
> > All fails are waiting on the same PID (73587), a DISABLE TABLE procedure.
&g

Re: HBASE WALs

2021-03-09 Thread Wellington Chevreuil
>
> All fails are waiting on the same PID (73587), a DISABLE TABLE procedure.
> The offending region (f25fe93e24b34cb2f7fffddee1d89eec) seems to be the
> problem.
>
Per your list procedures output attached, it seems the procs states are all
inconsistent. There's a WAIT_TIMEOUT subproc of 73587 with PID 73827,
which is the UnassignProcedure for this region. Problem is that there are
already 5 APs for the same region, which may be causing some deadlocks. If
this cluster was on a hbck2 supported version, you could get rid of this
state using bypass command on all these proc ids, then manually get the
table/regions states consistent again using
setRegionState/setTableState/assigns/unassigns methods.

Without tooling, the only option I can think of is to stop cluster, clean
out masterprocwals, restart cluster, then use hbase shell to
enable/disable/assign regions. You may also need to manually update
table/region states in meta table. Of course, you can automate these manual
steps into your own tooling, but may be a better strategy in the long term
to upgrade to a more stable version that also benefits from more tooling
supported by the community.





Em seg., 8 de mar. de 2021 às 07:50, Marc Hoppins 
escreveu:

> Hi, Wellington,
>
> I was on 'vacation' (no road trip or overseas anything) for a week.
>
> All fails are waiting on the same PID (73587), a DISABLE TABLE procedure.
> The offending region (f25fe93e24b34cb2f7fffddee1d89eec) seems to be the
> problem.
>
> I am still mystified about the HBCK2-tools. I have attached a previous
> thread that you commented on at the time.
>
> I did build a tools for our HBASE 2.1.0...or rather, I built it on Ubuntu
> 20.04 with openJDK8 (1.8.0_212), then successfully ran it on Ubuntu 16.04
> with a slightly different java (Oracle Java 8, 1.8.0_181).  I used it to
> help fix a similar problem with an offline table and RITs.  Both HBASE
> versions are the same.
>
> I attach a 'sheet' with the current procs/locks.
>
> -Original Message-
> From: Marc Hoppins 
> Sent: Wednesday, March 3, 2021 9:51 AM
> To: user@hbase.apache.org
> Cc: Martin Oravec 
> Subject: RE: HBASE WALs
>
> EXTERNAL
>
> Thanks, Wellington,
>
> I have already build a hbck1-tools for 2.1.0 using method described in
> other topics. All the HBASE and JDK here is the same version so if it
> worked fixing one cluster HBASE then it should work for other installs.
>
> Fiddling with masterprocWALs will require complete shutdown of hbase
> operations to prevent incoming reds/writes on other tables and I am not
> sure how disruptive that will be other than "probably a lot".
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Tuesday, March 2, 2021 10:57 AM
> To: Hbase-User 
> Subject: Re: HBASE WALs
>
> EXTERNAL
>
> Sorry, missed your previous email. I was hoping you were not on a
> non-stable version, so that you would benefit from hbck2 tool support.
> Unfortunately, 2.1.0 is among the early releases that don't work with this
> tool (it requires at least 2.0.3, 2.1.1 or 2.2.0).
>
> Multiple locks exist for DISABLE/ENABLE/UNASSIGN but the system seems
> > mostly unhappy with one region in particular, and is reporting on that.
> >
> Are the other regions for the table properly closed, and this is the only
> one stuck? If you do a list_procedures, are you able to identify an
> 'unassign' procedure still running for this table? Or if you grep master
> logs for this region, do you see any messages suggesting there's still
> ongoing attempts to bring the region offline? If there's apparently no
> procedure/no ongoing attempts to offline the region, you might try to
> manually update its state in meta table, then flip masters (assuming you
> have master HA), so that the new active loads an up to date state from meta
> table.
>
> Otherwise, if there's still a rogue procedure trying to offline the
> region, unfortunately, due to the lack of hbck support, you would most
> likely need a more disruptive intervention similar to what you had
> described in your first email, but instead of normal wal folder, master
> proc wals is what you really would need to clean out here, as that is where
> procedures state is persisted, and you wouldn't want the rogue procedure to
> be resumed.
>
> Em seg., 1 de mar. de 2021 às 10:22, Marc Hoppins 
> escreveu:
>
> > If you know of anything that will help I would appreciate it.
> >
> > If you need any log output let me know.
> >
> > Thanks
> >
> >
> > -Original Message-
> > From: Wellington Chevreuil 
> > Sent: Thursday, February 25, 2021 4:08 PM
> > To: Hbase-User 
> > S

Re: HBCK2 Tools

2021-03-09 Thread Wellington Chevreuil
There are three modules on hbase-operator-tools:
- hbase-hbck2: I believe most of the hbck2 commands, apart from
assigns/unassigns, wouldn't require tables to be enabled.
- hbase-tools: RegionsMerger would require table to be
enabled. MissingRegionDirsRepairTool doesn't require tables to be enabled.
- hbase-table-reporter: This produces general reports about tables, and
requires target table to be enabled.

Em ter., 9 de mar. de 2021 às 08:01, Marc Hoppins 
escreveu:

> Hi all,
>
> I have been looking for more info on this matter but cannot seem to find
> anything.
>
> What (if any) of hbck2 operator tools can be used on a disabled table?
>
> Thanks
>
> M
>


Re: HBASE WALs

2021-03-02 Thread Wellington Chevreuil
Sorry, missed your previous email. I was hoping you were not on a
non-stable version, so that you would benefit from hbck2 tool support.
Unfortunately, 2.1.0 is among the early releases that don't work with this
tool (it requires at least 2.0.3, 2.1.1 or 2.2.0).

Multiple locks exist for DISABLE/ENABLE/UNASSIGN but the system seems
> mostly unhappy with one region in particular, and is reporting on that.
>
Are the other regions for the table properly closed, and this is the only
one stuck? If you do a list_procedures, are you able to identify an
'unassign' procedure still running for this table? Or if you grep master
logs for this region, do you see any messages suggesting there's still
ongoing attempts to bring the region offline? If there's apparently no
procedure/no ongoing attempts to offline the region, you might try to
manually update its state in meta table, then flip masters (assuming you
have master HA), so that the new active loads an up to date state from meta
table.

Otherwise, if there's still a rogue procedure trying to offline the region,
unfortunately, due to the lack of hbck support, you would most likely need
a more disruptive intervention similar to what you had described in your
first email, but instead of normal wal folder, master proc wals is what you
really would need to clean out here, as that is where procedures state is
persisted, and you wouldn't want the rogue procedure to be resumed.

Em seg., 1 de mar. de 2021 às 10:22, Marc Hoppins 
escreveu:

> If you know of anything that will help I would appreciate it.
>
> If you need any log output let me know.
>
> Thanks
>
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Thursday, February 25, 2021 4:08 PM
> To: Hbase-User 
> Subject: Re: HBASE WALs
>
> EXTERNAL
>
> >
> > Do WAL files contain information for multiple regions per WAL or is
> > one WAL associated with one region?
> >
> Multiple regions edits would be present in a single wal file. That's why
> upon a RS crash and wal processing, there's a wal split phase.
>
> I am trying to find a way to clear a RIT for a disabled table. A similar
> > problem (but on a test cluster) involved me clearing znode info,
> > deleting HDFS data for the table and deleting WALs/MasterProcWAL
> > files, finally restarting HBASE service.
> >
> Which hbase version are you on?
>
> Em qui., 25 de fev. de 2021 às 11:51, Marc Hoppins 
> escreveu:
>
> > Hi all,
> >
> > Do WAL files contain information for multiple regions per WAL or is
> > one WAL associated with one region?
> >
> > I am trying to find a way to clear a RIT for a disabled table. A
> > similar problem (but on a test cluster) involved me clearing znode
> > info, deleting HDFS data for the table and deleting WALs/MasterProcWAL
> > files, finally restarting HBASE service.
> >
> > Table cannot be enabled.
> >
> > Multiple locks exist for DISABLE/ENABLE/UNASSIGN but the system seems
> > mostly unhappy with one region in particular, and is reporting on that.
> >
> > There are many tables that are very active so I don't think it is
> > possible to stop the entire service without a lot of forewarning to
> users.
> >
> > Thanks in advance.
> >
>


Re: HBASE WALs

2021-02-25 Thread Wellington Chevreuil
>
> Do WAL files contain information for multiple regions per WAL or is one
> WAL associated with one region?
>
Multiple regions edits would be present in a single wal file. That's why
upon a RS crash and wal processing, there's a wal split phase.

I am trying to find a way to clear a RIT for a disabled table. A similar
> problem (but on a test cluster) involved me clearing znode info, deleting
> HDFS data for the table and deleting WALs/MasterProcWAL files, finally
> restarting HBASE service.
>
Which hbase version are you on?

Em qui., 25 de fev. de 2021 às 11:51, Marc Hoppins 
escreveu:

> Hi all,
>
> Do WAL files contain information for multiple regions per WAL or is one
> WAL associated with one region?
>
> I am trying to find a way to clear a RIT for a disabled table. A similar
> problem (but on a test cluster) involved me clearing znode info, deleting
> HDFS data for the table and deleting WALs/MasterProcWAL files, finally
> restarting HBASE service.
>
> Table cannot be enabled.
>
> Multiple locks exist for DISABLE/ENABLE/UNASSIGN but the system seems
> mostly unhappy with one region in particular, and is reporting on that.
>
> There are many tables that are very active so I don't think it is possible
> to stop the entire service without a lot of forewarning to users.
>
> Thanks in advance.
>


Re: Removal of table rows.

2020-12-18 Thread Wellington Chevreuil
>
> So, once again, I ask: is a method to remove rows via namespace or table
> name part of any development plan? Or part of hbase operator tools
> development?

For normal operations, this could be done via "hbase shell" *drop* command.
In this specific case of yours, you can use it for removing the remaining
cells with table state.

As for the fix: apparently all is well. Balancing has restarted, no RITs
> anywhere.
>
Glad to hear that!

 Can anyone clarify what other operations are carried out other than
> removing meta rows?  Eg., is zookeeper shell called to deal with any
> outstanding items?  I saw zk information on screen when I ran tools.
>
ZK is used for orchestration of several internal hbase activities that are
carried in a distributed way. Sometimes, some of these actions can get into
an inconsistent state, causing it to fail to progress, so cleaning out ZK
info entirely basically does a *reset *of hbase internal admin state. In
general, it's safe to clean out hbase info in ZK, but there are few
features that may break if you do so, such as replication.
The meta table is a different story, in short, it's much less safe to
manipulate it manually. It works as an index of all tables in hbase,
containing info about which RegionServer is hosting which portions of
different tables, so that clients can be redirected accordingly when
querying specific tables. Mistakes while manipulating it could cause
temporary/indefinite partial/total user data unavailability, so it's only
advisable to do so if you have a deep understanding of the implications.

Em sex., 18 de dez. de 2020 às 07:55, Marc Hoppins 
escreveu:

> As for the fix: apparently all is well. Balancing has restarted, no RITs
> anywhere.
>
> Can anyone clarify what other operations are carried out other than
> removing meta rows?  Eg., is zookeeper shell called to deal with any
> outstanding items?  I saw zk information on screen when I ran tools.
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Thursday, December 17, 2020 8:15 PM
> To: Hbase-User 
> Subject: Re: Removal of table rows.
>
> EXTERNAL
>
> >
> > alfa:extgen,F,1608197544264.ba22f6113a0bb9520cee1f7b30050fa7.
> > column=info:state, timestamp=1608197544904, value=OPEN
> > alfa:rfilenameext column=table:state, timestamp=1604493139455,
> > value=\x08\x00
> > alfa:rfiles column=table:state, timestamp=1602600776355,
> > value=\x08\x00
> >
> These are all table state rows, not table regions rows, so
> extraRegionsInMeta would not remove it.
>
>
> > I did restart 'instances' on master(s) so I was surprised to see a
> > lingering row.
> >
> The restart should have cleared out the RITs, but not the table state rows.
> Can you confirm if you still see those regions in Master UI even after
> restarting masters?
>
> Em qui., 17 de dez. de 2020 às 15:19,  escreveu:
>
> > Marc,  what type of data are loading into hbase?   curious..
> > > On 12/17/2020 8:46 AM Marc Hoppins  wrote:
> > >
> > >
> > > I did restart 'instances' on master(s) so I was surprised to see a
> > lingering row.
> > >
> > > -Original Message-
> > > From: Wellington Chevreuil 
> > > Sent: Thursday, December 17, 2020 1:34 PM
> > > To: Hbase-User 
> > > Subject: Re: Removal of table rows.
> > >
> > > EXTERNAL
> > >
> > > Yes, depending on hbade version, you would need a master restart to
> > effectively clean master cache.
> > >
> > > On Thu, 17 Dec 2020, 10:12 Marc Hoppins,  wrote:
> > >
> > > > OK. I had a meandering circuit to get a version of operator-tools
> > > > built and running.
> > > >
> > > > After running it on the meta table
> > > >
> > > > sudo hbase hbck -j /tmp/hbase-hbck2-1.1.0-SNAPSHOT.jar
> > > > extraRegionsInMeta alfa:rfilenameext --fix
> > > >
> > > > I have ended up with:-
> > > >
> > > > alfa:extgen,F,1608197544264.ba22f6113a0bb9520cee1f7b30050fa7.
> > > > column=info:state, timestamp=1608197544904, value=OPEN
> > > > alfa:rfilenameext column=table:state, timestamp=1604493139455,
> > > > value=\x08\x00
> > > > alfa:rfiles column=table:state, timestamp=1602600776355,
> > > > value=\x08\x00
> > > >
> > > > I still have one row referring to the missing table.  Do I need to
> > > > restart hbase service to remove this or will it vanish at some time?
> > > >
> > > > HBASE on the master still shows the 48 regions in transition when
> > > > I open t

Re: Removal of table rows.

2020-12-17 Thread Wellington Chevreuil
>
> alfa:extgen,F,1608197544264.ba22f6113a0bb9520cee1f7b30050fa7.
> column=info:state, timestamp=1608197544904, value=OPEN
> alfa:rfilenameext column=table:state, timestamp=1604493139455,
> value=\x08\x00
> alfa:rfiles column=table:state, timestamp=1602600776355, value=\x08\x00
>
These are all table state rows, not table regions rows, so
extraRegionsInMeta would not remove it.


> I did restart 'instances' on master(s) so I was surprised to see a
> lingering row.
>
The restart should have cleared out the RITs, but not the table state rows.
Can you confirm if you still see those regions in Master UI even after
restarting masters?

Em qui., 17 de dez. de 2020 às 15:19,  escreveu:

> Marc,  what type of data are loading into hbase?   curious..
> > On 12/17/2020 8:46 AM Marc Hoppins  wrote:
> >
> >
> > I did restart 'instances' on master(s) so I was surprised to see a
> lingering row.
> >
> > -Original Message-
> > From: Wellington Chevreuil 
> > Sent: Thursday, December 17, 2020 1:34 PM
> > To: Hbase-User 
> > Subject: Re: Removal of table rows.
> >
> > EXTERNAL
> >
> > Yes, depending on hbade version, you would need a master restart to
> effectively clean master cache.
> >
> > On Thu, 17 Dec 2020, 10:12 Marc Hoppins,  wrote:
> >
> > > OK. I had a meandering circuit to get a version of operator-tools
> > > built and running.
> > >
> > > After running it on the meta table
> > >
> > > sudo hbase hbck -j /tmp/hbase-hbck2-1.1.0-SNAPSHOT.jar
> > > extraRegionsInMeta alfa:rfilenameext --fix
> > >
> > > I have ended up with:-
> > >
> > > alfa:extgen,F,1608197544264.ba22f6113a0bb9520cee1f7b30050fa7.
> > > column=info:state, timestamp=1608197544904, value=OPEN
> > > alfa:rfilenameext column=table:state, timestamp=1604493139455,
> > > value=\x08\x00
> > > alfa:rfiles column=table:state, timestamp=1602600776355,
> > > value=\x08\x00
> > >
> > > I still have one row referring to the missing table.  Do I need to
> > > restart hbase service to remove this or will it vanish at some time?
> > >
> > > HBASE on the master still shows the 48 regions in transition when I
> > > open the interface.  I assume this is because the service has not been
> restarted.
> > >
> > > M
> > >
> > > -Original Message-
> > > From: Wellington Chevreuil 
> > > Sent: Wednesday, December 16, 2020 3:39 PM
> > > To: Hbase-User 
> > > Subject: Re: Removal of table rows.
> > >
> > > EXTERNAL
> > >
> > > >
> > > > Do I build hbck2-tools on similar OS & java version?  I have been
> > > > informed we have 'maven' installed on one host on a cluster, which
> > > > is centos and
> > > > (probably) a different java, and the build itself is needed for
> Ubuntu16.
> > >
> > > I would stick to the same java major version. OS wise, this module
> > > doesn't rely on any OS native call, AFAIK, so should work fine between
> > > these different versions.
> > >
> > > Em qua., 16 de dez. de 2020 às 13:30, Marc Hoppins
> > > 
> > > escreveu:
> > >
> > > > Thanks. Once again, a newbie in this regard but,
> > > >
> > > > Do I build hbck2-tools on similar OS & java version?  I have been
> > > > informed we have 'maven' installed on one host on a cluster, which
> > > > is centos and
> > > > (probably) a different java, and the build itself is needed for
> Ubuntu16.
> > > >
> > > > -Original Message-
> > > > From: Wellington Chevreuil 
> > > > Sent: Wednesday, December 16, 2020 10:08 AM
> > > > To: Hbase-User 
> > > > Subject: Re: Removal of table rows.
> > > >
> > > > EXTERNAL
> > > >
> > > > >
> > > > > Hbase:meta has info for one table region which is NOT on a master.
> > > > > Is that correct? I would have expected all meta info to be stored
> > > > > on a
> > > > master.
> > > > >
> > > > meta table is an "hbase system table" that has info about which
> > > > regions are assigned to which region servers in your cluster. As any
> > > > user table, it also has a region, and this region must be hosted on
> > > > a region server in the cluster. Master is just responsible for the
> > &g

Re: Removal of table rows.

2020-12-17 Thread Wellington Chevreuil
Yes, depending on hbade version, you would need a master restart to
effectively clean master cache.

On Thu, 17 Dec 2020, 10:12 Marc Hoppins,  wrote:

> OK. I had a meandering circuit to get a version of operator-tools built
> and running.
>
> After running it on the meta table
>
> sudo hbase hbck -j /tmp/hbase-hbck2-1.1.0-SNAPSHOT.jar extraRegionsInMeta
> alfa:rfilenameext --fix
>
> I have ended up with:-
>
> alfa:extgen,F,1608197544264.ba22f6113a0bb9520cee1f7b30050fa7.
> column=info:state, timestamp=1608197544904, value=OPEN
> alfa:rfilenameext column=table:state, timestamp=1604493139455,
> value=\x08\x00
> alfa:rfiles column=table:state, timestamp=1602600776355, value=\x08\x00
>
> I still have one row referring to the missing table.  Do I need to restart
> hbase service to remove this or will it vanish at some time?
>
> HBASE on the master still shows the 48 regions in transition when I open
> the interface.  I assume this is because the service has not been restarted.
>
> M
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Wednesday, December 16, 2020 3:39 PM
> To: Hbase-User 
> Subject: Re: Removal of table rows.
>
> EXTERNAL
>
> >
> > Do I build hbck2-tools on similar OS & java version?  I have been
> > informed we have 'maven' installed on one host on a cluster, which is
> > centos and
> > (probably) a different java, and the build itself is needed for Ubuntu16.
>
> I would stick to the same java major version. OS wise, this module doesn't
> rely on any OS native call, AFAIK, so should work fine between these
> different versions.
>
> Em qua., 16 de dez. de 2020 às 13:30, Marc Hoppins 
> escreveu:
>
> > Thanks. Once again, a newbie in this regard but,
> >
> > Do I build hbck2-tools on similar OS & java version?  I have been
> > informed we have 'maven' installed on one host on a cluster, which is
> > centos and
> > (probably) a different java, and the build itself is needed for Ubuntu16.
> >
> > -Original Message-
> > From: Wellington Chevreuil 
> > Sent: Wednesday, December 16, 2020 10:08 AM
> > To: Hbase-User 
> > Subject: Re: Removal of table rows.
> >
> > EXTERNAL
> >
> > >
> > > Hbase:meta has info for one table region which is NOT on a master.
> > > Is that correct? I would have expected all meta info to be stored on
> > > a
> > master.
> > >
> > meta table is an "hbase system table" that has info about which
> > regions are assigned to which region servers in your cluster. As any
> > user table, it also has a region, and this region must be hosted on a
> > region server in the cluster. Master is just responsible for the
> > coordination of regions and some other housekeeping actions.
> >
> > As I state below, a simple method to remove everything connected to
> > > alfa:rfilenameext would seem the easiest of tasks as the
> > > namespace:name consistently appears on every relevant row.
> >
> >  Looks like hbck2 *extraRegionsinMeta *is what you need here, and it
> > should work with hbase 2.1. It's not available on hbck2 1.0 release,
> > but you can build a new hbck2 jar out of current master branch, and
> > that would give you an hbck2 with extraRegionsinMeta. You can do it by
> > cloning the repo below, then do a *mvn install* from the main module:
> >
> > https://github.com/apache/hbase-operator-tools.git
> >
> > Em qua., 16 de dez. de 2020 às 08:02, Marc Hoppins
> > 
> > escreveu:
> >
> > > Hbase is 2.1.0 (via Cloudera CDH 6.3.2)
> > >
> > > The UI page for the master shows NO report links/tabs.
> > >
> > > Hbase:meta has info for one table region which is NOT on a master.
> > > Is that correct? I would have expected all meta info to be stored on
> > > a
> > master.
> > >
> > > NameRegion Server
> > > Read RequestsWrite Requests
> > > Num.Storefiles  MemSize Locality
> > > hbase:meta,,1.1588230740ba-wtmp05.asgardalfa.hq.com:16030
> > >  219,168,603 35,419  1 MB  4 0 B
> > >  1.0
> > >
> > > -Original Message-
> > > From: Wellington Chevreuil 
> > > Sent: Tuesday, December 15, 2020 7:10 PM
> > > To: Hbase-User 
> > > Subject: Re: Removal of table rows.
> > >
> > > EXTERNAL
> > >
> > > >
> > > > I am 

Re: Removal of table rows.

2020-12-16 Thread Wellington Chevreuil
>
> Do I build hbck2-tools on similar OS & java version?  I have been informed
> we have 'maven' installed on one host on a cluster, which is centos and
> (probably) a different java, and the build itself is needed for Ubuntu16.

I would stick to the same java major version. OS wise, this module doesn't
rely on any OS native call, AFAIK, so should work fine between these
different versions.

Em qua., 16 de dez. de 2020 às 13:30, Marc Hoppins 
escreveu:

> Thanks. Once again, a newbie in this regard but,
>
> Do I build hbck2-tools on similar OS & java version?  I have been informed
> we have 'maven' installed on one host on a cluster, which is centos and
> (probably) a different java, and the build itself is needed for Ubuntu16.
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Wednesday, December 16, 2020 10:08 AM
> To: Hbase-User 
> Subject: Re: Removal of table rows.
>
> EXTERNAL
>
> >
> > Hbase:meta has info for one table region which is NOT on a master. Is
> > that correct? I would have expected all meta info to be stored on a
> master.
> >
> meta table is an "hbase system table" that has info about which regions
> are assigned to which region servers in your cluster. As any user table, it
> also has a region, and this region must be hosted on a region server in the
> cluster. Master is just responsible for the coordination of regions and
> some other housekeeping actions.
>
> As I state below, a simple method to remove everything connected to
> > alfa:rfilenameext would seem the easiest of tasks as the
> > namespace:name consistently appears on every relevant row.
>
>  Looks like hbck2 *extraRegionsinMeta *is what you need here, and it
> should work with hbase 2.1. It's not available on hbck2 1.0 release, but
> you can build a new hbck2 jar out of current master branch, and that would
> give you an hbck2 with extraRegionsinMeta. You can do it by cloning the
> repo below, then do a *mvn install* from the main module:
>
> https://github.com/apache/hbase-operator-tools.git
>
> Em qua., 16 de dez. de 2020 às 08:02, Marc Hoppins 
> escreveu:
>
> > Hbase is 2.1.0 (via Cloudera CDH 6.3.2)
> >
> > The UI page for the master shows NO report links/tabs.
> >
> > Hbase:meta has info for one table region which is NOT on a master. Is
> > that correct? I would have expected all meta info to be stored on a
> master.
> >
> > NameRegion Server
> > Read RequestsWrite Requests
> > Num.Storefiles      MemSize Locality
> > hbase:meta,,1.1588230740ba-wtmp05.asgardalfa.hq.com:16030
> >  219,168,603 35,419  1 MB  4 0 B
> >  1.0
> >
> > -Original Message-
> > From: Wellington Chevreuil 
> > Sent: Tuesday, December 15, 2020 7:10 PM
> > To: Hbase-User 
> > Subject: Re: Removal of table rows.
> >
> > EXTERNAL
> >
> > >
> > > I am an HBASE newbie so I apologise if I am being repetitious.
> > >
> > > Apologies also if this is not the right group. Am not sure if this
> > > may be more suited to 'dev' list.
> > >
> > Welcome, this is the right channel for this kind of questions.
> >
> >
> > > The solution offered by hbase-operator-tools  - extraRegionsinMeta -
> > > offered hope.  Once again, however, another problem has surfaced:
> > > this tools command for extra regions is incompatible with the hbase
> > > version we are running.
> > >
> >  This command does not rely on any Master/RegionServer interface, so
> > it should not have any incompatibility issues. It only uses public
> > client API to cleanup meta table, so maybe you just need to build
> > latest hbck2 master branch version? It would work for any hbase 2, in
> > theory, it could even work with hbase 1, but that was never tested.
> > Luckly, you got Stack's attention, so yeah, if you confirm on his
> > previous questions we might be able to help with further directions.
> >
> > Em ter., 15 de dez. de 2020 às 17:23, Stack  escreveu:
> >
> > > On Tue, Dec 15, 2020 at 7:46 AM Marc Hoppins 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am an HBASE newbie so I apologise if I am being repetitious.
> > > >
> > > > Apologies also if this is not the right group. Am not sure if this
> > > > may be more suited to 'dev' list.  However,
> > > >
> > > > A

Re: Removal of table rows.

2020-12-16 Thread Wellington Chevreuil
>
> Hbase:meta has info for one table region which is NOT on a master. Is that
> correct? I would have expected all meta info to be stored on a master.
>
meta table is an "hbase system table" that has info about which regions are
assigned to which region servers in your cluster. As any user table, it
also has a region, and this region must be hosted on a region server in the
cluster. Master is just responsible for the coordination of regions and
some other housekeeping actions.

As I state below, a simple method to remove everything connected to
> alfa:rfilenameext would seem the easiest of tasks as the namespace:name
> consistently appears on every relevant row.

 Looks like hbck2 *extraRegionsinMeta *is what you need here, and it should
work with hbase 2.1. It's not available on hbck2 1.0 release, but you can
build a new hbck2 jar out of current master branch, and that would give you
an hbck2 with extraRegionsinMeta. You can do it by cloning the repo below,
then do a *mvn install* from the main module:

https://github.com/apache/hbase-operator-tools.git

Em qua., 16 de dez. de 2020 às 08:02, Marc Hoppins 
escreveu:

> Hbase is 2.1.0 (via Cloudera CDH 6.3.2)
>
> The UI page for the master shows NO report links/tabs.
>
> Hbase:meta has info for one table region which is NOT on a master. Is that
> correct? I would have expected all meta info to be stored on a master.
>
> NameRegion Server
> Read RequestsWrite Requests
> Num.Storefiles  MemSize Locality
> hbase:meta,,1.1588230740ba-wtmp05.asgardalfa.hq.com:16030
>  219,168,603 35,419  1 MB  4     0 B
>  1.0
>
> -Original Message-
> From: Wellington Chevreuil 
> Sent: Tuesday, December 15, 2020 7:10 PM
> To: Hbase-User 
> Subject: Re: Removal of table rows.
>
> EXTERNAL
>
> >
> > I am an HBASE newbie so I apologise if I am being repetitious.
> >
> > Apologies also if this is not the right group. Am not sure if this may
> > be more suited to 'dev' list.
> >
> Welcome, this is the right channel for this kind of questions.
>
>
> > The solution offered by hbase-operator-tools  - extraRegionsinMeta -
> > offered hope.  Once again, however, another problem has surfaced: this
> > tools command for extra regions is incompatible with the hbase version
> > we are running.
> >
>  This command does not rely on any Master/RegionServer interface, so it
> should not have any incompatibility issues. It only uses public client API
> to cleanup meta table, so maybe you just need to build latest hbck2 master
> branch version? It would work for any hbase 2, in theory, it could even
> work with hbase 1, but that was never tested. Luckly, you got Stack's
> attention, so yeah, if you confirm on his previous questions we might be
> able to help with further directions.
>
> Em ter., 15 de dez. de 2020 às 17:23, Stack  escreveu:
>
> > On Tue, Dec 15, 2020 at 7:46 AM Marc Hoppins 
> wrote:
> >
> > > Hi all,
> > >
> > > I am an HBASE newbie so I apologise if I am being repetitious.
> > >
> > > Apologies also if this is not the right group. Am not sure if this
> > > may be more suited to 'dev' list.  However,
> > >
> > > A problem question and a technical/wishlist question.
> > >
> > > Problem:
> > >
> > > I have inherited a problem with an HBASE table. The original issue
> > > may have erupted due to a network outage.  A table has 48 region in
> > transition
> > > operations, stuck that way for several weeks.  A previous attempt to
> > > fix things with hbck failed. An attempt to DISABLE then DROP the
> > > table also failed. The four or five attempts to work the table ALSO
> > > now had stuck procedures.  Subsequent DFS and ZOO operations left
> > > the situation where there was no data and no real table: just a 6K
> > > file in an empty
> > structure.
> > >
> > >
> > Which version of hbase, do you know Marc? (Look at the base of the
> > master UI. It'll tell you. Sounds like it is an hbase-2.x).
> >
> > What is the name of the 6k file? (I am trying to understand what the
> > file you are referencing is).
> >
> > The table directory was removed from hdfs and zookeeper?
> >
> Yes.   The data had already been removed from the previous tech attempts
> to fix the issue.
>
> Thus, the HDFS 6K file structure contained no actual information, just
> empty files and directories.  Zkshell had been used to remove znodes.
>
&g

Re: Removal of table rows.

2020-12-15 Thread Wellington Chevreuil
>
> I am an HBASE newbie so I apologise if I am being repetitious.
>
> Apologies also if this is not the right group. Am not sure if this may be
> more suited to 'dev' list.
>
Welcome, this is the right channel for this kind of questions.


> The solution offered by hbase-operator-tools  - extraRegionsinMeta -
> offered hope.  Once again, however, another problem has surfaced: this
> tools command for extra regions is incompatible with the hbase version we
> are running.
>
 This command does not rely on any Master/RegionServer interface, so it
should not have any incompatibility issues. It only uses public client API
to cleanup meta table, so maybe you just need to build latest hbck2 master
branch version? It would work for any hbase 2, in theory, it could even
work with hbase 1, but that was never tested. Luckly, you got Stack's
attention, so yeah, if you confirm on his previous questions we might be
able to help with further directions.

Em ter., 15 de dez. de 2020 às 17:23, Stack  escreveu:

> On Tue, Dec 15, 2020 at 7:46 AM Marc Hoppins  wrote:
>
> > Hi all,
> >
> > I am an HBASE newbie so I apologise if I am being repetitious.
> >
> > Apologies also if this is not the right group. Am not sure if this may be
> > more suited to 'dev' list.  However,
> >
> > A problem question and a technical/wishlist question.
> >
> > Problem:
> >
> > I have inherited a problem with an HBASE table. The original issue may
> > have erupted due to a network outage.  A table has 48 region in
> transition
> > operations, stuck that way for several weeks.  A previous attempt to fix
> > things with hbck failed. An attempt to DISABLE then DROP the table also
> > failed. The four or five attempts to work the table ALSO now had stuck
> > procedures.  Subsequent DFS and ZOO operations left the situation where
> > there was no data and no real table: just a 6K file in an empty
> structure.
> >
> >
> Which version of hbase, do you know Marc? (Look at the base of the master
> UI. It'll tell you. Sounds like it is an hbase-2.x).
>
> What is the name of the 6k file? (I am trying to understand what the file
> you are referencing is).
>
> The table directory was removed from hdfs and zookeeper?
>
>
>
> > When I got to the problem my knowledge of HBASE was nil. It is little
> > better than that now but anyway...
> >
> > Fortunately for me this is a testing/dev cluster. The 'owner' was content
> > that the table can be removed - and appeared to have already been
> done...of
> > a kind.
> >
> > Reading and reading of others' similar issues lead me to the point I also
> > was going to clean the HDFS data and ZK data for this table.  I shut down
> > HBASE, cleaned HDFS and ZK node data, deleted the masterprocwals and
> > restarted HBASE.
> >
> > When all came up I was happy to see that the affected table appeared
> > nowhere and that the procedures had all disappeared.
> >
> > However, when I hopped to hbase master, even though no table of that name
> > existed, 48 regions were still in transition.  Further research steered
> me
> > toward hbase:meta and sure enough, the references to the RITs lived
> happily
> > among other data for other tables.
> >
> >
> Do you have a 'Procedures and Locks' tab on your master home page? If so,
> what does it report? Is there an "HBCK Report" tab? If so, what does it
> show?
>
>
>
> > The solution offered by hbase-operator-tools  - extraRegionsinMeta -
> > offered hope.  Once again, however, another problem has surfaced: this
> > tools command for extra regions is incompatible with the hbase version we
> > are running.
> >
> > So...
> >
> > How can I remove the references to namespace:kaput_table from hbase:meta?
> >
> >
> > Sounds like an hbase-2.1.x or a 2.0.x.
>
> Will wait on your answers to the above... It might be a crass delete of
> each row from the hbase:meta table then restart (even then, if procedures
> in the procedure store, you may have to clear it again as you did above
> before the restart to purge the procedures as you don't have tooling to do
> it from cmdline... do you have an 'hbck2 bypass --override'?).
>
>
>
>
> > Technical:
> >
> > Is there to be any implementation of such a fix within HBASe itself where
> > table manipulation can be performed by Eg.,
> >
> > delete hbase:meta namespace
> > delete hbase:meta  namespace:table
> >
> > or even
> >
> > scan hbase:meta filter = namespace:table | deleterow
> >
> > or some such?
> >
> >
> Scan doesn't return result (unfortunately) so you can assign to a shell
> variable; it dumps scan output2 on stdout/stderr.
>
> I'm sure there is a better way but something like below (don't laugh! and
> I've not really tried it so be careful):
>
> # Get rows for the BAD_TABLENAME from hbase:meta table
> $ echo 'scan "hbase:meta", {ROWPREFIXFILTER => "BAD_TABLENAME"}'| hbase
> shell > /tmp/out.txt
> # Need to get the row only. Above got rows and columns. We don't want to
> filter on column
> # because we do not know which columns are still in hbase:meta. Ne

Re: [ANNOUNCE] New HBase committer Yulin Niu

2020-12-03 Thread Wellington Chevreuil
Congratulations!

Em qui., 3 de dez. de 2020 às 14:56, zheng wang <18031...@qq.com> escreveu:

> Congratulations!
>
>
>
>
> -- 原始邮件 --
> 发件人:
>   "user"
> <
> palomino...@gmail.com>;
> 发送时间: 2020年12月3日(星期四) 晚上6:18
> 收件人: "HBase Dev List" 抄送: "Hbase-User" 主题: Re: [ANNOUNCE] New HBase committer Yulin Niu
>
>
>
> Congratulations!
>
> Guanghao Zhang 
> > Folks,
> >
> > On behalf of the Apache HBase PMC I am pleased to announce that Yulin
> Niu
> > has accepted the PMC's invitation to become a committer on the
> project.
> >
> > We appreciate all of the great contributions Yulin has made to the
> > community thus far and we look forward to his continued involvement.
> >
> > Allow me to be the first to congratulate Yulin on his new role!
> >
> > Thanks.
> >


Re: [ANNOUNCE] New HBase committer Xin Sun

2020-12-03 Thread Wellington Chevreuil
Congratulations and welcome!

Em qui., 3 de dez. de 2020 às 09:13, Guanghao Zhang 
escreveu:

> Folks,
>
> On behalf of the Apache HBase PMC I am pleased to announce that Xin Sun has
> accepted the PMC's invitation to become a committer on the project.
>
> We appreciate all of the great contributions Xin Sun has made to the
> community thus far and we look forward to his continued involvement.
>
> Allow me to be the first to congratulate Xin Sun on his new role!
>
> Thanks.
>


Re: Ghost Regions Problem

2020-08-05 Thread Wellington Chevreuil
I mentioned assigns as a possible solution for your original issue (before
you dropped/recreated/bulkloaded original table). It obviously will never
work for these "ghost" regions because these don't belong to any table.

Yes, rolling restart masters will make it read state from meta again. Can
you confirm how have you originally cleaned up the original problem,
especially if you had manually deleted regions from meta?

On Wed, 5 Aug 2020, 08:43 jackie macmillian, 
wrote:

> Thanks for your response Wellington.
>
> hbck2 assigns method does not work here unfortunately due to lack of table
> descriptor, both in meta table and in-memory. The actual table and most of
> the regions that table had been dropped successfully. When you try to
> assign those remaining ghost regions, they are stuck as on HBASE-22780.
>
> One way to get rid of those regions is to create a new table with its old
> name. Suppose you have 4 ghost regions. If you create a 1-region table,
> those 4 ghosts go after that 1 region composing a 5-regioned table. After
> that you are able to disable that table and drop the table successfully. On
> the contrary, as we have many tables & regions like this, it is so hard to
> explore them.
>
> To cut a long story short, the hmaster is assuming its in-memory
> representation of the meta table is intact, but in fact it is not. I need a
> way to force all masters to rebuild their in-memory representations from
> clean hbase:meta table. Does a rolling restart of all masters do that or do
> I have to shut all masters down to force them to proceed with
> initialization on startup?
>
> Wellington Chevreuil , 4 Ağu 2020 Sal,
> 16:42 tarihinde şunu yazdı:
>
> > >
> > >  if you use hbck2 to bypass
> > > those locks but leave them as they are, it would be only a cosmetic
> move,
> > > regions won't become online in real
> > >
> > You can use hbck2 *assigns *method to bring those regions online (it
> > accepts multiple regions as input)
> >
> >  i've read that master processes have some in-memory representation
> > > of hbase:meta table
> > >
> > Yes, masters read meta table only during initialisation, from there
> > onwards, since every change to meta is orchestrated by the active master,
> > it assumes its in-memory representation of meta table is the truth. What
> > exactly steps had you followed when you say you had dropped those ghost
> > regions? If that means any manual deletion of region dirs/files in hdfs,
> or
> > direct manipulation of meta table via client API, then that explains the
> > master inconsistency.
> >
> >
> > Em ter., 4 de ago. de 2020 às 12:51, jackie macmillian <
> > jackie.macmill...@gmail.com> escreveu:
> >
> > > Hi all,
> > >
> > > we have a cluster with hbase 2.2.0 installed on hadoop 2.9.2.
> > > a few weeks ago, we had some issues on our active/standby namenode
> > > selection due to some network problems and their zkfc services'
> > competition
> > > to select the active namenode. as a result, both our namenodes became
> > > active for a short time and all region server services restarted
> > > themselves. we achieved to solve that issue with some arrangements on
> > > timeout parameters. but the story began afterwards.
> > > after the region servers completed their reset tasks, we saw that all
> our
> > > hbase tables became unstable. for example, think about a 200
> regions-wide
> > > table. 196 regions of that table got online, but 4 regions stuck at an
> > > intermediate state like closing/opening. at the end, the tables stuck
> at
> > > disabling/enabling states. on the other hand, hbase had lots of
> procedure
> > > locks and masterprocwals directory kept enlarging.
> > > to overcome that issue, i used hbck2 to release stuck regions and once
> i
> > > managed to enable the table, i created an empty copy of that table from
> > its
> > > descriptor and bulk loaded all hfiles of that corrupt table to the new
> > one.
> > > at this point, you would ask why i did not use that enabled table. i
> > > couldn't because although i was able to bypass the locked procedures
> > there
> > > were so many of them to resolve one by one. if you use hbck2 to bypass
> > > those locks but leave them as they are, it would be only a cosmetic
> move,
> > > regions won't become online in real. so i thought it would be much more
> > > faster to create a brand new one and load all the data to that table.
> > bulk
> > > load was successfu

Re: Ghost Regions Problem

2020-08-04 Thread Wellington Chevreuil
>
>  if you use hbck2 to bypass
> those locks but leave them as they are, it would be only a cosmetic move,
> regions won't become online in real
>
You can use hbck2 *assigns *method to bring those regions online (it
accepts multiple regions as input)

 i've read that master processes have some in-memory representation
> of hbase:meta table
>
Yes, masters read meta table only during initialisation, from there
onwards, since every change to meta is orchestrated by the active master,
it assumes its in-memory representation of meta table is the truth. What
exactly steps had you followed when you say you had dropped those ghost
regions? If that means any manual deletion of region dirs/files in hdfs, or
direct manipulation of meta table via client API, then that explains the
master inconsistency.


Em ter., 4 de ago. de 2020 às 12:51, jackie macmillian <
jackie.macmill...@gmail.com> escreveu:

> Hi all,
>
> we have a cluster with hbase 2.2.0 installed on hadoop 2.9.2.
> a few weeks ago, we had some issues on our active/standby namenode
> selection due to some network problems and their zkfc services' competition
> to select the active namenode. as a result, both our namenodes became
> active for a short time and all region server services restarted
> themselves. we achieved to solve that issue with some arrangements on
> timeout parameters. but the story began afterwards.
> after the region servers completed their reset tasks, we saw that all our
> hbase tables became unstable. for example, think about a 200 regions-wide
> table. 196 regions of that table got online, but 4 regions stuck at an
> intermediate state like closing/opening. at the end, the tables stuck at
> disabling/enabling states. on the other hand, hbase had lots of procedure
> locks and masterprocwals directory kept enlarging.
> to overcome that issue, i used hbck2 to release stuck regions and once i
> managed to enable the table, i created an empty copy of that table from its
> descriptor and bulk loaded all hfiles of that corrupt table to the new one.
> at this point, you would ask why i did not use that enabled table. i
> couldn't because although i was able to bypass the locked procedures there
> were so many of them to resolve one by one. if you use hbck2 to bypass
> those locks but leave them as they are, it would be only a cosmetic move,
> regions won't become online in real. so i thought it would be much more
> faster to create a brand new one and load all the data to that table. bulk
> load was successful and the new table became online and scannable. the next
> point was to disable the old one and drop it. but, as hmaster was dealing
> lots of locks and procedures, i wasn't able to disable the old table. some
> regions remain in disabling state again. so i decided to set that table's
> state to disabled with hbck2 and then i succeeded to drop them.
> after i put all my tables to online and all my old tables dropped
> successfully, masterprocwals was the last stop to a clean hbase, i thought
> :) i moved aside masterprocwals directory and restarted the active master.
> the new master took control and voila! master procedures & locks became
> clear, and all my tables were online as needed! i scanned hbase:meta table
> and saw there is no other regions than the ones online.
> until now.. remember those regions who were stuck and forced to close to
> disable and drop the tables? when a region server is crashed and restarted
> for some reason now, those regions are tried to be assigned by the master
> to region servers. but region servers decline that assignment as there is
> no table descriptor for those regions. take a look at HBASE-22780
> . exactly the same
> problem is issued here.
> i tried to create a 1-regioned table with the same name as the old table.
> it succeeded. and the ghost region followed that table. then disabled and
> dropped them again successfully. and again explored that hbase:meta doesn't
> have that region anymore. but after a region server crash it comes again
> from nowhere. so i figured out that when a region server comes down hmaster
> does not read hbase:meta table to assign that server's regions to other
> servers. i've read that master processes have some in-memory representation
> of hbase:meta table in order to perform assignment issues as fast as
> possible. i would clean hbase:meta from those ghost regions as explained,
> but i have to force the masters to get this clean copy of hbase:meta to
> their in-memory representations. how can i achieve that? assume that i have
> cleared meta table and now what? rolling restart of hmasters? do standby
> masters share the same in-memory meta table with the active one? if that's
> the case i think rolling restart wouldn't solve that problem.. or should i
> shut all masters down and then start them again in order to force them to
> rebuild their in-memories from meta table?
> any helps would be appreciated.
> thank y

Re: Violating strong consistency after the fact

2020-07-03 Thread Wellington Chevreuil
On details about hdfs write process:
https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/

Em sex., 3 de jul. de 2020 às 15:21, Paul Carey 
escreveu:

> That's very helpful, many thanks.
>
> On Fri, Jul 3, 2020 at 2:36 PM 张铎(Duo Zhang) 
> wrote:
> >
> > You can see my design doc for async dfs output
> >
> >
> https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#heading=h.2jvw6cxnmirr
> >
> >
> > See the footnote below section 3.4. For the current HDFS pipeline
> > implementation, it could be a problem for replication in HBase, though it
> > rarely happens.
> >
> > And now HBase has its own AsyncFSWAL implementation, HBASE-14004 is used
> to
> > resolve the problem(although later we make things wrong and HBASE-24625
> is
> > the fix).
> >
> > And for WAL recovery, it will not be a problem. We will only return
> success
> > to client after all the replicas have been successfully committed, so if
> > DN2 goes offline, we will close the current file and commit it, and open
> a
> > new file to write WAL.
> >
> > Thanks.
> >
> > Paul Carey  于2020年7月3日周五 下午7:40写道:
> >
> > > >  If the hdfs write succeeded while u had only one DN available, then
> the
> > > other replica on the offline DN would be invalid now.
> > >
> > > Interesting, I wasn't aware of this. Are there any docs you could
> > > point me towards where this is described? I've had a look in Hadoop:
> > > The Definitive Guide and the official docs, but hadn't come across
> > > this.
> > >
> > > On Fri, Jul 3, 2020 at 11:19 AM Wellington Chevreuil
> > >  wrote:
> > > >
> > > > This is actually an hdfs consistency question, not hbase. If the hdfs
> > > write
> > > > succeeded while u had only one DN available, then the other replica
> on
> > > the
> > > > offline DN would be invalid now. Then what u have is an under
> replicated
> > > > block, and of your only available DN goes offline before it could be
> > > > replicated, the file that block belongs to now is corrupt. If I turn
> on
> > > the
> > > > previous offline DN, it would still be corrupt as the replica it has
> is
> > > not
> > > > valid anymore (NN knows which is the last valid version of the
> replica),
> > > so
> > > > unless u can bring back the DN that has the only valid replica, your
> > > hfilr
> > > > is corrupt and your data is lost.
> > > >
> > > > On Fri, 3 Jul 2020, 09:12 Paul Carey, 
> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I'd like to understand how HBase deals with the situation where the
> > > > > only available DataNodes for a given offline Region contain stale
> > > > > data. Will HBase allow the Region to be brought online again,
> > > > > effectively making the inconsistency permanent, or will it refuse
> to
> > > > > do so?
> > > > >
> > > > > My question is motivated from seeing how Kafka and Elasticsearch
> > > > > handle this scenario. They both allow the inconsistency to become
> > > > > permanent, Kafka via unclean leader election, and Elasticsearch via
> > > > > the allocate_stale_primary command.
> > > > >
> > > > > To better understand my question, please consider the following
> > > example:
> > > > >
> > > > > - HDFS is configured with `dfs.replication=2` and
> > > > > `dfs.namenode.replication.min=1`
> > > > > - DataNodes DN1 and DN2 contain the blocks for Region R1
> > > > > - DN2 goes offline
> > > > > - R1 receives a writes which succeeds as it can be written
> > > successfully to
> > > > > DN1
> > > > > - DN1 goes offline before the NameNode can replicate the
> > > > > under-replicated block containing the write to another DataNode
> > > > > - At this point the R1 is offline
> > > > > - DN2 comes back online, but it does not contain the missed write
> > > > >
> > > > > There are now two options:
> > > > >
> > > > > - R1 is brought back online, violating consistency
> > > > > - R1 remains offline, indefinitely, until DN1 is brought back
> online
> > > > >
> > > > > How does HBase deal with this situation?
> > > > >
> > > > > Many thanks
> > > > >
> > > > > Paul
> > > > >
> > >
>


Re: Violating strong consistency after the fact

2020-07-03 Thread Wellington Chevreuil
This is actually an hdfs consistency question, not hbase. If the hdfs write
succeeded while u had only one DN available, then the other replica on the
offline DN would be invalid now. Then what u have is an under replicated
block, and of your only available DN goes offline before it could be
replicated, the file that block belongs to now is corrupt. If I turn on the
previous offline DN, it would still be corrupt as the replica it has is not
valid anymore (NN knows which is the last valid version of the replica), so
unless u can bring back the DN that has the only valid replica, your hfilr
is corrupt and your data is lost.

On Fri, 3 Jul 2020, 09:12 Paul Carey,  wrote:

> Hi
>
> I'd like to understand how HBase deals with the situation where the
> only available DataNodes for a given offline Region contain stale
> data. Will HBase allow the Region to be brought online again,
> effectively making the inconsistency permanent, or will it refuse to
> do so?
>
> My question is motivated from seeing how Kafka and Elasticsearch
> handle this scenario. They both allow the inconsistency to become
> permanent, Kafka via unclean leader election, and Elasticsearch via
> the allocate_stale_primary command.
>
> To better understand my question, please consider the following example:
>
> - HDFS is configured with `dfs.replication=2` and
> `dfs.namenode.replication.min=1`
> - DataNodes DN1 and DN2 contain the blocks for Region R1
> - DN2 goes offline
> - R1 receives a writes which succeeds as it can be written successfully to
> DN1
> - DN1 goes offline before the NameNode can replicate the
> under-replicated block containing the write to another DataNode
> - At this point the R1 is offline
> - DN2 comes back online, but it does not contain the missed write
>
> There are now two options:
>
> - R1 is brought back online, violating consistency
> - R1 remains offline, indefinitely, until DN1 is brought back online
>
> How does HBase deal with this situation?
>
> Many thanks
>
> Paul
>


Re: [DISCUSS] Normalizer and pre-split tables

2020-06-29 Thread Wellington Chevreuil
>
> Nice. I didn't know about this. I see the tool wants you to specify a
> desired number of regions. In my particular case I don't have a view on the
> number of regions I want. I just know that all the post compaction 0 sized
> regions should go. Can I still make use of this tool?


For optimal results, you would like to have more insights about your
current regions deployment, such as how many regions you have, and avg
region size, but you could try passing a small target, such as '1', but
also decrease "hbase.tools.merge.upper.mark" to something like 0.5 (so that
you don't merge regions that would result in a region with size >= half of
"hbase.hregion.max.filesize" value), and decrease
"hbase.tools.max.iterations.blocked" to 2, so that you don't waste cycles
trying to reach the passed target (which may never be achievable, once you
had decreased "hbase.tools.merge.upper.mark"). Also, bear in mind this is a
maintenance operation, and there would be impacts on client applications,
as regions would be made unavailable while getting merged.

Em seg., 29 de jun. de 2020 às 16:04, Nick Dimiduk 
escreveu:

> On Mon, Jun 29, 2020 at 7:13 AM Whitney Jackson 
> wrote:
>
> > > The trouble is we ship defaults for all of the `*min*` configs, and
> right
> > now there's no way to "unset" them, disable the functionality.
> >
> > Why is that the case? Can I not just set
> > hbase.normalizer.merge.min_region_size.mb to 0? Do I risk blowing away
> > regions from pre-splits or something?
> >
>
> Yes, the idea was to guard against merging away intentional pre-splits.
>
> On Mon, Jun 29, 2020 at 2:42 AM Wellington Chevreuil <
> > wellington.chevre...@gmail.com> wrote:
> >
> > > >
> > > > The trouble is we ship defaults for all of the `*min*` configs, and
> > right
> > > > now there's no way to "unset" them, disable the functionality. Which
> > > means
> > > > there still isn't a way to support the empty regions use-case without
> > > > awkward special-case checks.
> > > >
> > >
> > > HBASE-23562  added a RegionsMerger tool to hbase-operators-tools
> project,
> > > as a mean to allow multiple merges without checking minimum size. Of
> > course
> > > it's not as convenient as normalizer, but at least gives an alternative
> > for
> > > such edge cases where users ended with lots of empty regions.
> > >
> > > Em sex., 26 de jun. de 2020 às 22:30, Nick Dimiduk <
> ndimi...@apache.org>
> > > escreveu:
> > >
> > > > Heya,
> > > >
> > > > I've seen a lot of use-cases where the normalizer would be a nice
> > > solution
> > > > for operators and application developers. I've been trying to beef it
> > up
> > > a
> > > > bit to handle these cases. However, some of these considerations are
> at
> > > > odds, so I want to vet the ideas here.
> > > >
> > > > The normalizer is a background chore in the HMaster that attempts to
> > > > converge region sizes within a table toward the average region size.
> It
> > > has
> > > > a pretty wide error bar, but that's the overall goal.
> > > >
> > > > Early on, it was observed that an operator needs to pre-split a
> table,
> > so
> > > > special considerations were included, by way of
> > > > `hbase.normalizer.min.region.count`,
> > > > `hbase.normalizer.merge.min_region_age.days`, and
> > > > `hbase.normalizer.merge.min_region_size.mb`. All these nobs are
> > designed
> > > to
> > > > give an operator means of controlling this behavior.
> > > >
> > > > We have (what I see as) a competing objective: doing away with empty,
> > or
> > > > nearly-empty regions. The use-case is pretty common when there's a
> TTL
> > > > applied to a table, especially if there's also a timestamp component
> in
> > > the
> > > > rowkey. In this case, we want the normalizer to "merge away" these
> > empty
> > > > regions.
> > > >
> > > > The trouble is we ship defaults for all of the `*min*` configs, and
> > right
> > > > now there's no way to "unset" them, disable the functionality. Which
> > > means
> > > > there still isn't a way to support the empty regions use-case without
> > > > awkward special-case checks. This is where I'm looking for
> suggestions
> > > from
> > > > the community. There's some discussion under way over on the PR for
> > > > HBASE-24583. Please take a look.
> > > >
> > > > Thanks in advance,
> > > > Nick
> > > >
> > >
> >
>


Re: [DISCUSS] Normalizer and pre-split tables

2020-06-29 Thread Wellington Chevreuil
>
> The trouble is we ship defaults for all of the `*min*` configs, and right
> now there's no way to "unset" them, disable the functionality. Which means
> there still isn't a way to support the empty regions use-case without
> awkward special-case checks.
>

HBASE-23562  added a RegionsMerger tool to hbase-operators-tools project,
as a mean to allow multiple merges without checking minimum size. Of course
it's not as convenient as normalizer, but at least gives an alternative for
such edge cases where users ended with lots of empty regions.

Em sex., 26 de jun. de 2020 às 22:30, Nick Dimiduk 
escreveu:

> Heya,
>
> I've seen a lot of use-cases where the normalizer would be a nice solution
> for operators and application developers. I've been trying to beef it up a
> bit to handle these cases. However, some of these considerations are at
> odds, so I want to vet the ideas here.
>
> The normalizer is a background chore in the HMaster that attempts to
> converge region sizes within a table toward the average region size. It has
> a pretty wide error bar, but that's the overall goal.
>
> Early on, it was observed that an operator needs to pre-split a table, so
> special considerations were included, by way of
> `hbase.normalizer.min.region.count`,
> `hbase.normalizer.merge.min_region_age.days`, and
> `hbase.normalizer.merge.min_region_size.mb`. All these nobs are designed to
> give an operator means of controlling this behavior.
>
> We have (what I see as) a competing objective: doing away with empty, or
> nearly-empty regions. The use-case is pretty common when there's a TTL
> applied to a table, especially if there's also a timestamp component in the
> rowkey. In this case, we want the normalizer to "merge away" these empty
> regions.
>
> The trouble is we ship defaults for all of the `*min*` configs, and right
> now there's no way to "unset" them, disable the functionality. Which means
> there still isn't a way to support the empty regions use-case without
> awkward special-case checks. This is where I'm looking for suggestions from
> the community. There's some discussion under way over on the PR for
> HBASE-24583. Please take a look.
>
> Thanks in advance,
> Nick
>


Re: hbck2 doesn't support fixing region holes

2020-05-28 Thread Wellington Chevreuil
The risks are high, there is currently no plans to add those to hbck2.
HBCK2 does provide many methods that can be used in conjunction to fix
region holes. This is inline with the current *philosophy *for hbck2, where
repairs should be iteratively by operators.

Depending on the nature of your region holes, you may be able to get it
sorted by running below hbck2 methods:
- reportMissingRegionsInMeta
- assigns/unassigns
- addMissingRegionsInMeta
- filesystem

You may also benefit from hbase shell *merge* command, to fix your holes as
well, but again, it all depends on what's causing such holes.

Em qui., 28 de mai. de 2020 às 18:34, Nand kishor Bansal 
escreveu:

> H
>
> I'm trying to use hbck2 (
> https://downloads.apache.org/hbase/hbase-operator-tools-1.0.0/) but can't
> find a way to fix the region holes.
> Most appropriate option to fix the region holes appears to be "filesystem
> --fix"
> However it does not fix the region holes.
> Looking at the source it appears that fixing region holes is not enabled
> out of the box
>
>
>
> https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/FileSystemFsck.java
> :
> /*
>   The below are too radical for hbck2. They are filesystem changes
> only.
>   Need to connect them to hbase:meta and master; master should repair
>   holes and overlaps and adopt regions.
> hbaseFsck.setFixHdfsOrphans(fix);
>   hbaseFsck.setFixHdfsHoles(fix);
>   hbaseFsck.setFixHdfsOverlaps(fix);
>   hbaseFsck.setFixTableOrphans(fix);
>   */
>
>
> Any reason why these are not enabled. What would be risk of enabling
> these fixes.
> HBase version 2.1.4 (http://archive.apache.org/dist/hbase/2.1.4/)
>
> Thanks,
>
> Nand
>


Re: [ANNOUNCE] New HBase committer Wei-Chiu Chuang

2020-05-14 Thread Wellington Chevreuil
Congratulations, Wei-Chiu! Welcome!

Em qui., 14 de mai. de 2020 às 10:12, Jan Hentschel <
jan.hentsc...@ultratendency.com> escreveu:

> Congratulations Wei-Chiu and welcome!
>
> From: Sean Busbey 
> Reply-To: "d...@hbase.apache.org" 
> Date: Wednesday, May 13, 2020 at 9:10 PM
> To: dev , Hbase-User 
> Subject: [ANNOUNCE] New HBase committer Wei-Chiu Chuang
>
> Folks,
>
> On behalf of the Apache HBase PMC I am pleased to announce that Wei-Chiu
> Chuang has accepted the PMC's invitation to become a committer on the
> project.
>
> We appreciate all of the great contributions Wei-Chiu has made to the
> community thus far and we look forward to his continued involvement.
>
> Allow me to be the first to congratulate Wei-Chiu on his new role!
>
> thanks,
> busbey
>
>


Re: How to delete row with Long.MAX_VALUE timestamp

2020-05-13 Thread Wellington Chevreuil
Yeah, creating hfiles manually with Long.MAX_VALUE Delete markers for those
cells would be my next suggestion. It would be nice to confirm how those
Cells could get through with Long.MAX_VALUE timestamp, it would be
surprising if it was WAL replay, I would expect it would reuse the
timestamps checks from the client write path.

Em qua., 13 de mai. de 2020 às 06:33, Bharath Vissapragada <
bhara...@apache.org> escreveu:

> Interesting behavior, I just tried it out on my local setup (master/HEAD)
> out of curiosity to check if we can trick HBase into deleting this bad row
> and the following worked for me. I don't know how you ended up with that
> row though (bad bulk load? just guessing).
>
> To have a table with the Long.MAX timestamp, I commented out some pieces of
> HBase code so that it doesn't override the timestamp with the current
> millis on the region server (otherwise, I just see the expected behavior of
> current ms).
>
> *Step1: Create a table and generate the problematic row*
>
> hbase(main):002:0> create 't1', 'f'
> Created table t1
>
> -- patch hbase to accept Long.MAX_VALUE ts ---
>
> hbase(main):005:0> put 't1', 'row1', 'f:a', 'val', 9223372036854775807
> Took 0.0054 seconds
>
> -- make sure the put with the ts is present --
> hbase(main):006:0> scan 't1'
> ROW  COLUMN+CELL
>
>  row1column=f:a, timestamp=
> *9223372036854775807*, value=val
>
> 1 row(s)
> Took 0.0226 seconds
>
> *Step 2: Hand craft an HFile with the delete marker*
>
>  ...with this row/col/max ts [Let me know if you want the code, I can put
> it somewhere. I just used the StoreFileWriter utility ]
>
> -- dump the contents of hfile using the utility ---
>
> $ bin/hbase hfile -f file:///tmp/hfiles/f/bf84f424544f4675880494e09b750ce8
> -p
> ..
> Scanned kv count -> 1
> K: row1/f:a/LATEST_TIMESTAMP/Delete/vlen=0/seqid=0 V:  < Delete marker
>
> *Step 3: Bulk load this HFile with the delete marker *
>
> bin/hbase completebulkload file:///tmp/hfiles t1
>
> *Step 4: Make sure the delete marker is inserted correctly.*
>
> hbase(main):001:0> scan 't1'
> ..
>
> 0 row(s)
> Took 0.1387 seconds
>
> -- Raw scan to make sure the delete marker is inserted and nothing funky is
> happening ---
>
> hbase(main):003:0> scan 't1', {RAW=>true}
> ROW  COLUMN+CELL
>
>
>  row1column=f:a,
> timestamp=9223372036854775807, type=Delete
>
>  row1column=f:a,
> timestamp=9223372036854775807, value=val
>
> 1 row(s)
> Took 0.0044 seconds
>
> Thoughts?
>
> On Tue, May 12, 2020 at 2:00 PM Alexander Batyrshin <0x62...@gmail.com>
> wrote:
>
> > Table is ~ 10TB SNAPPY data. I don’t have such a big time window on
> > production for re-inserting all data.
> >
> > I don’t know how we got those cells. I can only assume that this is
> > phoenix and/or replaying from WAL after region server crash.
> >
> > > On 12 May 2020, at 18:25, Wellington Chevreuil <
> > wellington.chevre...@gmail.com> wrote:
> > >
> > > How large is this table? Can you afford re-insert all current data on a
> > > new, temp table? If so, you could write a mapreduce job that scans this
> > > table and rewrite all its cells to this new, temp table. I had verified
> > > that 1.4.10 does have the timestamp replacing logic here:
> > >
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395
> > <
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395
> > >
> > >
> > > So if you re-insert all this table cells into a new one, the timestamps
> > > would be inserted correctly and you would then be able to delete those.
> > > Now, how those cells managed to get inserted with max timestamp? Was
> this
> > > cluster running on an old version that then got upgraded to 1.4.10?
> > >
> > >
> > > Em ter., 12 de mai. de 2020 às 13:49, Alexander Batyrshin <
> > 0x62...@gmail.com <mailto:0x62...@gmail.com>>
> > > escreveu:
> > >
> > >> Any ideas how to delete these rows?
> > >>
> > >> I see only this way:
> > >> - backup data from region that contains “damaged” rows
> > >> - close regio

Re: How to delete row with Long.MAX_VALUE timestamp

2020-05-12 Thread Wellington Chevreuil
How large is this table? Can you afford re-insert all current data on a
new, temp table? If so, you could write a mapreduce job that scans this
table and rewrite all its cells to this new, temp table. I had verified
that 1.4.10 does have the timestamp replacing logic here:
https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395

So if you re-insert all this table cells into a new one, the timestamps
would be inserted correctly and you would then be able to delete those.
Now, how those cells managed to get inserted with max timestamp? Was this
cluster running on an old version that then got upgraded to 1.4.10?


Em ter., 12 de mai. de 2020 às 13:49, Alexander Batyrshin <0x62...@gmail.com>
escreveu:

> Any ideas how to delete these rows?
>
> I see only this way:
> - backup data from region that contains “damaged” rows
> - close region
> - remove region files from HDFS
> - assign region
> - copy needed rows from backup to recreated region
>
> > On 30 Apr 2020, at 21:00, Alexander Batyrshin <0x62...@gmail.com> wrote:
> >
> > The same effect for CF:
> >
> > d =
> org.apache.hadoop.hbase.client.Delete.new("\x0439d58wj434dd".to_s.to_java_bytes)
> > d.deleteFamily("d".to_s.to_java_bytes,
> 9223372036854775807.to_java(Java::long))
> > table.delete(d)
> >
> > ROW  COLUMN+CELL
> >  \x0439d58wj434dd        column=d:,
> timestamp=1588269277879, type=DeleteFamily
> >
> >
> >> On 29 Apr 2020, at 18:30, Wellington Chevreuil <
> wellington.chevre...@gmail.com <mailto:wellington.chevre...@gmail.com>>
> wrote:
> >>
> >> Well, it's weird that puts with such TS values were allowed, according
> to
> >> current code state. Can you afford delete the whole CF for those rows?
> >>
> >> Em qua., 29 de abr. de 2020 às 14:41, junhyeok park <
> runnerren...@gmail.com <mailto:runnerren...@gmail.com>>
> >> escreveu:
> >>
> >>> I've been through the same thing. I use 2.2.0
> >>>
> >>> 2020년 4월 29일 (수) 오후 10:32, Alexander Batyrshin <0x62...@gmail.com
> <mailto:0x62...@gmail.com>>님이 작성:
> >>>
> >>>> As you can see in example I already tried DELETE operation with
> timestamp
> >>>> = Long.MAX_VALUE without any success.
> >>>>
> >>>>> On 29 Apr 2020, at 12:41, Wellington Chevreuil <
> >>>> wellington.chevre...@gmail.com <mailto:wellington.chevre...@gmail.com>>
> wrote:
> >>>>>
> >>>>> That's expected behaviour [1]. If you are "travelling to the future",
> >>> you
> >>>>> need to do a delete specifying Long.MAX_VALUE timestamp as the
> >>> timestamp
> >>>>> optional parameter in the delete operation [2], if you don't specify
> >>>>> timestamp on the delete, it will assume current time for the delete
> >>>> marker,
> >>>>> which will be smaller than the Long.MAX_VALUE set to your cells, so
> >>> scans
> >>>>> wouldn't filter it.
> >>>>>
> >>>>> [1] https://hbase.apache.org/book.html#version.delete <
> https://hbase.apache.org/book.html#version.delete>
> >>>>> [2]
> >>>>>
> >>>>
> >>>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> <
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> >
> >>>>>
> >>>>> Em qua., 29 de abr. de 2020 às 08:57, Alexander Batyrshin <
> >>>> 0x62...@gmail.com>
> >>>>> escreveu:
> >>>>>
> >>>>>> Hello all,
> >>>>>> We had faced with strange situation: table has rows with
> >>> Long.MAX_VALUE
> >>>>>> timestamp.
> >>>>>> These rows impossible to delete, because DELETE mutation uses
> >>>>>> System.currentTimeMillis() timestamp.
> >>>>>> Is there any way to delete these rows?
> >>>>>> We use HBase-1.4.10
> >>>>>>
> >>>>>> Example:
> >>>>>>
> >>>>>> hbase(main):037:0> scan 'TRACET', { ROWPREFIXFILTER

Re: how to scan for all values which don't have given timestamps?

2020-05-11 Thread Wellington Chevreuil
I don;t think there's any built-in filter for that. You would need to
implement your own filter with this logic.

Em sáb., 9 de mai. de 2020 às 21:56, Vitaliy Semochkin 
escreveu:

> Hello everyone,
>
> Hi, I need to scan for all values which don't have given timestamps?
>
> Does anyone know how to scan for all rows with all versions except that are
> not in a given Set of timestamps?
> (i.e. opposite to TimestampsFilter)
> <
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/TimestampsFilter.html
> >
> Which filter should I use?
>
> Regards,
> Vitaliy
>


Re: How to delete row with Long.MAX_VALUE timestamp

2020-04-29 Thread Wellington Chevreuil
Well, it's weird that puts with such TS values were allowed, according to
current code state. Can you afford delete the whole CF for those rows?

Em qua., 29 de abr. de 2020 às 14:41, junhyeok park 
escreveu:

> I've been through the same thing. I use 2.2.0
>
> 2020년 4월 29일 (수) 오후 10:32, Alexander Batyrshin <0x62...@gmail.com>님이 작성:
>
> > As you can see in example I already tried DELETE operation with timestamp
> > = Long.MAX_VALUE without any success.
> >
> > > On 29 Apr 2020, at 12:41, Wellington Chevreuil <
> > wellington.chevre...@gmail.com> wrote:
> > >
> > > That's expected behaviour [1]. If you are "travelling to the future",
> you
> > > need to do a delete specifying Long.MAX_VALUE timestamp as the
> timestamp
> > > optional parameter in the delete operation [2], if you don't specify
> > > timestamp on the delete, it will assume current time for the delete
> > marker,
> > > which will be smaller than the Long.MAX_VALUE set to your cells, so
> scans
> > > wouldn't filter it.
> > >
> > > [1] https://hbase.apache.org/book.html#version.delete
> > > [2]
> > >
> >
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> > >
> > > Em qua., 29 de abr. de 2020 às 08:57, Alexander Batyrshin <
> > 0x62...@gmail.com>
> > > escreveu:
> > >
> > >> Hello all,
> > >> We had faced with strange situation: table has rows with
> Long.MAX_VALUE
> > >> timestamp.
> > >> These rows impossible to delete, because DELETE mutation uses
> > >> System.currentTimeMillis() timestamp.
> > >> Is there any way to delete these rows?
> > >> We use HBase-1.4.10
> > >>
> > >> Example:
> > >>
> > >> hbase(main):037:0> scan 'TRACET', { ROWPREFIXFILTER =>
> > "\x0439d58wj434dd",
> > >> RAW=>true, VERSIONS=>10}
> > >> ROW
> COLUMN+CELL
> > >> \x0439d58wj434dd   column=d:_0,
> > >> timestamp=9223372036854775807, value=x
> > >>
> > >>
> > >> hbase(main):045:0* delete 'TRACET', "\x0439d58wj434dd", "d:_0"
> > >> 0 row(s) in 0.0120 seconds
> > >>
> > >> hbase(main):046:0> scan 'TRACET', { ROWPREFIXFILTER =>
> > "\x0439d58wj434dd",
> > >> RAW=>true, VERSIONS=>10}
> > >> ROW
> COLUMN+CELL
> > >> \x0439d58wj434dd   column=d:_0,
> > >> timestamp=9223372036854775807, value=x
> > >> \x0439d58wj434dd   column=d:_0,
> > >> timestamp=1588146570005, type=Delete
> > >>
> > >>
> > >> hbase(main):047:0> delete 'TRACET', "\x0439d58wj434dd", "d:_0",
> > >> 9223372036854775807
> > >> 0 row(s) in 0.0110 seconds
> > >>
> > >> hbase(main):048:0> scan 'TRACET', { ROWPREFIXFILTER =>
> > "\x0439d58wj434dd",
> > >> RAW=>true, VERSIONS=>10}
> > >> ROW
> COLUMN+CELL
> > >> \x0439d58wj434dd   column=d:_0,
> > >> timestamp=9223372036854775807, value=x
> > >> \x0439d58wj434dd   column=d:_0,
> > >> timestamp=1588146678086, type=Delete
> > >> \x0439d58wj434dd   column=d:_0,
> > >> timestamp=1588146570005, type=Delete
> > >>
> > >>
> > >>
> > >>
> >
> >
>


Re: How to delete row with Long.MAX_VALUE timestamp

2020-04-29 Thread Wellington Chevreuil
That's expected behaviour [1]. If you are "travelling to the future", you
need to do a delete specifying Long.MAX_VALUE timestamp as the timestamp
optional parameter in the delete operation [2], if you don't specify
timestamp on the delete, it will assume current time for the delete marker,
which will be smaller than the Long.MAX_VALUE set to your cells, so scans
wouldn't filter it.

[1] https://hbase.apache.org/book.html#version.delete
[2]
https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98

Em qua., 29 de abr. de 2020 às 08:57, Alexander Batyrshin <0x62...@gmail.com>
escreveu:

>  Hello all,
> We had faced with strange situation: table has rows with Long.MAX_VALUE
> timestamp.
> These rows impossible to delete, because DELETE mutation uses
> System.currentTimeMillis() timestamp.
> Is there any way to delete these rows?
> We use HBase-1.4.10
>
> Example:
>
> hbase(main):037:0> scan 'TRACET', { ROWPREFIXFILTER => "\x0439d58wj434dd",
> RAW=>true, VERSIONS=>10}
> ROW  COLUMN+CELL
>  \x0439d58wj434dd   column=d:_0,
> timestamp=9223372036854775807, value=x
>
>
> hbase(main):045:0* delete 'TRACET', "\x0439d58wj434dd", "d:_0"
> 0 row(s) in 0.0120 seconds
>
> hbase(main):046:0> scan 'TRACET', { ROWPREFIXFILTER => "\x0439d58wj434dd",
> RAW=>true, VERSIONS=>10}
> ROW  COLUMN+CELL
>  \x0439d58wj434dd   column=d:_0,
> timestamp=9223372036854775807, value=x
>  \x0439d58wj434dd   column=d:_0,
> timestamp=1588146570005, type=Delete
>
>
> hbase(main):047:0> delete 'TRACET', "\x0439d58wj434dd", "d:_0",
> 9223372036854775807
> 0 row(s) in 0.0110 seconds
>
> hbase(main):048:0> scan 'TRACET', { ROWPREFIXFILTER => "\x0439d58wj434dd",
> RAW=>true, VERSIONS=>10}
> ROW  COLUMN+CELL
>  \x0439d58wj434dd   column=d:_0,
> timestamp=9223372036854775807, value=x
>  \x0439d58wj434dd   column=d:_0,
> timestamp=1588146678086, type=Delete
>  \x0439d58wj434dd   column=d:_0,
> timestamp=1588146570005, type=Delete
>
>
>
>


Re: How to use Pluggable RPC authentication

2020-04-27 Thread Wellington Chevreuil
>
> how to use or config the custom authentication ?
>
Assuming your authentication solution implements properly the key
interfaces *SaslServerAuthenticationProvider*,
*SaslClientAuthenticationProvider* and *BuiltInProviderSelector*, you need
to specify each of these implementations on the
*hbase.client.sasl.provider.class*, *hbase.client.sasl.provider.extras* and
*hbase.server.sasl.provider.extras* you had mentioned. These must be set on
both client and server side, so your custom classes must be present on both
client and server classpaths.

I check the test code ,why we need setUp kerberos first?
>
Hbase security here means an authentication system is set. Right now, hbase
only supports kerberos as an authentication mechanism. So the RPC
sub-system would only apply any of its authentication checks if security is
enabled, and by that we mean *hbase.security.authentication *is set to
*kerberos.*


> if i add a extras but the client use simple
>
> the auth can be by pass
>
This would only happen if you had explicitly set
*hbase.ipc.server.fallback-to-simple-auth-allowed* property to *true* on
the server configuration.

what i missed?
>
Can you share both your client and server configs? Do you see any
suggestive messages on client/server logs (TRACE level might be helpful
here)? Would it be feasible for you to share your implementations of
*SaslServerAuthenticationProvider*, *SaslClientAuthenticationProvider*
and *BuiltInProviderSelector
*(I understand this might not be desirable, maybe just some snippets of
specific method implementation here)? For instance, both Server and Client
provider implementations should return the very same type in getTokenKind()
method implementation.

Em seg., 27 de abr. de 2020 às 05:59, 陈叶超  escreveu:

> Hi all:
>
> in https://issues.apache.org/jira/browse/HBASE-23347 introduce a puuggable
> rpc authentication
>
> https://github.com/apache/hbase/pull/884
>
> I want to use this pr to create a custom auth
>
> how to use or config the custom authentication ?
>
> I check the test code ,why we need setUp kerberos first?
>
> and seems server offer three simple/gss/ Digest and  extras auth*
>
> if i add a extras but the client use simple
>
> the auth can be by pass
>
> now i just set three properties :
>
> hbase.client.sasl.provider.class
>
> hbase.client.sasl.provider.extras
>
> hbase.server.sasl.provider.extras
>
> what i missed?
>
> I check the code
>
> in ServerRpcConnection.java
>
>
> ```java
> //from me: if client pass the simple authbyte? we just use simple???
>  this.provider = this.saslProviders.selectProvider(authbyte);
> if (this.provider == null) {
>   String msg = getFatalConnectionString(version, authbyte);
>   doBadPreambleHandling(msg, new BadAuthException(msg));
>   return false;
> }
> //from me : don't understand here
> // TODO this is a wart while simple auth'n doesn't go through sasl.
>if (this.rpcServer.isSecurityEnabled && isSimpleAuthentication()) {
>   if (this.rpcServer.allowFallbackToSimpleAuth) {
> this.rpcServer.metrics.authenticationFallback();
> authenticatedWithFallback = true;
>   } else {
> AccessDeniedException ae = new
> AccessDeniedException("Authentication is required");
> doRespond(getErrorResponse(ae.getMessage(), ae));
> return false;
>   }
> }
>   //from me : don't understand here ?
>  if (!this.rpcServer.isSecurityEnabled && !isSimpleAuthentication()) {
>   doRawSaslReply(SaslStatus.SUCCESS, new
> IntWritable(SaslUtil.SWITCH_TO_SIMPLE_AUTH), null,
> null);
>   provider = saslProviders.getSimpleProvider();
>   // client has already sent the initial Sasl message and we
>   // should ignore it. Both client and server should fall back
>   // to simple auth from now on.
>   skipInitialSaslHandshake = true;
> }
> useSasl = true;
>
> ```
>


Re: Empty REGIONINFO after Region Server Restart

2020-04-22 Thread Wellington Chevreuil
>
> I am running HBase 1.4.10 on AWS EMR 5.29.0
>
Which file system is hbase root dir under? If it's S3, it's very likely
some of its lack of consistency issues leading to missing regioninfo files.

Em qua., 22 de abr. de 2020 às 02:21, Stack  escreveu:

> On Mon, Apr 20, 2020 at 7:59 AM Mike Linsinbigler 
> wrote:
>
> > Hello,
> >
> > I am running HBase 1.4.10 on AWS EMR 5.29.0 and have had issues after
> > some of our Region Servers restart (due to a variety of reasons) and the
> > cluster enters an inconsistent state after they come back up.
> >
> > Running hbck, I am presented with many instances of:
> >
> > ERROR: Empty REGIONINFO_QUALIFIER found in hbase:meta
> >
> >
> > I am able to resolve this issue by running:
> >
> > "hbase hbck -fixEmptyMetaCells"
> >
> >
> > However, this is only a fix until the next time one of our region
> > servers restart which is currently a daily event. Does anyone know how
> > to prevent this issue from occurring in the first place? It looks like
> > our Region Server was in the middle of splitting and compaction
> > operations before aborting.
> >
> > I've noticed that writing to hbase while these errors are present can
> > result in ingest issues within our application so I'd really like to
> > understand how the meta table can get into this state.
> >
> >
> Rows in hbase:meta w/ empty REGIONINFO_QUALIFIER will mess you up. Shoudn't
> be happening. Have you tried tracing the lifecycle of one of these empty
> rows? The row name is the region name. Search master logs over time using
> the regionname. See if you can get a sense of what is happening
> manufacturing empty rows.
>
> Yours,
> S
>
>
>
> > Thanks,
> >
> > Mike
> >
> >
>


Re: region not found on an online server, hbase 2.1.4, hadoop 3.1.2

2020-02-07 Thread Wellington Chevreuil
Hi Michael,

That indicates some inconsistency in your hbase:meta, where that region
info is not up to date, showing the region online on a region server (RS)
instance that doesn't exist anymore (the 1580999341692 indicates the
timestamp of when that RS instance was started. Everytime you restart an
RS, a new instance is created and that "code" receives the timestamp of the
new start time).

What are the (possible) reasons for the error?
>
Looks familiar to a few known issues, such as HBASE-23594 or HBASE-21344,
to mention a few, but a thorough investigation would be needed to determine
what actually caused this, in this specific case. Where there any RS
crashes/restarts or any sort of manual intervention applied to this hbase
cluster?

How do I get the region online
> again?
>
You would need to force this region to get re-assigned. If you can afford
temporary unavailability of this table region, you could try hbase shell
disable/enable command for this table. If disable fails to complete because
of this inconsistency, next resource would be to use hbck2
unassigns/assigns command pair on that region. HBCK2 is shipped as a
separate project, you can download it from:
https://hbase.apache.org/downloads.html

Em sex., 7 de fev. de 2020 às 10:47, Michael Wohlwend <
mich...@fantasymail.de> escreveu:

> Hi,
>
> I have the following error in the logs:
>
> Not running balancer because 1 regions found not on an online server
> {...} state=OPEN ts=... server=hadoop-data04, 16020, 1580999341692 's
> server
> is not in the online server list
>
> All five region servers are working, it's this one region which doesn't
> seem to
> be found. No other errors are reported.
>
> In the log, the server is identified with
> server=hadoop-data04, 16020, 1580999341692
>
> On the webinterface the server is identified with
> server=hadoop-data04, 16020, 1581068616107
>
> Is this ok, that the last number is different?
>
> What are the (possible) reasons for the error? How do I get the region
> online
> again?
>
> Thanks
>  Michael
>
>
>
>
>


Re: [ANNOUNCE] New HBase committer Bharath Vissapragada

2020-02-06 Thread Wellington Chevreuil
Congratulations and welcome to the team, Bharath!

Em qui., 6 de fev. de 2020 às 19:25, Esteban Gutierrez
 escreveu:

> Yay! Congratulations Bharath!
>
> esteban.
> --
> Cloudera, Inc.
>
>
>
> On Thu, Feb 6, 2020 at 12:07 PM Andrew Purtell 
> wrote:
>
> > Congratulations and welcome, Bharath!
> >
> > On Wed, Feb 5, 2020 at 7:36 PM Nick Dimiduk  wrote:
> >
> > > On behalf of the Apache HBase PMC I am pleased to announce that Bharath
> > > Vissapragada has accepted the PMC's invitation to become a commiter on
> > the
> > > project. We appreciate all of Bharath's generous contributions thus far
> > and
> > > look forward to his continued involvement.
> > >
> > > Allow me to be the first to congratulate and welcome Bharath into his
> new
> > > role!
> > >
> > > Thanks,
> > > Nick
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>


Re: data replication if cluster replication is enabled later

2020-01-20 Thread Wellington Chevreuil
Data will be targeted to replication only once you have a replication peer
added and your related table column family REPLICATION_SCOPE is set to '1'.
If you had your table column family REPLICATION_SCOPE set to '1', but no
replication peer added, then no data inserted/modified prior to having a
peer added would get replicated.

Em seg., 20 de jan. de 2020 às 00:08, sudhir patil 
escreveu:

> Have query regarding data replication in hbase, in case when cluster
> replication is enabled after data is inserted to table.
>
> If we have existing hbase table with replication enabled & later point in
> time hbase cluster replication is enabled. Is only data saved after cluster
> replication enabled will be copied over? or all the data saved prior to
> cluster replication enabling will also be replicated?
>


Re: Any issues, if we enable table level replication when cluster replication is disabled

2020-01-20 Thread Wellington Chevreuil
If you are planning to enable it using 'alter' command or java admin API,
no issues, replication will not be initiated until you add a replication
peer. If you try to do it using hbase shell "enable_table_replication"
command, the command will fail.

Em seg., 20 de jan. de 2020 às 00:00, sudhir patil 
escreveu:

> Hi,
>
> Will there be any issues, if we enable table level replication
> (REPLICATION_SCOPE=>1), when cluster replication is disabled? Will there be
> any issues with WAL log rollover or accumulation?
>
> Thanks,
> Sudhir
>


Re: [ANNOUNCE] New HBase committer Viraj Jasani

2019-12-27 Thread Wellington Chevreuil
Congratulations, Viraj! Welcome aboard!

Em sex., 27 de dez. de 2019 às 13:02, Peter Somogyi 
escreveu:

> On behalf of the Apache HBase PMC I am pleased to announce that
> Viraj Jasani has accepted the PMC's invitation to become a
> commiter on the project.
>
> Thanks so much for the work you've been contributing. We look forward
> to your continued involvement.
>
> Congratulations and welcome!
>


Re: [ANNOUNCE] Please welcome Guangxu Cheng the HBase PMC

2019-12-09 Thread Wellington Chevreuil
Congratulations Guangxu!

Em seg., 9 de dez. de 2019 às 10:08, Balazs Meszaros
 escreveu:

> Congratulations and welcome Guangxu!
>
> On Mon, Dec 9, 2019 at 10:51 AM 宾莉金(binlijin)  wrote:
>
> > Congratulations!
> >
> > Duo Zhang  于2019年12月9日周一 下午5:47写道:
> >
> > > On behalf of the Apache HBase PMC I am pleased to announce that Guangxu
> > > Cheng has accepted our invitation to become a PMC member on the Apache
> > > HBase project. We appreciate Guangxu Cheng stepping up to take more
> > > responsibility in the HBase project.
> > >
> > > Please join me in welcoming Guangxu Cheng to the HBase PMC!
> > >
> >
> >
> > --
> > *Best Regards,*
> >  lijin bin
> >
>


Re: Deleting a (contiguous) subset of the columns in a row

2019-11-11 Thread Wellington Chevreuil
I don't think you would have an easier way to do this without having to
redefine your table layout, so that you split these two groups into
separate column families, and apply this "classification" logic at
insertion time to determine which column family a given cell should go.

Another possibility, if you are able to calculate the possible column label
values in advance, is to add all possible column name values that should
get deleted into the "Delete" operation using "Delete.addColumns" method:
https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/client/Delete.html#addColumns(byte[],%20byte[])

Em seg, 11 de nov de 2019 às 05:00, Wilson, Huon (Data61, Eveleigh) <
huon.wil...@data61.csiro.au> escreveu:

> We've got a data model where columns have a logical association, and this
> is encoded into the column qualifiers by having each group be a contiguous
> range of qualifiers. For instance, columns with first byte 0x00, 0x01, 0x02
> or 0x03 form group A and columns with first byte 0x04 or 0x05 form group B.
>
> We'd like to efficiently delete just group A from a row, while leaving
> everything in group B, which currently seems to require two steps: read the
> row to find the column qualifiers that exist in group A (we can use a
> ColumnRangeFilter to at least ignore everything in group B), and then doing
> a delete after .addColumns-ing those qualifiers.
>
> Is there a better way to do this? For instance, a similar way to apply
> filters to a delete?
>
> ---
> Huon Wilson
> CSIRO | Data61
> https://www.data61.csiro.au


Re: Completing a bulk load from HFiles stored in S3

2019-11-01 Thread Wellington Chevreuil
Ah yeah, didn't realise it would assume same FS, internally. Indeed, no way
to have rename working between different FSes.

Em qui, 31 de out de 2019 às 16:25, Josh Elser  escreveu:

> Short answer: no, it will not work and you need to copy it to HDFS first.
>
> IIRC, the bulk load code is ultimately calling a filesystem rename from
> the path you provided to the proper location in the hbase.rootdir's
> filesystem. I don't believe that an `fs.rename` is going to work across
> filesystems because you can't do this atomically, which HDFS guarantees
> for the rename method [1]
>
> Additionally, for Kerberos-secured clusters, the server-side bulk load
> logic expects that the filesystem hosting your hfiles is HDFS (in order
> to read the files with the appropriate authentication). This fails right
> now, but is something our PeterS is looking at.
>
> [1]
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29
>
> On 10/31/19 6:55 AM, Wellington Chevreuil wrote:
> > I believe you can specify your s3 path for the hfiles directly, as hdfs
> > FileSystem does support s3a scheme, but you would need to add your s3
> > access and secret key to your completebulkload configuration.
> >
> > Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
> > gauth...@alleninstitute.org> escreveu:
> >
> >> If I have Hfiles stored in S3, can I run CompleteBulkLoad and provide an
> >> S3 Endpoint to run a single command, or do I need to first copy the S3
> >> Hfiles to HDFS first? The documentation is not very clear.
> >>
> >
>


Re: Completing a bulk load from HFiles stored in S3

2019-10-31 Thread Wellington Chevreuil
I believe you can specify your s3 path for the hfiles directly, as hdfs
FileSystem does support s3a scheme, but you would need to add your s3
access and secret key to your completebulkload configuration.

Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
gauth...@alleninstitute.org> escreveu:

> If I have Hfiles stored in S3, can I run CompleteBulkLoad and provide an
> S3 Endpoint to run a single command, or do I need to first copy the S3
> Hfiles to HDFS first? The documentation is not very clear.
>


Re: [ANNOUNCE] Please welcome Wellington Chevreuil to the Apache HBase PMC

2019-10-24 Thread Wellington Chevreuil
Thank you, folks!

Em qui, 24 de out de 2019 às 12:46, Reid Chan 
escreveu:

>
> Welcome Wellington! Congratulations!
>
>
>
> --
>
> Best regards,
> R.C
>
>
>
> 
> From: Salvatore LaMendola (BLOOMBERG/ 731 LEX) 
> Sent: 24 October 2019 04:19
> To: d...@hbase.apache.org
> Cc: user@hbase.apache.org
> Subject: Re: [ANNOUNCE] Please welcome Wellington Chevreuil to the Apache
> HBase PMC
>
> Congrats Sakthi and Wellington!
>
> From: d...@hbase.apache.org At: 10/23/19 16:17:58To:  d...@hbase.apache.org
> Cc:  user@hbase.apache.org
> Subject: Re: [ANNOUNCE] Please welcome Wellington Chevreuil to the Apache
> HBase PMC
>
> Congrats Wellington!
>
> Sakthi
>
> On Wed, Oct 23, 2019 at 1:16 PM Sean Busbey  wrote:
>
> > On behalf of the Apache HBase PMC I am pleased to announce that
> > Wellington Chevreuil has accepted our invitation to become a PMC member
> on
> > the
> > HBase project. We appreciate Wellington stepping up to take more
> > responsibility in the HBase project.
> >
> > Please join me in welcoming Wellington to the HBase PMC!
> >
> >
> >
> > As a reminder, if anyone would like to nominate another person as a
> > committer or PMC member, even if you are not currently a committer or
> > PMC member, you can always drop a note to priv...@hbase.apache.org to
> > let us know.
> >
>
>
>


Re: [ANNOUNCE] Please welcome Sakthi to the Apache HBase PMC

2019-10-24 Thread Wellington Chevreuil
Congratulations, Sakthi!

Em qui, 24 de out de 2019 às 12:38, Reid Chan 
escreveu:

> Welcome Sakthi! Congratulations!
>
>
> --
>
> Best regards,
> R.C
>
>
>
> 
> From: Sean Busbey 
> Sent: 24 October 2019 04:14
> To: dev; Hbase-User
> Subject: [ANNOUNCE] Please welcome Sakthi to the Apache HBase PMC
>
> On behalf of the Apache HBase PMC I am pleased to announce that
> Sakthi has accepted our invitation to become a PMC member on the
> HBase project. We appreciate Sakthi stepping up to take more
> responsibility in the HBase project.
>
> Please join me in welcoming Jan to the HBase PMC!
>
>
>
> As a reminder, if anyone would like to nominate another person as a
> committer or PMC member, even if you are not currently a committer or
> PMC member, you can always drop a note to priv...@hbase.apache.org to
> let us know.
>


Re: [ANNOUNCE] Please welcome Balazs Meszaros to the Apache HBase PMC

2019-10-24 Thread Wellington Chevreuil
Congratulations, Balazs!

Em qui, 24 de out de 2019 às 15:34, Sean Busbey 
escreveu:

> On behalf of the Apache HBase PMC I am pleased to announce that
> Balazs Meszaros has accepted our invitation to become a PMC member on the
> HBase project. We appreciate Balazs stepping up to take more
> responsibility in the HBase project.
>
> Please join me in welcoming Balazs to the HBase PMC!
>
>
>
> As a reminder, if anyone would like to nominate another person as a
> committer or PMC member, even if you are not currently a committer or
> PMC member, you can always drop a note to priv...@hbase.apache.org to
> let us know.
>


Re: zookeeper - cluster redirection not happening

2019-08-22 Thread Wellington Chevreuil
That's not the correct zookeeper quorum property for hbase, you should have
"hbase.zookeeper.quorum" on your hbase-site.xml file.

On Thu, 22 Aug 2019, 18:33 Jignesh Patel,  wrote:

> This is what I see in core-site.xml
>
> 
>
> ha.zookeeper.quorum
>
> server1:2181,server2:2181,server3:2181
>
> 
>
>
> This is what I see in hbase-site.xml
>
>  
>
>  ha.zookeeper.quorum
>
>  server1:2181,server2:2181,server3:2181
>
>  
>
>
>
>
> On Thu, Aug 22, 2019 at 5:21 PM Wellington Chevreuil <
> wellington.chevre...@gmail.com> wrote:
>
> > The shared log snippet mentions a single server in the quorum connection
> > string:
> >
> > 16:38:12,838 ERROR [org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher]
> > > (default task-27) hconnection-0x2f95012, quorum=zookeeper1:2181
> > > <http://us-east-1-zookeeper-aws-1.icare.com:2181/>, baseZNode=/hbase
> > > Received unexpected KeeperException, re-throwing exception:
> > > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > >
> >
> > So it might be you client app config has only one of the quorum servers
> > defined in the connection string. Can you check your client application
> > hbase-site.xml file?
> >
> > Em qui, 22 de ago de 2019 às 18:09, Jignesh Patel <
> jigneshmpa...@gmail.com
> > >
> > escreveu:
> >
> > > We are running Quorum of three zookeepers to connect to our hadoop
> 2.6.0
> > > setup.
> > > However surprisingly if one of the zookeeper goes down our system goes.
> > > Below is a log from our wildfly server, which stopped responding as one
> > of
> > > the zookeeper down and it only tried to connect to that particular
> > > zookeeper.
> > > How do we ensure that if one goes down then also our system should
> > redirect
> > > to next one and keep going?
> > >
> > > 16:38:12,838 ERROR
> > [org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper]
> > > (default task-27) ZooKeeper getData failed after 4 attempts
> > >
> > > 16:38:12,838 ERROR [org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher]
> > > (default task-27) hconnection-0x2f95012, quorum=zookeeper1:2181
> > > <http://us-east-1-zookeeper-aws-1.icare.com:2181/>, baseZNode=/hbase
> > > Received unexpected KeeperException, re-throwing exception:
> > > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
> > >
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> > > [zookeeper-3.4.5.jar:3.4.5-1392090]
> > >
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > [zookeeper-3.4.5.jar:3.4.5-1392090]
> > >
> > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> > > [zookeeper-3.4.5.jar:3.4.5-1392090]
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
> > > [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
> > >
> > > at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:683)
> > > [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:1835)
> > > [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:183)
> > > [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
> > >
> >
>


Re: zookeeper - cluster redirection not happening

2019-08-22 Thread Wellington Chevreuil
The shared log snippet mentions a single server in the quorum connection
string:

16:38:12,838 ERROR [org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher]
> (default task-27) hconnection-0x2f95012, quorum=zookeeper1:2181
> , baseZNode=/hbase
> Received unexpected KeeperException, re-throwing exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
>

So it might be you client app config has only one of the quorum servers
defined in the connection string. Can you check your client application
hbase-site.xml file?

Em qui, 22 de ago de 2019 às 18:09, Jignesh Patel 
escreveu:

> We are running Quorum of three zookeepers to connect to our hadoop 2.6.0
> setup.
> However surprisingly if one of the zookeeper goes down our system goes.
> Below is a log from our wildfly server, which stopped responding as one of
> the zookeeper down and it only tried to connect to that particular
> zookeeper.
> How do we ensure that if one goes down then also our system should redirect
> to next one and keep going?
>
> 16:38:12,838 ERROR [org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper]
> (default task-27) ZooKeeper getData failed after 4 attempts
>
> 16:38:12,838 ERROR [org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher]
> (default task-27) hconnection-0x2f95012, quorum=zookeeper1:2181
> , baseZNode=/hbase
> Received unexpected KeeperException, re-throwing exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
>
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> [zookeeper-3.4.5.jar:3.4.5-1392090]
>
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> [zookeeper-3.4.5.jar:3.4.5-1392090]
>
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> [zookeeper-3.4.5.jar:3.4.5-1392090]
>
> at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
> [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
>
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:683)
> [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
>
> at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:1835)
> [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
>
> at
>
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:183)
> [hbase-client-0.98.1-hadoop2.jar:0.98.1-hadoop2]
>


Re: [ANNOUNCE] New HBase committer Tak-Lon (Stephen) Wu

2019-08-06 Thread Wellington Chevreuil
Congratulations Stephen!

Em ter, 6 de ago de 2019 às 10:32, Anoop John 
escreveu:

> Congratulations Stephen.
>
> -Anoop-
>
>
> On Tue, Aug 6, 2019 at 11:19 AM Pankaj kr  wrote:
>
> > Congratulations Stephen..!!
> >
> > Regards,
> > Pankaj
> >
> > -Original Message-
> > From: Sean Busbey [mailto:bus...@apache.org]
> > Sent: 06 August 2019 00:27
> > To: dev ; user@hbase.apache.org
> > Subject: [ANNOUNCE] New HBase committer Tak-Lon (Stephen) Wu
> >
> > On behalf of the Apache HBase PMC I am super pleased to announce that
> > Tak-Lon (Stephen) Wu has accepted the PMC's invitation to become a
> commiter
> > on the project.
> >
> > Thanks so much for the work you've been contributing. We look forward to
> > your continued involvement.
> >
> > Congratulations and welcome!
> >
> > -busbey
> >
>


Re: [ANNOUNCE] new HBase committer Sakthi

2019-08-01 Thread Wellington Chevreuil
Congratulations, well deserved!!

Em qui, 1 de ago de 2019 às 10:17, Szalay-Beko Mate
 escreveu:

> Congratulations!! :)
>
> On Thu, Aug 1, 2019 at 9:56 AM kevin su  wrote:
>
> > Congratulations 🎉🎉🎉
> >
> > OpenInx 於 2019年8月1日 週四,下午3:17寫道:
> >
> > > Congratulations, Sakthi.
> > >
> > > On Thu, Aug 1, 2019 at 3:09 PM Jan Hentschel <
> > > jan.hentsc...@ultratendency.com> wrote:
> > >
> > > > Congrats Sakthi
> > > >
> > > > From: Reid Chan 
> > > > Reply-To: "user@hbase.apache.org" 
> > > > Date: Thursday, August 1, 2019 at 9:04 AM
> > > > To: "user@hbase.apache.org" , dev <
> > > > d...@hbase.apache.org>
> > > > Subject: Re: [ANNOUNCE] new HBase committer Sakthi
> > > >
> > > >
> > > > Congratulations and welcome, Sakthi!
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > R.C
> > > >
> > > >
> > > >
> > > > 
> > > > From: Sean Busbey mailto:bus...@apache.org>>
> > > > Sent: 01 August 2019 08:04
> > > > To: user@hbase.apache.org; dev
> > > > Subject: [ANNOUNCE] new HBase committer Sakthi
> > > >
> > > > On behalf of the HBase PMC, I'm pleased to announce that Sakthi has
> > > > accepted our invitation to become an HBase committer.
> > > >
> > > > We'd like to thank Sakthi for all of his diligent contributions to
> the
> > > > project thus far. We look forward to his continued participation in
> our
> > > > community.
> > > >
> > > > Congrats and welcome Sakthi!
> > > >
> > > >
> > >
> >
>


Re: Failed to create assembly

2019-07-15 Thread Wellington Chevreuil
Do you need all these extra maven params? If all you want is to build a tar
ball, have you tried below?

$ mvn package assembly:single -DskipTests

Em seg, 15 de jul de 2019 às 10:33, Kang Minwoo 
escreveu:

> I forgot to attach the build command.
> I try to build HBase 2.1.5 using below command.
>
> ---
> mvn -Dmaven.test.skip.exec=true -Dslf4j.version=1.7.25
> -Dhadoop.profile=3.0 -Dhadoop-three.version=3.1.2
> -Dzookeeper.version=3.4.14 -Dcheckstyle.skip=true
> -Dadditionalparam=-Xdoclint:none install site assembly:single
> ---
>
> 
> 보낸 사람: Kang Minwoo 
> 보낸 날짜: 2019년 7월 15일 월요일 18:31
> 받는 사람: user@hbase.apache.org
> 제목: Failed to create assembly
>
> Hello Users,
>
> I try to build HBase 2.1.5 but I got a failure on Apache HBase - Assembly
> Project.
> Error Message is below.
>
> ---
> [INFO] Apache HBase - External Block Cache  SUCCESS [
> 1.750 s]
> [INFO] Apache HBase - Assembly  FAILURE [
> 11.643 s]
> [INFO] Apache HBase Shaded Packaging Invariants ... SKIPPED
> [INFO] Apache HBase Shaded Packaging Invariants (with Hadoop bundled)
> SKIPPED
> [INFO] Apache HBase - Archetypes .. SKIPPED
> [INFO] Apache HBase - Exemplar for hbase-client archetype . SKIPPED
> [INFO] Apache HBase - Exemplar for hbase-shaded-client archetype SKIPPED
> [INFO] Apache HBase - Archetype builder ... SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time:  15:13 min
> [INFO] Finished at: 2019-07-15T18:20:54+09:00
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single (default-cli)
> on project hbase-assembly: Failed to create assembly: Error adding file
> 'org.apache.hbase:hbase-common:test-jar:tests:2.1.5' to archive:
> /ws/hbase-2.1.5-src/hbase-common/target/test-classes isn't a file. -> [Help
> 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :hbase-assembly
> ---
>
> I didn't know why the assembly plugin tries to add
> 'hbase-common/target/test-classes'.
> I didn’t know what was wrong.
>
> Best regards,
> Minwoo Kang
>


Re: replication - how to change peer cluster key (zookeeper)

2019-07-11 Thread Wellington Chevreuil
Hi Marjana,

I guess OpenInx (much valid) point here is that between step#2 and #3, you
need to make sure there's no lags for ORIGINAL_PEER_ID, because if it has
huge lags, it might be that some of the edits pending on its queue came
before you added NEW_PEER_ID in step #1. In that case, since
ORIGINAL_PEER_ID will never find the slave cluster anymore, these potential
old edits that didn't come into NEW_PEER_ID queue would be lost once
ORIGINAL_PEER_ID is deleted.

Em qui, 11 de jul de 2019 às 11:06, marjana  escreveu:

> Hi OpenInx,
> Correct, only ZK is being moved, hbase slave stays the same. I moved that
> earlier effortlessly.
> In order to move ZK, I will have to stop hbase. While it's down, hlogs will
> accumulate for the NEW_ID and ORIGINAL_ID peers. Once I start hbase, hlogs
> for NEW_ID will start replicating. hlogs for ORIGINAL_ID I hope will be
> disregarded when I drop that peer. I won't miss any data as long as I
> add_peer (step 1) before I disable_peer (step 2).
> Thanks
>
>
>
> --
> Sent from:
> http://apache-hbase.679495.n3.nabble.com/HBase-User-f4020416.html
>


Re: replication - how to change peer cluster key (zookeeper)

2019-07-10 Thread Wellington Chevreuil
Yep, that's a better, concise description of what I meant. You could even
do #2 after #4, doesn't really matter, as long source cluster is already
trying to replicate to the new peer id.

Em qua, 10 de jul de 2019 às 13:03, marjana  escreveu:

> You were thinking something like:
>
> 1. add_peer NEW_ID 'newZK'
> 2. disable_peer ORIGINAL_ID 'originalZK'.
> 3. stop slave hbase. move ZK.
> 4. start slave hbase. Data starts coming in for NEW_ID peer.
> 5. drop_peer ORIGINAL_ID
>
> Not sure about drop_peer, if I should do it at the end (in case something
> goes wrong with the ZK move) or after I disable it at step 2.
> Thanks
>
>
>
>
>
> --
> Sent from:
> http://apache-hbase.679495.n3.nabble.com/HBase-User-f4020416.html
>


Re: replication - how to change peer cluster key (zookeeper)

2019-07-09 Thread Wellington Chevreuil
How about adding it as a new peer, where you define a new peerID for the
new ZK quorum? Until your new ZK quorum address is effective, replication
would accumulate edits, then once you complete the ZK move, it will resume
replication to that one, and original peer id could be removed.

Em ter, 9 de jul de 2019 às 15:18, marjana  escreveu:

> Hello,
> I have master-slave replication configured. My slave cluster's ZK needs to
> be moved. Is there a way to alter peer on my master cluster so it points to
> the new ZK?
> If I disable_peer then remove_peer, I am afraid my replication will stop
> and
> all my tables will have replication disabled.
> There is no "alter_peer" command. Any idea how to move ZK and update my
> peer's cluster key
>
> "hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parent"
> so we don't miss on replicating any data from the master?
> Thanks
>
>
>
> --
> Sent from:
> http://apache-hbase.679495.n3.nabble.com/HBase-User-f4020416.html
>


Re: Merged daughter region

2019-07-05 Thread Wellington Chevreuil
Hi Austin, which hbase version is in use? Is this
'7b0d3c6836cf417b007771ab48d0450f' the parent region, or one of the
daughter regions that got merged? What does meta scan show for the parent
region and original two daughters, plus the resulting region from the merge?

Em sex, 5 de jul de 2019 às 15:25, Austin Heyne  escreveu:

> Hey,
>
> We're continuing to get some heavy use out of our normalizers (split
> point pr still to come), and we're seeing an issue with daughters
> getting merged. The situation is a region is splitting and becoming the
> parent to regions A and B, one of these regions, say B is then getting
> compacted out but region A is still waiting to be compacted. Region B is
> then getting merged by our normalization plan with it's neighbor region,
> C, producing region D. This leaves us with regions A and D and the
> metadata for the parent.
>
> Now when a query comes in that spans A through D, the query planner sees
> the reference to B in the parent's metadata and tries to find that
> region. After the timeout period we get an exception that says the child
> region (B) of the parent isn't online yet but should be soon.
>
> 19/07/03 18:33:06 ERROR Executor: Exception in task 3.0 in stage 9.0
> (TID 43)
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the
> location
>  at
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:316)
>  at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
>  at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
>  at
>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)
>  at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314)
>  at
>
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289)
>  at
>
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:164)
>  at
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:159)
>  at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:796)
>
> ...
> Caused by: org.apache.hadoop.hbase.client.RegionOfflineException: the
> only available region for the required row is a split parent, the
> daughters should be online soon:
>
> table_z3_v2,\x01\x09\xFC}\x18\xFA\xD2o\x99\x8D\xE5,1550014268044.7b0d3c6836cf417b007771ab48d0450f.
>  at
>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1307)
>  at
>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1183)
>  at
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)
>
>  From what I gather we should be removing the reference to B from the
> parent's metadata when B gets merged with C.
>
> Is this something people deal with or is there a good way to prevent this?
>
> Thanks,
> Austin
>
>


Re: TableSnapshotInputFormat failing to delete files under recovered.edits

2019-06-18 Thread Wellington Chevreuil
Thanks for clarifying. So given the region was already open for a while, I
guess those were just empty recovered.edits dir under the region dir, and
my previous assumption does not really apply here. I also had checked
further on TableSnapshotInputFormat, then realised it actually performs a
copy of table dir to a temporary, *restoreDir, *that should be passed as
parameter to *TableSnapshotInputFormat.setInput *initialisation method:

https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java#L212

Note the method comments on this *restoreDir *param:


>
> *restoreDir a temporary directory to restore the snapshot into. Current
> user should   * have write permissions to this directory, and this should
> not be a subdirectory of rootdir.   * After the job is finished, restoreDir
> can be deleted.*
>

Here's the point where snapshot data get copied to restoreDir:

https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L509

So as long as we follow javadoc advice, our concerns about potential data
loss is not valid. I guess problem here is that when table dir is
recreated/copied to *restoreDir*, original ownership/permissions is
preserved for the subdirs, such as regions recovered.edits.


Em ter, 18 de jun de 2019 às 01:03, Jacob LeBlanc <
jacob.lebl...@microfocus.com> escreveu:

> First of all, thanks for the reply! I appreciate the time taken addressing
> our issues.
>
> > It seems the mentioned "hiccup" caused RS(es) crash(es), as you got RITs
> and recovered edits under these regions dirs.
>
> To give more context, I was making changes to increase snapshot timeout on
> region servers and did a graceful restart, so I didn't mean to crash
> anything, but it seems like I did this to too many region servers at once
> (did about half the cluster) which seemed to result in some number of
> regions getting stuck in transition. This was attempted on a live
> production cluster so the hope was to do this without downtime but it
> resulted in an outage to our application instead. Unfortunately master and
> region server logs have since rolled and aged out so I don't have them
> anymore.
>
> > The fact there was a "recovered" dir under some regions dirs means that
> when the snapshot was taken, crashed RS(es) WAL(s) had been split, but not
> completely replayed yet.
>
> Snapshot was taken many days later. File timestamps under recovered.edits
> directory were from June 6th and snapshot from the pastebin was taken on
> June 14th, but actually snapshots were taken many times with the same
> result (ETL jobs are launched at least daily in oozie). Do you mean that if
> a snapshot was taken before region was fully recovered it could result in
> this state even if snapshot was subsequently deleted?
>
> > Would you know which specific hbase version is this?
>
> It is EMR 5.22 which runs HBase 1.4.9 (with some Amazon-specific edits
> maybe? I noticed line numbers in HRegion.java in stack trace don't quite
> line up with those in the 1.4.9 tag in github).
>
> > Could your job restore the snapshot into a temp table and then read from
> this temp table using TableInputFormat, instead?
>
> Maybe we could do this, but it will take us some effort to make the
> changes, test, release, etc... Of course we'd rather not jump through hoops
> like this.
>
> > In this case, it's finding "recovered" folder under regions dir, so it
> will replay the edits there. Looks like a problem with
> TableSnapshotInputFormat, seems weird that it tries to delete edits on a
> non-staging dir (your path suggests it's trying to delete the actual edit
> folder), that could cause data loss if it would succeed to delete edits
> before RSes actually replay it.
>
> I agree that this "seems weird" to me given that I am not intimately
> familiar with all of the inner workings of hbase code. The potential data
> loss is what I'm wondering about - would data loss have occurred if we
> happened to execute our job under a user that had delete permissions in
> HDFS directories? Or did the edits actually get replayed when regions were
> in stuck and transition and the files just didn't get cleaned up? Is this
> something for which I should file a defect in JIRA?
>
> Thanks again,
>
> --Jacob LeBlanc
>
>
> -Original Message-
> From: Wellington Chevreuil [mailto:wellington.chevre...@gmail.com]
> Sent: Monday, June 17, 2019 3:55 PM
> To: user@hbase.apache.org
> Subject: Re: TableSnapshotInputFormat failing to delete files under
> recovered.edits
>
> It seems the mentio

Re: TableSnapshotInputFormat failing to delete files under recovered.edits

2019-06-17 Thread Wellington Chevreuil
It seems the mentioned "hiccup" caused RS(es) crash(es), as you got RITs
and recovered edits under these regions dirs. The fact there was a
"recovered" dir under some regions dirs means that when the snapshot was
taken, crashed RS(es) WAL(s) had been split, but not completely replayed
yet.

Since you are facing error when reading from table snapshot, and the stack
trace shows TableSnapshotInputFormat is using "HRegion.openHRegion" code
path to read snapshotted data, it will basically do the same as an RS would
when trying to assign a region. In this case, it's finding "recovered"
folder under regions dir, so it will replay the edits there. Looks like a
problem with TableSnapshotInputFormat, seems weird that it tries to delete
edits on a non-staging dir (your path suggests it's trying to delete the
actual edit folder), that could cause data loss if it would succeed to
delete edits before RSes actually replay it. Would you know which specific
hbase version is this? Could your job restore the snapshot into a temp
table and then read from this temp table using TableInputFormat, instead?

Em seg, 17 de jun de 2019 às 17:22, Jacob LeBlanc <
jacob.lebl...@microfocus.com> escreveu:

> Hi,
>
> We periodically execute Spark jobs to run ETL from some of our HBase
> tables to another data repository. The Spark jobs read data by taking a
> snapshot and then using the TableSnapshotInputFormat class. Lately we've
> been having some failures because when the jobs try to read the data, it is
> trying to delete files under the recovered.edits directory for some regions
> and the user under which we run the jobs doesn't have permissions to do
> that. Pastebin of the error and stack trace from one of our job logs is
> here: https://pastebin.com/MAhVc9JB
>
> This has started happening since upgrading to EMR 5.22 where the
> recovered.edits directory is collocated with the WALs in HDFS where it used
> to be in S3-backed EMRFS.
>
> I have two questions regarding this:
>
>
> 1)  First of why are these files under the recovered.edits directory?
> The timestamp of the files coincides with a hiccup we had with our cluster
> where I had to use "hbase hbck -fixAssignments" to fix regions that were
> stuck in transition. But that command seemed to work just fine and all
> regions were assigned and there have since been no inconsistencies. Does
> this mean the WALs were not replayed correctly? Does "hbase hbck
> -fixAssignments" not recover regions properly?
>
> 2)  Why is our job trying to delete these files? I don't know enough
> to say for sure, but it seems like using TableSnapshotInputFormat to read
> snapshot data should not be trying recover or delete edits.
>
> I've fixed the problems by running "assign ''" in hbase shell for
> every region that had files under the recovered.edits directory and those
> files seemed to be cleaned up when the assignment completed. But I'd like
> to understand this better especially if something is interfering with
> replaying edits from WALs (also making sure our ETL jobs don't start
> failing would be nice).
>
> Thanks!
>
> --Jacob LeBlanc
>
>


[ANNOUNCE] HBase FileSystem 1.0.0-alpha1 is now available for download

2019-06-13 Thread Wellington Chevreuil
The HBase team is happy to announce the immediate availability of Apache
HBase FileSystem 1.0.0-alpha1.

Apache HBase FileSystem provides additional Apache Hadoop's FileSystem
interface implementation, in order to enforce HBase expected FileSystem
semantics on top of object-store implementations of Hadoop FileSystem, such
as s3a.

HBase FileSystem 1.0.0-alpha1 is the first alpha release of
hbase-filesystem. It is compatible with:
- HBase 2.1.4
- Hadoop 2.9.2
- Hadoop 3.2.0

The full list of features can be found in the included CHANGES.md and
RELEASENOTES.md,
or via our issue tracker:
https://s.apache.org/hbase-filesystem-1.0.0-alpha1

Download through an ASF mirror near you:
http://www.apache.org/dist/hbase/hbase-filesystem-1.0.0-alpha1/

Relevant checksum file is available at:
http://www.apache.org/dist/hbase/hbase-filesystem-1.0.0-alpha1/hbase-filesystem-1.0.0-alpha1-src.tar.gz.sha512

PGP signature available at:
http://www.apache.org/dist/hbase/hbase-filesystem-1.0.0-alpha1/hbase-filesystem-1.0.0-alpha1-src.tar.gz.asc

Project member signature keys can be found at:
https://www.apache.org/dist/hbase/KEYS

Question, comments, and problems are always welcome at:
d...@hbase.apache.org.

Thanks to all who contributed and made this release possible.

Cheers,
The HBase Dev Team


Re: [ANNOUNCE] New Committer: Wellington Chevreuil

2019-06-08 Thread Wellington Chevreuil
Thanks for the warming greetings, everyone. Looking forward to keep
contributing with the best I can!

Em sex, 7 de jun de 2019 às 22:01, Xu Cang 
escreveu:

> Congrats Wellington and welcome!
>
>
> On Fri, Jun 7, 2019 at 1:21 PM Sakthi  wrote:
>
> > Hurray! Congrats Wellington. Well deserved one!
> >
> > Sakthi
> >
> > On Fri, Jun 7, 2019 at 12:58 PM Wei-Chiu Chuang
> >  wrote:
> >
> > > Yay!! Congrats for the achievement!
> > >
> > > On Fri, Jun 7, 2019 at 12:56 PM Andrew Purtell <
> andrew.purt...@gmail.com
> > >
> > > wrote:
> > >
> > > > Congratulations and welcome, Wellington.
> > > >
> > > > > On Jun 7, 2019, at 3:11 AM, Peter Somogyi 
> > wrote:
> > > > >
> > > > > On behalf of the HBase PMC, I'm pleased to announce that Wellington
> > > > > Chevreuil has accepted our invitation to become an HBase committer.
> > > > >
> > > > > Thanks for all your hard work Wellington; we look forward to more
> > > > > contributions!
> > > > >
> > > > > Please join me in extending congratulations to Wellington!
> > > >
> > >
> >
>


Re: How does HBase deal with master switch?

2019-06-06 Thread Wellington Chevreuil
Hey Zili,

Besides what Duo explained previously, just clarifying on some concepts to
your previous description:

1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> regarded it as failed.
>
ZK just knows about sessions and clients, not the type of client connecting
to it. Clients open a session in ZK, then keep pinging back ZK
periodically, to keep the session alive. In the case of long full GC
pauses, the client (RS, in this case), will fail to ping back within the
required period. At this point, ZK will *expire *the session.

2) ZooKeeper launched a new RegionServer, and the new one started to serve.
>
ZK doesn't launch new RS, it doesn't know about RSes, only client sessions.
With the session expiration, Master will be notified that an RS is
potentially gone, and will start the process explained by Duo.

3) The old RegionServer finished gc and thought itself was still active and
> serving.
>
What really happens here is that once RS is back from GC, it will try ping
ZK again for that session, ZK will back it off because the session is
already expired, then RS will kill itself.





Em qui, 6 de jun de 2019 às 14:58, 张铎(Duo Zhang) 
escreveu:

> Once a RS is started, it will create its wal directory and start to write
> wal into it. And if master thinks a RS is dead, it will rename the wal
> directory of the RS and call recover lease on all the wal files under the
> directory to make sure that they are all closed. So even after the RS is
> back after a long GC, before it kills itself because of the
> SessionExpiredException, it can not accept any write requests any more
> since its old wal file is closed and the wal directory is also gone so it
> can not create new wal files either.
>
> Of course, you may still read from the dead RS at this moment
> so theoretically you could read a stale data, which means HBase can not
> guarantee ‘external consistency’.
>
> Hope this solves your problem.
>
> Thanks.
>
> Zili Chen  于2019年6月6日周四 下午9:38写道:
>
> > Hi,
> >
> > Recently from the book, ZooKeeper: Distributed Process Coordination, I
> find
> > a paragraph mentions that, HBase once suffered by
> >
> > 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> > regarded it as failed.
> > 2) ZooKeeper launched a new RegionServer, and the new one started to
> serve.
> > 3) The old RegionServer finished gc and thought itself was still active
> and
> > serving.
> >
> > in Chapter 5 section 5.3.
> >
> > I'm interested on it and would like to know how HBase community overcame
> > this issue.
> >
> > Best,
> > tison.
> >
>


Re: Difference of n columns with 1 version vs 1 column with n versions

2019-03-31 Thread Wellington Chevreuil
I would agree with JMS, to ideally avoid wide tables. Plus, there are still
some inconsistent behaviour for versions feature (See HBASE-21596, for
example). I would also favour option "a" over "b", as it seems to give more
flexibility in the way you can access/delete these columns.

Em dom, 31 de mar de 2019 às 00:12, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> escreveu:

> Hi Serkan,
>
> This is my personal opinion and some might not share it ;)
>
> I tried to go with the deep versions approach for one project and I found
> issues on some of the calls (pagination over versions as an example). So if
> for you both (The deep version and wide columns) are the same, I will say,
> better go with the wide columns.
>
> Also, why not good with tall table instead of wide?
>
> JMS
>
> Le sam. 30 mars 2019 à 01:14, Serkan Uzunbaz  a écrit :
>
> > Hi all,
> > I have a question regarding the difference between storing a set of data
> > as:
> > *a) n columns with 1 version each*
> > *b) 1 column with n versions*
> >
> > Since the storage unit in hbase is a cell (rowkey, column family, column
> > qualifier, timestamp), is there a difference between the above two
> storage
> > options in terms of read/write performance, compaction/GC time, etc?
> >
> > I know it is not recommended to use high number of versions if you do not
> > really need them. However, if those n versions of data are really needed
> > for reading, then will it cause any problem to store the data in a single
> > column with n versions. Also, even if max versions is set to 1 for a
> column
> > (option a), new values are still stored as a new cell and old cell is
> > deleted at compaction time. So, I also feel like compaction-wise two
> > options are identical.
> > I wonder if there is anything that makes one option superior to the
> other.
> >
> > *Example*: To clarify more, say the data to be stored is set of urls
> > visited in certain time ranges and we want to keep the last 100 hours of
> > url sets:
> >
> > *a) store each hour as column name with one url set in it (column names
> > will be used in cyclic manner (data for hour 101 will be written into
> > column 1))*
> > column_qualifier: value
> > ---
> > urls_hour1: 
> > urls_hour2: 
> > urls_hour3: 
> > ...
> > urls_hour100: 
> >
> >
> > *b) store in a single column with 100 versions (one for each hour) (max
> > versions for column will be 100 and hbase will do the auto-compaction for
> > old versions)*
> > column_qualifier: value @ timestamp
> > ---
> > urls:  @ ts_hour1,  @ ts_hour2,  @
> > ts_hour3,  ,  @ ts_hour100
> >
> > Thanks,
> > -Serkan
> >
>


Re: Region flush delayed even memstore size above MEMSTORE_FLUSH config.

2019-03-11 Thread Wellington Chevreuil
Well, there would be already a flush request for the region in the queue
anyway, and this would force flush of all stores, so doesn't look adding an
additional request would have any benefit, does it?

Em seg, 11 de mar de 2019 às 04:22, Kang Minwoo 
escreveu:

> Hello Users.
>
> ---
> HBase version is 1.2.9
> ---
>
> I wonder this region operation is intended.
>
> I set "hbase.regionserver.optionalcacheflushinterval" slightly shorter
> than the default setting.
> So cf has old edit, they flush after a random delay.
>
> If the flush queue has a flush request by old edit, a flush request by
> memstore size above MEMSTORE_FLUSH config in the same region is ignored.
> Because region already has in regionsInQueue.
>
> As a result, memstore size increase until random delay.
>
> I think a flush request by memstore size above MEMSTORE_FLUSH config is a
> higher priority than a flush request by old edit.
>
>
> Here are related codes.
>
> ---
> @Override
> public void requestFlush(Region r, boolean forceFlushAllStores) {
> synchronized (regionsInQueue) {
> if (!regionsInQueue.containsKey(r)) { // <- Here
> FlushRegionEntry fqe = new FlushRegionEntry(r,
> forceFlushAllStores);
> this.regionsInQueue.put(r, fqe);
> this.flushQueue.add(fqe);
> }
> }
> }
>
> @Override
> public void requestDelayedFlush(Region r, long delay, boolean
> forceFlushAllStores) {
> synchronized (regionsInQueue) {
> if (!regionsInQueue.containsKey(r)) { // <- Here
> FlushRegionEntry fqe = new FlushRegionEntry(r,
> forceFlushAllStores);
> fqe.requeue(delay);
> this.regionsInQueue.put(r, fqe);
> this.flushQueue.add(fqe);
> }
> }
> }
> ---
>
> Best regards,
> Minwoo Kang
>


Re: HBase Coprocessor calls fail with RegionNotFoundException during region splits

2019-02-19 Thread Wellington Chevreuil
The client RPC properties are not hardcoded, "pause" is defined by
"hbase.client.pause" and for number of retries, there is
"hbase.client.retries.number". Since the attempted region got split and
will never be online again, maybe you can decrease these properties on your
coprocessor code to "fail fast", and skip the given region in case there's
a "NotServingRegionException" wrapped on the response.

Em ter, 19 de fev de 2019 às 15:57, Ben Watson 
escreveu:

> Hello,
>
> I’m running HBase 1.4.4. I’ve got a simple endpoint coprocessor that sums
> records when called. Whenever a split occurs, it fails when called,
> throwing a RegionNotFoundException. The error manifests itself by spending
> 10 minutes retrying the connection 35 times:
>
> 2019-02-19 09:42:34 INFO  o.a.h.h.c.RpcRetryingCaller
> [hconnection-0x100f9a76-shared--pool3-t215]: Call exception, tries=25,
> retries=35, started=331810 ms ago, cancelled=false,
> msg=org.apache.hadoop.hbase.NotServingRegionException: Region
> coprocessor-test,1,1550568604433.63f03f2a494dc5756238ba08af437af6. is not
> online on ,16020,1550568101996
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3082)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2201)
>
> at
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>
> row '1_pfx-cfb0e548-f399-4059-af80-54fe9b7a828f' on table
> 'coprocessor-test' at
>
> region=coprocessor-test,1_pfx-7b2b6071-7d2c-4282-9645-31ca027327dc6549,1550568988094.f6cc0c6245702c544fb7fe65c1e3299b.,
> hostname=l,16020,1550568101996, seqNum=630
>
> before eventually failing:
>
> Tue Feb 19 09:37:02 UTC 2019,
> RpcRetryingCaller{globalStartTime=1550569022304, pause=100, retries=35},
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region
> coprocessor-test,9,1550568604433.2d98945e85cca401a2c5d8bd777a0451. is not
> online on ,16020,1550568099593
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3082)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1275)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2201)
>
> at
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>
> If I then re-run the coprocessor, it works without any issues. So, I need a
> way to quickly catch this error and manually retry it until it works. I
> can't see a way to change any useful parameter – the 35 retries and the
> time between retries seem to be hardcoded.
>
> Can anyone suggest how I can go about solving this?
>
> Regards,
>
> Ben
>


Re: Throttle major compaction for version 1.1.2

2019-01-23 Thread Wellington Chevreuil
I can see in branch 1.1 that CompactSplitThread does load a
CompactionThroughputController here [1], and it's possible to set
PressureAwareCompactionThroughputController [2] on
"hbase.regionserver.throughput.controller" property.
PressureAwareCompactionThroughputController defines additional
configurations to control, eventually slowing down, compaction throughput.

HStore also considers offpeak hours when creating the compaction request
[3]. Offpeak configurations then available would be
"hbase.offpeak.end.hour" and "hbase.offpeak.start.hour".

[1]
https://github.com/apache/hbase/blob/branch-1.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java#L162
[2]
https://github.com/apache/hbase/blob/branch-1.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/PressureAwareCompactionThroughputController.java
[3]
https://github.com/apache/hbase/blob/branch-1.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L1577

Em ter, 22 de jan de 2019 às 04:28, Abhishek Gupta 
escreveu:

> We are seeing frequent long GC pause whenever major compaction gets
> triggered on region servers.
> I wanted to know how to throttle major compaction on HBase 1.1.2.
>
> I see a number of throttling parameters added on versions 1.2+ including
> offpeak hours params, what are the options for HBase 1.1.2
>


Re: how do disable encryption on a CF

2019-01-23 Thread Wellington Chevreuil
I believe this is a limitation from hbase shell, where it provides no means
to set a null value for a column family property. An alternative would be
to use Java Client API, below a code example:

Configuration configuration =  HBaseConfiguration.create();

try(Connection connection =
ConnectionFactory.createConnection(configuration)){
  Admin admin = connection.getAdmin();
  ColumnFamilyDescriptorBuilder descriptorBuilder =
ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(args[1]));
  descriptorBuilder.setEncryptionType(null);
  admin.modifyColumnFamily(TableName.valueOf(args[0]),
descriptorBuilder.build());
} finally {

}

If you really want/need to have this functionality working from shell, then
an alternative would be to create your own Master coprocessor, where
in preModifyTable method you would check the ColumnFamilyDescriptor value
for ENCRYPTION property, replacing 'NONE' or '' values by null.

Em qua, 23 de jan de 2019 às 16:04, marjana  escreveu:

> Hello,
>
> I setup hbase encryption and enabled it on a subset of tables.
> I am looking for a way to revert back to non encrypted tables without
> having
> to recreate a new set of tables and copy data from encrypted to
> nonencrypted.
>
> I have tried in hbase shell a few ways but can't get it to work:
>
> alter 't1',{NAME => 'e', ENCRYPTION => ''}
>
> ERROR: java.io.IOException: Cipher  previously failed test
> at
>
> org.apache.hadoop.hbase.util.EncryptionTest.testEncryption(EncryptionTest.java:155)
>
>
> alter 't1',{NAME => 'e', ENCRYPTION => 'NONE'}
>
> ERROR: java.io.IOException: Cipher NONE previously failed test
> at
>
> org.apache.hadoop.hbase.util.EncryptionTest.testEncryption(EncryptionTest.java:155)
>
>
>
> Is there a way to make this work?
> Thanks!
>
>
>
>
>
> --
> Sent from:
> http://apache-hbase.679495.n3.nabble.com/HBase-User-f4020416.html
>


Re: tables in archive directory

2019-01-22 Thread Wellington Chevreuil
No, u can't manually delete files from archive folder. Hbase would remove
those automatically once it doesn't need it anymore. The fact those files
are still there means those are probably still referenced by snapshots or
cloned tables.

On Tue, 22 Jan 2019, 16:44 Munoz, Robert  Hello,
>
> I am performing maintenance on an HBase cluster to free up storage, let's
> call this the target cluster. The tables that are in target cluster were
> exported from source cluster via export snapshot and subsequent import
> snapshot. When importing snapshot, the files were placed in archive folder.
> Is this folder now part of table space? If not, can I safely purge the
> imported snapshots on the target cluster.
>
>
> Thanks,
>
> Robert Munoz
> 949-220-5169
>
> This message contains confidential information and is intended only for
> the individual named. If you are not the named addressee, you should not
> disseminate, distribute, alter or copy this e-mail. Please notify the
> sender immediately by e-mail if you have received this e-mail by mistake
> and delete this e-mail from your system. E-mail transmissions cannot be
> guaranteed to be secure or without error as information could be
> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
> contain viruses. The sender, therefore, does not accept liability for any
> errors or omissions in the contents of this message which arise during or
> as a result of e-mail transmission. If verification is required, please
> request a hard-copy version. This message is provided for information
> purposes and should not be construed as a solicitation or offer to buy or
> sell any securities or related financial instruments in any jurisdiction.
> Securities are offered in the U.S. through PIMCO Investments LLC,
> distributor and a company of PIMCO LLC.
>
> The individual providing the information herein is an employee of Pacific
> Investment Management Company LLC ("PIMCO"), an SEC-registered investment
> adviser.  To the extent such individual advises you regarding a PIMCO
> investment strategy, he or she does so as an associated person of PIMCO.
> To the extent that any information is provided to you related to a
> PIMCO-sponsored investment fund ("PIMCO Fund"), it is being provided to you
> in the individual's capacity as a registered representative of PIMCO
> Investments LLC ("PI"), an SEC-registered broker-dealer.  PI is not
> registered, and does not intend to register, as a municipal advisor and
> therefore does not provide advice with respect to the investment of the
> proceeds of municipal securities or municipal escrow investments.  In
> addition, unless otherwise agreed by PIMCO, this communication and any
> related attachments are being provided on the express basis that they will
> not cause PIMCO LLC, or its affiliates, to become an investment advice
> fiduciary under ERISA or the Internal Revenue Code.
>


Re: coprocessor and hbase.regionserver.handler.count

2019-01-21 Thread Wellington Chevreuil
>
>
I am not familiar with thread-dumping or profiler. Is there any reference
> that I can study about how to use those tools to analyze this kind of issue?

You can start collecting jstack [1] frames of the RS process while your
test is running. Jstack command dumps state of all threads from the given
java process, you can then check which threads are blocked/running most of
the time. Below sample script would collect 20 frames with 1 second
interval, notice you would need to replace RSPID by the actual RS pid, and
it assumes RS process is running as hbase user (jstack needs to be ran as
same user as the java process).

for i in $(seq 1 20); do sudo -u hbase /usr/java/default/bin/jstack $RSPID
> /tmp/rs-jstack-$i.txt; sleep 1s; done



>

I just observed that all resources CPU, memory,. I/O are still idle, but
> the avg response time for a single request (put/get) become slower when the
> concurrency increasing.
>
Interesting, is the CP code using any type of locks? How is the response
time for single requests being measured? Is it possible to run this same
client with the CP disabled, and see if the performance issue still
manifests?

[1] https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jstack.html



Em qui, 17 de jan de 2019 às 16:14, Ankit Singhal 
escreveu:

> I hope you have configured 200+ handlers in server config and not creating
> a new connection in every thread. Sharing your java code, hbase version,
> machine configuration and hbase-site.xml may help us comment more.
>
> Regards,
> Ankit Singhal
>
> On Wed, Jan 16, 2019 at 11:35 PM ming.liu  wrote:
>
> > Thanks, Stack,
> >
> > I still cannot figure out the root cause. I am not familiar with
> > thread-dumping or profiler. Is there any reference that I can study about
> > how to use those tools to analyze this kind of issue?
> > Or is there some similar performance test that I can refer to? Maybe my
> > test is wrong.
> > I just write a simple java program to get a row from a table in a loop.
> > And by spawning this program from 1 to 200, and check the response time.
> > I just observed that all resources CPU, memory,. I/O are still idle, but
> > the avg response time for a single request (put/get) become slower when
> the
> > concurrency increasing. All the request goes to the same region in my
> test.
> > So I am trying to understand those parameters.
> >
> > Thanks,
> > Ming
> >
> > -Original Message-
> > From: Stack 
> > Sent: Thursday, January 17, 2019 3:17 AM
> > To: Hbase-User 
> > Subject: Re: coprocessor and hbase.regionserver.handler.count
> >
> > On Mon, Jan 14, 2019 at 5:37 PM ming.liu  wrote:
> >
> > > Hi, all,
> > >
> > >
> > >
> > > Our application is using coprocessor to communicate with HBase, not
> using
> > > the get()/put() client API. When there are large concurrency, the
> > > performance is degrading very fast.
> > >
> > >
> > Can you figure why? Thread-dumping, profiler?
> >
> >
> > > I checked there is hbase.regionserver.handler.count which control how
> > many
> > > thread in RS to handle client request. Will coprocessor use those same
> > > threads?
> > >
> > >
> > Yes. CPs decorate the existing read/write paths serviced by handlers.
> >
> > S
> >
> >
> >
> > >
> > >
> > > thanks,
> > >
> > > Ming
> > >
> > >
> > >
> > >
> >
> >
>


Re: HBase top activity

2019-01-09 Thread Wellington Chevreuil
I don't think it is possible to track individual scans resources usage,
current available JMX metrics are an aggregation of all request types
(scanNext, append, get, delete) per regions level at individual RSes, and
these are basically counts on the number of those requests. Resources usage
are measured at RS process level, so that also adds a challenge to account
those for individual scans. Sure, you could break those down by looking at
each thread usage with some monitoring tool such as VisualVM, but then you
would still need to map each RPC handler to an specific scan, and depending
on how large the result, a single scan would be addressed by different
handlers every time it calls "next" fetch another batch of results it's
iterating through.

Em seg, 7 de jan de 2019 às 14:27, shicheng31...@gmail.com <
shicheng31...@gmail.com> escreveu:

> Maybe  coprocessor can  help you in some way.
>
>
>
> shicheng31...@gmail.com
>
> From: Meirav Malka
> Date: 2019-01-07 19:22
> To: Hbase User Group; dev
> Subject: HBase top activity
> Hi,
>
> Does anyone know i there's a way to get the current operations running in a
> HBase cluster?
> We want to be able to detect the running scans, their execution time,
> number of executions,cpu, memory and io.(as can be found in Oracle
> database)
> Is there any tracking of this data in HBase?
>
> Thanks
>


Re: Getting old value for the cell after TTL expiry of latest value cell value set with TTL.

2018-12-20 Thread Wellington Chevreuil
I believe this is the same issue as when deleting with specific timestamp
(HBASE-21596). The scanner triggered during flush will let the 1st version
go through because it's still the only version at that time. The second
flush will make the second version (with TTL) to also go through. TTL
expiration will then put a delete marker. However, the scanner
implementations seem to have a problem when counting deleted cells and
versions. In this situation, there are actually two cell versions on the
store. The latest one is deleted, so the scanner will correctly mark it as
deleted, but will not increase it's version counter for that cell. Then
when it get's to the older version, it's not deleted and because it had not
increased the version counter before, it will (erroneously) allow it to go
through. More details on HBASE-21596.

If you try the following sequence:

> put 'sepTest', '18', 'data:value', '18'

> put 'sepTest', '18', 'data:value', 'update_18', {TTL => 10}

> flush 'sepTest'
That would cause only second version to be flushed. After TTL expiration,
your get would not return any value. This is because the scan ran during
flush only allowed the second version to go through.

Em qui, 20 de dez de 2018 às 17:06, Pankaj Birat 
escreveu:

> Hi,
>
> I am using HBase version 1.2.7
>
> Getting old value for the cell after TTL expiry of latest cell value set
> with TTL.
>
> Table:
>
> COLUMN FAMILIES DESCRIPTION
>
> {NAME => 'data', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL
> => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =>
> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'}
>
> First time I am putting cell value without TTL and second time I am puting
> value with TTL.
>
> Eg command sequencee
>
> > put 'sepTest', '18', 'data:value', '18'
>
> > flush 'sepTest'
>
> > get 'sepTest', '18'
>
> value : 18
>
> > put 'sepTest', '18', 'data:value', 'update_18', {TTL => 10}
>
> > get 'sepTest', '18'
>
> value : update_18
>
> >  flush 'sepTest'
>
> > get 'sepTest', '18'
>
> value : update_18
>
> after around TTL expiry ( 10 )
>
> > get 'sepTest', '18'
>
> value : 18
>
> > major_compact 'sepTest'
>
> Since my max version is set to 1, I am not able to understand why I am
> getting value 18?
>
> As per my understanding in the second put with TTL should have overridden
> the old value and after the expiry of second put, I should not get any
> value.
>
> Attaching screenshot for reference.
>


Re: hadoop hdfs ha QJM namenode failover causes dead regionservers (IOE in log roller) , master server abort, and needed hbck -fixAssignments

2015-12-15 Thread Wellington Chevreuil
Hi Solin,

The timeout messages are usually a consequence of other issues on the 
connectivity between the Namenode and the QJM. Assuming Regionservers are 
configured properly to HDFS HA, pointing to an HDFS nameservice instead of a 
direct namenode address, it should also be resilient to a failover. 

Considering the the Zookeeper session timeout message on the Regionserver log 
below, I would look first for a NW issue on the cluster, but it's just an 
initial guess:

…
> 2015-12-09 04:11:35,413 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x44e6c2f20980003 has expired,
> closing socket connection
...



> On 15 Dec 2015, at 01:17, Colin Kincaid Williams  wrote:
> 
> We had a namenode go down due to timeout with the hdfs ha qjm journal:
> 
> 
> 
> 2015-12-09 04:10:42,723 WARN
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016
> ms (timeout=2 ms) for a response for sendEdits
> 
> 2015-12-09 04:10:43,708 FATAL
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for
> required journal (JournalAndStream(mgr=QJM to [10.42.28.221:8485,
> 10.42.28.222:8485, 10.42.28.223:8485], stream=QuorumOutputStream starting
> at txid 8781293))
> 
> java.io.IOException: Timed out waiting 2ms for a quorum of nodes to
> respond.
> 
> at
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
> 
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1695)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1669)
> 
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:409)
> 
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
> 
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
> 
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> 
> at java.security.AccessController.doPrivileged(Native Method)
> 
> at javax.security.auth.Subject.doAs(Subject.java:415)
> 
> 
> While this is disturbing in it's own right, I'm further annoyed that HBASE
> shut down  2 region servers. Furthermore, we had to hbck -fixAssignments to
> repair HBASE, and I'm not sure that the data from the shutdown regions was
> available, and if our hbase service itself was available afterwards:
> 
> 
> 2015-12-09 04:10:44,320 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server ^@^@hbase008r09.comp.prod.local,60020,1436412712133 reported
> a fatal error:
> 
> ABORTING region server hbase008r09.comp.prod.local,60020,1436412712133: IOE
> in log roller
> 
> Cause:
> 
> java.io.IOException: cannot get log writer
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)
> 
>  at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)
> 
>  at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> 
>  at java.lang.Thread.run(Thread.java:722)
> 
> Caused by: java.io.IOException: java.io.IOException: Failed on local
> exception: java.io.IOException: Response is null.; Host Details : local
> host is: "hbase008r09.comp.prod.local/10.42.28.192"; destination host is:
> "hbasenn001.comp.prod.local":8020;
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
> 
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)
> 
>  ... 4 more
> 
> Caused by: java.io.IOException: Failed on local exception:
> java.io.IOException: Response is null.; Host Details : local host is:
> "hbase008r09.comp.prod.local/10.42.28.192"; destination hos

Re: I can't start cluster due to zookeeper

2015-04-27 Thread Wellington Chevreuil
Hi,

Have you checked if your ZK quorum is properly running, before trying to start 
HBase? Also, the hostnames defined for ZK quorum nodes seem quite unusual. 
Shouldn't these be as follows?

 
   hbase.zookeeper.quorum
   pc225.emulab.net,pc273.emulab.net,pc210.emulab.net
   The directory shared by RegionServers.
   
 


Regards,
Wellington.

On 26 Apr 2015, at 19:08, Bo Fu  wrote:

> Hi all,
> 
> I have problem starting a cluster of 1 master and 3 region server. When I 
> started the cluster, the HMaster and HRegionserver will automatically exit.
> 
> My abase-site.xml:
> 
>  
>hbase.master
>hadoopmaster:6
>  
>  
>hbase.rootdir
>hdfs://hadoopmaster:9000/hbase
>  
>  
>hbase.zookeeper.property.dataDir
>/proj/ucare/bo/hadoop_data/zookeeper
>  
>  
>hbase.zookeeper.quorum
>
> pc225.emulab.net,pc273.emulab.net,pc210.emulab.net
>The directory shared by RegionServers.
>
>  
>  
>hbase.zookeeper.property.clientPort
>2181
> 
>  
>hbase.cluster.distributed
>true
>  
>  
>dfs.replication
>1
>  
> 
> 
> 
> Log file:
> 
> 2015-04-26 11:38:45,938 INFO  
> [main-SendThread(pc273.emulab.net:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> pc273.emulab.net/155.98.39.73:2181.
>  Will not attempt to authenticate using SASL (unknown error)
> 2015-04-26 11:38:45,939 INFO  
> [main-SendThread(pc273.emulab.net:2181)] 
> zookeeper.ClientCnxn: Socket connection established to 
> pc273.emulab.net/155.98.39.73:2181,
>  initiating session
> 2015-04-26 11:38:45,940 INFO  
> [main-SendThread(pc273.emulab.net:2181)] 
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid 
> 0x0, likely server has closed socket, closing socket connection and 
> attempting reconnect
> 2015-04-26 11:38:47,022 INFO  
> [main-SendThread(pc225.emulab.net:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> pc225.emulab.net/155.98.39.25:2181.
>  Will not attempt to authenticate using SASL (unknown error)
> 2015-04-26 11:38:47,023 INFO  
> [main-SendThread(pc225.emulab.net:2181)] 
> zookeeper.ClientCnxn: Socket connection established to 
> pc225.emulab.net/155.98.39.25:2181,
>  initiating session
> 2015-04-26 11:38:47,025 INFO  
> [main-SendThread(pc225.emulab.net:2181)] 
> zookeeper.ClientCnxn: Unable to read additional data from server sessionid 
> 0x0, likely server has closed socket, closing socket connection and 
> attempting reconnect
> 2015-04-26 11:38:47,994 INFO  
> [main-SendThread(pc332.emulab.net:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> pc332.emulab.net/155.98.39.132:2181.
>  Will not attempt to authenticate using SASL (unknown error)
> 2015-04-26 11:39:17,150 INFO  
> [main-SendThread(pc332.emulab.net:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30025ms for sessionid 0x0, closing socket connection and attempting reconnect
> 2015-04-26 11:39:17,251 WARN  [main] zookeeper.RecoverableZooKeeper: Possibly 
> transient ZooKeeper, 
> quorum=pc273.emulab.net:2181,pc225.emulab.net:2181,pc332.emulab.net:2181,
>  exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase
> 2015-04-26 11:39:17,251 ERROR [main] zookeeper.RecoverableZooKeeper: 
> ZooKeeper create failed after 4 attempts
> 2015-04-26 11:39:17,255 ERROR [main] master.HMasterCommandLine: Master exiting
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1982)
>at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:198)
>at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
>at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
>at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1996)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:7

Re: write availability

2015-04-07 Thread Wellington Chevreuil
The data is stored on files on hdfs. If a RS goes down, the master knows which 
regions were on that RS and which hdfs files contain data for these regions, so 
it will just assign the regions to others RS, and these others RS will have 
access to the regions data because it's stored on HDFS. The RS does not "own" 
the disk, this is HDFS job, so the recovery on this case is transparent.


On 7 Apr 2015, at 16:51, Marcelo Valle (BLOOMBERG/ LONDON) 
 wrote:

> So if a RS goes down, it's assumed you lost the data on it, right?
> HBase has replications on HDFS, so if a RS goes down it doesn't mean I lost 
> all the data, as I could have the replicas yet... But what happens if all RS 
> hosting a specific region goes down? 
> What if one RS from this one comes back again, but with the disk intact, with 
> all the data it had before crashing?
> 
> 
> From: user@hbase.apache.org 
> Subject: Re: write availability
> 
> When a RS goes down, the Master will try to assign the regions on the 
> remaining RSes. When the RS comes back, after a while, the Master balancer 
> process will re-distribute regions between RS, so the given RS will be 
> hosting regions, but not necessarily the one it used to host before it went 
> down.
> 
> 
> On 7 Apr 2015, at 16:31, Marcelo Valle (BLOOMBERG/ LONDON) 
>  wrote:
> 
>>> So if the cluster is up, then you can insert records in to HBase even 
>>> though you lost a RS that was handing a specific region. 
>> 
>> What happens when the RS goes down? Writes to that region will be written to 
>> another region server? Another RS assumes the region "range" while the RS is 
>> down?
>> 
>> What happens when the RS that was down goes up again? 
>> 
>> 
>> From: user@hbase.apache.org 
>> Subject: Re: write availability
>> 
>> I don’t know if I would say that… 
>> 
>> I read Marcelo’s question of “if the cluster is up, even though a RS may be 
>> down, can I still insert records in to HBase?”
>> 
>> So if the cluster is up, then you can insert records in to HBase even though 
>> you lost a RS that was handing a specific region. 
>> 
>> But because he talked about syncing nodes… I could be misreading his initial 
>> question… 
>> 
>>> On Apr 7, 2015, at 9:02 AM, Serega Sheypak  wrote:
>>> 
 If I have an application that writes to a HBase cluster, can I count that
>>> the cluster will always available to receive writes?
>>> No, it's CP, not AP system.
 so everything get in sync when the other nodes get up again
>>> There is no hinted backoff, It's not Cassandra.
>>> 
>>> 
>>> 
>>> 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) <
>>> mvallemil...@bloomberg.net>:
>>> 
 If I have an application that writes to a HBase cluster, can I count that
 the cluster will always available to receive writes?
 I might not be able to read if a region server which handles a range of
 keys is down, but will I be able to keep writing to other nodes, so
 everything get in sync when the other nodes get up again?
 Or I might get no write availability for a while?
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive 
>> thought, that is purely accidental. 
>> Use at your own risk. 
>> Michael Segel
>> michael_segel (AT) hotmail.com
> 
> 



Re: write availability

2015-04-07 Thread Wellington Chevreuil
When a RS goes down, the Master will try to assign the regions on the remaining 
RSes. When the RS comes back, after a while, the Master balancer process will 
re-distribute regions between RS, so the given RS will be hosting regions, but 
not necessarily the one it used to host before it went down.


On 7 Apr 2015, at 16:31, Marcelo Valle (BLOOMBERG/ LONDON) 
 wrote:

>> So if the cluster is up, then you can insert records in to HBase even though 
>> you lost a RS that was handing a specific region. 
> 
> What happens when the RS goes down? Writes to that region will be written to 
> another region server? Another RS assumes the region "range" while the RS is 
> down?
> 
> What happens when the RS that was down goes up again? 
> 
> 
> From: user@hbase.apache.org 
> Subject: Re: write availability
> 
> I don’t know if I would say that… 
> 
> I read Marcelo’s question of “if the cluster is up, even though a RS may be 
> down, can I still insert records in to HBase?”
> 
> So if the cluster is up, then you can insert records in to HBase even though 
> you lost a RS that was handing a specific region. 
> 
> But because he talked about syncing nodes… I could be misreading his initial 
> question… 
> 
>> On Apr 7, 2015, at 9:02 AM, Serega Sheypak  wrote:
>> 
>>> If I have an application that writes to a HBase cluster, can I count that
>> the cluster will always available to receive writes?
>> No, it's CP, not AP system.
>>> so everything get in sync when the other nodes get up again
>> There is no hinted backoff, It's not Cassandra.
>> 
>> 
>> 
>> 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) <
>> mvallemil...@bloomberg.net>:
>> 
>>> If I have an application that writes to a HBase cluster, can I count that
>>> the cluster will always available to receive writes?
>>> I might not be able to read if a region server which handles a range of
>>> keys is down, but will I be able to keep writing to other nodes, so
>>> everything get in sync when the other nodes get up again?
>>> Or I might get no write availability for a while?
> 
> The opinions expressed here are mine, while they may reflect a cognitive 
> thought, that is purely accidental. 
> Use at your own risk. 
> Michael Segel
> michael_segel (AT) hotmail.com



Re: HMaster does not start when Upgrading Hbase 0.94 to 0.98 (needed by new version of Drill)

2015-02-14 Thread Wellington Chevreuil
Hi,

It seems there's already a process using zookeeper port, which is preventing 
zookeeper from start properly. Does "netstat -nalp | grep 2181" return any 
result?

 
On 13 Feb 2015, at 21:31, Alexander Zarei  wrote:

> Hi,
> 
> I was wondering if you could help me solve this issue I am facing in setting 
> up a new Hbase 0.98.
> 
> So we had Hbase 0.94 running on our Drillbit machine. I stopped the .94, and 
> installed a fresh 0.98 from Apache website. 
> 
> I configured hbase-site.xml based on the Apache getting started page, the 
> file with the same name from 0.94 installation and this tutorial.
> 
> 
> The problem is that HMaster does not show up in the "jps" processes list. In 
> addition, when start_hbase.sh tries to start zookeeper the following 
> exception happens:
> 
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
> at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
> at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:111)
> at 
> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:91)
> at 
> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:76) 
> 
> 
> Also when I open "./bin/hbase shell"
> and run "list"
> the following error shows up:
> 
> ERROR: Can't get master address from ZooKeeper; znode data == null
> 
> I have also attached my zookeeper configuration file and hbase-site.xml file.
> 
>  I should also add that I do not need to upgrade files on the previous Hbase 
> or use anything from it. I just need to set up this new Hbase and then I will 
> populate the new tables. I am not sure if I should remove the 
> "maprfs:///hbase" which holds the information from previous installation or 
> not.
> 
> I will really appreciate it if you could help me set the new Hbase up.
> 
> Thanks,
> Alex
> 
> Alexander Zarei
> 
> Computer Scientist | Simba Technologies Inc.
> 
> +1.604.633.0008 | alexand...@simba.com
> 
>  
> 938 West 8th Avenue | Vancouver, BC | Canada | V5Z 1E5
> 
> The Big Data Connectivity Experts | www.simba.com
> 
> 



Re: Access hbase remotely from java client

2014-09-29 Thread Wellington Chevreuil
Hi,

You should not do this, as localhost should resolve to the own host. This is 
probably some missing property on the clients hbase configuration (make sure 
you have a proper hbase-site.xml on client's classpath or set configuration 
programatically). As a start, check if you had set the properties below on your 
client's hbase config. 

 
hbase.zookeeper.quorum
ZK_HOSTS
  
  
hbase.zookeeper.property.clientPort
2181
  


On 29 Sep 2014, at 13:43, SACHINGUPTA  wrote:

> Hello guys
> 
> I am using the hbase java api to connect to hbase remotely, but when I 
> executed the java code, got |MasterNotRunningException|. When I debugged the 
> code, I came to know that zookeeper was returning the address of hmaster as 
> localhost.localdomain, so the client was trying to search for the hmaster 
> locally. When I changed the |/etc/hosts| file of my local machine from where 
> i am running the java client as:
> 
> |  localhost.localdomain|
> 
> then it worked fine.
> 
> However, I think that this is not the right way. I think I have to change the 
> addresses somewhere in the configuration of zookeeper, but I did not get it.
> 
> please help
> thanks
> 
> 
> -- 
> Thanks
> Sachin Gupta
> 



Re: How to monitor HBase Region Server ?

2014-07-24 Thread Wellington Chevreuil
The master and region servers UI on hbase 0.98 provide all these details. You 
can access your master UI and from there navigate through your RS for specific 
details. 

http://YOUR-MASTER-HOST:60010/


On 24 Jul 2014, at 12:12, Pham Phuong Tu  wrote:

> Hi guys,
> 
> I want to monitor deep aspect ()of Hbase like Hannibal (
> https://github.com/sentric/hannibal), this tool isn't support for Hbase
> 0.98.
> 
> So, how can monitor these things:
> - Region distributon
> - Region split per table
> - Number of storefile
> - Size of the memstore
> - Size of the storefiles
> - Compactions
> 
> Thanks a lot, bro !
> -- 
> *-*
> 
> 
> *Pham Phuong TuBack-end & Big data developerSkype: phamphuongtu*



Re: Replication in Hbase

2014-07-22 Thread Wellington Chevreuil
I think you need to run a ZK instance apart from HBase. But if your main goal 
is to copy data from one cluster to another, you may use other options, such as 
CopyTable, Bulkload, or Export/Import tools. Replication will not copy data 
already inserted previously on your source Hbase, it only replicates 
transactions as it happens, from source to destination, only once replication 
has been enabled.

http://hbase.apache.org/book/ops_mgt.html#copytable

http://hbase.apache.org/book/ops_mgt.html#export

http://hbase.apache.org/book/ops_mgt.html#completebulkload





On 22 Jul 2014, at 07:09, Vimal Jain  wrote:

> One more info ,
> After putting some data in master cluster , i am getting below in its
> regionserver log.
> 
> 2014-07-22 11:38:14,032 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting
> 1 rs from peer cluster # 1
> 2014-07-22 11:38:14,032 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Choosing peer XYZ,60020,1406009110004
> 2014-07-22 11:38:15,032 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Since
> we are unable to replicate, sleeping 1000 times 10
> 
> where XYZ is my slave cluster's machine.
> 
> 
> 
> On Tue, Jul 22, 2014 at 11:21 AM, Vimal Jain  wrote:
> 
>> Hi,
>> I have 2 Hbase clusters setup in different data center.
>> Both are configured in pseudo-distributed mode.
>> I followed the steps in Hbase Replication
>> 
>> .
>> But i am getting following logs in master cluster's region server log.
>> 
>> 2014-07-22 11:19:19,186 DEBUG
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
>> log for replication ip-10-14-24-19%2C60020%2C1405945008796.1406006236991 at
>> 134
>> 2014-07-22 11:19:19,193 DEBUG
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>> currentNbOperations:0 and seenEntries:0 and size: 0
>> 2014-07-22 11:19:19,193 DEBUG
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Nothing
>> to replicate, sleeping 1000 times 10
>> 
>> 
>> What i am missing here ?
>> Also in one of the requirements mentioned in the above post , it says
>> zookeeper should not be managed by Hbase.But i have Hbase managing
>> zookeeper in both clusters.
>> 
>> 
>> Please help here.
>> Basically , i want to copy all data from one cluster to another which are
>> geographically distant.
>> 
>> 
>> --
>> Thanks and Regards,
>> Vimal Jain
>> 
> 
> 
> 
> -- 
> Thanks and Regards,
> Vimal Jain



Re: Hive and HBase doubts

2014-07-17 Thread Wellington Chevreuil
For more info on Hbase bulk loads:

http://hbase.apache.org/book/arch.bulk.load.html


On 17 Jul 2014, at 07:32, Nitin Pawar  wrote:

> 1) How can i update my data in hive table
> there is no update concept as of now in hive. Its write once and read many
> times. Though the work is in progress.
> For update there is a hack suggested here
> http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
> 
> 2)can i load bulk data into HBase
> Yes you can load bulk data into hbase.
> 
> 
> On Thu, Jul 17, 2014 at 11:05 AM, Vinod Sirigina 
> wrote:
> 
>> Hi apache team Good  Morning,
>> 
>> 1) How can i update my data in hive table
>> 2)can i load bulk data into HBase
>> 
> 
> 
> 
> -- 
> Nitin Pawar



Re: Copy data from one cluster to another cluster on different lan

2014-07-17 Thread Wellington Chevreuil
Hi,

If you r planning to copy hbase data between your clusters, you may find the 
CopyTable and Export/Import tools described in the book:

http://hbase.apache.org/book/ops_mgt.html#tools

If running hbase 0.94.6 onwards, there is also the option to use snapshots:

http://hbase.apache.org/book/ops.snapshots.html


On 17 Jul 2014, at 09:19, sudhakara st  wrote:

> Use *DistCp *for inter-cluster data copy,
> You can install Hbase,  Hbase setup  will not impact for you data
> inter-cluster copy until you sufficient resource
> 
> 
> On Thu, Jul 17, 2014 at 1:19 PM, Vimal Jain  wrote:
> 
>> Hi,
>> I have hadoop-1.2.1 and hbase-0.94.17 in pseudo distributed mode ( single
>> node ) in one data center ( lets say cluster1 in dataceneter1).
>> I have setup the same cluster in a different data center ( lets say
>> cluster2 in dataceneter2).
>> Whats the best way to copy data from cluster1 to cluster2 ?
>> Also If i choose to install hbase 0.98.3 on cluster2 , would there be any
>> issues ?
>> Please help here.
>> 
>> 
>> --
>> Thanks and Regards,
>> Vimal Jain
>> 
> 
> 
> 
> -- 
> 
> Regards,
> ...sudhakara



Re: HBase table is in LIMBO its neither Enabled Nor Disabled !!

2014-07-14 Thread Wellington Chevreuil
Hi Vikram,

You may be facing this issue: https://issues.apache.org/jira/browse/HBASE-6469

Have you already tried to restart your HMaster, as suggested in this jira? 

Thanks,
Wellington.

On 14 Jul 2014, at 10:32, Vikram Singh Chandel  
wrote:

> Hi
> 
> Cluster size : 35 nodes
> Table Split on Regions : 231
> 
> one of the table when described using shell shows Enable flag as FALSE
> 
> *when i am trying to disable it gives following error*
> 
> ERROR: org.apache.hadoop.hbase.*TableNotDisabledException:*
> org.apache.hadoop.hbase.TableNotDisabledException:
> imsi_30YR_84TO13_ONLYAUTHORS_Author
>at
> org.apache.hadoop.hbase.master.handler.EnableTableHandler.(EnableTableHandler.java:82)
>at
> org.apache.hadoop.hbase.master.HMaster.enableTable(HMaster.java:1346)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
>at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
> 
> *When i am trying to enable it gives this error*
> 
> ERROR: org.apache.hadoop.hbase.*TableNotEnabledException:*
> org.apache.hadoop.hbase.TableNotEnabledException:
> imsi_30YR_84TO13_ONLYAUTHORS_Author
>at
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.(DisableTableHandler.java:75)
>at
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1359)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
>at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
> 
> 
> I am able to perform scan/get on the table
> but unable to attach or remove a coprocessor because i can't disable it
> 
> -- 
> *Regards*
> 
> *VIKRAM SINGH CHANDEL*
> 
> Please do not print this email unless it is absolutely necessary,Reduce.
> Reuse. Recycle. Save our planet.



Re: HBase Master cannot be contacted by Region Servers

2014-07-09 Thread Wellington Chevreuil
I guess you also may need to set the zookeeper client port, on your hbase-site:

 
hbase.zookeeper.property.clientPort
2181
  


On 9 Jul 2014, at 14:18, Cosmin Cătălin Sanda  wrote:

> I have a Hadoop cluster made of 3 slaves and 1 master on top of which
> there is an HBase cluster with 3 RS and 1 master respectively.
> Additionally there is a Zookeeper ensemble on 3 machines.
> 
> The Hadoop cluster is functioning correctly as well as the Zookeeper
> ensemble. However, the HBase cluster fails to initialize correctly.
> 
> I start HBase it by running ./bin/start-hbase.sh. This correctly
> starts the HBase Master and the Region Servers. The hbase folder in
> hdfs is set-up correctly.
> 
> jps on master
> 
> hduser@master:~/hbase$ jps
> 5694 HMaster
> 3934 JobHistoryServer
> 3786 NameNode
> 3873 ResourceManager
> 6025 Jps
> 
> jps on slaves
> 
> 5737 Jps
> 5499 HRegionServer
> 3736 DataNode
> 3820 NodeManager
> 
> However, the HBase master does not register the Region Servers as it
> is also apparent from looking at the logs:
> 
> master log
> 
> [master:master:6] master.ServerManager: Waiting for region servers
> count to settle; currently checked in 0, slept for 1511 ms, expecting
> minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of
> 1500 ms.
> 
> slave log
> 
> [regionserver60020] regionserver.HRegionServer: reportForDuty to
> master=master,6,1404856451890 with port=60020,
> startcode=1404856453874
> [regionserver60020] regionserver.HRegionServer: error telling master we are up
> com.google.protobuf.ServiceException:
> org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout
> while waiting for channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending
> local=/10.0.2.15:53939 remote=master/192.168.66.60:6]
> 
> Here are the configuration details:
> 
> /etc/hosts on master
> 
> 192.168.66.63   slave-3 # Data Node and Region Server
> 192.168.66.60   master # Name Node and HBase Master
> 192.168.66.73   zookeeper-3 # Zookeeper node
> 192.168.66.71   zookeeper-1 # Zookeeper node
> 192.168.66.72   zookeeper-2 # Zookeeper node
> 192.168.66.62   slave-2 # Data Node and Region Server
> 192.168.66.61   slave-1 # Data Node and Region Server
> 
> /etc/hosts on slave-1
> 
> 192.168.66.60   master
> 192.168.66.73   zookeeper-3
> 192.168.66.71   zookeeper-1
> 192.168.66.72   zookeeper-2
> 
> hbase-site.xml on ALL cluster nodes
> 
> 
> 
> 
>
>hbase.tmp.dir
>/home/hduser/hbase/tmp
>
>
>hbase.rootdir
>hdfs://master/hbase
>
>
>hbase.cluster.distributed
>true
>
>
>hbase.local.dir
>/home/hduser/hbase/local
>
>
>hbase.master.info.port
>6010
>
>
>hbase.zookeeper.quorum
>zookeeper-1,zookeeper-2,zookeeper-3,
>
> 
> 
> regionservers file on master and slaves
> 
> slave-3
> slave-1
> slave-2
> 
> hbase-env.sh on master and slaves
> 
> export JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::"
> export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
> export HBASE_MANAGES_ZK=false
> 
> What am I doing wrong so that the nodes cannot talk to each other?
> 
> I am using Hadoop 2.4.0 and HBase 0.98.3 along with Zookeeper 3.4.6 on
> Ubuntu Trusty Tahr x64.



Re: HBase Master cannot be contacted by Region Servers

2014-07-09 Thread Wellington Chevreuil
Hi,

Had you already tried to set the port on hbase.rootdir for hbase-site.xml, like 
below:

   
   hbase.rootdir
   hdfs://master:8020/hbase


Cheers.

On 9 Jul 2014, at 14:18, Cosmin Cătălin Sanda  wrote:

> I have a Hadoop cluster made of 3 slaves and 1 master on top of which
> there is an HBase cluster with 3 RS and 1 master respectively.
> Additionally there is a Zookeeper ensemble on 3 machines.
> 
> The Hadoop cluster is functioning correctly as well as the Zookeeper
> ensemble. However, the HBase cluster fails to initialize correctly.
> 
> I start HBase it by running ./bin/start-hbase.sh. This correctly
> starts the HBase Master and the Region Servers. The hbase folder in
> hdfs is set-up correctly.
> 
> jps on master
> 
> hduser@master:~/hbase$ jps
> 5694 HMaster
> 3934 JobHistoryServer
> 3786 NameNode
> 3873 ResourceManager
> 6025 Jps
> 
> jps on slaves
> 
> 5737 Jps
> 5499 HRegionServer
> 3736 DataNode
> 3820 NodeManager
> 
> However, the HBase master does not register the Region Servers as it
> is also apparent from looking at the logs:
> 
> master log
> 
> [master:master:6] master.ServerManager: Waiting for region servers
> count to settle; currently checked in 0, slept for 1511 ms, expecting
> minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of
> 1500 ms.
> 
> slave log
> 
> [regionserver60020] regionserver.HRegionServer: reportForDuty to
> master=master,6,1404856451890 with port=60020,
> startcode=1404856453874
> [regionserver60020] regionserver.HRegionServer: error telling master we are up
> com.google.protobuf.ServiceException:
> org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout
> while waiting for channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending
> local=/10.0.2.15:53939 remote=master/192.168.66.60:6]
> 
> Here are the configuration details:
> 
> /etc/hosts on master
> 
> 192.168.66.63   slave-3 # Data Node and Region Server
> 192.168.66.60   master # Name Node and HBase Master
> 192.168.66.73   zookeeper-3 # Zookeeper node
> 192.168.66.71   zookeeper-1 # Zookeeper node
> 192.168.66.72   zookeeper-2 # Zookeeper node
> 192.168.66.62   slave-2 # Data Node and Region Server
> 192.168.66.61   slave-1 # Data Node and Region Server
> 
> /etc/hosts on slave-1
> 
> 192.168.66.60   master
> 192.168.66.73   zookeeper-3
> 192.168.66.71   zookeeper-1
> 192.168.66.72   zookeeper-2
> 
> hbase-site.xml on ALL cluster nodes
> 
> 
> 
> 
>
>hbase.tmp.dir
>/home/hduser/hbase/tmp
>
>
>hbase.rootdir
>hdfs://master/hbase
>
>
>hbase.cluster.distributed
>true
>
>
>hbase.local.dir
>/home/hduser/hbase/local
>
>
>hbase.master.info.port
>6010
>
>
>hbase.zookeeper.quorum
>zookeeper-1,zookeeper-2,zookeeper-3,
>
> 
> 
> regionservers file on master and slaves
> 
> slave-3
> slave-1
> slave-2
> 
> hbase-env.sh on master and slaves
> 
> export JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::"
> export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
> export HBASE_MANAGES_ZK=false
> 
> What am I doing wrong so that the nodes cannot talk to each other?
> 
> I am using Hadoop 2.4.0 and HBase 0.98.3 along with Zookeeper 3.4.6 on
> Ubuntu Trusty Tahr x64.



Re: Store data in HBase with a MapReduce.

2014-06-26 Thread Wellington Chevreuil
Hi Guillermo,

You can use the TableOutputFormat as the output format for your job, then on 
your reduce, you just need to write Put objects. 

On your driver:

Job job = new Job(conf);
…
job.setOutputFormatClass(TableOutputFormatClass);
job.setReducerClass(AverageReducer.class);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "table");
job.setOutputKeyClass(ImmutableBytesWritable.class); 
job.setOutputValueClass(Writable.class);
...

On your reducer, just create related puts and write it:

Put put = new Put();
ImmutableBytesWritable key = new ImmutableBytesWritable();
...
context.write(key, put);


Cheers,
Wellington.


On 26 Jun 2014, at 16:24, Guillermo Ortiz  wrote:

> I have a question.
> I want to execute an MapReduce and the output of my reduce it's going to
> store in HBase.
> 
> So, it's a MapReduce with an output which it's going to be stored in HBase.
> I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob,
> table); but, I don't know how I could do it if I have a Reduce as well,,
> since the configureIncrementalLoad generates an reduce.



Re: No of column qualifiers in a column family

2014-06-15 Thread Wellington Chevreuil
Hi Vimal,

Adding to Dima's comment, just be aware about how large your rows will
become in bytes, with so much cqs. If you end up with a row larger than the
configured size limit for regions, regionserver will have problems to split
the region. Also, if your schema is defined by very wide rows, where most
of your data is stored in few rows, then you may face performance problems,
because most of your data will be stored on single regions, thus being
served by the same RegionServer.

To understand a little more about how data is split in hbase, and the
implications it can have to performance, you can look at the links below:

http://hbase.apache.org/book/regions.arch.html#arch.region.splits
http://hbase.apache.org/book/ops.capacity.html#ops.capacity.regions.count

Cheers.


2014-06-15 12:08 GMT+01:00 Dima Spivak :

> Hi Vimal,
>
> There's no limit on how many qualifiers a particular column family can
> have, nor should there be any performance degradation due to simply having
> different numbers of qualifiers for different families (unless this leads
> to huge differences in row numbers between families). For more details,
> please take a look at http://hbase.apache.org/book/number.of.cfs.html
>
> Cheers,
>Dima
>
> On Sunday, June 15, 2014, Vimal Jain  wrote:
>
> > Hi,
> > I am planning to have one table with 3 column families(cf) ,having around
> > 200,100,200 column qualifiers (cq) in each of them resp.
> > Whats the number of cq a cf can hold ?
> > Also having different numbers of cqs in family ( as in above 2 cfs have
> 200
> > while the other one has 100 ) will have any impact on performance ?
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>


Re: high load average in one region server

2014-06-13 Thread Wellington Chevreuil
Might be worth check hbase ui (http://hbase-host:60010/), you will see a page 
where there is a table “Region Servers”, and you can check there if regions are 
evenly spread around your RS. From there, you can click on the link for each of 
your RS, and you can find more information specific for each region being 
managed by the RS, like read and write requests for each region, and the start 
key and end key of each region. Then, if some regions are having much more 
requests than others, or the store files are much bigger, than you can think on 
split these regions, and also change your row-key design to spread your 
records. You can find some useful information about rowkey design here: 
http://hbase.apache.org/book/rowkey.design.html.

Cheers 

 
On 12 Jun 2014, at 02:49, Ted Yu  wrote:

> Which hbase release are you using ?
> 
> Could this be related to how your schema is designed ?
> 
> Have you run jstack for region server on mphbase2 ?
> 
> BTW the tables are not easy to read.
> If you have pictures, you can put them on some website and include links.
> 
> Cheers
> 
> 
> On Wed, Jun 11, 2014 at 6:38 PM, Li Li  wrote:
> 
>>   I have 5 region server hbase cluster. today I found one rs server's
>> load average is above 100 while the other 4 is less than 1.
>>   I use vmstat and dstat and found that this high load machine have
>> large number of read(about 30M/s) and network sent.
>>   Does that mean the cluster suffers hot spot? the slow machine is
>> mphbase2
>>   1. base statistics
>>  ServerName Start time Requests Per Second Num. Regions
>> mphbase1,60020,1402298228045 Mon Jun 09 15:17:08 CST 2014 586 35
>> mphbase2,60020,1402298228527 Mon Jun 09 15:17:08 CST 2014 539 32
>> mphbase3,60020,1402298228361 Mon Jun 09 15:17:08 CST 2014 966 32
>> mphbase4,60020,1402298159826 Mon Jun 09 15:15:59 CST 2014 518 35
>> mphbase5,60020,1402298228382 Mon Jun 09 15:17:08 CST 2014 442 36
>> Total:5 3051 170
>> 
>>2. storefiles
>> ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed
>> Storefile Size Index Size Bloom Size
>> mphbase1,60020,1402298228045 35 67 11872m 11874mb 8783k 34818k
>> mphbase2,60020,1402298228527 32 61 11976m 11977mb 8882k 34976k
>> mphbase3,60020,1402298228361 32 66 18321m 18325mb 13470k 54872k
>> mphbase4,60020,1402298159826 35 72 13842m 13848mb 10753k 31784k
>> mphbase5,60020,1402298228382 36 78 15021m 15027mb 15329k 29321k
>> 
>>3. hdfs info(from hdfs)
>> Live Datanodes : 5
>> Node Last Contact  Admin Stat  Configured Capacity (GB) Used (GB)  Non
>> DFS Used (GB)  Remaining (GB)  Used(%) Used(%) Remaining(%)  Blocks
>> mphbase1 0 In Service 457.55 53.28 52.23 352.04 11.64 76.94 1150
>> mphbase2 1 In Service 457.55 46.56 48.89 362.1 10.18 79.14 971
>> mphbase3 0 In Service 457.55 52.05 55.6 349.89 11.38 76.47 1128
>> mphbase4 1 In Service 457.55 50.25 36.88 370.42 10.98 80.96 1254
>> mphbase5 2 In Service 457.55 55.2 49.29 353.06 12.06 77.16 1338
>> 



Re: HBase and HDFS HA failover with QJM

2014-06-08 Thread Wellington Chevreuil
Hi Jerry,

There's no need for additional HBase configuration when using HDFS HA, as
it works as an HDFS client and RS will reach NN through nameservice
configured on hdfs-site, and once both active and standby NNs have an up to
date image of HDFS state, during a failover the transition between NNs is
transparent for clients (including HBASE).


2014-06-08 2:46 GMT+01:00 Jerry He :

> Hi, guys
>
> Does anybody have experience on HBase and HDFS HA failover with QJM?
> Any recommended settings for HBase to make region servers ride over the NN
> failover smoothly under load?
>
> Particularly, is there a need to set these in hbase-site.xml?
>
> dfs.client.retry.policy.enabled
> dfs.client.retry.policy.spec
>
> Thanks in advance for the help!
>
> Jerry
>