Re: Major compaction

2016-04-04 Thread Sumit Nigam
Hello all,
Thanks a lot for your replies.
@Frank - I will try the compactor you wrote and let you know how it goes.
@Esteban - I am trying to understand how to reduce major compaction load. In my 
cluster whenever it happens, it takes down a region server or two. My setting 
are - disable time based compaction, make blocking files = 1000, min files to 
compact = 5, max files to compact = 10. Other settings are defaults. So, a 
question I had is if a major compaction of certain table is performed a few 
mins back, then would another major compaction of the same take a lot of time? 
If not, why not process delete markers/ version related expulsions more often 
by kicking in major compaction more often? Assuming, splits are not happening 
after every major compaction cycle, we should be able to manage major 
compaction better by spreading this load more evenly rather than doing once a 
day? Of course, I am missing something?
@Vladimir - By same understanding that I mentioned above, would every major 
compaction really lead to more disk I/O or n/w or hbase process load? I assumed 
that the main things done by major compaction are deletes, creating a huge file 
per store and region splits. If this process has already been done say, a few 
mins back (or could be an hour back), then the next compaction should ideally 
have less to do. Not sure, if I am misunderstanding major compaction altogether.
Please let me know your inputs.
Best regards,Sumit

  From: Frank Luo 
 To: "user@hbase.apache.org"  
Cc: Sumit Nigam 
 Sent: Monday, April 4, 2016 11:07 PM
 Subject: RE: Major compaction
   
I wrote a small program to do MC in a "smart" way here: 
https://github.com/jinyeluo/smarthbasecompactor/

Instead of blindly running MC on a table level, the program find a non-hot 
regions that has most store-files on a per region-server base, and run MC on 
them. Once done, it finds the next candidates... It just keeps on going until 
time is up.

I am sure it has a lot area for improvement if something wants to go crazy on. 
But the code has been running for about half a year and it seems working well.

-Original Message-
From: Vladimir Rodionov [mailto:vladrodio...@gmail.com]
Sent: Monday, April 04, 2016 12:15 PM
To: user@hbase.apache.org
Cc: Sumit Nigam 
Subject: Re: Major compaction

>> Why I am trying to understand this is because Hbase also sets it to
>> 24
hour default (for time based compaction) and I am looking to lower it to say >> 
20 mins to reduce stress by spreading the load.

The more frequently you run major compaction the more IO (disk/network) you 
consume.

Usually, in production environment, periodic major compactions are disabled and 
run manually  to avoid major compaction storms.

To control major compaction completely you will also need to disable promotion 
minor compaction to major ones. You can do this, by setting maximum compaction 
size for minor compaction:
*hbase.hstore.compaction.max.size*

-Vlad


On Mon, Apr 4, 2016 at 8:55 AM, Esteban Gutierrez 
wrote:

> Hello Sumit,
>
> Ideally you shouldn't be triggering major compactions that frequently
> since minor compactions should be taking care of reducing the number
> of store files. The caveat of doing it more frequently is the
> additional disk/network I/O.
>
> Can you please elaborate more on "reduce stress by spreading the
> load." Is there anything else you are seeing in your cluster that is
> suggesting to you to lower the period for major compactions?
>
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Mon, Apr 4, 2016 at 8:35 AM, Sumit Nigam
> 
> wrote:
>
> > Hi,
> > Are there major overheads to running major compaction frequently? As
> > much as I know, it produces one Hfile for a region and processes
> > delete
> markers
> > and version related drops. So, if this process has happened once
> > say. a
> few
> > mins back then another major compaction should ideally not cause
> > much
> harm.
> > Why I am trying to understand this is because Hbase also sets it to
> > 24 hour default (for time based compaction) and I am looking to
> > lower it to say 20 mins to reduce stress by spreading the load.
> > Or am I completely off-track?
> > Thanks,Sumit
>
Merkle was named a leader in Customer Insights Services Providers by Forrester 
Research


Forrester Research report names 500friends, a Merkle Company, a leader in 
customer Loyalty Solutions for Midsize 
Organizations
This email and any attachments transmitted with it 

Re: hbase custom scan

2016-04-04 Thread Shushant Arora
table will have ~100 regions.

I did n't get the advantage of top rows from same vs different regions ?
They will come from different regions .

On Tue, Apr 5, 2016 at 9:10 AM, Ted Yu  wrote:

> How many regions does your table have ?
>
> After sorting, is there a chance that the top N rows come from distinct
> regions ?
>
> On Mon, Apr 4, 2016 at 8:27 PM, Shushant Arora 
> wrote:
>
> > Hi
> >
> > I have a requirement to scan a hbase table based on insertion timestamp.
> > I need to fetch the keys sorted by insertion timestamp not by key .
> >
> > I can't made timestamp as prefix of key to avoid hot spotting.
> > Is there any efficient way possible for this requirement.
> >
> > Thanks!
> >
>


Re: hbase custom scan

2016-04-04 Thread Ted Yu
How many regions does your table have ?

After sorting, is there a chance that the top N rows come from distinct
regions ?

On Mon, Apr 4, 2016 at 8:27 PM, Shushant Arora 
wrote:

> Hi
>
> I have a requirement to scan a hbase table based on insertion timestamp.
> I need to fetch the keys sorted by insertion timestamp not by key .
>
> I can't made timestamp as prefix of key to avoid hot spotting.
> Is there any efficient way possible for this requirement.
>
> Thanks!
>


hbase custom scan

2016-04-04 Thread Shushant Arora
Hi

I have a requirement to scan a hbase table based on insertion timestamp.
I need to fetch the keys sorted by insertion timestamp not by key .

I can't made timestamp as prefix of key to avoid hot spotting.
Is there any efficient way possible for this requirement.

Thanks!


答复: Major compaction

2016-04-04 Thread Liu, Ming (Ming)
Thanks Frank, this is something I am looking for. Would like to have a try with 
it.

Thanks,
Ming

-邮件原件-
发件人: Frank Luo [mailto:j...@merkleinc.com] 
发送时间: 2016年4月5日 1:38
收件人: user@hbase.apache.org
抄送: Sumit Nigam 
主题: RE: Major compaction

I wrote a small program to do MC in a "smart" way here: 
https://github.com/jinyeluo/smarthbasecompactor/

Instead of blindly running MC on a table level, the program find a non-hot 
regions that has most store-files on a per region-server base, and run MC on 
them. Once done, it finds the next candidates... It just keeps on going until 
time is up.

I am sure it has a lot area for improvement if something wants to go crazy on. 
But the code has been running for about half a year and it seems working well.

-Original Message-
From: Vladimir Rodionov [mailto:vladrodio...@gmail.com]
Sent: Monday, April 04, 2016 12:15 PM
To: user@hbase.apache.org
Cc: Sumit Nigam 
Subject: Re: Major compaction

>> Why I am trying to understand this is because Hbase also sets it to
>> 24
hour default (for time based compaction) and I am looking to lower it to say >> 
20 mins to reduce stress by spreading the load.

The more frequently you run major compaction the more IO (disk/network) you 
consume.

Usually, in production environment, periodic major compactions are disabled and 
run manually  to avoid major compaction storms.

To control major compaction completely you will also need to disable promotion 
minor compaction to major ones. You can do this, by setting maximum compaction 
size for minor compaction:
*hbase.hstore.compaction.max.size*

-Vlad


On Mon, Apr 4, 2016 at 8:55 AM, Esteban Gutierrez 
wrote:

> Hello Sumit,
>
> Ideally you shouldn't be triggering major compactions that frequently 
> since minor compactions should be taking care of reducing the number 
> of store files. The caveat of doing it more frequently is the 
> additional disk/network I/O.
>
> Can you please elaborate more on "reduce stress by spreading the 
> load." Is there anything else you are seeing in your cluster that is 
> suggesting to you to lower the period for major compactions?
>
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Mon, Apr 4, 2016 at 8:35 AM, Sumit Nigam 
> 
> wrote:
>
> > Hi,
> > Are there major overheads to running major compaction frequently? As 
> > much as I know, it produces one Hfile for a region and processes 
> > delete
> markers
> > and version related drops. So, if this process has happened once 
> > say. a
> few
> > mins back then another major compaction should ideally not cause 
> > much
> harm.
> > Why I am trying to understand this is because Hbase also sets it to
> > 24 hour default (for time based compaction) and I am looking to 
> > lower it to say 20 mins to reduce stress by spreading the load.
> > Or am I completely off-track?
> > Thanks,Sumit
>
Merkle was named a leader in Customer Insights Services Providers by Forrester 
Research 


Forrester Research report names 500friends, a Merkle Company, a leader in 
customer Loyalty Solutions for Midsize 
Organizations
This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


Re: HBase table map to hive

2016-04-04 Thread Wojciech Indyk
Hi!
You can use map on your column family or a prefix of
column qualifier.
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily

--
Kind regards/ Pozdrawiam,
Wojciech Indyk
http://datacentric.pl


2016-04-04 14:13 GMT+02:00 ram kumar :
> Hi,
>
> I have a hbase table with column name changes (increases) over time.
> Is there a way to map such hbase to hive table,
> inferring schema from the hbase table?
>
> Thanks


RE: Major compaction

2016-04-04 Thread Frank Luo
I wrote a small program to do MC in a "smart" way here: 
https://github.com/jinyeluo/smarthbasecompactor/

Instead of blindly running MC on a table level, the program find a non-hot 
regions that has most store-files on a per region-server base, and run MC on 
them. Once done, it finds the next candidates... It just keeps on going until 
time is up.

I am sure it has a lot area for improvement if something wants to go crazy on. 
But the code has been running for about half a year and it seems working well.

-Original Message-
From: Vladimir Rodionov [mailto:vladrodio...@gmail.com]
Sent: Monday, April 04, 2016 12:15 PM
To: user@hbase.apache.org
Cc: Sumit Nigam 
Subject: Re: Major compaction

>> Why I am trying to understand this is because Hbase also sets it to
>> 24
hour default (for time based compaction) and I am looking to lower it to say >> 
20 mins to reduce stress by spreading the load.

The more frequently you run major compaction the more IO (disk/network) you 
consume.

Usually, in production environment, periodic major compactions are disabled and 
run manually  to avoid major compaction storms.

To control major compaction completely you will also need to disable promotion 
minor compaction to major ones. You can do this, by setting maximum compaction 
size for minor compaction:
*hbase.hstore.compaction.max.size*

-Vlad


On Mon, Apr 4, 2016 at 8:55 AM, Esteban Gutierrez 
wrote:

> Hello Sumit,
>
> Ideally you shouldn't be triggering major compactions that frequently
> since minor compactions should be taking care of reducing the number
> of store files. The caveat of doing it more frequently is the
> additional disk/network I/O.
>
> Can you please elaborate more on "reduce stress by spreading the
> load." Is there anything else you are seeing in your cluster that is
> suggesting to you to lower the period for major compactions?
>
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Mon, Apr 4, 2016 at 8:35 AM, Sumit Nigam
> 
> wrote:
>
> > Hi,
> > Are there major overheads to running major compaction frequently? As
> > much as I know, it produces one Hfile for a region and processes
> > delete
> markers
> > and version related drops. So, if this process has happened once
> > say. a
> few
> > mins back then another major compaction should ideally not cause
> > much
> harm.
> > Why I am trying to understand this is because Hbase also sets it to
> > 24 hour default (for time based compaction) and I am looking to
> > lower it to say 20 mins to reduce stress by spreading the load.
> > Or am I completely off-track?
> > Thanks,Sumit
>
Merkle was named a leader in Customer Insights Services Providers by Forrester 
Research


Forrester Research report names 500friends, a Merkle Company, a leader in 
customer Loyalty Solutions for Midsize 
Organizations
This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


Re: Major compaction

2016-04-04 Thread Vladimir Rodionov
>> Why I am trying to understand this is because Hbase also sets it to 24
hour default (for time based compaction) and I am looking to lower it to
say >> 20 mins to reduce stress by spreading the load.

The more frequently you run major compaction the more IO (disk/network) you
consume.

Usually, in production environment, periodic major compactions are disabled
and run manually  to avoid major compaction storms.

To control major compaction completely you will also need to disable
promotion minor compaction to major ones. You can do this, by setting
maximum compaction size for minor compaction:
*hbase.hstore.compaction.max.size*

-Vlad


On Mon, Apr 4, 2016 at 8:55 AM, Esteban Gutierrez 
wrote:

> Hello Sumit,
>
> Ideally you shouldn't be triggering major compactions that frequently since
> minor compactions should be taking care of reducing the number of store
> files. The caveat of doing it more frequently is the additional
> disk/network I/O.
>
> Can you please elaborate more on "reduce stress by spreading the load." Is
> there anything else you are seeing in your cluster that is suggesting to
> you to lower the period for major compactions?
>
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Mon, Apr 4, 2016 at 8:35 AM, Sumit Nigam 
> wrote:
>
> > Hi,
> > Are there major overheads to running major compaction frequently? As much
> > as I know, it produces one Hfile for a region and processes delete
> markers
> > and version related drops. So, if this process has happened once say. a
> few
> > mins back then another major compaction should ideally not cause much
> harm.
> > Why I am trying to understand this is because Hbase also sets it to 24
> > hour default (for time based compaction) and I am looking to lower it to
> > say 20 mins to reduce stress by spreading the load.
> > Or am I completely off-track?
> > Thanks,Sumit
>


Re: Connecting to hbase 1.0.3 via java client stuck at zookeeper.ClientCnxn: Session establishment complete on server

2016-04-04 Thread Sachin Mittal
There is additional information I would like to share with you which points
to region server dying or something or connecting/resolving a wrong region
server.

Here is the log when trying to connect to server:

[main-EventThread] zookeeper.ZooKeeperWatcher:
hconnection-0x1e67b872-0x153e135af570008 connected
[main] client.ZooKeeperRegistry: Looking up meta region location in
ZK, connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@69b794e2
[main] client.ZooKeeperRegistry: Looked up meta region location,
connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@69b794e2;
serverName=sachin-pc,55964,1459772310378[main] client.MetaCache:
Cached location: [region=hbase:meta,,1.1588230740,
hostname=sachin-pc,55964,1459772310378, seqNum=0]
[hconnection-0x1e67b872-shared--pool1-t1] ipc.AbstractRpcClient:
Connecting to Sachin-PC/127.0.0.1:55964

java.net.SocketException: Socket is closed
at sun.nio.ch.SocketAdaptor.getOutputStream(Unknown Source)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.closeConnection(RpcClientImpl.java:429)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleConnectionFailure(RpcClientImpl.java:477)


java.net.SocketException: Socket is closed
at sun.nio.ch.SocketAdaptor.getInputStream(Unknown Source)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.closeConnection(RpcClientImpl.java:436)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleConnectionFailure(RpcClientImpl.java:477)


[hconnection-0x1e67b872-shared--pool1-t1] ipc.AbstractRpcClient: IPC
Client (1890187342) connection to Sachin-PC/127.0.0.1:55964 from
Sachin: marking at should close, reason: Connection refused: no
further information


So you can see that it gets the right region server and tries to connect to
Sachin-PC/127.0.0.1:55964.

However here it is getting socket exception.

Also via the master web UI I found a region server with name:
sachin-pc,55964,1459772310378
Now in logs I see connection to Sachin-PC/127.0.0.1:55964 which is failing.
Also on netstat sometimes I see
TCP 192.168.1.102:60737 Sachin-PC:0 LISTENING
but most of the time I see nothing.
This means region servers are failing a lot and it is listening on PC IP
address 192.168.1.102 and not on localhost 127.0.0.1 Maybe that is also
causing an issue.

But again the question is why it is trying to connect to
Sachin-PC/127.0.0.1:55964.which is the region server name, actually region
server is running on a different port say 55950 or something.


Thanks
Sachin



On Sun, Apr 3, 2016 at 10:36 PM, Sachin Mittal  wrote:

> I am stuck on connecting to hbase 1.0.3 via simple java client.
> The program hangs at:
>
> [main] zookeeper.ZooKeeper: Initiating client connection,
> connectString=127.0.0.1:2181 sessionTimeout=9
> watcher=hconnection-0x1e67b8720x0, quorum=127.0.0.1:2181,
> baseZNode=/hbaseenter code here
> [main-SendThread(127.0.0.1:2181)] zookeeper.ClientCnxn: Opening
> socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to
> authenticate using SASL (unknown error)
> [main-SendThread(127.0.0.1:2181)] zookeeper.ClientCnxn: Socket
> connection established to 127.0.0.1/127.0.0.1:2181, initiating session
> [main-SendThread(127.0.0.1:2181)] zookeeper.ClientCnxn: Session
> establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid =
> 0x153d8383c530008, negotiated timeout = 4
>
> The code is very simple and standard:
>
>   public static void main(String args[]) throws IOException{
> // Instantiating Configuration class
> Configuration config = HBaseConfiguration.create();
> Connection connection = ConnectionFactory.createConnection(config);
>   // Instantiating Table class
> Table  table =
> connection.getTable(TableName.valueOf(HBaseTables.APPLICATION_TRACE_INDEX));
>// Instantiating the Scan class
> Scan scan = new Scan();
>  // Getting the scan result
> ResultScanner scanner = table.getScanner(scan);
> // Reading values from scan result
> for (Result result = scanner.next(); result != null; result =
> scanner.next()) {
> System.out.println("Found row : " + result);
> }
> //closing the scanner
> scanner.close();
> table.close();
> connection.close();
>  }
>
>
> The jars I am using are:
>
> commons-collections-3.2.1.jar
> commons-configuration-1.6.jar
> commons-lang-2.6.jar
> commons-logging-1.2.jar
> guava-12.0.1.jar
> hadoop-auth-2.5.1.jar
> hadoop-client-2.5.1.jar
> hadoop-common-2.5.1.jar
> hbase-client-1.0.3.jar
> hbase-common-1.0.3.jar
> hbase-hadoop-compat-1.0.3.jar
> hbase-hadoop2-compat-1.0.3.jar
> hbase-it-1.0.3.jar
> hbase-protocol-1.0.3.jar
> hbase-resource-bundle-1.0.3.jar
> hbase-rest-1.0.3.jar
> htrace-core-3.0.4.jar
> htrace-core-3.1.0-incubating.jar
> log4j-1.2.17.jar
> 

Re: Major compaction

2016-04-04 Thread Esteban Gutierrez
Hello Sumit,

Ideally you shouldn't be triggering major compactions that frequently since
minor compactions should be taking care of reducing the number of store
files. The caveat of doing it more frequently is the additional
disk/network I/O.

Can you please elaborate more on "reduce stress by spreading the load." Is
there anything else you are seeing in your cluster that is suggesting to
you to lower the period for major compactions?

esteban.

--
Cloudera, Inc.


On Mon, Apr 4, 2016 at 8:35 AM, Sumit Nigam 
wrote:

> Hi,
> Are there major overheads to running major compaction frequently? As much
> as I know, it produces one Hfile for a region and processes delete markers
> and version related drops. So, if this process has happened once say. a few
> mins back then another major compaction should ideally not cause much harm.
> Why I am trying to understand this is because Hbase also sets it to 24
> hour default (for time based compaction) and I am looking to lower it to
> say 20 mins to reduce stress by spreading the load.
> Or am I completely off-track?
> Thanks,Sumit


Major compaction

2016-04-04 Thread Sumit Nigam
Hi,
Are there major overheads to running major compaction frequently? As much as I 
know, it produces one Hfile for a region and processes delete markers and 
version related drops. So, if this process has happened once say. a few mins 
back then another major compaction should ideally not cause much harm. 
Why I am trying to understand this is because Hbase also sets it to 24 hour 
default (for time based compaction) and I am looking to lower it to say 20 mins 
to reduce stress by spreading the load.
Or am I completely off-track?
Thanks,Sumit

Re: Retiring empty regions

2016-04-04 Thread Nick Dimiduk
> Crazy idea, but you might be able to take stripped down version of region
> normalizer code and make a Tool to run? Requesting split or merge is done
> through the client API, and the only weighing information you need is
> whether region empty or not, that you could find out too?

Yeah, that's the direction I'm headed.

> A bit off topic, but I think unfortunately region normalizer now ignores
> empty regions to avoid undoing pre-split on the table.

Unfortunate indeed. Maybe we should be keeping around the initial splits
list as a metadata attribute on the table?

> With a right row-key design you will never have empty regions due to TTL.

I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
on the list would appreciate your Cliff's Notes version. I can take this
into account for my v2 schema design.

> So Nick, merge on 1.1 is not recommended??? Was working very well on
> previous versions. Is ProcV2 really impact it that bad??

How to answer here carefully... I have no reason to believe merge is not
working on 1.1. I've been on the wrong end of enough "regions stuck in
transition" support tickets that I'm not keen to put undue stress on my
master. ProcV2 insures against many scenarios that cause master trauma,
hence my interest in the implementation details and my preference for
cluster administration tasks that use it as their source of authority.

Thanks for the thoughts folks.
-n

On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> ;) That was not the question ;)
>
> So Nick, merge on 1.1 is not recommended??? Was working very well on
> previous versions. Is ProcV2 really impact it that bad??
>
> JMS
>
> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov :
>
> > >> This is something
> > >> which makes it far less useful for time-series databases with short
> TTL
> > on
> > >> the tables.
> >
> > With a right row-key design you will never have empty regions due to TTL.
> >
> > -Vlad
> >
> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov 
> > wrote:
> >
> > > Crazy idea, but you might be able to take stripped down version of
> region
> > > normalizer code and make a Tool to run? Requesting split or merge is
> done
> > > through the client API, and the only weighing information you need is
> > > whether region empty or not, that you could find out too?
> > >
> > >
> > > "Short of upgrading to 1.2 for the region normalizer,"
> > >
> > > A bit off topic, but I think unfortunately region normalizer now
> ignores
> > > empty regions to avoid undoing pre-split on the table. This is
> something
> > > which makes it far less useful for time-series databases with short TTL
> > on
> > > the tables. We'll need to address that.
> > >
> > > -Mikhail
> > >
> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk 
> > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I have a table with TTL enabled. It's been receiving data for a while
> > > > beyond the TTL and I now have a number of empty regions. I'd like to
> > drop
> > > > those empty regions to free up heap space on the region servers and
> > > reduce
> > > > master load. I'm running a 1.1 derivative.
> > > >
> > > > The only threads I found on this topic are from circa 0.92 timeframe.
> > > >
> > > > Short of upgrading to 1.2 for the region normalizer, what's the
> > > recommended
> > > > method of cleaning up this cruft? Should I be merging empty regions
> > into
> > > > their neighbor's? Looks like region merge hasn't been migrated to
> > ProcV2
> > > > yet so would be wise to reduce online table activity, or at least aim
> > > for a
> > > > "quiet period"? Is there a documented process for off-lining and
> > > deleting a
> > > > region by name? I don't see anything in the book about it.
> > > >
> > > > I experimented with online merge on pseudodist, looks like it's
> working
> > > > fine for the most basic case. I'll probably pursue this unless
> someone
> > > has
> > > > some other ideas.
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Michael Antonov
> > >
> >
>


HBase table map to hive

2016-04-04 Thread ram kumar
Hi,

I have a hbase table with column name changes (increases) over time.
Is there a way to map such hbase to hive table,
inferring schema from the hbase table?

Thanks