Re: REST Vs RPC

2014-11-17 Thread Dhaval Shah
Almost always RPC. Its more optimized for this use case 
Regards,Dhaval
  From: Jignesh Patel jigneshmpa...@gmail.com
 To: user@hbase.apache.org 
 Sent: Monday, 17 November 2014 12:05 PM
 Subject: REST Vs RPC
   
Which one is faster and better
1. REST
2. RPC

I am not looking in a context of technology independence, but if we are
using java
as a client, which is more robust.


  

Re: Duplicate Value Inserts in HBase

2014-10-21 Thread Dhaval Shah
You can achieve what you want using versions and some hackery with timestamps


Sent from my T-Mobile 4G LTE Device


 Original message 
From: Jean-Marc Spaggiari jean-m...@spaggiari.org 
Date:10/21/2014  9:02 AM  (GMT-05:00) 
To: user user@hbase.apache.org 
Cc:  
Subject: Re: Duplicate Value Inserts in HBase 

You can do check and puts to validate if value is already there, but it's
slower...

2014-10-21 8:50 GMT-04:00 Krishna Kalyan krishnakaly...@gmail.com:

 Thanks Jean,
 If i put the same value in my table for a particular column for a rowkey i
 want HBase reject this value and retain old value with old time stamp.
 In other words update only when value changes.

 Regards,
 Krishna

 On Tue, Oct 21, 2014 at 6:02 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Krishna,
 
  HBase will store them in the same row, same cell but you will have 2
  versions. If you want to keep just one, setup the version=1 on the table
  side and only one will be stored. Is that what yo mean?
 
  JM
 
  2014-10-21 8:29 GMT-04:00 Krishna Kalyan krishnakaly...@gmail.com:
 
   Hi,
   I have a HBase table which is populated from pig using PigStorage.
   While inserting, suppose for rowkey i have a duplicate value.
   Is there a way to prevent an update?.
   I want to maintain the version history for my values which are unique.
  
   Regards,
   Krishna
  
 



Re: Connection pool Concurrency in HBase

2014-08-04 Thread Dhaval Shah
HConnection connection = HConnectionManager.createConnection(config); will give 
you the shared HConnection. 


Do not close the connection object until all your threads are done using it. In 
your use case you should not close it when you close the table since other 
threads may be using it or may need to use it in the future


Regards,
Dhaval



From: Serega Sheypak serega.shey...@gmail.com
To: user@hbase.apache.org 
Sent: Monday, 4 August 2014 1:44 PM
Subject: Connection pool Concurrency in HBase


Hi, I'm trying to understand how does connection pooling works in HBase.
I've seen that
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html
is recommended to use.

I have a servlet, it's instance shaed among many threads.
What is a good way to use connection pooling in this case?

Is this

HConnection connection = HConnectionManager.createConnection(config);
HTableInterface table = connection.getTable(table1);
try {
   // Use the table as needed, for a single operation and a single thread
} finally {
   table.close();
   connection.close();
}


1. enough to reuse connection and they wouldn't be opened each time?
2. why do I have to close ALL: table and connection? It's done by design? 


Re: hbase cluster working bad

2014-07-22 Thread Dhaval Shah
We just solved a very similar issue with our cluster (yesterday!). I would 
suggest you look at 2 things in particular:
- Is the network on your region server saturated? That would prevent 
connections from being made
- See if the region server has any RPC handlers available when you get this 
error. Its possible that all RPC handlers are busy servicing other requests (or 
stuck due to a combination of load and bad configs).
 

Regards,
Dhaval



 From: Павел Мезенцев pa...@mezentsev.org
To: user@hbase.apache.org 
Sent: Tuesday, 22 July 2014 7:46 AM
Subject: Re: hbase cluster working bad
 

Jobs, running on this cluster, print exceptions:

java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020 failed on socket
timeout exception: java.net.SocketTimeoutException: 6 millis timeout
while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.218.64.14:38621 remote=
ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020]

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1569)
    at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1421)
    at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
    at org.apache.hadoop.hbase.client.HTable.get(HTable.java:708)
    at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:367)
    at ru.tcsbank.hbase.HBasePersonDao.getUsersBatch(HBasePersonDao.java:306)
    at 
ru.tcsbank.matching.PersonMatcher.performSolrRequest(PersonMatcher.java:153)
    at ru.tcsbank.matching.PersonMatcher.search(PersonMatcher.java:135)
    at 
ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:80)
    at 
ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:65)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)





С уважением,
Мезенцев Павел





2014-07-22 14:59 GMT+04:00 Павел Мезенцев pa...@mezentsev.org:

 Hello all!

 We have a trouble with hbase
 Our hadoop cluster has 4 nodes (plus 1 client node).
 There are CHD 4.6 + CM 4.7 hadoop installed
 Hadoop versions are:
  - hadoop-hdfs : 2.0.0+1475
  - hadoop-0.20-mapreduce : 2.0.0+1475
  - hbase : 0.94.6+132
 Hadoop and hBase configs are in attachment

 We have several tables in hbase with total volume of 2 Tb.
 We run mapReduce ETL jobs and analytics queries over them.

 There are a lot of warnings like
 - *The health test result for REGION_SERVER_READ_LATENCY has become bad:
 The moving average of HDFS read latency is 162 millisecond(s) over the
 previous 5 minute(s). Critical threshold: 100*.
 - *The health test result for REGION_SERVER_SYNC_LATENCY has become bad:
 The moving average of HDFS sync latency is 8.2 second(s) over the previous
 5 minute(s). Critical threshold: 5,000*.
 *- HBase region health: 442 unhealthy regions *
 *- HDFS_DATA_NODES_HEALTHY has become bad*
 *- HBase Region Health Canary is running slowly **on the cluster*

 mapReduce jobs over hBase with random queries to hBase working very slowly
 (job is completed on 20% after 18 hours versus 100% after 12 hours on
 analogue cluster)

 Please help use to solve reasons of this alerts and speed up the cluster.
 Could you give us a good advise, what shall we do?

 Cheers,
 Mezentsev Pavel



Re: multiple region servers at one machine

2014-07-16 Thread Dhaval Shah
Its certainly possible (atleast with command line) but probably very messy. You 
will need to have different ports, different log files, different pid files, 
possibly even different configs on the same machine.
 

Regards,
Dhaval



 From: Jane Tao jiao@oracle.com
To: user@hbase.apache.org 
Sent: Wednesday, 16 July 2014 6:06 PM
Subject: multiple region servers at one machine
 

Hi there,

Is it possible to run multiple region servers at one machine/node? If 
this is possible, how to start multiple region servers with command 
lines or cloudera manager?

Thanks,
Jane


-- 

Re: HBase cluster design

2014-05-26 Thread Dhaval Shah
A few things pop out to me on cursory glance:
- You are using CMSIncrementalMode which after a long chain of events has a 
tendency to result in the famous Juliet pause of death. Can you try Par New GC 
instead and see if that helps?
- You should try to reduce the CMSInitiatingOccupancyFraction to avoid a full GC
- Your hbase-env.sh is not setting the Xmx at all. Do you know how much RAM you 
are giving to your region servers? It may be too small or too large given your 
use case and machines size
- Your client scanner caching is 1 which may be too small depending on your row 
sizes. You can also override that setting in your scan for the MR job
- You only have 2 zookeeper instances which is not at all recommended. 
Zookeeper needs a quorum to operate and generally works best with an odd number 
of zookeeper servers. This probably isn't related to your crashes but it would 
help stability if you had 1 or 3 zookeepers
- I am not 100% sure if the version of hbase you are using has mslab enabled. 
If not you should enable it.
- You can try increasing/decreasing the amount of RAM you provide to block 
caches and memstores to suit your use case. I see that you are using the 
defaults here

On top of these, when you kick off your MR job to scan HBase you should 
setCacheBlocks to false



Regards,
Dhaval
 


 From: Flavio Pompermaier pomperma...@okkam.it
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in 
Sent: Friday, 23 May 2014 3:16 AM
Subject: Re: HBase cluster design
  


The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each server
Attached my hbase config files. 

Thanks,
Flavio





On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah prince_mithi...@yahoo.co.in 
wrote:

Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your 
cluster?


 

Regards,
Dhaval



 From: Flavio Pompermaier pomperma...@okkam.it
To: user@hbase.apache.org
Sent: Saturday, 17 May 2014 2:49 AM
Subject: Re: HBase cluster design



Could you tell me please in detail the parameters you'd like to see so i
can look for them and learn the important ones?i'm using cloudera, cdh4 in
one cluster and cdh5 in the other.

Best,
Flavio

On May 17, 2014 2:48 AM, prince_mithi...@yahoo.co.in 
prince_mithi...@yahoo.co.in wrote:

 Can you describe your setup in more detail? Specifically the amount of
 heap hbase region servers have and your GC settings. Is your server
 swapping when your MR obs are running? Also do your regions go down or your
 region servers?

 We run many MR jobs simultaneously on our hbase tables (size is in TBs)
 along with serving real time requests at the same time. So I can vouch for
 the fact that a well tuned hbase cluster definitely supports this use case
 (well-tuned is the key word here)

 Sent from Yahoo Mail on Android

 

Re: HBase cluster design

2014-05-22 Thread Dhaval Shah
Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your 
cluster?


 

Regards,
Dhaval



 From: Flavio Pompermaier pomperma...@okkam.it
To: user@hbase.apache.org 
Sent: Saturday, 17 May 2014 2:49 AM
Subject: Re: HBase cluster design
 

Could you tell me please in detail the parameters you'd like to see so i
can look for them and learn the important ones?i'm using cloudera, cdh4 in
one cluster and cdh5 in the other.

Best,
Flavio

On May 17, 2014 2:48 AM, prince_mithi...@yahoo.co.in 
prince_mithi...@yahoo.co.in wrote:

 Can you describe your setup in more detail? Specifically the amount of
 heap hbase region servers have and your GC settings. Is your server
 swapping when your MR obs are running? Also do your regions go down or your
 region servers?

 We run many MR jobs simultaneously on our hbase tables (size is in TBs)
 along with serving real time requests at the same time. So I can vouch for
 the fact that a well tuned hbase cluster definitely supports this use case
 (well-tuned is the key word here)

 Sent from Yahoo Mail on Android



Re: SCDynamicStore

2014-03-19 Thread Dhaval Shah
I don't think its an error. Its an annoying warning message but does not affect 
functionality
 
Regards, 
Dhaval



 From: Fabrice fchap...@ip-worldcom.ch
To: user@hbase.apache.org user@hbase.apache.org 
Sent: Wednesday, 19 March 2014 10:51 AM
Subject: SCDynamicStore
 

Good afternoon,

I'm new with hbase. I have installed the product and start hbase deamon, 
however when I create a table, the system send me this error message:

Unable to load realm info from SCDynamicStore

Somebody knows what this message means and how can could it be solved?

Thanks for your help

Fabrice Chapuis

Re: Need some information over WAL

2014-02-25 Thread Dhaval Shah
Inline
 
Regards,
Dhaval



 From: Upendra Yadav upendra1...@gmail.com
To: user@hbase.apache.org 
Sent: Tuesday, 25 February 2014 1:00 PM
Subject: Need some information over WAL
 

I have also doubt over WAL(write ahead log).

In hdfs we can write a new file or we can append to old file.

Is that correct :
1. WAL only logs operations not its data... just like disk journaling(ext4)
- No WAL is a log of all new data, not just the operations

2. In case of Region Server failure... WAL replay will depends on Client to
get each operation's data that yet not committed/flushed to hdfs.
- No client will not generally know of a region server failure. It gets the 
data from the WAL and replays it. The client may not even exist when a region 
server crashes

3. Hbase uses append operation in hdfs to store/log WAL.
- Yes

Thanks...

Re: Need some information over WAL

2014-02-25 Thread Dhaval Shah
yes WAL is a single and common file for all regions on a region server. 
Yes to 1.
Re 2: HBase will roll over WAL files and eventually delete them when they are 
no longer needed. 
 
Regards, 
Dhaval



 From: Upendra Yadav upendra1...@gmail.com
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in 
Sent: Tuesday, 25 February 2014 1:18 PM
Subject: Re: Need some information over WAL
 

Thanks Dhawal

Oh... whatever i assumed with reading documents (partially) that was
wrong...

With ur answer... i have another questions...
WAL is a single and common file for all region of a Region server.

Is this correct:
1. So all operation(including data) will go to WAL... operation by
operation

2. Then on HDFS, HBase will have to perform delete operation on some set of
WAL files always...




On Tue, Feb 25, 2014 at 11:35 PM, Dhaval Shah
prince_mithi...@yahoo.co.inwrote:

 Inline

 Regards,
 Dhaval


 
  From: Upendra Yadav upendra1...@gmail.com
 To: user@hbase.apache.org
 Sent: Tuesday, 25 February 2014 1:00 PM
 Subject: Need some information over WAL


 I have also doubt over WAL(write ahead log).

 In hdfs we can write a new file or we can append to old file.

 Is that correct :
 1. WAL only logs operations not its data... just like disk journaling(ext4)
 - No WAL is a log of all new data, not just the operations

 2. In case of Region Server failure... WAL replay will depends on Client to
 get each operation's data that yet not committed/flushed to hdfs.
 - No client will not generally know of a region server failure. It gets
 the data from the WAL and replays it. The client may not even exist when a
 region server crashes

 3. Hbase uses append operation in hdfs to store/log WAL.
 - Yes

 Thanks...

Re: RegionServer unable to connect to master

2014-01-29 Thread Dhaval Shah
Do you have a firewall between the master and the slaves?

Regards, 



Dhaval



From: Fernando Iwamoto - Plannej fernando.iwam...@plannej.com.br
To: user@hbase.apache.org 
Sent: Wednesday, 29 January 2014 3:11 PM
Subject: Re: RegionServer unable to connect to master


Iam new to HBASE too, but I had same problem long time ago and I dont
remember how i fixed, I will keep troubleshooting you...
How about zookeeper? have you uncommented the HBASE_MANAGE_ZK(something
like that) in hbase-env.sh and set to TRUE?



2014-01-29 Guang Gao birdeey...@gmail.com

 You mean the SSH key? Yes, any two nodes can ssh each other without
 password.

 On Wed, Jan 29, 2014 at 2:10 PM, Fernando Iwamoto - Plannej
 fernando.iwam...@plannej.com.br wrote:
  Did you tried to pass the key to the machines?
 
 
  2014-01-29 birdeeyore birdeey...@gmail.com
 
  Thanks for your reply. Here's some additional info. Thanks.
 
  $ cat hbase-site.xml
  configuration
    property
      namehbase.cluster.distributed/name
      valuetrue/value
    /property
    property
      namehbase.rootdir/name
      valuehdfs://obelix8.local:9001/hbase/value
    /property
    property
      namehbase.zookeeper.quorum/name
      valueobelix105.local,obelix106.local,obelix107.local/value
    /property
    property
      namehbase.zookeeper.property.clientPort/name
      value2183/value
    /property
    property
      namehbase.zookeeper.peerport/name
      value2890/value
    /property
    property
      namehbase.zookeeper.leaderport/name
      value3890/value
    /property
    property
      namehbase.zookeeper.property.dataDir/name
      value/ssd/hbase/hbase-0.94.16/zookeeper/value
    /property
    property
      namehbase.master/name
      valueobelix8.local:6/value
    /property
    property
      namehbase.master.info.port/name
      value50070/value
    /property
    property
      namehbase.client.scanner.caching/name
      value200/value
    /property
  /configuration
 
  ==
 
  $ cat regionservers
  obelix105.local
  obelix106.local
  obelix107.local
  obelix108.local
  obelix109.local
  obelix110.local
  obelix111.local
  obelix112.local
  obelix113.local
  obelix114.local
 
  =
  On my master node:
 
  $ cat /etc/hosts
  127.0.0.1       localhost
  192.168.245.8      obelix8.local xx.yy.net      obelix8
 
  # The following lines are desirable for IPv6 capable hosts
  ::1     ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
  192.168.245.1   obelix.local
 
  ===
 
  On one of my slave nodes:
 
  $ cat /etc/hosts
  127.0.0.1       localhost
  127.0.1.1       obelix105.local xx.yy.net    obelix105
 
  # The following lines are desirable for IPv6 capable hosts
  ::1     ip6-localhost ip6-loopback
  fe00::0 ip6-localnet
  ff00::0 ip6-mcastprefix
  ff02::1 ip6-allnodes
  ff02::2 ip6-allrouters
  192.168.245.1   obelix.local
 
  ==
 
  The error of HBase 0.94.16+Hadoop 1.2.1:
 
  2014-01-29 12:58:30,922 INFO
  org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
  to Master server at obelix8.local,6,1391018303918
  2014-01-29 12:58:40,960 WARN
  org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect
  to master. Retrying. Error was:
  java.net.SocketException: Invalid argument
          at sun.nio.ch.Net.connect(Native Method)
          at
 sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:532)
          at
 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
          at
 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:392)
          at
 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:438)
          at
 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1141)
          at
  org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:988)
          at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
          at $Proxy9.getProtocolVersion(Unknown Source)
          at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:141)
          at
  org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
          at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2043)
          at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2089)
          at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:747)
          at java.lang.Thread.run(Thread.java:662)
 
  Best,
 
  Boduo
 
 
  On Wed, Jan 29, 2014 at 8:21 AM, Jean-Marc Spaggiari
  jean-m...@spaggiari.org wrote:
   Hi,
  
   

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Dhaval Shah
Versions in HBase are timestamps by default. If you intend to continue using 
the timestamps, what will happen when someone writes value_1 and value_2 at the 
exact same time?
 
Regards,

Dhaval


- Original Message -
From: Sagar Naik sn...@splunk.com
To: user@hbase.apache.org user@hbase.apache.org
Cc: 
Sent: Friday, 24 January 2014 12:27 PM
Subject: HBase Design : Column name v/s Version

Hi,

I have a choice to maintain to data either in column values or as
versioned data.
This data is not a versioned copy per se.

The access pattern on this get all the data every time

So the schema choices are :
Schema 1:
1. column_name/qualifier = data_1. column_value = value_1
1.a. column_name/qualifier = data_2. column_value = value_2,value_2.a

1.b. column_name/qualifier = data_3. column_value = value_3

To get all the values for data, I will have to use ColumnPrefixFilter
with prefix set data

Schema 2:
2. column_name/qualifier = data. version= 1, column_value = value_1

2.a. column_name/qualifier = data. version= 2, column_value =
value_2,value_2.a

2.b. column_name/qualifier = data. version= 3, column_value = value_3
To get all the values for data , I will do a simple get operation to get
all the versions.

Number of versions can go from: 10 to 100K

Get operation perf should beat the Filter perf.
Comparing 100K values will be costly as the # versions increase.

I would like to know if there are drawbacks in going the version route.




-Sagar



Re: HBase Design : Column name v/s Version

2014-01-24 Thread Dhaval Shah
Theoretically that could work. However, it does seem like a weird way of doing 
what you want to do and you might run into unforeseen issues. One issue I see 
is that 100k versions sounds a bit scary. You can paginate through columns but 
not through versions on the same column for example.
 
Regards,

Dhaval


- Original Message -
From: Sagar Naik sn...@splunk.com
To: user@hbase.apache.org user@hbase.apache.org; Dhaval Shah 
prince_mithi...@yahoo.co.in
Cc: 
Sent: Friday, 24 January 2014 1:46 PM
Subject: Re: HBase Design : Column name v/s Version

Thanks for clarifying,

I will be using custom version numbers (auto incrementing on the client
side) and not timestamps.
Two clients do not update the same row


-Sagar


On 1/24/14 10:33 AM, Dhaval Shah prince_mithi...@yahoo.co.in wrote:

I am talking about schema 2. Schema 1 would definitely work. Schema 2 can
have the version collisions if you decide to use timestamps as versions
 
Regards,

Dhaval


- Original Message -
From: Sagar Naik sn...@splunk.com
To: user@hbase.apache.org user@hbase.apache.org; Dhaval Shah
prince_mithi...@yahoo.co.in
Cc: 
Sent: Friday, 24 January 2014 1:07 PM
Subject: Re: HBase Design : Column name v/s Version

I am not sure I understand you correctly.
I assume you are talking abt schema 1.
In this case I m appending the version number to the column name.

The column_names are different (data_1/data_2) for value_1 and value_2
respectively.


-Sagar


On 1/24/14 9:47 AM, Dhaval Shah prince_mithi...@yahoo.co.in wrote:

Versions in HBase are timestamps by default. If you intend to continue
using the timestamps, what will happen when someone writes value_1 and
value_2 at the exact same time?
 
Regards,

Dhaval


- Original Message -
From: Sagar Naik sn...@splunk.com
To: user@hbase.apache.org user@hbase.apache.org
Cc: 
Sent: Friday, 24 January 2014 12:27 PM
Subject: HBase Design : Column name v/s Version

Hi,

I have a choice to maintain to data either in column values or as
versioned data.
This data is not a versioned copy per se.

The access pattern on this get all the data every time

So the schema choices are :
Schema 1:
1. column_name/qualifier = data_1. column_value = value_1
1.a. column_name/qualifier = data_2. column_value = value_2,value_2.a

1.b. column_name/qualifier = data_3. column_value = value_3

To get all the values for data, I will have to use ColumnPrefixFilter
with prefix set data

Schema 2:
2. column_name/qualifier = data. version= 1, column_value = value_1

2.a. column_name/qualifier = data. version= 2, column_value =
value_2,value_2.a

2.b. column_name/qualifier = data. version= 3, column_value = value_3
To get all the values for data , I will do a simple get operation to
get
all the versions.

Number of versions can go from: 10 to 100K

Get operation perf should beat the Filter perf.
Comparing 100K values will be costly as the # versions increase.

I would like to know if there are drawbacks in going the version route.




-Sagar





Re: Easiest way to get a random sample of keys

2014-01-24 Thread Dhaval Shah
HBase shell is a JRuby shell and wraps all Java classes in a ruby interface. 
You can actually use a RandomRowFilter with a 5% configuration to achieve what 
you need.

Regards,

Dhaval



From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) skada...@bloomberg.net
To: user@hbase.apache.org 
Sent: Friday, 24 January 2014 6:15 PM
Subject: Easiest way to get a random sample of keys


Something like count 't1', {INTERVAL=20} should give me every 20th row in 
table 't1'. Is there an easy way to get a random sample via. the shell using 
filters? 


Re: Rebuild HBASE Table to reduce Regions per RS

2014-01-14 Thread Dhaval Shah
If you can afford downtime for your table, there are ways to do it. You can:
- Merge regions (requires table to be disabled atleast in some older versions 
and probably in newer ones too)
- Go brute force by doing an export, truncate, import (this is a little more 
manageable when you have a large number of regions). This however is way more 
resource intensive and will take longer.

If you can't afford downtime, I would suggest create another table which 
mirrors this one and then switch to the new one

Regards,

Dhaval



From: Upender Nimbekar nimbekar.upen...@gmail.com
To: d...@hbase.apache.org; user@hbase.apache.org 
Sent: Tuesday, 14 January 2014 10:21 AM
Subject: Rebuild HBASE Table to reduce Regions per RS


Hi,
Does anyone have any experience rebuidling the HBASE table to reduce the
number of regions. I am currently dealing with a situation where the no. of
regions per RS have gone up quite significantly (500 per RS) and thereby
causing some performance issues. This is how I am thinking of bringing it
down:

increase the hbase.hregion.max.filesize from 500 MB to 2 GB

And the rebuild the HBASE table. I am assuming if after table rebuild, I
should see the no. of regions come down to more than half. I would
basically like to stay within HBASE suggested no. of regions per RS which
is about 50-200.

Please suggest if someone has any experience doing it.

Thanks
Upen 


Re: Large Puts in MR job

2014-01-13 Thread Dhaval Shah
If you are creating 1 big put object, how would auto flush help you? In theory 
you would run out of memory before you do a table.put() anyways. Am I missing 
something?

Why don't you split your put into smaller puts and let the deferred flush do 
its job? Do you need all the kv's to be flushed at the same time?

Technically you can create your own hbase client in setup() but I don't know if 
that's going to solve your issue

Sent from Yahoo Mail on Android



Re: Schema Design Newbie Question

2013-12-23 Thread Dhaval Shah
A 1000 CFs with HBase does not sound like a good idea. 

category + timestamp sounds like the better of the 2 options you have thought 
of. 

Can you tell us a little more about your data? 
 
Regards,

Dhaval



 From: Kamal Bahadur mailtoka...@gmail.com
To: user@hbase.apache.org 
Sent: Monday, 23 December 2013 6:01 PM
Subject: Schema Design Newbie Question
 

Hello,

I am just starting to use HBase and I am coming from Cassandra world.Here
is a quick background regarding my data:

My system will be storing data that belongs to a certain category.
Currently I have around 1000 categories.  Also note that some categories
produce lot more data than others. To be precise, 10% of the categories
provide more than 65% of the total data in the system.

Data access queries always contains this category in the query. I have
listed 2 options to design the schema:

1. Add category as first component of the row key [category + timestamp] so
that my data is sorted based on category for fast retrieval.
2. Add category as column family so that I can just use timestamp as
rowkey. This option will however create more hfiles since I have more
categories.

I am leaning towards option2. I like the idea that HBase separates data for
each CF into its own HFiles. However I still worried about the number of
hfiles that will be created on the server. Will it cause any other side
effects? I would like to hear from the user community as to which option
will be the best option in my case.

Kamal

Re: One Region Server fails - all M/R jobs crash.

2013-11-25 Thread Dhaval Shah
Hmm ok. You are right. The principle of Hadoop/HBase is do do big data on 
commodity hardware but that's not to say you can do it without enough hardware. 
Get 8 commodity disks and see your performance/throughput numbers improve 
suibstantially. Before jumping into buying anything though, I would suggest you 
look at hardware utilization when the problem happens. That would tell you what 
your most pressing need is
 
Regards, 
Dhaval



 From: David Koch ogd...@googlemail.com
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in 
Sent: Monday, 25 November 2013 3:36 AM
Subject: Re: One Region Server fails - all M/R jobs crash.
 

Hi Dhaval,

Yes, rows can get very big, that's why we filter them. The filter lets KVs
pass as long as the KV count is  MAX_LIMIT and skips the row entirely once
the count exceeds this limit. KV size is about constant. Alternatively, we
could use batching, you are right.

Also, with regard to the Java version used. Cloudera 4 installs its own JVM
which happens to be Java 7 so it's not a choice we made.

I always thought the principle of Hadoop/HBase was to do big data on
commodity hardware. You suggest we get 1 disk per CPU? I am by no means an
expert in setting up this kind of system.

Thanks again for your response,

/David



On Fri, Nov 22, 2013 at 9:06 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:

 How big can your rows get? If you have a million columns on a row, you
 might run your region server out of memory. Can you try setBatch to a
 smaller number and test if that works?

 10k regions is too many Can you try and increase your max file size and
 see if that helps.

 8 cores / 1 disk is a bad combination. Can you look at disk IO during the
 time of crash and see if you find anything there.

 You might also be swapping. Can you look at your GC logs?

 You are running dangerously close to the fence with the kind of hardware
 you have.

 Regards,
 Dhaval


 
  From: David Koch ogd...@googlemail.com
 To: user@hbase.apache.org
 Sent: Friday, 22 November 2013 2:43 PM
 Subject: Re: One Region Server fails - all M/R jobs crash.


 Hello,

 Thank you for your replies.

 Not that it matters but, cache is 1, batch is -1 on the scan i.e each RPC
 call returns one row. The jobs don't write any data back to HBase,
 compaction is de-activated and done manually. At the time of the crash all
 datanodes were fine, hbchk showed no inconsistencies. Table size is about
 10k regions/3 billion records on the largest tables and we do a lot of
 server side filtering to limit what's sent across the network.

 Our machines may not be the most powerful, 32GB RAM, 8 cores, 1 disk. It's
 also true that when we took a closer look in the past it turned out that
 most of the issues we had were somehow rooted in the fact that CPUs were
 overloaded, not enough memory available - hardware stuff.

 What I don't get is why HBase always crashes. I mean if it's slow ok - the
 hardware is a bottleneck but at least you'd expect it to pull through
 eventually. Some days all jobs work fine, some days they don't and there is
 no telling why. HBase's erratic behavior has been causing us a lot of
 headache and we have been spending way too much time fiddling with HBase
 configuration settings over the past 18 months.

 /David



 On Fri, Nov 22, 2013 at 7:05 PM, Ted Yu yuzhih...@gmail.com wrote:

  Thanks Dhaval for the analysis.
 
  bq. The HBase version is 0.94.6
 
  David:
  Please upgrade to 0.94.13, if possible. There have been several JIRAs
  backporting patches from trunk where jdk 1.7 is supported.
 
  Please also check your DataNode log to see whether there was problem
 there
  (likely there was).
 
  Cheers
 
 
  On Sat, Nov 23, 2013 at 2:00 AM, Dhaval Shah 
 prince_mithi...@yahoo.co.in
  wrote:
 
   You logs suggest that you are overloading resources
   (servers/network/memory). How much data are you scanning with your MR
  job,
   how much are you writing back to HBase? What values are you setting for
   setBatch, setCaching, setCacheBlocks? How much memory do you have on
 your
   region servers? 1 server crashing should not cause a job to fail
 because
  it
   will move on to the next one (given the right parmas for retries and
  retry
   interval are set). Your region server logs suggest that its way more
   complicated than that.
  
   2013-11-17 09:58:37,513 WARN
   org.apache.hadoop.hbase.regionserver.HRegionServer: Received close for
   region we are already opening or closing;
  e54b8e16ffbe2187b9017fef596c62aa
  
   looks like some state inconsistency issue
  
   I also see that you are using Java 7. Though some people have had
 success
   using it, I am not sure if Java 7 is currently the recommended version
   (most people use Java 6!)
  
   2013-11-18 18:01:47,959 INFO org.apache.zookeeper.ClientCnxn: Unable to
   read additional data from server sessionid 0x342654dfdd30017, likely
  server
   has closed socket

Re: RegionServer crash without any errors (compaction?)

2013-11-07 Thread Dhaval Shah
Did you look at your GC logs? Probably the compaction process is running your 
region server out of memory. Can you provide more details on your setup? Max 
heap size? Max Region HFile size?
 
Regards,
Dhaval



 From: John johnnyenglish...@gmail.com
To: user@hbase.apache.org 
Sent: Thursday, 7 November 2013 10:51 AM
Subject: RegionServer crash without any errors (compaction?)
 

Hi,

I have a cluster with 7 regionserver. Some of them are crashing from time
to time wihtout any error message in the hbase log. If I take a look at the
log at the time I found this:

2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
Starting compaction of 2 file(s) in 1 of P_SO,
http://xmlns.com/foaf/0.1/homepage,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
i$
2013-11-07 15:29:10,471 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
type for hdfs://
pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
HBase 0.94.6-cdh4.4.0
 restart

At this time 2 of the 7 RS crashed, both has this compaction message before
they crashed. I don't know exactly what compaction is, but it seems that
this compaction has to do with the crash. What can I do to avoid this
restart/crash?

best regards

Re: RegionServer crash without any errors (compaction?)

2013-11-07 Thread Dhaval Shah
Operation too slow is generally in the .log file while the GC logs (if you 
enabled GC logging) is in the .out file. You have a very small heap for a 1GB 
HFIle size. You are probably running your region server out of memory. Try 
increasing the heap size and see if that helps
 
Regards,
Dhaval



 From: John johnnyenglish...@gmail.com
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in 
Sent: Thursday, 7 November 2013 11:09 AM
Subject: Re: RegionServer crash without any errors (compaction?)
 


there are no really other logs before. There are a operationTooSlow message 
before, but that log is ~50 mins bofre the other: http://pastebin.com/EAAubqGB




2013/11/7 John johnnyenglish...@gmail.com

Hi,

thanks for your fast answer. If I take a look at the cloudera manager at this 
time the %-time of using the GC increase at this time, so I think you are 
right. The max heap size is 1GB for this node. The hbase.hregion.max.filesize 
is also 1GB. 

regards




2013/11/7 Dhaval Shah prince_mithi...@yahoo.co.in

Did you look at your GC logs? Probably the compaction process is running your 
region server out of memory. Can you provide more details on your setup? Max 
heap size? Max Region HFile size?
 
Regards,
Dhaval



 From: John johnnyenglish...@gmail.com
To: user@hbase.apache.org
Sent: Thursday, 7 November 2013 10:51 AM
Subject: RegionServer crash without any errors (compaction?)



Hi,

I have a cluster with 7 regionserver. Some of them are crashing from time
to time wihtout any error message in the hbase log. If I take a look at the
log at the time I found this:

2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
Starting compaction of 2 file(s) in 1 of P_SO,
http://xmlns.com/foaf/0.1/homepage,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
i$
2013-11-07 15:29:10,471 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
type for hdfs://
pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
HBase 0.94.6-cdh4.4.0
 restart

At this time 2 of the 7 RS crashed, both has this compaction message before
they crashed. I don't know exactly what compaction is, but it seems that
this compaction has to do with the crash. What can I do to avoid this
restart/crash?

best regards


Re: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes

2013-11-04 Thread Dhaval Shah
You need to add the Hadoop and HBase libraries to the Hadoop Classpath. You 
successfully added it on the classpath of your mainproject but when it submits 
the job to Hadoop, the classpath is lost. The easiest way is to modify 
hadoop_env.sh. Another way would be to submit the jars for hbase and hadoop 
with your job submission
 
Regards,
Dhaval



 From: fateme Abiri fateme.ab...@yahoo.com
To: user@hbase.apache.org user@hbase.apache.org 
Sent: Monday, 4 November 2013 11:32 AM
Subject: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes
 

hi all
i'm running a mapreduce job in my hbase project.
my hadoop  hbase are remote and i run my code by this command in my terminal:
$ java  -cp  myproject.jar:/user/HadoopAndHBaseLibrary/*  mainproject

but i get this error:
attempt_201207261322_0002_m_00_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes

my other project which dont use import org.apache.hadoop.hbase.util.Bytes
has been successfully run!!! 
but when i
use this class in my mapreduce Job  i got this error...

what could i do? can any one help me?
my hbase  version is : 0.94.11 and the jar files of it and Hadoop are completly 
in HadoopAndHBaseLibrary

--tanx 

Re: RE: Add Columnsize Filter for Scan Operation

2013-10-26 Thread Dhaval Shah
Mapper.cleanup is always called after all map calls are over

Sent from Yahoo Mail on Android



Re: RE: Add Columnsize Filter for Scan Operation

2013-10-25 Thread Dhaval Shah
John, an important point to note here is that even though rows will get split 
over multiple calls to scanner.next(), all batches of 1 row will always reach 1 
mapper. Another important point to note is that these batches will appear in 
consecutive calls to mapper.map()

What this means is that you don't need to send your data to the reducer (and be 
more efficient by not writing to disk, no shuffle/sort phases and so on). You 
can just keep the state in memory for a particular row being processed 
(effectively a running count on the number of columns) and make the final 
decision when the row ends (effectively you encounter a different row or all 
rows are exhausted and you reach the cleanup function).

The way I would do it is a map only MR job which keeps the state in memory as 
described above and uses the KeyOnlyFilter to reduce the amount of data flowing 
to the mapper
 
Regards,
Dhaval



 From: John johnnyenglish...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org 
Sent: Friday, 25 October 2013 8:02 AM
Subject: Re: RE: Add Columnsize Filter for Scan Operation
 

One thing I could do is to drop every batch-row where the column-size is
smaller than the batch size. Something like if(rowsize  batchsize-1) drop
row. The problem with this version is that the last row of a big row is
also droped. Here a little example:
There is one row:
row1: 3500 columns

If I set the batch to 1000. the mapper function got for the first row

1. Iteration: map function got 1000 columns - write to disk for the reducer
2. Iteration map function got 1000 columns - write to disk for the reducer
3. Iteration map function got 1000 columns - write to disk for the reducer
4. Iteration map function got 500 columns - drop, because it's smaller
than the batch size

Is there a way to count the columns over different map-functions?

regards



2013/10/25 John johnnyenglish...@gmail.com

 I try to build a MR-Job, but in my case that doesn't work. Because if I
 set for example the batch to 1000 and there are 5000 columns in row. Now i
 found to generate something for rows where are the column size is bigger
 than 2500. BUT since the map function is executed for every batch-row i
 can't say if the row has a size bigger than 2500.

 any ideas?


 2013/10/25 lars hofhansl la...@apache.org

 We need to finish up HBASE-8369



 
  From: Dhaval Shah prince_mithi...@yahoo.co.in
 To: user@hbase.apache.org user@hbase.apache.org
 Sent: Thursday, October 24, 2013 4:38 PM
 Subject: Re: RE: Add Columnsize Filter for Scan Operation


 Well that depends on your use case ;)

 There are many nuances/code complexities to keep in mind:
 - merging results of various HFiles (each region can have.more than one)
 - merging results of WAL
 - applying delete markers
 - how about data which is only in memory of region servers and no where
 else
 - applying bloom filters for efficiency
 - what about hbase filters?

 At some point you would basically start rewriting an hbase region server
 on you map reduce job which is not ideal for maintainability.

 Do we ever read MySQL data files directly or issue a SQL query? Kind of
 goes back to the same argument ;)

 Sent from Yahoo Mail on Android




Re: RE: Add Columnsize Filter for Scan Operation

2013-10-25 Thread Dhaval Shah
Cool

Sent from Yahoo Mail on Android



Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Dhaval Shah
Jean, if we don't add setBatch to the scan, MR job does cause HBase to crash 
due to OOME. We have run into this in the past as well. Basically the problem 
is - Say I have a region server with 12GB of RAM and a row of size 20GB (an 
extreme example, in practice, HBase runs out of memory way before 20GB). If I 
query the entire row, HBase does not have enough memory to hold/process it for 
the response. 

In practice, if your setCaching  1, then the aggregate of all rows growing too 
big can also cause the same issue. 

I think 1 way we can solve this issue is making the HBase server serve 
responses in a streaming fashion somehow (not exactly sure about the details on 
how this can work but if it has to hold the entire row in memory, its going to 
be bound by HBase heap size)
 
Regards,
Dhaval



 From: Jean-Marc Spaggiari jean-m...@spaggiari.org
To: user user@hbase.apache.org 
Sent: Thursday, 24 October 2013 12:37 PM
Subject: Re: Add Columnsize Filter for Scan Operation
 

If the MR crash because of the number of columns, then we have an issue
that we need to fix ;) Please open a JIRA provide details if you are facing
that.

Thanks,

JM



2013/10/24 John johnnyenglish...@gmail.com

 @Jean-Marc: Sure, I can do that, but thats a little bit complicated because
 the the rows has sometimes Millions of Columns and I have to handle them
 into different batches because otherwise hbase crashs. Maybe I will try it
 later, but first I want to try the API version. It works okay so far, but I
 want to improve it a little bit.

 @Ted: I try to modify it, but I have no idea how exactly do this. I've to
 count the number of columns in that filter (that works obviously with the
 count field). But there is no Method that is caleld after iterating over
 all elements, so I can not return the Drop ReturnCode in the filterKeyValue
 Method because I did'nt know when it was the last one. Any ideas?

 regards


 2013/10/24 Ted Yu yuzhih...@gmail.com

  Please take a look
  at
 src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java :
 
   * Simple filter that returns first N columns on row only.
 
  You can modify the filter to suit your needs.
 
  Cheers
 
 
  On Thu, Oct 24, 2013 at 7:52 AM, John johnnyenglish...@gmail.com
 wrote:
 
   Hi,
  
   I'm write currently a HBase Java programm which iterates over every row
  in
   a table. I have to modiy some rows if the column size (the amount of
   columns in this row) is bigger than 25000.
  
   Here is my sourcode: http://pastebin.com/njqG6ry6
  
   Is there any way to add a Filter to the scan Operation and load only
 rows
   where the size is bigger than 25k?
  
   Currently I check the size at the client, but therefore I have to load
   every row to the client site. It would be better if the wrong rows
  already
   filtered at the server site.
  
   thanks
  
   John
  
 


Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Dhaval Shah
Interesting!! Can't wait to see this in action. I am already imagining huge 
performance gains
 
Regards,
Dhaval



 From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org; Dhaval Shah 
prince_mithi...@yahoo.co.in 
Sent: Thursday, 24 October 2013 1:06 PM
Subject: Re: Add Columnsize Filter for Scan Operation
 

For streaming responses, there is this JIRA:

HBASE-8691 High-Throughput Streaming Scan API



On Thu, Oct 24, 2013 at 9:53 AM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:

 Jean, if we don't add setBatch to the scan, MR job does cause HBase to
 crash due to OOME. We have run into this in the past as well. Basically the
 problem is - Say I have a region server with 12GB of RAM and a row of size
 20GB (an extreme example, in practice, HBase runs out of memory way before
 20GB). If I query the entire row, HBase does not have enough memory to
 hold/process it for the response.

 In practice, if your setCaching  1, then the aggregate of all rows
 growing too big can also cause the same issue.

 I think 1 way we can solve this issue is making the HBase server serve
 responses in a streaming fashion somehow (not exactly sure about the
 details on how this can work but if it has to hold the entire row in
 memory, its going to be bound by HBase heap size)

 Regards,
 Dhaval


 
  From: Jean-Marc Spaggiari jean-m...@spaggiari.org
 To: user user@hbase.apache.org
 Sent: Thursday, 24 October 2013 12:37 PM
 Subject: Re: Add Columnsize Filter for Scan Operation


 If the MR crash because of the number of columns, then we have an issue
 that we need to fix ;) Please open a JIRA provide details if you are facing
 that.

 Thanks,

 JM



 2013/10/24 John johnnyenglish...@gmail.com

  @Jean-Marc: Sure, I can do that, but thats a little bit complicated
 because
  the the rows has sometimes Millions of Columns and I have to handle them
  into different batches because otherwise hbase crashs. Maybe I will try
 it
  later, but first I want to try the API version. It works okay so far,
 but I
  want to improve it a little bit.
 
  @Ted: I try to modify it, but I have no idea how exactly do this. I've to
  count the number of columns in that filter (that works obviously with the
  count field). But there is no Method that is caleld after iterating over
  all elements, so I can not return the Drop ReturnCode in the
 filterKeyValue
  Method because I did'nt know when it was the last one. Any ideas?
 
  regards
 
 
  2013/10/24 Ted Yu yuzhih...@gmail.com
 
   Please take a look
   at
  src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java :
  
    * Simple filter that returns first N columns on row only.
  
   You can modify the filter to suit your needs.
  
   Cheers
  
  
   On Thu, Oct 24, 2013 at 7:52 AM, John johnnyenglish...@gmail.com
  wrote:
  
Hi,
   
I'm write currently a HBase Java programm which iterates over every
 row
   in
a table. I have to modiy some rows if the column size (the amount of
columns in this row) is bigger than 25000.
   
Here is my sourcode: http://pastebin.com/njqG6ry6
   
Is there any way to add a Filter to the scan Operation and load only
  rows
where the size is bigger than 25k?
   
Currently I check the size at the client, but therefore I have to
 load
every row to the client site. It would be better if the wrong rows
   already
filtered at the server site.
   
thanks
   
John
   
  
 


Re: RE: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Dhaval Shah
Well that depends on your use case ;)

There are many nuances/code complexities to keep in mind:
- merging results of various HFiles (each region can have.more than one)
- merging results of WAL
- applying delete markers
- how about data which is only in memory of region servers and no where else
- applying bloom filters for efficiency
- what about hbase filters?

At some point you would basically start rewriting an hbase region server on you 
map reduce job which is not ideal for maintainability. 

Do we ever read MySQL data files directly or issue a SQL query? Kind of goes 
back to the same argument ;)

Sent from Yahoo Mail on Android



Re: How can I export HBase table using start and stop row key

2013-10-22 Thread Dhaval Shah
Hi Karunakar. Unfortunately due to organizational restrictions I am not allowed 
to share my code. However, its a very simple modification. 

Basically look at Export.java within the hbase mapreduce package. Look for the 
function getConfiguredScanForJob (might be named differently based on your 
version) and add the required hbase Filter to your scan or you can also add a 
start row/stop row to your scan. Should not be more than 3 lines of code to do 
what you need
 
Regards,
Dhaval



 From: karunakar lkarunaka...@gmail.com
To: user@hbase.apache.org 
Sent: Monday, 21 October 2013 7:36 PM
Subject: Re: How can I export HBase table using start and stop row key
 

Hi Dhaval,

Can you please share your code if possible ? it would benefit others as
well.

Thanks,
karunakar.



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/How-can-I-export-HBase-table-using-start-and-stop-row-key-tp4051961p4051972.html

Sent from the HBase User mailing list archive at Nabble.com.

Re: How can I export HBase table using start and stop row key

2013-10-21 Thread Dhaval Shah
The version you are using only support PrefixFilter and RegexFilter for scans. 
Unless your start and stop row have the same prefix (or you can somehow get it 
into a regex), you won't be able to do it as is. You can always write your own 
export (we did that to support some more functionality like batching, etc and 
its very easy to do)
 
Regards,
Dhaval



 From: karunakar lkarunaka...@gmail.com
To: user@hbase.apache.org 
Sent: Monday, 21 October 2013 5:40 PM
Subject: How can I export HBase table using start and stop row key
 

Hi,

I would like to fetch data from hbase table using map reduce export API. I
see that I can fetch data using start and stop time, but I don't see any
information regarding start and stop row key. Can any expert guide me or
give me an example in order fetch first 1000 rows (or start and stop row
key) using export API ?

Hadoop 2.0.0-cdh4.1.2 HBase 0.92.1-cdh4.1.2

Please let me know if you need more information.

Thank you.



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/How-can-I-export-HBase-table-using-start-and-stop-row-key-tp4051960.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Multi-master info missing from book?

2013-09-25 Thread Dhaval Shah
Yes. Just start HMaster on 2 different servers and they will fight it out

Regards,
Dhaval



From: Otis Gospodnetic otis.gospodne...@gmail.com
To: user@hbase.apache.org 
Sent: Wednesday, 25 September 2013 1:53 PM
Subject: Re: Multi-master info missing from book?


Thanks Ted.  That's good, but shouldn't there be some info about how
to run multiple masters?  Is it as simple as starting hmaster on 2
different servers and have them fight it out?

Otis
--
HBASE Performance Monitoring
http://sematext.com/spm/hbase-performance-monitoring/



On Wed, Sep 25, 2013 at 1:11 PM, Ted Yu yuzhih...@gmail.com wrote:
 I found:

 2.5.1.2. If a backup Master, making primary Master fail fast

 Cheers


 On Wed, Sep 25, 2013 at 10:08 AM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Hi,

 I was looking for info about running multiple HBase masters on
 http://hbase.apache.org/book.html and wasn't able to find any
 references to it.  I think I spotted one mention of active in the
 context of master, but nothing else.

 Either I'm not seeing it there, or I'm not looking at the right place,
 or the info about this is lacking?

 Thanks,
 Otis
 --
 HBASE Performance Monitoring
 http://sematext.com/spm/hbase-performance-monitoring/
 


Re: HBase Region Server crash if column size become to big

2013-09-11 Thread Dhaval Shah
John can you check the .out file as well. We used to have a similar issue and 
turned out that query for such a large row ran the region server out of memory 
causing the crash and oome does not show up in the .log files but rather in the 
.out files.

In such a situation setBatch for scans or column pagination filter for gets can 
help your case

Sent from Yahoo! Mail on Android



Re: HBase Region Server crash if column size become to big

2013-09-11 Thread Dhaval Shah
John oome is out of memory error. Your log file structure is a bit different 
than ours. We see the kind of messages you get in .log files and GC/JVM related 
logs in .out files but everything is in /var/log/hbase.

Sent from Yahoo! Mail on Android



Re: HBase Region Server crash if column size become to big

2013-09-11 Thread Dhaval Shah
@Mike rows can't span multiple regions but it does not cause crashes. It simply 
won't allow the region to split and continue to function like a huge region. We 
had a similar situation long back (when we were on 256mb region sizes) and it 
worked (just didn't split the region).

Sent from Yahoo! Mail on Android



Re: HBase Region Server crash if column size become to big

2013-09-11 Thread Dhaval Shah
@Mike rows can't span multiple regions but it does not cause crashes. It simply 
won't allow the region to split and continue to function like a huge region. We 
had a similar situation long back (when we were on 256mb region sizes) and it 
worked (just didn't split the region).

Sent from Yahoo! Mail on Android



Re: How is pig so much faster than my java MR job?

2013-09-02 Thread Dhaval Shah
Java MR code is not optimized/efficiently written while Pig is highly 
optimized? Can you give us more details on what exactly you are trying to do 
and how your Java MR code is written, how many MR jobs for Java vs Pig and so on

Sent from Yahoo! Mail on Android



Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Dhaval Shah
Couple of things:
- Can you check the resources on the region server for which you get the lease 
exception? It seems like the server is heavily thrashed
- What are your values for scan.setCaching and scan.setBatch? 



The lease does not exist exception generally happens when the client goes back 
to the region server after the lease expires (in your case 90). If you 
setCaching is really high for example, the client gets enough data in one call 
to scanner.next and keeps processing it for  90 ms and when it eventually 
goes back to the region server, the lease on the region server has already 
expired. Setting your setCaching value lower might help in this case

Regards,
Dhaval



From: Ameya Kanitkar am...@groupon.com
To: user@hbase.apache.org 
Sent: Wednesday, 28 August 2013 11:00 AM
Subject: Lease Exception Errors When Running Heavy Map Reduce Job


HI All,

We have a very heavy map reduce job that goes over entire table with over
1TB+ data in HBase and exports all data (Similar to Export job but with
some additional custom code built in) to HDFS.

However this job is not very stable, and often times we get following error
and job fails:

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4456594242606811626' does not exist
    at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
    at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.


Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

We have changed following settings in HBase to counter this problem
but issue persists:

property
!-- Loaded from hbase-site.xml --
namehbase.regionserver.lease.period/name
value90/value
/property

property
!-- Loaded from hbase-site.xml --
namehbase.rpc.timeout/name
value90/value
/property


We also reduced number of mappers per RS less than available CPU's on the box.

We also observed that problem once happens, happens multiple times on
the same RS. All other regions are unaffected. But different RS
observes this problem on different days. There is no particular region
causing this either.

We are running: 0.94.2 with cdh4.2.0

Any ideas?


Ameya 


Re: Will hbase automatically distribute the data across region servers or NOT..??

2013-08-23 Thread Dhaval Shah
Vamshi, max value for hbase.hregion.max.filesize to 10MB seems too small. Did 
you mean 10GB?


Regards,
Dhaval



From: Vamshi Krishna vamshi2...@gmail.com
To: user@hbase.apache.org; zhoushuaifeng zhoushuaif...@gmail.com 
Sent: Friday, 23 August 2013 9:38 AM
Subject: Re: Will hbase automatically distribute the data across region servers 
or NOT..??


Thanks for the clarifications.
I am using hbase-0.94.10 and zookeepr-3.4.5
But I am running into different issues .
I set  hbase.hregion.max.filesize to 10Mb and i am inserting 10 million
rows in to hbase table. During the insertion after some time, suddenly
master is going down. I don't know what is the reason for such peculiar
behavior.
I found in master log below content and not able to make out what exactly
the mistake. Please somebody help.

master-log:

2013-08-23 18:56:36,865 FATAL org.apache.hadoop.hbase.master.HMaster:
Master server abort: loaded coprocessors are: []
2013-08-23 18:56:36,866 FATAL org.apache.hadoop.hbase.master.HMaster:
Unexpected state :
scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f.
state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 ..
Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state :
scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f.
state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 ..
Cannot transit it to OFFLINE.
    at
org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
    at
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
    at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)
2013-08-23 18:56:36,867 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting
2013-08-23 18:56:36,867 DEBUG org.apache.hadoop.hbase.master.HMaster:
Stopping service threads
2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 6
2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 6: exiting
2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 6: exiting
2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 6: exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server listener on 6
2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
Server handler 2 on 6: exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
Server handler 1 on 6: exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.hbase.master.HMaster$2:
vamshi,6,1377263788019-BalancerChore exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.hbase.master.HMaster:
Stopping infoServer
2013-08-23 18:56:36,873 INFO
org.apache.hadoop.hbase.master.cleaner.HFileCleaner:
master-vamshi,6,1377263788019.archivedHFileCleaner exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
vamshi,6,1377263788019-CatalogJanitor exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
Server handler 0 on 6: exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 6: exiting
2013-08-23 18:56:36,874 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:60010
2013-08-23 18:56:36,874 INFO
org.apache.hadoop.hbase.master.cleaner.LogCleaner:
master-vamshi,6,1377263788019.oldLogCleaner exiting
2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 6: exiting
2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 6: exiting
2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 6: exiting
2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 6: exiting
2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server Responder
2013-08-23 18:56:36,876 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server Responder
2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 6: exiting
2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 6: exiting
2013-08-23 

Re: Will hbase automatically distribute the data across region servers or NOT..??

2013-08-23 Thread Dhaval Shah
Ok. The balancer runs as a separate thread (there is a config to set how often 
the thread wakes up but can't remember off the top of my head). Maybe if you 
wait long enough, it will balance eventually. Another thing you can try is run 
the balancer from hbase shell and see what you get back. If you get back a 
true, it means it should balance. If you get back a false, look at hbase master 
logs to see whats happening. I once had a scenario where my Unix accounts were 
messed up (2 users - hbase and another user mapped to the same unix ID and HDFS 
thought the user did not have the permissions to write to the HBase files on 
HDFS) and balancer did not run due to this exception. 

Another thing is (I think!) balancer generally does not run when regions are 
splitting. So its possible in your case that your regions are splitting so 
often (due to 10MB limit) that the balancer cannot be run since your regions 
are not stationary


Regards,
Dhaval



From: Vamshi Krishna vamshi2...@gmail.com
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in 
Sent: Friday, 23 August 2013 10:21 AM
Subject: Re: Will hbase automatically distribute the data across region servers 
or NOT..??


No that is 10MB itself. Just to observe the region splitting with respect
to the amount of data i am inserting in to hbase.
So, here i am inserting 40-50mb data and fixing that property to 10mb and
checking the region splitting happening.
But the intersting thing is regions got split BUT they are not being
distributed across other servers.
Whatever regions formed from the created tables on machine-1, all of them
are residing on the same machine-1 not being moved to other machine.




On Fri, Aug 23, 2013 at 7:40 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:

 Vamshi, max value for hbase.hregion.max.filesize to 10MB seems too small.
 Did you mean 10GB?


 Regards,
 Dhaval


 
 From: Vamshi Krishna vamshi2...@gmail.com
 To: user@hbase.apache.org; zhoushuaifeng zhoushuaif...@gmail.com
 Sent: Friday, 23 August 2013 9:38 AM
 Subject: Re: Will hbase automatically distribute the data across region
 servers or NOT..??


 Thanks for the clarifications.
 I am using hbase-0.94.10 and zookeepr-3.4.5
 But I am running into different issues .
 I set  hbase.hregion.max.filesize to 10Mb and i am inserting 10 million
 rows in to hbase table. During the insertion after some time, suddenly
 master is going down. I don't know what is the reason for such peculiar
 behavior.
 I found in master log below content and not able to make out what exactly
 the mistake. Please somebody help.

 master-log:

 2013-08-23 18:56:36,865 FATAL org.apache.hadoop.hbase.master.HMaster:
 Master server abort: loaded coprocessors are: []
 2013-08-23 18:56:36,866 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unexpected state :

 scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f.
 state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 ..
 Cannot transit it to OFFLINE.
 java.lang.IllegalStateException: Unexpected state :

 scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f.
 state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 ..
 Cannot transit it to OFFLINE.
     at

 org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
     at

 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
     at

 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
     at

 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
     at

 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
     at

 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
     at
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
     at

 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
     at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
     at java.lang.Thread.run(Thread.java:662)
 2013-08-23 18:56:36,867 INFO org.apache.hadoop.hbase.master.HMaster:
 Aborting
 2013-08-23 18:56:36,867 DEBUG org.apache.hadoop.hbase.master.HMaster:
 Stopping service threads
 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
 server on 6
 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
 handler 0 on 6: exiting
 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
 handler 5 on 6: exiting
 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
 handler 3 on 6: exiting
 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
 IPC Server listener on 6
 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC

Re: about rowkey prefix search

2013-08-15 Thread Dhaval Shah
Did you try setting start and e-d rows on your scan?

Sent from Yahoo! Mail on Android



Re: Memory distribution for Hadoop/Hbase processes

2013-08-07 Thread Dhaval Shah
You are way underpowered. I don't think you are going to get reasonable 
performance out of this hardware with so many processes running on it 
(specially memory heavy processes like HBase), obviously severity depends on 
your use case

I would say you can decrease memory allocation to namenode/datanodes/secondary 
namenode/hbase master/zookeeper and increase allocation to region servers
 
Regards,
Dhaval



 From: Vimal Jain vkj...@gmail.com
To: user@hbase.apache.org 
Sent: Wednesday, 7 August 2013 12:47 PM
Subject: Re: Memory distribution for Hadoop/Hbase processes
 

Hi Ted,
I am using centOS.
I could not get output of ps aux | grep pid as currently the hbase/hadoop
is down in production due to some internal reasons.

Can you please help me in figuring out memory distribution for my single
node cluster ( pseudo-distributed mode)  ?
Currently its just 4GB  RAM .Also i can try and  make it up to 6 GB.
So i have come up with following distribution :-

Name node - 512 MB
Data node - 1024MB
Secondary Name node - 512 MB

HMaster - 512 MB
HRegion - 2048 MB
Zookeeper - 512 MB

So total memory allocation is 5 GB and i still have 1 GB left for OS.

1) So is it fine  to go ahead with this configuration in production ? ( I
am asking this because i had long GC pause  problems in past when i did
not change JVM memory allocation configuration in hbase-env.sh and
hadoop-env.sh so it was taking default values . i.e. 1 GB for each of the 6
process so total allocation was 6 GB and i had only 4 GB of RAM. After this
i just assigned 1.5 GB to HRegion and 512 MB each to HMaster and Zookeeper
. I forgot to change it for Hadoop processes.Also i changed kernel
parameter vm.swappiness to 0. After this , it was working fine).

2) Currently i am running pseudo-distributed mode as my data size is at max
10-15GB at present.How easy it is to migrate from pseudo-distributed mode
to Fully distributed mode in future if my data size increases ? ( which
will be the case for sure ) .

Thanks for your help . Really appreciate it .




On Sun, Aug 4, 2013 at 8:12 PM, Kevin O'dell kevin.od...@cloudera.comwrote:

 My questions are :
 1) How this thing is working ? It is working because java can over allocate
 memory. You will know you are using too much memory when the kernel starts
 killing processes.
 2) I just have one table whose size at present is about 10-15 GB , so what
 should be ideal memory distribution ? Really you should get a box with more
 memory. You can currently only hold about ~400 MB in memory.
 On Aug 4, 2013 9:58 AM, Ted Yu yuzhih...@gmail.com wrote:

  What OS are you using ?
 
  What is the output from the following command ?
   ps aux | grep pid
  where pid is the process Id for Namenode, Datanode, etc.
 
  Cheers
 
  On Sun, Aug 4, 2013 at 6:33 AM, Vimal Jain vkj...@gmail.com wrote:
 
   Hi,
   I have configured Hbase in pseudo distributed mode with HDFS as
  underlying
   storage.I am not using map reduce framework as of now
   I have 4GB RAM.
   Currently i have following distribution of memory
  
   Data Node,Name Node,Secondary Name Node each :1000MB(default
   HADOOP_HEAPSIZE
   property)
  
   Hmaster - 512 MB
   HRegion - 1536 MB
   Zookeeper - 512 MB
  
   So total heap allocation becomes - 5.5 GB which is absurd as my total
 RAM
   is only 4 GB , but still the setup is working fine on production. :-0
  
   My questions are :
   1) How this thing is working ?
   2) I just have one table whose size at present is about 10-15 GB , so
  what
   should be ideal memory distribution ?
   --
   Thanks and Regards,
   Vimal Jain
  
 




-- 
Thanks and Regards,
Vimal Jain

NoRouteToHostException when zookeeper crashes

2013-08-06 Thread Dhaval Shah
I have a weird (and a pretty serious) issue on my HBase cluster. Whenever one 
of my zookeeper server goes down, already running services work fine for a few 
hours but when I try to restart any service (be it region servers or clients), 
they fail with a NoRouteToHostException while trying to connect to zookeeper 
and I cannot restart any service successfully at all. I do realize that No 
Route to host is coming from my network infrastructure (ping gives the same 
error) but why would 1 zookeeper server going down bring down the entire HBase 
cluster. Why doesn't HBase ride over the exception and try some other zookeeper 
server? 

Is this an issue other people face or its just me? We are running these on DHCP 
(but the IPs don't change because we have long leases). Do you guys think its a 
DHCP specific issue? Do you have pointers to avoid this issue with DHCP or do I 
have to move to static IPs?
 
Regards,
Dhaval


Re: NoRouteToHostException when zookeeper crashes

2013-08-06 Thread Dhaval Shah
HBase - 0.92.1
Zookeeper - 3.4.3
 
Regards,
Dhaval


- Original Message -
From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in
Cc: 
Sent: Tuesday, 6 August 2013 11:08 AM
Subject: Re: NoRouteToHostException when zookeeper crashes

What HBase / zookeeper versions are you using ?

On Tue, Aug 6, 2013 at 7:48 AM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:

 I have a weird (and a pretty serious) issue on my HBase cluster. Whenever
 one of my zookeeper server goes down, already running services work fine
 for a few hours but when I try to restart any service (be it region servers
 or clients), they fail with a NoRouteToHostException while trying to
 connect to zookeeper and I cannot restart any service successfully at all.
 I do realize that No Route to host is coming from my network infrastructure
 (ping gives the same error) but why would 1 zookeeper server going down
 bring down the entire HBase cluster. Why doesn't HBase ride over the
 exception and try some other zookeeper server?

 Is this an issue other people face or its just me? We are running these on
 DHCP (but the IPs don't change because we have long leases). Do you guys
 think its a DHCP specific issue? Do you have pointers to avoid this issue
 with DHCP or do I have to move to static IPs?

 Regards,
 Dhaval




Re: NoRouteToHostException when zookeeper crashes

2013-08-06 Thread Dhaval Shah
Thanks Stack. Do you have any specific pointers as to what configs would help 
mitigate this issue with a DHCP setup (I am not a networking expert, other 
teams manage the network and if I have specific pointers that would help guide 
the discussion)
 
Regards,
Dhaval



 From: Stack st...@duboce.net
To: Hbase-User user@hbase.apache.org; Dhaval Shah 
prince_mithi...@yahoo.co.in 
Sent: Tuesday, 6 August 2013 1:29 PM
Subject: Re: NoRouteToHostException when zookeeper crashes
 

On Tue, Aug 6, 2013 at 7:48 AM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:

 I have a weird (and a pretty serious) issue on my HBase cluster. Whenever
 one of my zookeeper server goes down, already running services work fine
 for a few hours but when I try to restart any service (be it region servers
 or clients), they fail with a NoRouteToHostException while trying to
 connect to zookeeper and I cannot restart any service successfully at all.
 I do realize that No Route to host is coming from my network infrastructure
 (ping gives the same error) but why would 1 zookeeper server going down
 bring down the entire HBase cluster. Why doesn't HBase ride over the
 exception and try some other zookeeper server?

 Is this an issue other people face or its just me? We are running these on
 DHCP (but the IPs don't change because we have long leases). Do you guys
 think its a DHCP specific issue? Do you have pointers to avoid this issue
 with DHCP or do I have to move to static IPs?




All bets are off in the face of NoRouteToHost.  Please fixup your
networking (My guess is first lookup works and gets cached.  On restart, we
run into your network issue).

St.Ack

Re: help on key design

2013-07-31 Thread Dhaval Shah
Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems like the 
500 Gets are executed sequentially on the region server. 

Also 3k requests per minute = 50 requests per second. Assuming your requests 
take 1 sec (which seems really long but who knows) then you need atleast 50 
threads/region server handlers to handle these. Defaults for that number on 
some older versions of hbase is 10 which means you are running out of threads. 
Which brings up the following questions - 
What version of HBase are you running?
How many region server handlers do you have?
 
Regards,
Dhaval


- Original Message -
From: Demian Berjman dberj...@despegar.com
To: user@hbase.apache.org
Cc: 
Sent: Wednesday, 31 July 2013 11:12 AM
Subject: Re: help on key design

Thanks for the responses!

  why don't you use a scan
I'll try that and compare it.

 How much memory do you have for your region servers? Have you enabled
 block caching? Is your CPU spiking on your region servers?
Block caching is enabled. Cpu and memory dont seem to be a problem.

We think we are saturating a region because the quantity of keys requested.
In that case my question will be if asking 500+ keys per request is a
normal scenario?

Cheers,


On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina pablomedin...@gmail.comwrote:

 The scan can be an option if the cost of scanning undesired cells and
 discarding them trough filters is better than accessing those keys
 individually. I would say that as the number of 'undesired' cells decreases
 the scan overall performance/efficiency gets increased. It all depends on
 how the keys are designed to be grouped together.

 2013/7/30 Ted Yu yuzhih...@gmail.com

  Please also go over http://hbase.apache.org/book.html#perf.reading
 
  Cheers
 
  On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah 
 prince_mithi...@yahoo.co.in
  wrote:
 
   If all your keys are grouped together, why don't you use a scan with
   start/end key specified? A sequential scan can theoretically be faster
  than
   MultiGet lookups (assuming your grouping is tight, you can also use
  filters
   with the scan to give better performance)
  
   How much memory do you have for your region servers? Have you enabled
   block caching? Is your CPU spiking on your region servers?
  
   If you are saturating the resources on your *hot* region server then
 yes
   having more region servers will help. If no, then something else is the
   bottleneck and you probably need to dig further
  
  
  
  
   Regards,
   Dhaval
  
  
   
   From: Demian Berjman dberj...@despegar.com
   To: user@hbase.apache.org
   Sent: Tuesday, 30 July 2013 4:37 PM
   Subject: help on key design
  
  
   Hi,
  
   I would like to explain our use case of HBase, the row key design and
 the
   problems we are having so anyone can give us a help:
  
   The first thing we noticed is that our data set is too small compared
 to
   other cases we read in the list and forums. We have a table containing
 20
   million keys splitted automatically by HBase in 4 regions and balanced
  in 3
   region servers. We have designed our key to keep together the set of
 keys
   requested by our app. That is, when we request a set of keys we expect
  them
   to be grouped together to improve data locality and block cache
  efficiency.
  
   The second thing we noticed, compared to other cases, is that we
  retrieve a
   bunch keys per request (500 aprox). Thus, during our peaks (3k requests
  per
   minute), we have a lot of requests going to a particular region servers
  and
   asking a lot of keys. That results in poor response times (in the order
  of
   seconds). Currently we are using multi gets.
  
   We think an improvement would be to spread the keys (introducing a
   randomized component on it) in more region servers, so each rs will
 have
  to
   handle less keys and probably less requests. Doing that way the multi
  gets
   will be spread over the region servers.
  
   Our questions:
  
   1. Is it correct this design of asking so many keys on each request?
 (if
   you need high performance)
   2. What about splitting in more region servers? It's a good idea? How
 we
   could accomplish this? We thought in apply some hashing...
  
   Thanks in advance!
  
 




Re: help on key design

2013-07-31 Thread Dhaval Shah
Yup that issue definitely seems relevant. Unfortunately you might have to wait 
till you can upgrade or patch your version. In the time being depending on how 
well your rows are grouped (and if you are using Bloomfilters) the scan might 
give you a short term solution
 
Regards,
Dhaval


- Original Message -
From: Demian Berjman dberj...@despegar.com
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in
Cc: 
Sent: Wednesday, 31 July 2013 2:41 PM
Subject: Re: help on key design

Dhaval,

 What version of HBase are you running?
0.94.7

 How many region server handlers do you have?
100

We are following this issue:
https://issues.apache.org/jira/browse/HBASE-9087

Ted, we think too that splitting may incur in a better performance. But
like you said, it must be done manually.

Thanks!


On Wed, Jul 31, 2013 at 2:14 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:

 Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems like
 the 500 Gets are executed sequentially on the region server.

 Also 3k requests per minute = 50 requests per second. Assuming your
 requests take 1 sec (which seems really long but who knows) then you need
 atleast 50 threads/region server handlers to handle these. Defaults for
 that number on some older versions of hbase is 10 which means you are
 running out of threads. Which brings up the following questions -
 What version of HBase are you running?
 How many region server handlers do you have?

 Regards,
 Dhaval


 - Original Message -
 From: Demian Berjman dberj...@despegar.com
 To: user@hbase.apache.org
 Cc:
 Sent: Wednesday, 31 July 2013 11:12 AM
 Subject: Re: help on key design

 Thanks for the responses!

   why don't you use a scan
 I'll try that and compare it.

  How much memory do you have for your region servers? Have you enabled
  block caching? Is your CPU spiking on your region servers?
 Block caching is enabled. Cpu and memory dont seem to be a problem.

 We think we are saturating a region because the quantity of keys requested.
 In that case my question will be if asking 500+ keys per request is a
 normal scenario?

 Cheers,


 On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina pablomedin...@gmail.com
 wrote:

  The scan can be an option if the cost of scanning undesired cells and
  discarding them trough filters is better than accessing those keys
  individually. I would say that as the number of 'undesired' cells
 decreases
  the scan overall performance/efficiency gets increased. It all depends on
  how the keys are designed to be grouped together.
 
  2013/7/30 Ted Yu yuzhih...@gmail.com
 
   Please also go over http://hbase.apache.org/book.html#perf.reading
  
   Cheers
  
   On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah 
  prince_mithi...@yahoo.co.in
   wrote:
  
If all your keys are grouped together, why don't you use a scan with
start/end key specified? A sequential scan can theoretically be
 faster
   than
MultiGet lookups (assuming your grouping is tight, you can also use
   filters
with the scan to give better performance)
   
How much memory do you have for your region servers? Have you enabled
block caching? Is your CPU spiking on your region servers?
   
If you are saturating the resources on your *hot* region server then
  yes
having more region servers will help. If no, then something else is
 the
bottleneck and you probably need to dig further
   
   
   
   
Regards,
Dhaval
   
   

From: Demian Berjman dberj...@despegar.com
To: user@hbase.apache.org
Sent: Tuesday, 30 July 2013 4:37 PM
Subject: help on key design
   
   
Hi,
   
I would like to explain our use case of HBase, the row key design and
  the
problems we are having so anyone can give us a help:
   
The first thing we noticed is that our data set is too small compared
  to
other cases we read in the list and forums. We have a table
 containing
  20
million keys splitted automatically by HBase in 4 regions and
 balanced
   in 3
region servers. We have designed our key to keep together the set of
  keys
requested by our app. That is, when we request a set of keys we
 expect
   them
to be grouped together to improve data locality and block cache
   efficiency.
   
The second thing we noticed, compared to other cases, is that we
   retrieve a
bunch keys per request (500 aprox). Thus, during our peaks (3k
 requests
   per
minute), we have a lot of requests going to a particular region
 servers
   and
asking a lot of keys. That results in poor response times (in the
 order
   of
seconds). Currently we are using multi gets.
   
We think an improvement would be to spread the keys (introducing a
randomized component on it) in more region servers, so each rs will
  have
   to
handle less keys and probably less requests. Doing that way the multi
   gets
will be spread over the region servers

Re: help on key design

2013-07-30 Thread Dhaval Shah
If all your keys are grouped together, why don't you use a scan with start/end 
key specified? A sequential scan can theoretically be faster than MultiGet 
lookups (assuming your grouping is tight, you can also use filters with the 
scan to give better performance)

How much memory do you have for your region servers? Have you enabled block 
caching? Is your CPU spiking on your region servers?

If you are saturating the resources on your *hot* region server then yes having 
more region servers will help. If no, then something else is the bottleneck and 
you probably need to dig further




Regards,
Dhaval



From: Demian Berjman dberj...@despegar.com
To: user@hbase.apache.org 
Sent: Tuesday, 30 July 2013 4:37 PM
Subject: help on key design


Hi,

I would like to explain our use case of HBase, the row key design and the
problems we are having so anyone can give us a help:

The first thing we noticed is that our data set is too small compared to
other cases we read in the list and forums. We have a table containing 20
million keys splitted automatically by HBase in 4 regions and balanced in 3
region servers. We have designed our key to keep together the set of keys
requested by our app. That is, when we request a set of keys we expect them
to be grouped together to improve data locality and block cache efficiency.

The second thing we noticed, compared to other cases, is that we retrieve a
bunch keys per request (500 aprox). Thus, during our peaks (3k requests per
minute), we have a lot of requests going to a particular region servers and
asking a lot of keys. That results in poor response times (in the order of
seconds). Currently we are using multi gets.

We think an improvement would be to spread the keys (introducing a
randomized component on it) in more region servers, so each rs will have to
handle less keys and probably less requests. Doing that way the multi gets
will be spread over the region servers.

Our questions:

1. Is it correct this design of asking so many keys on each request? (if
you need high performance)
2. What about splitting in more region servers? It's a good idea? How we
could accomplish this? We thought in apply some hashing...

Thanks in advance! 


Re: Writing unit tests against HBase

2013-06-24 Thread Dhaval Shah
Why don't you spin up a mini cluster for your tests (there is a 
MiniHBaseCluster which brings up an in-memory cluster for testing and you can 
tear it down at the end of your test)? The benefit you get is that you no 
longer need to mock HBase responses and you will be talking to an actual 
cluster running similar code to the one you will have running in prod, so will 
be more reliable. Obviously the downside is that instead of mocking responses, 
you will have to populate data in HBase tables but I still feel this is more 
intuitive and reliable.

Regards,
Dhaval



From: Adam Phelps a...@opendns.com
To: user@hbase.apache.org 
Sent: Monday, 24 June 2013 5:14 PM
Subject: Re: Writing unit tests against HBase


On 6/18/13 4:22 PM, Stack wrote:
 On Tue, Jun 18, 2013 at 4:17 PM, Varun Sharma va...@pinterest.com wrote:
 
 Hi,

 If I wanted to write to write a unit test against HTable/HBase, is there an
 already available utility to that for unit testing my application logic.

 I don't want to write code that either touches production or requires me to
 mock an HTable. I am looking for a test htable object which behaves pretty
 close to a real HTable.

 
 
 Would this help if we included it?
 https://github.com/kijiproject/fake-hbase/

I figured I'd take a look as I was about to try using Mockito
(https://code.google.com/p/mockito/) to try to implement unit testing of
some of our code that accesses HBase.  The example tests in there are
all Scala, and I'm not having much success using them in Java.  Do you
know if there's any example Java tests that make use of fake-hbase?

- Adam 


Re: Writing unit tests against HBase

2013-06-24 Thread Dhaval Shah
Yup I hear ya. MiniHBaseCluster adds an extra minute to the tests which kind of 
sucks. It gives me peace of mind though
 
Regards,
Dhaval


- Original Message -
From: Adam Phelps a...@opendns.com
To: user@hbase.apache.org
Cc: 
Sent: Monday, 24 June 2013 6:00 PM
Subject: Re: Writing unit tests against HBase

What I'm currently looking for is a method of adding quick unit tests
(ie preferably run time of a few seconds) to test some algorithms that
read hbase data and perform some operations on it.  Mocking seems a much
better way to handle this, though I'm open to other suggestions.  I'll
try out MiniHBaseCluster anyway since I can't seem to get FakeHBase to work.

- Adam

On 6/24/13 2:39 PM, Dhaval Shah wrote:
 Why don't you spin up a mini cluster for your tests (there is a 
 MiniHBaseCluster which brings up an in-memory cluster for testing and you can 
 tear it down at the end of your test)? The benefit you get is that you no 
 longer need to mock HBase responses and you will be talking to an actual 
 cluster running similar code to the one you will have running in prod, so 
 will be more reliable. Obviously the downside is that instead of mocking 
 responses, you will have to populate data in HBase tables but I still feel 
 this is more intuitive and reliable.
 
 Regards,
 Dhaval
 
 
 
 From: Adam Phelps a...@opendns.com
 To: user@hbase.apache.org 
 Sent: Monday, 24 June 2013 5:14 PM
 Subject: Re: Writing unit tests against HBase
 
 
 On 6/18/13 4:22 PM, Stack wrote:
 On Tue, Jun 18, 2013 at 4:17 PM, Varun Sharma va...@pinterest.com wrote:

 Hi,

 If I wanted to write to write a unit test against HTable/HBase, is there an
 already available utility to that for unit testing my application logic.

 I don't want to write code that either touches production or requires me to
 mock an HTable. I am looking for a test htable object which behaves pretty
 close to a real HTable.



 Would this help if we included it?
 https://github.com/kijiproject/fake-hbase/
 
 I figured I'd take a look as I was about to try using Mockito
 (https://code.google.com/p/mockito/) to try to implement unit testing of
 some of our code that accesses HBase.  The example tests in there are
 all Scala, and I'm not having much success using them in Java.  Do you
 know if there's any example Java tests that make use of fake-hbase?
 
 - Adam 



Re: Is there a way to view multiple versions of a cell in the HBase shell?

2013-03-07 Thread Dhaval Shah
I think you can. Try specifying the following VERSIONS = 4|

Its also documented in the HBase shell documentation for Get (and I am assuming 
the same would apply for scans)|

get   Get row or cell contents; pass table name, row, and optionally a 
dictionary of column(s), timestamp and versions.  Examples: hbase get 't1', 
'r1' hbase get 't1', 'r1', {COLUMN = 'c1'} hbase get 't1', 'r1', {COLUMN = 
['c1', 'c2', 'c3']} hbase get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1} 
hbase get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1, \ VERSIONS = 4}
 
Regards,
Dhaval



 From: Jonathan Natkins na...@wibidata.com
To: user@hbase.apache.org 
Sent: Thursday, 7 March 2013 12:08 PM
Subject: Is there a way to view multiple versions of a cell in the HBase shell?
 
It seems that the answer is no, but I just wanted to make sure I didn't
miss something. As far as I can tell, scanning a column on a time range
returns just the most recent value within that time range, rather than all
the values in the range.

Thanks,
Natty

-- 
http://www.wibidata.com
office: 1.415.496.9424 x208
cell: 1.609.577.1600
twitter: @nattyice http://www.twitter.com/nattyice

Re: Json+hbase

2013-02-04 Thread Dhaval Shah
JSON object is nothing but a String representation.. You can call 
json.toBytes() to get the byte representation and put that into HBase
 
Regards,
Dhaval



 From: ranjin...@polarisft.com ranjin...@polarisft.com
To: user@hbase.apache.org 
Sent: Monday, 4 February 2013 5:14 AM
Subject: Json+hbase
 
Hi,

        Need to create JSON Object and put the data into Hbase table.

How to but Json object in hbase table. Please guide me to complete the 
task.

Thanks in advance.

Regards,
Ranjini R

BSC-1,Nxt-Lvl 4th Floor East Wing
Polaris.FT-Navallur-Chennai

Mobile No:9003048194


This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

Re: Controlling TableMapReduceUtil table split points

2013-01-22 Thread Dhaval Shah

Hi David.. We successfully use the logical schema approach and have not seen 
issues yet.. Ofcourse it all depends on the use case and saying it would work 
for you because it works for us would be naive.. However, if it does work, it 
will make your life much easier because with a logical schema other problems 
become simpler (like you can be sure that 1 map function will process an entire 
row rather than a row going to multiple mappers, or if you are using filters 
that restrict queries to only a small subset of the data, even setBatch won't 
be needed for those use cases).. I did run into issues where I did not use 
setBatch and my mappers ran out of memory but that was a simpler one to solve 
(and by the way if you are on CDH4, the HBase export utility also does not use 
setBatch and your mapper will run out of memory if you have a large row.. Its 
easy to put that line in though as a config param and this feature is available 
in future releases of HBase
 trunk)

Regards,
Dhaval
 


 From: David Koch ogd...@googlemail.com
To: user@hbase.apache.org 
Sent: Sunday, 6 January 2013 12:53 PM
Subject: Re: Controlling TableMapReduceUtil table split points
  
Hi Dhaval,

Good call on the setBatch. I had forgotten about it. Just like changing the
schema it would involve changing the map(...) to reflect the fact that only
part of the user's data is returned in each call but I would not have to
manipulate table splits.

The HBase book does suggest that it's bad practice to use the logical
schema of lumping all user data into a single row(*) but I'll do some
testing to see what works.

Thank you,

/David

(*) Chapter 9, section Tall-Narrow Versus Flat-Wide Tables, 3rd ed., page
359)


On Sun, Jan 6, 2013 at 6:29 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:

 Another option to avoid the timeout/oome issues is to use scan.setBatch()
 so that the scanner would function normally for small rows but would break
 up large rows in multiple Result objects which you can now use in
 conjunction with scan.setCaching() to control how much data you get back..

 This approach would not need a change in your schema design and would
 ensure that only 1 mapper processes the entire row (but in multiple calls
 to the map function)


Re: HDFS disk space requirements

2013-01-11 Thread Dhaval Shah

Also depending on compression type chosen it might take less disk space


--
On Fri 11 Jan, 2013 3:53 PM IST Mesika, Asaf wrote:

130 GB raw data will take in HBase since it adds the family name, qualifier 
and timestamp to each value, so it can even be 150GB. You can check it 
exactly, by loading only one row with one column and see how much it takes on 
the HDFS file system (run compaction first).

Next, you 5 times that since you have 5 times replication, so 5x150=750GB

On Jan 11, 2013, at 5:07 AM, Panshul Whisper wrote:

 Hello,
 
 I have a 5 node hadoop cluster and a fully distributed Hbase setup on the
 cluster with 130 GB of HDFS space avaialble. HDFS replication is set to 5.
 
 I have a total of 115 GB of JSON files that need to be loaded into the
 Hbase database and then they have to processed.
 
 So is the available HDFS space sufficient for the operations?? considering
 the replication and all factors?
 or should I increase the space and by how much?
 
 Thanking You,
 
 -- 
 Regards,
 Ouch Whisper
 010101010101




Re: Controlling TableMapReduceUtil table split points

2013-01-06 Thread Dhaval Shah

Another option to avoid the timeout/oome issues is to use scan.setBatch() so 
that the scanner would function normally for small rows but would break up 
large rows in multiple Result objects which you can now use in conjunction with 
scan.setCaching() to control how much data you get back.. 

This approach would not need a change in your schema design and would ensure 
that only 1 mapper processes the entire row (but in multiple calls to the map 
function)



--
On Sun 6 Jan, 2013 10:07 PM IST David Koch wrote:

Hi Ted,

Thank you for your response. I will take a look.

With regards to the timeouts: I think changing the key design as outlined
above would ameliorate the situation since each map call only requests a
small amount of data as opposed to what could be a large chunk. I remember
that simply doing a get on one of the large outlier rows (~500mb) brought
down the region server involved.

/David

On Sun, Jan 6, 2013 at 5:11 PM, Ted Yu yuzhih...@gmail.com wrote:

 If events for one user are processed by a single mapper, I think you would




Re: disable table

2012-09-26 Thread Dhaval Shah
I have had similar problems and it seems like zookeeper and hbase master have 
different notions of whether the table is enabled or not.. Stopping the 
cluster, deleting zookeeper data and then starting it worked for me in this 
scenario

Regards,
Dhaval



From: Mohit Anchlia mohitanch...@gmail.com
To: user@hbase.apache.org 
Sent: Wednesday, 26 September 2012 4:54 PM
Subject: disable table

When I try to disable table I get:

hbase(main):011:0 disable 'SESSIONID_TIMELINE'
ERROR: org.apache.hadoop.hbase.TableNotEnabledException:
org.apache.hadoop.hbase.TableNotEnabledException: SESSIONID_TIMELINE
Here is some help for this command:
Start disable of named table: e.g. hbase disable 't1'

But then I try to enable I get:

hbase(main):012:0 enable 'SESSIONID_TIMELINE'
ERROR: org.apache.hadoop.hbase.TableNotDisabledException:
org.apache.hadoop.hbase.TableNotDisabledException: SESSIONID_TIMELINE
Here is some help for this command:
Start enable of named table: e.g. hbase enable 't1'

I've tried flush, major_compaction also. I tseems it's stuck in
inconsistent state. Could someone point me to correct direction? I am using
92.1 


Re:: Hregionserver instance runs endlessly

2012-09-25 Thread Dhaval Shah


Try killing the old process manually ( ps -ef )


--
On Tue 25 Sep, 2012 11:28 AM IST iwannaplay games wrote:

Hi,

My hbase was working properly.But now it shows two instances of
hregionserver , the starting time of one is of 4 days back.If i try
stopping hbase it doesnt stop by stop-hbase.sh.If i go to slave n stop
it,it stops the new instance .The old instance is not getting deleted and
its not showing up when i do jps in that slave.
What to do please help me.I restarted the cluster but the problem remains
the same.

here is the snapshot



Re: Required a sample Java program to delete a row from Hbase

2012-09-22 Thread Dhaval Shah

Delete d = new Delete(rowKey);
HTable t = new HTable(tableName);
t.delete(d);

Regards,
Dhaval



From: Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com
To: user@hbase.apache.org 
Sent: Saturday, 22 September 2012 10:15 AM
Subject: Required a sample Java program to delete a row from Hbase

Hi,

Can someone send a sample Java program to delete a row from Hbase.

Regards,
Rams 


Re: Required a sample Java program to delete a row from Hbase

2012-09-22 Thread Dhaval Shah
HTable and Delete are the only 2 I remember
 
Regards,
Dhaval


- Original Message -
From: Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com
To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in
Cc: 
Sent: Saturday, 22 September 2012 10:47 AM
Subject: Re: Required a sample Java program to delete a row from Hbase

Dhaval,

Thanks!!

What are all the classes that we need to import? My requirement is to use
java script in Pentaho. So I could not write a full Java program there... I
may require to import required classes before using these functions.

regards,
Rams

On Sat, Sep 22, 2012 at 7:48 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote:


 Delete d = new Delete(rowKey);
 HTable t = new HTable(tableName);
 t.delete(d);

 Regards,
 Dhaval


 
 From: Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com
 To: user@hbase.apache.org
 Sent: Saturday, 22 September 2012 10:15 AM
 Subject: Required a sample Java program to delete a row from Hbase

 Hi,

 Can someone send a sample Java program to delete a row from Hbase.

 Regards,
 Rams




Re:: HMaster construction failing over SASL authentication issue.

2012-09-14 Thread Dhaval Shah


Looking at your life, it seems like SASL is just.a warning/info message.. your 
real issue is invalid zookeeper sessions.. Can you try stopping everything, 
delete zookeeper data dir and data log dir and start.. Also are you running a 
version of zookeeper compatible with your hbase version?


--
On Fri 14 Sep, 2012 11:48 AM IST Arati ! wrote:

Hi,

I was running my HBase-Hadoop setup just fine with HBase 0.92.0 and Hadoop
1.0.1.on 3 nodes

Recently upgraded to HBase-0.94.1 and Hadoop-1.0.3.

Since i was running a trial environment I hadnt added my machines to a DNS.
Now after the version change, the HRegionServer processes began failing
over not being able to reverse look up the IPs of my nodes.

So I got that worked out and added the servers onto the DNS. After which im
getting the foloowing SASL exceptions.

Am I missing something?

Upon start-up all the processes are up but the logs indicate failed
construction of HMaster.

Can I disable SASL authentication?

2012-09-13 19:27:12,720 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server /10.12.19.110:2181
2012-09-13 19:27:12,722 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
this process is 17868@test110
2012-09-13 19:27:12,731 WARN
org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration
occurred when trying to find JAAS configuration.
2012-09-13 19:27:12,731 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
2012-09-13 19:27:12,737 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)


Thanks,

Arati Patro



Re: Enabling compression

2012-07-24 Thread Dhaval Shah
I bet that your compression libraries are not available to HBase.. Run the 
compression test utility and see if it can find LZO


Regards,
Dhaval


- Original Message -
From: Mohit Anchlia mohitanch...@gmail.com
To: user@hbase.apache.org
Cc: 
Sent: Tuesday, 24 July 2012 4:39 PM
Subject: Re: Enabling compression

Thanks! I was trying it out and I see this message when I use COMPRESSION,
but it works when I don't use it. Am I doing something wrong?


hbase(main):012:0 create 't2', {NAME = 'f1', VERSIONS = 1, COMPRESSION
= 'LZO'}

ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1
regions are online; retries exhausted.

hbase(main):014:0 create 't3', {NAME = 'f1', VERSIONS = 1}

0 row(s) in 1.1260 seconds


On Tue, Jul 24, 2012 at 1:37 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 On Tue, Jul 24, 2012 at 1:34 PM, Jean-Marc Spaggiari
 jean-m...@spaggiari.org wrote:
  Also, if I understand it correctly, this will enable the compression
  for the new put but will not compresse the actual cells already stored
  right? For that, we need to run a major compaction of the table which
  will rewrite all the cells and so compact them?

 Yeah, although you may not want to recompact everything all at once in
 a live system. You can just let it happen naturally through cycles of
 flushes and compactions, it's all fine.

 J-D




Re: Enabling compression

2012-07-24 Thread Dhaval Shah


Yes you need to add the snappy libraries to hbase path (i think the variable to 
set is called HBASE_LIBRARY_PATH)

--
On Wed 25 Jul, 2012 3:46 AM IST Mohit Anchlia wrote:

On Tue, Jul 24, 2012 at 2:04 PM, Dhaval Shah 
prince_mithi...@yahoo.co.inwrote:

 I bet that your compression libraries are not available to HBase.. Run the
 compression test utility and see if it can find LZO

 That seems to be the case for SNAPPY. However, I do have snappy installed
and it works with hadoop just fine and HBase is running on the same
cluster. Is there something special I need to do for HBase?


 Regards,
 Dhaval


 - Original Message -
 From: Mohit Anchlia mohitanch...@gmail.com
 To: user@hbase.apache.org
 Cc:
 Sent: Tuesday, 24 July 2012 4:39 PM
 Subject: Re: Enabling compression

 Thanks! I was trying it out and I see this message when I use COMPRESSION,
 but it works when I don't use it. Am I doing something wrong?


 hbase(main):012:0 create 't2', {NAME = 'f1', VERSIONS = 1, COMPRESSION
 = 'LZO'}

 ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1
 regions are online; retries exhausted.

 hbase(main):014:0 create 't3', {NAME = 'f1', VERSIONS = 1}

 0 row(s) in 1.1260 seconds


 On Tue, Jul 24, 2012 at 1:37 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:

  On Tue, Jul 24, 2012 at 1:34 PM, Jean-Marc Spaggiari
  jean-m...@spaggiari.org wrote:
   Also, if I understand it correctly, this will enable the compression
   for the new put but will not compresse the actual cells already stored
   right? For that, we need to run a major compaction of the table which
   will rewrite all the cells and so compact them?
 
  Yeah, although you may not want to recompact everything all at once in
  a live system. You can just let it happen naturally through cycles of
  flushes and compactions, it's all fine.
 
  J-D
 





RE: Applying QualifierFilter to one column family only.

2012-07-20 Thread Dhaval Shah


Alternately you can use a filter list and say first column family and qualifier 
filter or second column family.. 


--
On Fri 20 Jul, 2012 8:40 AM IST Anoop Sam John wrote:

Yes I was having  this doubt. So if you know exactly the qualifier names in 
advance you can use this scan way.
Else filter only u can use.
QualifierFilter just checks the qualifier name only which CF it is part of is 
not checked.
So the similar qualifier names in both T and S will get filtered out.
You can create a simple filter of your own and plugin the same into HBase and 
use? Here you can pass the CF name also and in the filterKeyValue() u can 
consider the CF name too. I think it should be an easy job :)

-Anoop-
_
From: David Koch [ogd...@googlemail.com]
Sent: Thursday, July 19, 2012 1:57 PM
To: user@hbase.apache.org
Subject: Re: Applying QualifierFilter to one column family only.

Hello Anoop,

Thank you for your answer. The QualifierFilter on T specifies a minimum
value not one that has to be matched exactly, so merely adding a specific
qualifier value directly to the scan does not work if I understand
correctly :-/

/David


On Thu, Jul 19, 2012 at 7:05 AM, Anoop Sam John anoo...@huawei.com wrote:

 Hi David,
   You want the below use case in scan
 Table :T1
 --
 CF : T   CF: S
 q1   q2..q1  q2 ..

 Now in Scan u want to scan all the qualifiers under S and one qualifier
 under T. (I think I got ur use case correctly)

 Well this use case u can achieve with out using any filter also.
 Scan s = new Scan()
 s.addFamily(S); // Tells to add all the qualifier(KVs) under this CF in
 the result
 s.addColumn(T,q1)
 Use this scan object for your getScanner.
 Using the addColumn you can add more than one qualifier under one CF too.

 Hope this helps u.

 -Anoop-
 
 From: David Koch [ogd...@googlemail.com]
 Sent: Thursday, July 19, 2012 3:36 AM
 To: user@hbase.apache.org
 Subject: Applying QualifierFilter to one column family only.

 Hello,

 When scanning a table with 2 column families, is it possible to apply
 a QualifierFilter selectively to one family but still include the other
 family in the scan?

 The layout of my table is as follows:

 rowkeyT:timestamp -- data,S:summary_item -- value

 For each rowkey family T contains timestamp/data key/value pairs. Column
 S contains summary information about this row key.

 I want to apply a QualifierFilter to column T only - i.e filter by
 timestamp but return also all of S whenever the set of key/values matched
 in T is not empty. Is this doable using standard HBase filters? If so, how?
 If not could I implement such a filter myself using FilterBase?

 Thank you,

 /David



Re: HBase shell

2012-07-20 Thread Dhaval Shah
Mohit, HBase shell is a JRuby wrapper and as such has all functions available 
which are available using Java API.. So you can import the Bytes class and the 
do a Bytes.toString() similar to what you'd do in Java

Regards,
Dhaval



From: Mohit Anchlia mohitanch...@gmail.com
To: user@hbase.apache.org 
Sent: Friday, 20 July 2012 8:39 PM
Subject: HBase shell

Is there a command on the shell that convert byte into char array when
using HBase shell command line? It's all in hex format


hbase(main):004:0 scan 'SESSION_TIMELINE'

ROW COLUMN+CELL

\x00\x00\x00\x01\x7F\xFF\xE8\x034\x04\xCF\xFF column=S_T_MTX:\x07A\xB8\xB1,
timestamp=1342826789668, value=Hello

\x00\x00\x00\x01\x7F\xFF\xE81\xDC\xE4\x07\xFF column=S_T_MTX:\x04@\xBB\x94,
timestamp=1342826589226, value=Hello

\x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x09\xA2\x7F column=S_T_MTX:\x00\x00O?,
timestamp=1342830980018, value=Hello

\x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1B\xF1\xFF
column=S_T_MTX:\x00\x00\x82\x19, timestamp=1342829793047, value=Hello

\x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1C\xDC_ column=S_T_MTX:\x00\x00S,
timestamp=1342829721025, value=Hello

\x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1D\xC6\xBF column=S_T_MTX:\x00\x00\x8Az,
timestamp=1342829675205, value=Hello

\x00\x00\x00\x01\x7F\xFF\xFE\xC7Y \x85\xDF column=S_T_MTX:\x00\x00\x89\xDE,
timestamp=1342829495072, value=Hello

\x00\x00\x00\x01\x7F\xFF\xFE\xC7Y!p? column=S_T_MTX:\x00\x00b\xEA,
timestamp=1342829425086, value=Hello


Re: Improvement: Provide better feedback on Put to unknown CF

2012-07-10 Thread Dhaval Shah

+1 a proper error message always helps IMHO



--
On Tue 10 Jul, 2012 5:58 PM IST Jean-Marc Spaggiari wrote:

Hi Michael,

I agree that in the code we have access to all the information to
access the right column.

However, let's imagine the column family name is dynamically retrieved
from a property file, and there is a typo. Or, another process removed
the column family. Or there is a bug in the code, and so on.

There is many possibilities why an application might try to access a
CF which, at the end, doesn't exist in the table.  I agree it should
have been checked from the meta before, but skeeping that step might
be required to improve performances.

Adding such exception will not have any negative impact on perfs,
readability, etc. It will simply help a lot the defect tracking when
someone will face the issue and see the stack trace.

JM

2012/7/9, Michael Segel michael_se...@hotmail.com:
 Jean-Marc,

 I think you mis understood.
 At run time, you can query HBase to find out the table schema and its column
 families.

 While I agree that you are seeing poorly written exceptions, IMHO its easier
 to avoid the problem in the first place.

 In a Map/Reduce in side the mapper class, you have everything you need to
 get the table's schema.
 From that you can see the column families.


 HTH

 -Mike

 On Jul 9, 2012, at 8:42 AM, Jean-Marc Spaggiari wrote:

 In my case it was a codding issue. Used the wrong final byte array to
 access the CF. So I agree, the CF is well known since you create the
 table based on them. But maybe you have added some other CFs later and
 something went wrong?

 It's just that based on the exception received, there is no indication
 that there might be some issues with the CF. So you might end trying
 to figure what the issue is far from where it's really.

 2012/7/9, Michael Segel michael_se...@hotmail.com:
 This may beg the question ...
 Why do you not know the CF?

 Your table schemas only consist of tables and CFs. So you should know
 them
 at the start of your job or m/r Mapper.setup();


 On Jul 9, 2012, at 7:25 AM, Jean-Marc Spaggiari wrote:

 Hi,

 When we try to add a value to a CF which does not exist on a table, we
 are getting the error below. I think this is not really giving the
 right information about the issue.

 Should it not be better to provide an exception like
 UnknownColumnFamillyException?

 JM

 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
 Failed 1 action: DoNotRetryIOException: 1 time, servers with issues:
 phenom:60020,
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1591)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367)
 at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945)
 at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801)
 at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776)
 at org.myapp.app.Integrator.main(Integrator.java:162)









Re: HBASE -- YCSB ?

2012-07-09 Thread Dhaval Shah
This exception is generally caused when one of your server names returned does 
not map to a valid IP address on that host.. The services being up or not does 
not matter but the hostname should resolve to a valid IP 

Regards,
Dhaval



From: registrat...@circle-cross-jn.com registrat...@circle-cross-jn.com
To: user@hbase.apache.org 
Sent: Monday, 9 July 2012 5:30 PM
Subject: Re: HBASE -- YCSB ?



     Thank you Amandeep for your input.

    I go into hbase shell to create a table from my HMaster, which
isn't running a DN process and I get the following.  Could this be
caused by a number of my DNs being offline, by the fact that the node
isn't running a DN process, or something else?

    hbase(main):013:0 create 'usertable', 'testcol'

ERROR: java.net.NoRouteToHostException:
java.netNoRouteToHostException: No route to host

Here is some help for this command:
Create table; pass table name, a dictionary of specifications per
column family, and optionally a dictionary of table configuration.
Dictionaries are described below in the GENERAL NOTES section.
Examples:

hbase create 't1', {NAME = 'f1', VERSIONS = 5}
hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'}
hbase # The above in shorthand would be the following:
hbase create 't1', 'f1', 'f2', 'f3'
hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000,
BLOCKCACHE = true}
hbase create 't1', 'f1', {SPLITS = ['10', '20', '30', '40']}
hbase create 't1', 'f1', {SPLITS_FILE = 'splits.txt'}

    I can see in the ZK logs and the RS logs that they talk to the
shell, so I know that communication is good and I find no errors or
exceptions in them.

    Also I can do a hbase shell status, hbase shell zk_dump, and hadoop
dfsadmin -report all from the node I am trying to create the table
from with no issue.

    If I get on a node with the DataNode process running on it and try,
I get the following:

    [hadoop@srack0-11 ~]$ hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type exit to leave the HBase Shell
Version 0.90.6-cdh3u4, r, Mon May 7 13:14:00 PDT 2012

hbase(main):001:0 status
3 servers, 0 dead, 0.6667 average load

hbase(main):002:0 create 'usertable', 'tempcol'

ERROR: java.io.IOException: java.io.IOException: Bad connect ack with
firstBadLink as 172.18.0.9:50010

    I assume this means it is trying to talk to a DN process on a node
that I know is down.

 ---

    Jay Wilson

- Original Message -
From: user@hbase.apache.org
To:, 
Cc:
Sent:Mon, 9 Jul 2012 12:21:22 -0700
Subject:Re: HBASE -- YCSB ?

Inline. 

On Monday, July 9, 2012 at 12:17 PM,
registrat...@circle-cross-jn.com [1] wrote:

 
 
 Now that I have a stable cluster, I would like to use YCSB to test
 its performance; however, I am a bit confused after reading
several
 different website posting about YCSB.
 
 1) Be default will YCSB read my hbase-site.xml [2] file or do I
have to
 copy it into the YCSB conf directory? I plan on using on of my
nodes
 with no Hadoop/HBASE processes running on it, but it has all the
 environmental stuff in place.
 
 

You have to put the hbase-site.xml [3] in YCSB/hbase/src/main/conf/.

 
 2) Does the hbase.master [4] property have to be site in the
 hbase-site.xml [5] file for YCSB to work?
 
 

The only property that has to be there is the zookeeper quorum list.
That's what the HBase client needs to talk to the cluster. 
 
 3) After working through all the workloads is there a script/tool
 that will clean up my HBase?
 
 

Nope. You'll need to go in and disable, drop the table you wrote
too. You can do that from the shell.

disable 'mytable'
drop 'mytable'

That's all you'll need to do to clean it up. 
 
 Thank You
 
 ---
 
 Jay Wilson 



Links:
--
[1] mailto:registrat...@circle-cross-jn.com
[2] http://hbase-site.xml
[3] http://hbase-site.xml
[4] http://hbase.master
[5] http://hbase-site.xml


Re: HBASE -- YCSB ?

2012-07-09 Thread Dhaval Shah
There is definitely a debug flag on hbase.. You can find out details 
on http://hbase.apache.org/shell.html.. I am not sure how much details would it 
log though.. I have never used it personally
 
Regards,
Dhaval


- Original Message -
From: registrat...@circle-cross-jn.com registrat...@circle-cross-jn.com
To: 'user@hbase.apache.org'
Cc: 
Sent: Monday, 9 July 2012 5:56 PM
Subject: Re: HBASE -- YCSB ?



     Is there a debug flag I can use with hbase shell that will tell
me the name it's trying to resolve?

    Thank you

    ---

    Jay Wilson 

- Original Message -
From: 
To:user@hbase.apache.org , registrat...@circle-cross-jn.com 
Cc:
Sent:Tue, 10 Jul 2012 05:36:44 +0800 (SGT)
Subject:Re: HBASE -- YCSB ?

This exception is generally caused when one of your server names
returned does not map to a valid IP address on that host.. The
services being up or not does not matter but the hostname should
resolve to a valid IP 

Regards,
Dhaval


From: registrat...@circle-cross-jn.com [1] 
To: user@hbase.apache.org [2] 
Sent: Monday, 9 July 2012 5:30 PM
Subject: Re: HBASE -- YCSB ?

     Thank you Amandeep for your input.

    I go into hbase shell to create a table from my
HMaster, which
isn't running a DN process and I get the following.  Could this
be
caused by a number of my DNs being offline, by the fact that the
node
isn't running a DN process, or something else?

    hbase(main):013:0 create 'usertable', 'testcol'

ERROR: java.net.NoRouteToHostException: [3]
java.netNoRouteToHostException: [4] No route to host

Here is some help for this command:
Create table; pass table name, a dictionary of specifications per
column family, and optionally a dictionary of table configuration.
Dictionaries are described below in the GENERAL NOTES section.
Examples:

hbase create 't1', {NAME = 'f1', VERSIONS = 5}
hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'}
hbase # The above in shorthand would be the following:
hbase create 't1', 'f1', 'f2', 'f3'
hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000,
BLOCKCACHE = true}
hbase create 't1', 'f1', {SPLITS = ['10', '20', '30', '40']}
hbase create 't1', 'f1', {SPLITS_FILE = 'splits.txt'} [5]

    I can see in the ZK logs and the RS logs that they talk
to the
shell, so I know that communication is good and I find no errors or
exceptions in them.

    Also I can do a hbase shell status, hbase shell
zk_dump, and hadoop
dfsadmin -report all from the node I am trying to create the table
from with no issue.

    If I get on a node with the DataNode process running on
it and try,
I get the following:

    [hadoop@srack0-11 ~]$ hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type exit to leave the HBase Shell
Version 0.90.6-cdh3u4, r, Mon May 7 13:14:00 PDT 2012

hbase(main):001:0 status
3 servers, 0 dead, 0.6667 average load

hbase(main):002:0 create 'usertable', 'tempcol'

ERROR: java.io.IOException: [6] java.io.IOException: [7] Bad connect
ack with
firstBadLink as 172.18.0.9:50010

    I assume this means it is trying to talk to a DN
process on a node
that I know is down.

     ---

    Jay Wilson

- Original Message -
From: user@hbase.apache.org [8]
To:, 
Cc:
Sent:Mon, 9 Jul 2012 12:21:22 -0700
Subject:Re: HBASE -- YCSB ?

Inline. 

On Monday, July 9, 2012 at 12:17 PM,
registrat...@circle-cross-jn.com [9] [1] wrote:

 
 
 Now that I have a stable cluster, I would like to use YCSB to test
 its performance; however, I am a bit confused after reading
several
 different website posting about YCSB.
 
 1) Be default will YCSB read my hbase-site.xml [10] [2] file or do
I
have to
 copy it into the YCSB conf directory? I plan on using on of my
nodes
 with no Hadoop/HBASE processes running on it, but it has all the
 environmental stuff in place.
 
 

You have to put the hbase-site.xml [11] [3] in
YCSB/hbase/src/main/conf/.

 
 2) Does the hbase.master [12] [4] property have to be site in the
 hbase-site.xml [13] [5] file for YCSB to work?
 
 

The only property that has to be there is the zookeeper quorum list.
That's what the HBase client needs to talk to the cluster. 
 
 3) After working through all the workloads is there a script/tool
 that will clean up my HBase?
 
 

Nope. You'll need to go in and disable, drop the table you wrote
too. You can do that from the shell.

disable 'mytable'
drop 'mytable'

That's all you'll need to do to clean it up. 
 
 Thank You
 
 ---
 
 Jay Wilson 

Links:
--
[1] mailto:registrat...@circle-cross-jn.com [14]
[2] http://hbase-site.xml [15]
[3] http://hbase-site.xml [16]
[4] http://hbase.master [17]
[5] http://hbase-site.xml [18]


Links:
--
[1] mailto:registrat...@circle-cross-jn.com
[2] mailto:user@hbase.apache.org
[3] http://java.net.NoRouteToHostException
[4] http://java.netNoRouteToHostException
[5] http://sitemail.hostway.com/http:
[6] http://java.io.IOException
[7] http://java.io.IOException
[8] 

Re: Hmaster and HRegionServer disappearance reason to ask

2012-07-05 Thread Dhaval Shah
 Pablo, instead of CMSIncrementalMode try UseParNewGC.. That seemed to be the 
silver bullet when I was dealing with HBase region server crashes

Regards,
Dhaval



From: Pablo Musa pa...@psafe.com
To: user@hbase.apache.org user@hbase.apache.org 
Sent: Thursday, 5 July 2012 5:37 PM
Subject: RE: Hmaster and HRegionServer disappearance reason to ask

I am having the same problem. I tried N different things but I cannot solve the 
problem.

hadoop-0.20.noarch      0.20.2+923.256-1
hadoop-hbase.noarch     0.90.6+84.29-1
hadoop-zookeeper.noarch 3.3.5+19.1-1

I already set:

        property
                namehbase.hregion.memstore.mslab.enabled/name
                valuetrue/value
        /property
        property
                namehbase.regionserver.handler.count/name
                value20/value
        /property 

But it does not seem to work.
How can I check if this variables are really set in the HRegionServer?

I am starting the server with the following:
-Xmx8192m -XX:NewSize=64m -XX:MaxNewSize=64m -ea -XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps

I am also having trouble to read reagionserver.out
[GC 72004.406: [ParNew: 55830K-2763K(59008K), 0.0043820 secs] 
886340K-835446K(1408788K) icms_dc=0 , 0.0044900 secs] [Times: user=0.04 
sys=0.00, real=0.00 secs]
[GC 72166.759: [ParNew: 55192K-6528K(59008K), 135.1102750 secs] 
887876K-839688K(1408788K) icms_dc=0 , 135.1103920 secs] [Times: user=1045.58 
sys=138.11, real=135.09 secs]
[GC 72552.616: [ParNew: 58977K-6528K(59008K), 0.0083040 secs] 
892138K-847415K(1408788K) icms_dc=0 , 0.0084060 secs] [Times: user=0.05 
sys=0.01, real=0.01 secs]
[GC 72882.991: [ParNew: 58979K-6528K(59008K), 151.4924490 secs] 
899866K-853931K(1408788K) icms_dc=0 , 151.4925690 secs] [Times: user=0.07 
sys=151.48, real=151.47 secs]

What does each part means?
Each line is a GC cicle?

Thanks,
Pablo


-Original Message-
From: Lars George [mailto:lars.geo...@gmail.com] 
Sent: segunda-feira, 2 de julho de 2012 06:43
To: user@hbase.apache.org
Subject: Re: Hmaster and HRegionServer disappearance reason to ask

Hi lztaomin,

 org.apache.zookeeper.KeeperException$SessionExpiredException: 
 KeeperErrorCode = Session expired

indicates that you have experienced the Juliet Pause issue, which means you 
ran into a JVM garbage collection that lasted longer than the configured 
ZooKeeper timeout threshold. 

If you search for it on Google 
http://www.google.com/search?q=juliet+pause+hbase you will find quite a few 
pages explaining the problem, and what you can do to avoid this.

Lars

On Jul 2, 2012, at 10:30 AM, lztaomin wrote:

 HI ALL
 
      My HBase group a total of 3 machine, Hadoop HBase mounted in the same 
machine, zookeeper using HBase own. Operation 3 months after the reported 
abnormal as follows. Cause hmaster and HRegionServer processes are gone. 
Please help me. 
 Thanks
 
 The following is a log
 
 ABORTING region server serverName=datanode1,60020,1325326435553, 
 load=(requests=332, regions=188, usedHeap=2741, maxHeap=8165): 
 regionserver:60020-0x3488dec38a02b1 
 regionserver:60020-0x3488dec38a02b1 received expired from ZooKeeper, 
 aborting
 Cause:
 org.apache.zookeeper.KeeperException$SessionExpiredException: 
 KeeperErrorCode = Session expired at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo
 KeeperWatcher.java:343) at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa
 tcher.java:261) at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja
 va:530) at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
 2012-07-01 13:45:38,707 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: 
 Splitting logs for datanode1,60020,1325326435553
 2012-07-01 13:45:38,756 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 32 
 hlog(s) in 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553
 2012-07-01 13:45:38,764 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 
 1 of 32: 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod
 e1%3A60020.1341006689352, length=5671397
 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.util.FSUtils: 
 Recovering file 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod
 e1%3A60020.1341006689352
 2012-07-01 13:45:39,766 INFO org.apache.hadoop.hbase.util.FSUtils: 
 Finished lease recover attempt for 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod
 e1%3A60020.1341006689352
 2012-07-01 13:45:39,880 INFO 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using 
 syncFs -- HDFS-200
 2012-07-01 13:45:39,925 INFO 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using 
 syncFs -- HDFS-200
 
 ABORTING region server serverName=datanode2,60020,1325146199444, 
 load=(requests=614, 

Re:: HBase dies shortly after starting.

2012-06-30 Thread Dhaval Shah


Try cleaning up your zookeeper data.. I have had similar issues before due to 
corrupt zookeeper data/bad zookeeper state


--
On Sat 30 Jun, 2012 4:12 AM IST Jay Wilson wrote:

I somewhat have HBase up and running in a distributed mode.  It starts
fine, I can use hbase shell to create, disable, and drop tables;
however, after a short period of time HMaster and the HRegionalservers
terminate.  Decoding the error messages is a bit bewildering and the
O'Reilly HBase book hasn't helped much with message decoding.

 

Here is a snippet of the messages from a regionalserver log:

~~~ 

U Stats: total=6.68 MB, free=807.12 MB, max=813.8 MB, blocks=2,
accesses=19, hits=17, hitRatio=89.47%%, cachingAccesses=17,
cachingHits=15, cachingHitsRatio=88.

23%%, evictions=0, evicted=0, evictedPerRun=NaN

2012-06-27 12:36:47,103 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.68
MB, free=807.12 MB, max=813.8 MB, blocks=2, accesses=19, hits=17,
hitRatio=89.47%%, cachingAccesses=17, cachingHits=15, cachingHitsRatio=88.

23%%, evictions=0, evicted=0, evictedPerRun=NaN

2012-06-27 12:40:02,106 INFO org.apache.zookeeper.ClientCnxn: Unable to
read additional data from server sessionid 0x382f6861690003, likely
server has closed socket, closing socket connection and attempting
reconnect

2012-06-27 12:40:02,112 INFO org.apache.zookeeper.ClientCnxn: Unable to
read additional data from server sessionid 0x382f6861690004, likely
server has closed socket, closing socket connection and attempting
reconnect

2012-06-27 12:40:02,245 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server devrackA-01/172.18.0.2:2181

2012-06-27 12:40:02,247 WARN org.apache.zookeeper.ClientCnxn: Session
0x382f6861690003 for server null, unexpected error, closing socket
connection and attempting reconnect

java.net.NoRouteToHostException: No route to host

~~~

No route to host would imply it can't reach one of my HQuorumpeers, but
it talks to them when I first run start-hase.sh. Also there is no DNS
involved, the /etc/hosts files are identical on all nodes,  and it's
currently a closed cluster.  All nodes are on the same subnet 172.18/16

 

Do I have something wrong in one of my xml files:

 

Core-site.xml:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
 
!-- Put site-specific property overrides in this file. --
 
configuration
property
   namehadoop.tmp.dir/name
   value/var/hbase-hadoop/tmp/value
/property
property
   namefs.default.name/name
   valuehdfs://devrackA-00:8020/value
   finaltrue/final
/property
/configuration

 

Hdfs-site.xml:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
 
!-- Put site-specific property overrides in this file. -- 
configuration
property
   namedfs.replication/name
   value3/value
/property
property
   namedfs.name.dir/name
   value/var/hbase-hadoop/name/value
/property
property
   namedfs.data.dir/name
   value/var/hbase-hadoop/data/value
/property
property
   namefs.checkpoint.dir/name
   value/var/hbase-hadoop/namesecondary/value
/property
property
   namedfs.datanode.max.xcievers/name
   value4096/value
/property
/configuration

 

Hbase-site.xml:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!--
/**
 * Copyright 2010 The Apache Software Foundation
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * License); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
--
configuration
property
   namehbase.rootdir/name
   valuehdfs://devrackA-00:8020/var/hbase-hadoop/hbase/value
/property
property
   namedfs.datanode.max.xcievers/name
   value4096/value
/property
property
   namehbase.cluster.distributed/name
   valuetrue/value
/property
property
   namehbase.regionserver.handler.count/name
   value20/value
/property
property
   namehbase.zookeeper.quorum/name
   valuedevrackA-00,devrackA-01,devrackA-25/value
/property
property
   

Re:: Rows count

2012-06-24 Thread Dhaval Shah

Instead of the shell rowcount you can use the MR job for rowcount.. something 
like
hadoop jar path_to_hbase.jar rowcount your_table

The MR job is much faster than the shell


--
On Mon 25 Jun, 2012 4:52 AM IST Jean-Marc Spaggiari wrote:

Hi,

In HBASE-1512 (https://issues.apache.org/jira/browse/HBASE-1512) there
is the implementation of co-processor for count and others.

Is there anywhere an example of the way to use them? Because the shell
count is very slow with there is to many rows.

Thanks,

JM



Re:: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain

2012-06-07 Thread Dhaval Shah

Have you restarted zookeeper? Also clearing zookeeper data dir and data log dir 
might also help.. it seems that localhost.localdomain is being cached somewhere



--
On Thu 7 Jun, 2012 2:48 PM IST Manu S wrote:

Hi All,

In pseudo distributed node HBaseMaster is stopping automatically when we
starts HbaseRegion.

I have changed all the configuration files of Hadoop,Hbase  Zookeeper to
set the exact hostname of the machine. Also commented the localhost entry
from /etc/hosts  cleared the cache as well. There is no entry of
localhost.localdomain entry in these configurations, but this it is
resolving to localhost.localdomain.

Please find the error:
2012-06-07 12:13:11,995 INFO
org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split
*2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress:
Could not resolve the DNS name of localhost.localdomain
2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.*
*java.lang.IllegalArgumentException: hostname can't be null*
at java.net.InetSocketAddress.init(InetSocketAddress.java:121)
at
org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
at
org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64)
at
org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
at
org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487)
at
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455)
at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293)
2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting
2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster:
Stopping service threads

Thanks,
Manu S



Re: HBase server treating my emails as spam (FW: Failure Notice)

2012-06-03 Thread Dhaval Shah

Got ya.. will try that next time.. thanks



--
On Sun 3 Jun, 2012 3:59 PM IST Harsh J wrote:

Hey Dhaval,

Sending plaintext email replies helps in such cases. Rich formatting
may have caused this (With too many HTML links, etc.).

On Sat, Jun 2, 2012 at 5:22 AM, Dhaval Shah prince_mithi...@yahoo.co.in 
wrote:

 Hi guys. When I send an email from my yahoo account (from a PC/laptop), the 
 hbase mail servers are treating it as spam.. if I send it from my cell using 
 the same yahoo account it goes through (like this one).. my last email got 
 marked as spam by hbase servers as you can read below..



 Forwarded Message
 From: mailer-dae...@yahoo.com
 To: prince_mithi...@yahoo.co.in
 Sent: Fri 1 Jun, 2012 8:18 PM IST
 Subject: Failure Notice

 Sorry, we were unable to deliver your message to the following address.

 user@hbase.apache.org:
 Remote host said: 552 spam score (5.4) exceeded threshold 
 (FREEMAIL_FORGED_REPLYTO,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL ) [BODY]

 --- Below this line is a copy of the message.

 Received: from [106.10.166.116] by nm2.bullet.mail.sg3.yahoo.com with NNFMP; 
 01 Jun 2012 14:48:07 -
 Received: from [106.10.151.251] by tm5.bullet.mail.sg3.yahoo.com with NNFMP; 
 01 Jun 2012 14:48:07 -
 Received: from [127.0.0.1] by omp1022.mail.sg3.yahoo.com with NNFMP; 01 Jun 
 2012 14:48:07 -
 X-Yahoo-Newman-Property: ymail-3
 X-Yahoo-Newman-Id: 151978.45173...@omp1022.mail.sg3.yahoo.com
 Received: (qmail 69489 invoked by uid 60001); 1 Jun 2012 14:48:07 -
 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; 
 s=s1024; t=1338562087; bh=4UyEMs/mYUWEloKWPm8TNHpf0XUq1hS6shr2jtC2png=; 
 h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
  
 b=Idtey84AMgbZ1B0L9sZFUeXVnhL4qrmUCjIVia5g6jKmTDtZ3QS5Qg8VbHpWAVzIORmWqOx1ia3u9WqCIwuMhHNDI9fXcHjJ0V6E5GX0dCXXarubdolBxCWAvodQr9ZXrj8JO+zsSDMFCDSW83ZmpPoFptt5NQWdNE1wNVUEKN8=
 DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.co.in;
  h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
  b=yn52KmZA19Tre13iYbmt0H4NRfudP7x7xrlGehPDMUU7OXCOWfKtfyaNZ5e7x0lI1A4mjdEmeEwaNkEFV4MYDcBl8LmuV3HQsX4NZl15VuPYS7GbDGxDdeoR9CZinzJhWHlzPuhPi+g4MGDWbTev7FtayVKJrLQSCrQCmw5WAFI=;
 X-YMail-OSG: O1MD734VM1nUIwivrAyqVmVyGupyuKq4og1.Ui6RaHpyrCi
  oqTdLss0wdr4WW5pfNFAxC4oK36jwewpOMhWLixlxazByEySyIIPKzBdakrK
  z0IlUE4QclLH6g3LjSc_JYtH9M4tHgSCN4WTuBS0s34F7xcTAFPdXhm2v.xu
  dv.7wJ1N_3QHrLvZh9XeEQxm721CMic72Yk.PtcEg_aSljOiZVd_MdLbQkyq
  80l2H98OCHLqfvgbs2qO40x5_RwJ.pzUtRmx_gs.GxsfCuIvQMiA7XkXbKAs
  ZFQSJR0EDVRFCc5QSjCKPkK25hgjzkAzQ8MqPpc7o44O1az8bQzWPQbqVHaS
  89m8QcqJ.R43KIRDRrdausENT199M0HvqygTrcPhkzUhSW73RXiOQyxJP_BP
  lRx2245t8bkU4Rm34LqkkyKTtPnhK7VWHCi8V..yq0qUKMoN1_KN6Y1XhVID
  dEucmLKc3PRe2z_BEAEr_hsh7HB.xmkXNM6JCFE3hXh1k9NCtnNm.7EhdS8S
  9mw--
 Received: from [199.172.169.86] by web192504.mail.sg3.yahoo.com via HTTP; 
 Fri, 01 Jun 2012 22:48:06 SGT
 X-Mailer: YahooMailWebService/0.8.118.349524
 References: 4fc4e89f.3090...@free.fr 4fc7c20c.4040...@free.fr 
 CAGpTDNcDBGF=v+fx7vssvpdx91qg_4_vq0a8fxzee+70gue...@mail.gmail.com 
 4fc7ddf1.8080...@free.fr 
 CAGpTDNe94LkkvLcHP1nCucXzb6Dapb-X+tfQEBw+55Tnx=5e=a...@mail.gmail.com 
 4fc8c2c6.7060...@free.fr 4fc8d456.6060...@free.fr
 Message-ID: 1338562086.60488.yahoomail...@web192504.mail.sg3.yahoo.com
 Date: Fri, 1 Jun 2012 22:48:06 +0800 (SGT)
 From: Dhaval Shah prince_mithi...@yahoo.co.in
 Reply-To: Dhaval Shah prince_mithi...@yahoo.co.in
 Subject: Re: hosts unreachables
 To: user@hbase.apache.org user@hbase.apache.org
 In-Reply-To: 4fc8d456.6060...@free.fr
 MIME-Version: 1.0
 Content-Type: multipart/alternative; 
 boundary=1766475574-631287525-1338562086=:60488

 --1766475574-631287525-1338562086=:60488
 Content-Type: text/plain; charset=iso-8859-1
 Content-Transfer-Encoding: quoted-printable

 Can you try removing=A0-XX:+CMSIncrementalMode from your GC settings.. That=
  had caused a lot of pain a few months back=0A=A0=0ARegards,=0ADhaval=0A=0A=
 =0A=0A From: Cyril Scetbon cyril.scetbon@f=
 ree.fr=0ATo: user@hbase.apache.org =0ASent: Friday, 1 June 2012 10:40 AM=
 =0ASubject: Re: hosts unreachables=0A =0AI've another regionserver (hb-d2) =
 that crashed (I can easily reproduce the issue by continuing injections), a=
 nd as I see in master log, it gets information about hb-d2 every 5 minutes.=
  I suppose it's what helps him to note if a node is dead or not. However it=
  adds hb-d2 to the dead node list at 13:32:20, so before 5 minutes since th=
 e last time it got the server information. Is it normal ?=0A=0A2012-06-01 1=
 3:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server infor=
 mation: hb-d5,60020,1338553124247=3D47, hb-d4,60020,1338553126577=3D47, hb-=
 d7,60020,1338553124279=3D46, hb-d10,60020,1338553126695=3D47, hb-d6,60020,1=
 33=0A8553124588

HBase server treating my emails as spam (FW: Failure Notice)

2012-06-01 Thread Dhaval Shah

Hi guys. When I send an email from my yahoo account (from a PC/laptop), the 
hbase mail servers are treating it as spam.. if I send it from my cell using 
the same yahoo account it goes through (like this one).. my last email got 
marked as spam by hbase servers as you can read below.. 



Forwarded Message
From: mailer-dae...@yahoo.com
To: prince_mithi...@yahoo.co.in
Sent: Fri 1 Jun, 2012 8:18 PM IST
Subject: Failure Notice

Sorry, we were unable to deliver your message to the following address.

user@hbase.apache.org:
Remote host said: 552 spam score (5.4) exceeded threshold 
(FREEMAIL_FORGED_REPLYTO,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL ) [BODY]

--- Below this line is a copy of the message.

Received: from [106.10.166.116] by nm2.bullet.mail.sg3.yahoo.com with NNFMP; 01 
Jun 2012 14:48:07 -
Received: from [106.10.151.251] by tm5.bullet.mail.sg3.yahoo.com with NNFMP; 01 
Jun 2012 14:48:07 -
Received: from [127.0.0.1] by omp1022.mail.sg3.yahoo.com with NNFMP; 01 Jun 
2012 14:48:07 -
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 151978.45173...@omp1022.mail.sg3.yahoo.com
Received: (qmail 69489 invoked by uid 60001); 1 Jun 2012 14:48:07 -
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s1024; 
t=1338562087; bh=4UyEMs/mYUWEloKWPm8TNHpf0XUq1hS6shr2jtC2png=; 
h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
 
b=Idtey84AMgbZ1B0L9sZFUeXVnhL4qrmUCjIVia5g6jKmTDtZ3QS5Qg8VbHpWAVzIORmWqOx1ia3u9WqCIwuMhHNDI9fXcHjJ0V6E5GX0dCXXarubdolBxCWAvodQr9ZXrj8JO+zsSDMFCDSW83ZmpPoFptt5NQWdNE1wNVUEKN8=
DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.co.in;
  
h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
  
b=yn52KmZA19Tre13iYbmt0H4NRfudP7x7xrlGehPDMUU7OXCOWfKtfyaNZ5e7x0lI1A4mjdEmeEwaNkEFV4MYDcBl8LmuV3HQsX4NZl15VuPYS7GbDGxDdeoR9CZinzJhWHlzPuhPi+g4MGDWbTev7FtayVKJrLQSCrQCmw5WAFI=;
X-YMail-OSG: O1MD734VM1nUIwivrAyqVmVyGupyuKq4og1.Ui6RaHpyrCi
 oqTdLss0wdr4WW5pfNFAxC4oK36jwewpOMhWLixlxazByEySyIIPKzBdakrK
 z0IlUE4QclLH6g3LjSc_JYtH9M4tHgSCN4WTuBS0s34F7xcTAFPdXhm2v.xu
 dv.7wJ1N_3QHrLvZh9XeEQxm721CMic72Yk.PtcEg_aSljOiZVd_MdLbQkyq
 80l2H98OCHLqfvgbs2qO40x5_RwJ.pzUtRmx_gs.GxsfCuIvQMiA7XkXbKAs
 ZFQSJR0EDVRFCc5QSjCKPkK25hgjzkAzQ8MqPpc7o44O1az8bQzWPQbqVHaS
 89m8QcqJ.R43KIRDRrdausENT199M0HvqygTrcPhkzUhSW73RXiOQyxJP_BP
 lRx2245t8bkU4Rm34LqkkyKTtPnhK7VWHCi8V..yq0qUKMoN1_KN6Y1XhVID
 dEucmLKc3PRe2z_BEAEr_hsh7HB.xmkXNM6JCFE3hXh1k9NCtnNm.7EhdS8S
 9mw--
Received: from [199.172.169.86] by web192504.mail.sg3.yahoo.com via HTTP; Fri, 
01 Jun 2012 22:48:06 SGT
X-Mailer: YahooMailWebService/0.8.118.349524
References: 4fc4e89f.3090...@free.fr 4fc7c20c.4040...@free.fr 
CAGpTDNcDBGF=v+fx7vssvpdx91qg_4_vq0a8fxzee+70gue...@mail.gmail.com 
4fc7ddf1.8080...@free.fr 
CAGpTDNe94LkkvLcHP1nCucXzb6Dapb-X+tfQEBw+55Tnx=5e=a...@mail.gmail.com 
4fc8c2c6.7060...@free.fr 4fc8d456.6060...@free.fr
Message-ID: 1338562086.60488.yahoomail...@web192504.mail.sg3.yahoo.com
Date: Fri, 1 Jun 2012 22:48:06 +0800 (SGT)
From: Dhaval Shah prince_mithi...@yahoo.co.in
Reply-To: Dhaval Shah prince_mithi...@yahoo.co.in
Subject: Re: hosts unreachables
To: user@hbase.apache.org user@hbase.apache.org
In-Reply-To: 4fc8d456.6060...@free.fr
MIME-Version: 1.0
Content-Type: multipart/alternative; 
boundary=1766475574-631287525-1338562086=:60488

--1766475574-631287525-1338562086=:60488
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Can you try removing=A0-XX:+CMSIncrementalMode from your GC settings.. That=
 had caused a lot of pain a few months back=0A=A0=0ARegards,=0ADhaval=0A=0A=
=0A=0A From: Cyril Scetbon cyril.scetbon@f=
ree.fr=0ATo: user@hbase.apache.org =0ASent: Friday, 1 June 2012 10:40 AM=
=0ASubject: Re: hosts unreachables=0A =0AI've another regionserver (hb-d2) =
that crashed (I can easily reproduce the issue by continuing injections), a=
nd as I see in master log, it gets information about hb-d2 every 5 minutes.=
 I suppose it's what helps him to note if a node is dead or not. However it=
 adds hb-d2 to the dead node list at 13:32:20, so before 5 minutes since th=
e last time it got the server information. Is it normal ?=0A=0A2012-06-01 1=
3:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server infor=
mation: hb-d5,60020,1338553124247=3D47, hb-d4,60020,1338553126577=3D47, hb-=
d7,60020,1338553124279=3D46, hb-d10,60020,1338553126695=3D47, hb-d6,60020,1=
33=0A8553124588=3D47, hb-d8,60020,1338553124113=3D47, hb-d2,60020,133855312=
6560=3D47, hb-d11,60020,1338553124329=3D47, hb-d12,60020,1338553126567=3D47=
, hb-d1,60020,1338553126474=3D47, hb-d9,60020,1338553124179=3D47=0A..=0A201=
2-06-01 13:07:36,319 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Ser=
ver information: hb-d5,60020,1338553124247=3D47, hb-d4,60020,1338553126577=
=3D47, hb-d7,60020,1338553124279=3D46, hb

Re:: Cannot post or reply to hbase mailing list anymore - please unblock

2012-05-26 Thread Dhaval Shah

I am seeing a very similar behavior. The funny part is that if I reply from my 
android it goes through (like right now) but if I send it from my browser its 
classified as spam (for the exact same email account)



--
On Sat 26 May, 2012 2:00 PM IST Christian Schäfer wrote:

Hi

again I cannot post to hbase mailing list anymore.
Like today, that seems to happen when I answer mails as I just wanted to it.
My mail provider (yahoo) wrote me that it's not classifying my mail as spam on 
their side but on the hbase mailing server.
Could someone unblock me or something?


regards
Chris


Re: : question on filters

2012-05-25 Thread Dhaval Shah

Yes instead of a single Get you can supply a list of Get's to the same 
htable.get call. It will sort and partition the list on a per region basis, 
make requests in parallel, aggregate the responses and return an array of 
Result. Make sure you apply your filter to each Get



--
On Fri 25 May, 2012 11:18 PM IST jack chrispoo wrote:

Thanks Dhaval, and is there a way to get multiple rows (their keys not
contiguous) from HBase server with only one request? it seems to me it's
expensive to send one get request for each one row.

jack

On Thu, May 24, 2012 at 5:40 PM, Dhaval Shah 
prince_mithi...@yahoo.co.inwrote:


 Jack, you can use filters on Get's too..



 --
 On Fri 25 May, 2012 5:36 AM IST jack chrispoo wrote:

 Hi,
 
 I'm new to HBase and I have a question about using filters. I know that I
 can use filters with scan, say scan start-key=key1  end-key=key2 and with
 a SingleColumnValueFilter: columnA=valueA. But in my java program I need
 to
 do filtering on a set of rows which are not contiguous, my client needs to
 get all rows with rowid in a setString and with columnA=valueA. I don't
 know how this can be done efficiently. I can imagine that I can do a scan
 of the entire table and set filers rowid=... and columnA=valueA; or I can
 use get function to get the rows with rowid in my set all to my client and
 do the filtering on my client side. But I think neither way is efficient.
 Can anyone give me a hint on this?
 
 Thanks
 Yixiao





Re:: question on filters

2012-05-24 Thread Dhaval Shah

Jack, you can use filters on Get's too.. 



--
On Fri 25 May, 2012 5:36 AM IST jack chrispoo wrote:

Hi,

I'm new to HBase and I have a question about using filters. I know that I
can use filters with scan, say scan start-key=key1  end-key=key2 and with
a SingleColumnValueFilter: columnA=valueA. But in my java program I need to
do filtering on a set of rows which are not contiguous, my client needs to
get all rows with rowid in a setString and with columnA=valueA. I don't
know how this can be done efficiently. I can imagine that I can do a scan
of the entire table and set filers rowid=... and columnA=valueA; or I can
use get function to get the rows with rowid in my set all to my client and
do the filtering on my client side. But I think neither way is efficient.
Can anyone give me a hint on this?

Thanks
Yixiao



Re: RegionServer silently stops (only issue: CMS-concurrent-mark ~80sec)

2012-05-01 Thread Dhaval Shah


Not sure if its related (or even helpful) but we were using cdh3b4 (which is 
0.90.1) and we saw similar issues with region servers going down.. we didn't 
look at GC logs but we had very high zookeeper leases so its unlikely that the 
GC could have caused the issue.. this problem went away when we upgraded to 
cdh3u3 which is rock steady in terms of region servers.. (havent had a single 
region server crash in a month where on the older version I used to have 1 
crash every couple of days).. the only other difference between the two is that 
we use snappy on the newer one and gz on the old

We also noticed that having replication enabled also contributed to the 
issues.. 


--
On Tue 1 May, 2012 3:15 PM IST N Keywal wrote:

Hi Alex,

On the same idea, note that hbase is launched with
-XX:OnOutOfMemoryError=kill -9 %p.

N.

On Tue, May 1, 2012 at 10:41 AM, Igal Shilman ig...@wix.com wrote:

 Hi Alex, just to rule out, oom killer,
 Try this:

 http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer


 On Mon, Apr 30, 2012 at 10:48 PM, Alex Baranau alex.barano...@gmail.com
 wrote:

  Hello,
 
  During recent weeks I constantly see some RSs *silently* dying on our
 HBase
  cluster. By silently I mean that process stops, but no errors in logs
  [1].
 
  The only thing I can relate to it is long CMS-concurrent-mark: almost 80
  seconds. But this should not cause issues as it is not a stop-the-world
  process.
 
  Any advice?
 
  HBase: hbase-0.90.4-cdh3u3
  Hadoop: 0.20.2-cdh3u3
 
  Thank you,
  Alex Baranau
 
  [1]
 
  last lines from RS log (no errors before too, and nothing written in
 *.out
  file):
 
  2012-04-30 18:52:11,806 DEBUG
  org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
  requested for agg-sa-1.3,0011|
 
 
 te|dtc|\x00\x00\x00\x00\x00\x00\x1E\x002\x00\x00\x00\x015\x9C_n\x00\x00\x00\x00\x00\x00\x00\x00\x00,1334852280902.4285f9339b520ee617c087c0fd0dbf65.
  because regionserver60020.cacheFlusher; priority=-1, compaction queue
  size=0
  2012-04-30 18:54:58,779 DEBUG
  org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using new
  createWriter -- HADOOP-6840
  2012-04-30 18:54:58,779 DEBUG
  org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
 
 
 Path=hdfs://xxx.ec2.internal/hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651,
  syncFs=true, hflush=false
  2012-04-30 18:54:58,874 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLog:
  Roll
 
 
 /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335811856672,
  entries=73789, filesize=63773934. New hlog
 
 
 /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651
  2012-04-30 18:56:31,867 INFO
  org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke
 up
  with memory above low water.
  2012-04-30 18:56:31,867 INFO
  org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
  agg-sa-1.3,s_00I4|
 
 
 tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.
  due to global heap pressure
  2012-04-30 18:56:31,867 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegion:
  Started memstore flush for agg-sa-1.3,s_00I4|
 
 
 tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.,
  current region memstore size 138.1m
  2012-04-30 18:56:31,867 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegion:
  Finished snapshotting, commencing flushing stores
  2012-04-30 18:56:56,303 DEBUG
  org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=322.84
 MB,
  free=476.34 MB, max=799.17 MB, blocks=5024, accesses=12189396,
 hits=127592,
  hitRatio=1.04%%, cachingAccesses=132480, cachingHits=126949,
  cachingHitsRatio=95.82%%, evictions=0, evicted=0, evictedPerRun=NaN
  2012-04-30 18:56:59,026 INFO org.apache.hadoop.hbase.regionserver.Store:
  Renaming flushed file at
 
 
 hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/.tmp/391890051647401997
  to
 
 
 hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168
  2012-04-30 18:56:59,034 INFO org.apache.hadoop.hbase.regionserver.Store:
  Added
 
 
 hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168,
  entries=476418, sequenceid=880198761, memsize=138.1m, filesize=5.7m
  2012-04-30 18:56:59,097 INFO
 org.apache.hadoop.hbase.regionserver.HRegion:
  Finished memstore flush of ~138.1m for region agg-sa-1.3,s_00I4|
 
 
 

HBase Thrift for CDH3U3 leaking file descriptors/socket connections to Zookeeper

2012-04-27 Thread Dhaval Shah
We have an app written in Ruby which uses HBase as the backing store.. It uses 
Thrift to connect to it.. We were using HBase from Cloudera's CDH3B4 distro 
until 
now and it worked fine.. I just upgraded our Hadoop install to CDH3U3 (which is 
the latest stable CDH release at this point) and in a matter of hours all 
Thrift 
servers went down..

Upon further investigation I realized that it was hitting the limit on the 
number 
of allowed file descriptors (which is pretty high at 32k).. This problem occurs 
if 
I use thrift in any configuration (hsha, framed transport, threadpool) except 
the 
nonblocking mode.. Digging further I realized a couple of things:
1. Even with light load (1-2 processes hitting the thrift server in quick 
succession), thrift is spinning up new threads and each of the threads is 
maintaining a socket connection to zookeeper.. In a matter on minutes (with 
this 
load test), thrift has  32k open connections with  8k threads having 
connection 
to zookeeper which do not seem to die even after a day..
2. The logs show approx 3-4 open connections (presumably for each thread):
java53588 hbase 4135r  FIFO0,6 177426 pipe
java53588 hbase 4136w  FIFO0,6 177426 pipe
java53588 hbase 4137r     0,11 0   177427 eventpoll
java53588 hbase 4138u  IPv4 177428TCP 
njhaddev05:49729-njhaddev01:2181 (ESTABLISH
ED)

CDH3B4 with the exact same configurations and the exact same setup works fine 
but 
CDH3U3 does not.. Using Thrift in nonblocking mode isn't really an option 
because 
of the low throughput and single threaded nature..

Can someone help please?