Re: Hbase clustering

2012-09-27 Thread Venkateswara Rao Dokku
How can we verify that the data(tables) is distributed across the cluster??
Is there a way to confirm it that the data is distributed across all the
nodes in the cluster.?

On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
dvrao@gmail.com wrote:

 Hi,
 I am completely new to Hbase. I want to cluster the Hbase on two
 nodes.I installed hadoop,hbase on the two nodes  my conf files are as
 given below.
 *cat  conf/regionservers *
 hbase-regionserver1
 hbase-master
 *cat conf/masters *
 hadoop-namenode
 * cat conf/slaves *
 hadoop-datanode1
 *vim conf/hdfs-site.xml *
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
 property
 namedfs.replication/name
 value2/value
 descriptionDefault block replication.The actual number of
 replications can be specified when the file is created. The default is used
 if replication is not specified in create time.
 /description
 /property
 property
 namedfs.support.append/name
 valuetrue/value
 descriptionDefault block replication.The actual number of
 replications can be specified when the file is created. The default is used
 if replication is not specified in create time.
 /description
 /property
 /configuration
 * finally my /etc/hosts file is *
 127.0.0.1   localhost
 127.0.0.1   oc-PowerEdge-R610
 10.2.32.48  hbase-master hadoop-namenode
 10.240.13.35 hbase-regionserver1  hadoop-datanode1
  The above files are identical on both of the machines. The following are
 the processes that are running on my m/c's when I ran start scripts in
 hadoop as well as hbase
 *hadoop-namenode:*
 HQuorumPeer
 HMaster
 Main
 HRegionServer
 SecondaryNameNode
 Jps
 NameNode
 JobTracker
 *hadoop-datanode1:*

 TaskTracker
 Jps
 DataNode
 -- process information unavailable
 Main
 NC
 HRegionServer

 I can able to create,list  scan tables on the *hadoop-namenode* machine
 using Hbase shell. But while trying to run the same on the  *
 hadoop-datanode1 *machine I couldn't able to do it as I am getting
 following error.
 hbase(main):001:0 list
 TABLE


 ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times

 Here is some help for this command:
 List all tables in hbase. Optional regular expression parameter could
 be used to filter the output. Examples:

   hbase list
   hbase list 'abc.*'
 How can I list,scan the tables that are created by the *hadoop-namenode *
 from the *hadoop-datanode1* machine. Similarly Can I create some tables
 on  *hadoop-datanode1 * can I access them from the *hadoop-namenode * 
 vice-versa as the data is distributed as this is a cluster.



 --
 Thanks  Regards,
 Venkateswara Rao Dokku,
 Software Engineer,One Convergence Devices Pvt Ltd.,
 Jubille Hills,Hyderabad.




-- 
Thanks  Regards,
Venkateswara Rao Dokku,
Software Engineer,One Convergence Devices Pvt Ltd.,
Jubille Hills,Hyderabad.


Re: H-base Bulk insert

2012-09-27 Thread Sonal Goyal
Check http://hbase.apache.org/book/arch.bulk.load.html

Best Regards,
Sonal
Crux: Reporting for HBase https://github.com/sonalgoyal/crux
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal





On Thu, Sep 27, 2012 at 12:45 PM, Venkateswara Rao Dokku 
dvrao@gmail.com wrote:

 How can I insert bulk data into h-base ?? Can you please provide me with
 the links

 --
 Thanks  Regards,
 Venkateswara Rao Dokku,
 Software Engineer,One Convergence Devices Pvt Ltd.,
 Jubille Hills,Hyderabad.



RE: H-base Bulk insert

2012-09-27 Thread Ramkrishna.S.Vasudevan
You can use mapreduce.  We have an utility called ImportTsv tool that allows
you to bulk load data from a flat file?  Is this your use case?

Pls refer to http://hbase.apache.org/book.html#arch.bulk.load

Regards
Ram

 -Original Message-
 From: Venkateswara Rao Dokku [mailto:dvrao@gmail.com]
 Sent: Thursday, September 27, 2012 12:45 PM
 To: user@hbase.apache.org
 Subject: H-base Bulk insert
 
 How can I insert bulk data into h-base ?? Can you please provide me
 with
 the links
 
 --
 Thanks  Regards,
 Venkateswara Rao Dokku,
 Software Engineer,One Convergence Devices Pvt Ltd.,
 Jubille Hills,Hyderabad.



Re: Hbase clustering

2012-09-27 Thread n keywal
Hi,

I would like to direct you to the reference guide, but I must acknowledge
that, well, it's a reference guide, hence not really easy for a plain new
start.
You should have a look at Lars' blog (and may be buy his book), and
especially this entry:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

Some hints however:
- the replication occurs at the hdfs level, not the hbase level: hbase
writes files that are split in hdfs blocks that are replicated accross the
datanodes. If you want to check the replications, you must look at what
files are written by hbase and how they have been split in blocks by hdfs
and how these blocks have been replicated. That will be in the hdfs
interface. As a side note, it's not the easiest thing to learn when you
start :-)
- The error  ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
Retried 7 times
  this is not linked to replication or whatever. It means that second
machine cannot find the master. You need to fix this first. (googling 
checking the logs).


Good luck,

Nicolas




On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku dvrao@gmail.com
 wrote:

 How can we verify that the data(tables) is distributed across the cluster??
 Is there a way to confirm it that the data is distributed across all the
 nodes in the cluster.?

 On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
 dvrao@gmail.com wrote:

  Hi,
  I am completely new to Hbase. I want to cluster the Hbase on two
  nodes.I installed hadoop,hbase on the two nodes  my conf files are as
  given below.
  *cat  conf/regionservers *
  hbase-regionserver1
  hbase-master
  *cat conf/masters *
  hadoop-namenode
  * cat conf/slaves *
  hadoop-datanode1
  *vim conf/hdfs-site.xml *
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
  property
  namedfs.replication/name
  value2/value
  descriptionDefault block replication.The actual number of
  replications can be specified when the file is created. The default is
 used
  if replication is not specified in create time.
  /description
  /property
  property
  namedfs.support.append/name
  valuetrue/value
  descriptionDefault block replication.The actual number of
  replications can be specified when the file is created. The default is
 used
  if replication is not specified in create time.
  /description
  /property
  /configuration
  * finally my /etc/hosts file is *
  127.0.0.1   localhost
  127.0.0.1   oc-PowerEdge-R610
  10.2.32.48  hbase-master hadoop-namenode
  10.240.13.35 hbase-regionserver1  hadoop-datanode1
   The above files are identical on both of the machines. The following are
  the processes that are running on my m/c's when I ran start scripts in
  hadoop as well as hbase
  *hadoop-namenode:*
  HQuorumPeer
  HMaster
  Main
  HRegionServer
  SecondaryNameNode
  Jps
  NameNode
  JobTracker
  *hadoop-datanode1:*
 
  TaskTracker
  Jps
  DataNode
  -- process information unavailable
  Main
  NC
  HRegionServer
 
  I can able to create,list  scan tables on the *hadoop-namenode* machine
  using Hbase shell. But while trying to run the same on the  *
  hadoop-datanode1 *machine I couldn't able to do it as I am getting
  following error.
  hbase(main):001:0 list
  TABLE
 
 
  ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times
 
  Here is some help for this command:
  List all tables in hbase. Optional regular expression parameter could
  be used to filter the output. Examples:
 
hbase list
hbase list 'abc.*'
  How can I list,scan the tables that are created by the *hadoop-namenode *
  from the *hadoop-datanode1* machine. Similarly Can I create some tables
  on  *hadoop-datanode1 * can I access them from the *hadoop-namenode * 
  vice-versa as the data is distributed as this is a cluster.
 
 
 
  --
  Thanks  Regards,
  Venkateswara Rao Dokku,
  Software Engineer,One Convergence Devices Pvt Ltd.,
  Jubille Hills,Hyderabad.
 
 


 --
 Thanks  Regards,
 Venkateswara Rao Dokku,
 Software Engineer,One Convergence Devices Pvt Ltd.,
 Jubille Hills,Hyderabad.



Re: Hbase clustering

2012-09-27 Thread Venkateswara Rao Dokku
I can see that HMaster is not started on the data-node machine when the
start scripts in hadoop  hbase ran on the hadoop-namenode. My doubt is
that,Shall we have to start that master on the hadoop-datanode1 too or the
hadoop-datanode1 will access the Hmaster that is running on the
hadoop-namenode to create,list,scan tables as the two nodes are in the
cluster as namenode  datanode.

On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote:

 Hi,

 I would like to direct you to the reference guide, but I must acknowledge
 that, well, it's a reference guide, hence not really easy for a plain new
 start.
 You should have a look at Lars' blog (and may be buy his book), and
 especially this entry:
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

 Some hints however:
 - the replication occurs at the hdfs level, not the hbase level: hbase
 writes files that are split in hdfs blocks that are replicated accross the
 datanodes. If you want to check the replications, you must look at what
 files are written by hbase and how they have been split in blocks by hdfs
 and how these blocks have been replicated. That will be in the hdfs
 interface. As a side note, it's not the easiest thing to learn when you
 start :-)
 - The error  ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
 Retried 7 times
   this is not linked to replication or whatever. It means that second
 machine cannot find the master. You need to fix this first. (googling 
 checking the logs).


 Good luck,

 Nicolas




 On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku 
 dvrao@gmail.com
  wrote:

  How can we verify that the data(tables) is distributed across the
 cluster??
  Is there a way to confirm it that the data is distributed across all the
  nodes in the cluster.?
 
  On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
  dvrao@gmail.com wrote:
 
   Hi,
   I am completely new to Hbase. I want to cluster the Hbase on two
   nodes.I installed hadoop,hbase on the two nodes  my conf files are as
   given below.
   *cat  conf/regionservers *
   hbase-regionserver1
   hbase-master
   *cat conf/masters *
   hadoop-namenode
   * cat conf/slaves *
   hadoop-datanode1
   *vim conf/hdfs-site.xml *
   ?xml version=1.0?
   ?xml-stylesheet type=text/xsl href=configuration.xsl?
  
   !-- Put site-specific property overrides in this file. --
  
   configuration
   property
   namedfs.replication/name
   value2/value
   descriptionDefault block replication.The actual number of
   replications can be specified when the file is created. The default is
  used
   if replication is not specified in create time.
   /description
   /property
   property
   namedfs.support.append/name
   valuetrue/value
   descriptionDefault block replication.The actual number of
   replications can be specified when the file is created. The default is
  used
   if replication is not specified in create time.
   /description
   /property
   /configuration
   * finally my /etc/hosts file is *
   127.0.0.1   localhost
   127.0.0.1   oc-PowerEdge-R610
   10.2.32.48  hbase-master hadoop-namenode
   10.240.13.35 hbase-regionserver1  hadoop-datanode1
The above files are identical on both of the machines. The following
 are
   the processes that are running on my m/c's when I ran start scripts in
   hadoop as well as hbase
   *hadoop-namenode:*
   HQuorumPeer
   HMaster
   Main
   HRegionServer
   SecondaryNameNode
   Jps
   NameNode
   JobTracker
   *hadoop-datanode1:*
  
   TaskTracker
   Jps
   DataNode
   -- process information unavailable
   Main
   NC
   HRegionServer
  
   I can able to create,list  scan tables on the *hadoop-namenode*
 machine
   using Hbase shell. But while trying to run the same on the  *
   hadoop-datanode1 *machine I couldn't able to do it as I am getting
   following error.
   hbase(main):001:0 list
   TABLE
  
  
   ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
 times
  
   Here is some help for this command:
   List all tables in hbase. Optional regular expression parameter could
   be used to filter the output. Examples:
  
 hbase list
 hbase list 'abc.*'
   How can I list,scan the tables that are created by the
 *hadoop-namenode *
   from the *hadoop-datanode1* machine. Similarly Can I create some tables
   on  *hadoop-datanode1 * can I access them from the *hadoop-namenode *
 
   vice-versa as the data is distributed as this is a cluster.
  
  
  
   --
   Thanks  Regards,
   Venkateswara Rao Dokku,
   Software Engineer,One Convergence Devices Pvt Ltd.,
   Jubille Hills,Hyderabad.
  
  
 
 
  --
  Thanks  Regards,
  Venkateswara Rao Dokku,
  Software Engineer,One Convergence Devices Pvt Ltd.,
  Jubille Hills,Hyderabad.
 




-- 
Thanks  Regards,
Venkateswara Rao Dokku,
Software Engineer,One Convergence Devices Pvt Ltd.,
Jubille Hills,Hyderabad.


Re: Hbase clustering

2012-09-27 Thread Venkateswara Rao Dokku
On Thu, Sep 27, 2012 at 1:09 PM, Venkateswara Rao Dokku dvrao@gmail.com
 wrote:

 I can see that HMaster is not started on the data-node machine when the
 start scripts in hadoop  hbase ran on the hadoop-namenode. My doubt is
 that,Shall we have to start that master on the hadoop-datanode1 too or the
 hadoop-datanode1 will access the Hmaster that is running on the
 hadoop-namenode to create,list,scan tables as the two nodes are in the
 cluster as namenode  datanode.


 On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote:

 Hi,

 I would like to direct you to the reference guide, but I must acknowledge
 that, well, it's a reference guide, hence not really easy for a plain new
 start.
 You should have a look at Lars' blog (and may be buy his book), and
 especially this entry:
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

 Some hints however:
 - the replication occurs at the hdfs level, not the hbase level: hbase
 writes files that are split in hdfs blocks that are replicated accross the
 datanodes. If you want to check the replications, you must look at what
 files are written by hbase and how they have been split in blocks by hdfs
 and how these blocks have been replicated. That will be in the hdfs
 interface. As a side note, it's not the easiest thing to learn when you
 start :-)
 - The error  ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
 Retried 7 times
   this is not linked to replication or whatever. It means that second
 machine cannot find the master. You need to fix this first. (googling 
 checking the logs).


 Good luck,

 Nicolas




 On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku 
 dvrao@gmail.com
  wrote:

  How can we verify that the data(tables) is distributed across the
 cluster??
  Is there a way to confirm it that the data is distributed across all the
  nodes in the cluster.?
 
  On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
  dvrao@gmail.com wrote:
 
   Hi,
   I am completely new to Hbase. I want to cluster the Hbase on two
   nodes.I installed hadoop,hbase on the two nodes  my conf files are as
   given below.
   *cat  conf/regionservers *
   hbase-regionserver1
   hbase-master
   *cat conf/masters *
   hadoop-namenode
   * cat conf/slaves *
   hadoop-datanode1
   *vim conf/hdfs-site.xml *
   ?xml version=1.0?
   ?xml-stylesheet type=text/xsl href=configuration.xsl?
  
   !-- Put site-specific property overrides in this file. --
  
   configuration
   property
   namedfs.replication/name
   value2/value
   descriptionDefault block replication.The actual number of
   replications can be specified when the file is created. The default is
  used
   if replication is not specified in create time.
   /description
   /property
   property
   namedfs.support.append/name
   valuetrue/value
   descriptionDefault block replication.The actual number of
   replications can be specified when the file is created. The default is
  used
   if replication is not specified in create time.
   /description
   /property
   /configuration
   * finally my /etc/hosts file is *
   127.0.0.1   localhost
   127.0.0.1   oc-PowerEdge-R610
   10.2.32.48  hbase-master hadoop-namenode
   10.240.13.35 hbase-regionserver1  hadoop-datanode1
The above files are identical on both of the machines. The following
 are
   the processes that are running on my m/c's when I ran start scripts in
   hadoop as well as hbase
   *hadoop-namenode:*
   HQuorumPeer
   HMaster
   Main
   HRegionServer
   SecondaryNameNode
   Jps
   NameNode
   JobTracker
   *hadoop-datanode1:*
  
   TaskTracker
   Jps
   DataNode
   -- process information unavailable
   Main
   NC
   HRegionServer
  
   I can able to create,list  scan tables on the *hadoop-namenode*
 machine
   using Hbase shell. But while trying to run the same on the  *
   hadoop-datanode1 *machine I couldn't able to do it as I am getting
   following error.
   hbase(main):001:0 list
   TABLE
  
  
   ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
 times
  
   Here is some help for this command:
   List all tables in hbase. Optional regular expression parameter could
   be used to filter the output. Examples:
  
 hbase list
 hbase list 'abc.*'
   How can I list,scan the tables that are created by the
 *hadoop-namenode *
   from the *hadoop-datanode1* machine. Similarly Can I create some
 tables
   on  *hadoop-datanode1 * can I access them from the *hadoop-namenode
 * 
   vice-versa as the data is distributed as this is a cluster.
  
  
  
   --
   Thanks  Regards,
   Venkateswara Rao Dokku,
   Software Engineer,One Convergence Devices Pvt Ltd.,
   Jubille Hills,Hyderabad.
  
  
 
 
  --
  Thanks  Regards,
  Venkateswara Rao Dokku,
  Software Engineer,One Convergence Devices Pvt Ltd.,
  Jubille Hills,Hyderabad.
 




 --
 Thanks  Regards,
 Venkateswara Rao Dokku,
 Software Engineer,One 

Re: Hbase clustering

2012-09-27 Thread n keywal
You should launch the master only once, on whatever machine you like. Then
you will be able to access it from any other machine.
Please have a look at the blog I mentioned in my previous mail.

On Thu, Sep 27, 2012 at 9:39 AM, Venkateswara Rao Dokku dvrao@gmail.com
 wrote:

 I can see that HMaster is not started on the data-node machine when the
 start scripts in hadoop  hbase ran on the hadoop-namenode. My doubt is
 that,Shall we have to start that master on the hadoop-datanode1 too or the
 hadoop-datanode1 will access the Hmaster that is running on the
 hadoop-namenode to create,list,scan tables as the two nodes are in the
 cluster as namenode  datanode.

 On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote:

  Hi,
 
  I would like to direct you to the reference guide, but I must acknowledge
  that, well, it's a reference guide, hence not really easy for a plain new
  start.
  You should have a look at Lars' blog (and may be buy his book), and
  especially this entry:
  http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
 
  Some hints however:
  - the replication occurs at the hdfs level, not the hbase level: hbase
  writes files that are split in hdfs blocks that are replicated accross
 the
  datanodes. If you want to check the replications, you must look at what
  files are written by hbase and how they have been split in blocks by hdfs
  and how these blocks have been replicated. That will be in the hdfs
  interface. As a side note, it's not the easiest thing to learn when you
  start :-)
  - The error  ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
  Retried 7 times
this is not linked to replication or whatever. It means that second
  machine cannot find the master. You need to fix this first. (googling 
  checking the logs).
 
 
  Good luck,
 
  Nicolas
 
 
 
 
  On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku 
  dvrao@gmail.com
   wrote:
 
   How can we verify that the data(tables) is distributed across the
  cluster??
   Is there a way to confirm it that the data is distributed across all
 the
   nodes in the cluster.?
  
   On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
   dvrao@gmail.com wrote:
  
Hi,
I am completely new to Hbase. I want to cluster the Hbase on two
nodes.I installed hadoop,hbase on the two nodes  my conf files are
 as
given below.
*cat  conf/regionservers *
hbase-regionserver1
hbase-master
*cat conf/masters *
hadoop-namenode
* cat conf/slaves *
hadoop-datanode1
*vim conf/hdfs-site.xml *
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
   
!-- Put site-specific property overrides in this file. --
   
configuration
property
namedfs.replication/name
value2/value
descriptionDefault block replication.The actual number of
replications can be specified when the file is created. The default
 is
   used
if replication is not specified in create time.
/description
/property
property
namedfs.support.append/name
valuetrue/value
descriptionDefault block replication.The actual number of
replications can be specified when the file is created. The default
 is
   used
if replication is not specified in create time.
/description
/property
/configuration
* finally my /etc/hosts file is *
127.0.0.1   localhost
127.0.0.1   oc-PowerEdge-R610
10.2.32.48  hbase-master hadoop-namenode
10.240.13.35 hbase-regionserver1  hadoop-datanode1
 The above files are identical on both of the machines. The following
  are
the processes that are running on my m/c's when I ran start scripts
 in
hadoop as well as hbase
*hadoop-namenode:*
HQuorumPeer
HMaster
Main
HRegionServer
SecondaryNameNode
Jps
NameNode
JobTracker
*hadoop-datanode1:*
   
TaskTracker
Jps
DataNode
-- process information unavailable
Main
NC
HRegionServer
   
I can able to create,list  scan tables on the *hadoop-namenode*
  machine
using Hbase shell. But while trying to run the same on the  *
hadoop-datanode1 *machine I couldn't able to do it as I am getting
following error.
hbase(main):001:0 list
TABLE
   
   
ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
  times
   
Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
   
  hbase list
  hbase list 'abc.*'
How can I list,scan the tables that are created by the
  *hadoop-namenode *
from the *hadoop-datanode1* machine. Similarly Can I create some
 tables
on  *hadoop-datanode1 * can I access them from the *hadoop-namenode
 *
  
vice-versa as the data is distributed as this is a cluster.
   
   
   
--
Thanks  Regards,
 

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-27 Thread Damien Hardy
Hello,

Corollary, what is the better way to migrate data from a 0.90 cluster to a
0.92 cluser ?

Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92

All the data must tansit on a single host where compute the 2 clients.

It may be paralalize with mutiple version working with different range
scanner maybe but not so easy.

Is there a copytable version that should read on 0.90 to write on 0.92 with
mapreduce version ?

maybe there is some sort of namespace available for Java Classes that we
may use 2 version of a same package and go for a mapreduce ?

Cheers,

-- 
Damien

2012/9/25 Jean-Daniel Cryans jdcry...@apache.org

 It's not compatible. Like the guide says[1]:

 replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you
 clear out all 0.90.x instances) and restart (You cannot do a rolling
 restart from 0.90.x to 0.92.x -- you must restart)

 This includes the client.

 J-D

 1. http://hbase.apache.org/book.html#upgrade0.92

 On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
 saurabh.agar...@citi.com wrote:
  Hi,
 
  We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked
 fine in hbase 0.90.4.
 
  Our new setup has HBase 0.92 server and hbase 0.90.4 client. And throw
 following exception when client would like to connect to server.
 
  Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me know,
 
  Thanks,
  Saurabh.
 
 
  12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment
 complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181,
 sessionid = 0x139f61977650034, negotiated timeout = 6
 
  java.lang.IllegalArgumentException: Not a host:port pair: ?
 
at
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:60)
 
at
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
 
at
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
 
at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
 
at org.apache.hadoop.hbase.client.HTable.init(HTable.java:179)
 
at
 org.apache.hadoop.hbase.HBaseTestingUtility.truncateTable(HBaseTestingUtility.java:609)
 
at
 com.citi.sponge.flume.sink.ELFHbaseSinkTest.testAppend2(ELFHbaseSinkTest.java:221)
 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 
at java.lang.reflect.Method.invoke(Unknown Source)
 
at junit.framework.TestCase.runTest(TestCase.java:168)
 
at junit.framework.TestCase.runBare(TestCase.java:134)
 
at junit.framework.TestResult$1.protect(TestResult.java:110)
 
at junit.framework.TestResult.runProtected(TestResult.java:128)
 
at junit.framework.TestResult.run(TestResult.java:113)
 
at junit.framework.TestCase.run(TestCase.java:124)
 
at junit.framework.TestSuite.runTest(TestSuite.java:232)
 
at junit.framework.TestSuite.run(TestSuite.java:227)
 
at
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
 
at
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 
at
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 
at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
 
at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
 
at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
 
at
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)



Re: disable table

2012-09-27 Thread Mohammad Tariq
Hello Mohit,

It should be /hbase/hbase/table/SESSIONID_TIMELINE..Apologies for the
typo. For rest of the things, I feel Ramkrishna sir has provided a good and
proper explanation. Please let us know if you still have any doubt or
question.

Ramkrishna.S.Vasudevan : You are welcome sir. It's my pleasure to share
space with you people.

Regards,
Mohammad Tariq



On Thu, Sep 27, 2012 at 9:59 AM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 Hi Mohith
 First of all thanks to Tariq for his replies.

 Just to add on,
 Basically HBase  uses the Zookeeper to know the status of the cluster like
 the no of tables enabled, disabled and deleted.
 Enabled and deleted states are handled bit different in the 0.94 version.

 ZK is used for various region assignments.

 Also the ZK is used to track the Active master and standby master.

 As you understand correctly that the master is responsible for the overall
 maintenance of the no of tables and their respective states, it seeks the
 help of ZK to do it and that is where the states are persisted.

 Also there are few cases where the enable and disable table are having some
 issues due to some race conditions in the 0.92 versions, In the latest
 version we are trying to resolve them.
 You can attach the master and RS logs to identify exactly what caused this
 problem in your case which will be really help ful so that I can be fixed
 in
 the kernel.

 Regards
 Ram

  -Original Message-
  From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
  Sent: Thursday, September 27, 2012 5:09 AM
  To: user@hbase.apache.org
  Subject: Re: disable table
 
  I did /hbase/table/SESSIONID_TIMELINE and that seem to work. I'll
  restart
  hbase and see if it works.
 
  One thing I don't understand is why is zookeeper holding information
  about
  this table if it is enabled or disabled? Wouldn't this information be
  with
  master?
 
  On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia
  mohitanch...@gmail.comwrote:
 
   I don't see path like /hbase/SESSIONID_TIMELINE
   This is what I see
  
   [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table
   [SESSIONID_TIMELINE]
   [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table
  
   cZxid = 0x100fe
   ctime = Mon Sep 10 15:31:45 PDT 2012
   mZxid = 0x100fe
   mtime = Mon Sep 10 15:31:45 PDT 2012
   pZxid = 0x508f1
   cversion = 3
   dataVersion = 0
   aclVersion = 0
   ephemeralOwner = 0x0
   dataLength = 0
   numChildren = 1
  
On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq
  donta...@gmail.comwrote:
  
   In order to delete a znode you have to go to the ZK shell and issue
  the
   delete command along with the required path. For example :
   delete /hbase/SESSIONID_TIMELINE. For detailed info you can visit
  the ZK
   homepage at : zookeeper.apache.org
  
   Actually when we try to fetch data from an Hbase table, the client
  or app
   first contacts the ZK to get the location of server holding the
   -ROOT- table. From this we come to know about the server hosting the
   .META.
   table. This tells us the location of the server which actually holds
  the
   rows of interest. Because of some reasons the znode which was
  holding this
   info has either faced some catastrophe or lost the info associated
  with
   this particular table. Or sometimes the znode remains unable to keep
   itself
   updated with the latest changes. That could also be a probable
  reason. We
   should always keep in mind that ZK is the centralized service that
   actually
   coordinating everything behind the scene. As a result, any problem
  to the
   ZK quorum means problem with Hbase custer.
  
   Regards,
   Mohammad Tariq
  
  
  
   On Thu, Sep 27, 2012 at 3:39 AM, Mohit Anchlia
  mohitanch...@gmail.com
   wrote:
  
Thanks! I do see Inconsistency. How do I remove the znode. And
  also
   could
you please help me understand how this might have happened?
   
   
ERROR: Region
   
  SESSIONID_TIMELINE,,1348689726526.0e200aace5e81cead8d8714ed8076050. not
deployed on any region server.
   
   
On Wed, Sep 26, 2012 at 2:36 PM, Mohammad Tariq
  donta...@gmail.com
wrote:
   
 A possible reason could be that the znode associated with this
   particular
 table is not behaving properly. In such case, you can try the
   following:

 Stop Hbase
 Stop ZK
 Take a backup of ZK data
 Restart ZK
 Remove the znode
 Start Hbase again

 After this hopefully your table would be enabled.

 Regards,
 Mohammad Tariq



 On Thu, Sep 27, 2012 at 2:59 AM, Mohammad Tariq
  donta...@gmail.com
 wrote:

  Yes. Also have a look at the logs of the problematic region if
  hbck
shows
  any inconsistency.
 
  Regards,
  Mohammad Tariq
 
 
 
  On Thu, Sep 27, 2012 at 2:55 AM, Mohit Anchlia 
   mohitanch...@gmail.com
 wrote:
 
  Which node should I look at for logs? Is this the 

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-27 Thread n keywal
You don't have to migrate the data when you upgrade, it's done on the fly.
But it seems you want to do something more complex? A kind of realtime
replication between two clusters in two different versions?

On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com wrote:

 Hello,

 Corollary, what is the better way to migrate data from a 0.90 cluster to a
 0.92 cluser ?

 Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92

 All the data must tansit on a single host where compute the 2 clients.

 It may be paralalize with mutiple version working with different range
 scanner maybe but not so easy.

 Is there a copytable version that should read on 0.90 to write on 0.92 with
 mapreduce version ?

 maybe there is some sort of namespace available for Java Classes that we
 may use 2 version of a same package and go for a mapreduce ?

 Cheers,

 --
 Damien

 2012/9/25 Jean-Daniel Cryans jdcry...@apache.org

  It's not compatible. Like the guide says[1]:
 
  replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you
  clear out all 0.90.x instances) and restart (You cannot do a rolling
  restart from 0.90.x to 0.92.x -- you must restart)
 
  This includes the client.
 
  J-D
 
  1. http://hbase.apache.org/book.html#upgrade0.92
 
  On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
  saurabh.agar...@citi.com wrote:
   Hi,
  
   We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked
  fine in hbase 0.90.4.
  
   Our new setup has HBase 0.92 server and hbase 0.90.4 client. And throw
  following exception when client would like to connect to server.
  
   Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me
 know,
  
   Thanks,
   Saurabh.
  
  
   12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment
  complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181,
  sessionid = 0x139f61977650034, negotiated timeout = 6
  
   java.lang.IllegalArgumentException: Not a host:port pair: ?
  
 at
  org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:60)
  
 at
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
  
 at
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
  
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:179)
  
 at
 
 org.apache.hadoop.hbase.HBaseTestingUtility.truncateTable(HBaseTestingUtility.java:609)
  
 at
 
 com.citi.sponge.flume.sink.ELFHbaseSinkTest.testAppend2(ELFHbaseSinkTest.java:221)
  
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
  
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
 Source)
  
 at java.lang.reflect.Method.invoke(Unknown Source)
  
 at junit.framework.TestCase.runTest(TestCase.java:168)
  
 at junit.framework.TestCase.runBare(TestCase.java:134)
  
 at junit.framework.TestResult$1.protect(TestResult.java:110)
  
 at junit.framework.TestResult.runProtected(TestResult.java:128)
  
 at junit.framework.TestResult.run(TestResult.java:113)
  
 at junit.framework.TestCase.run(TestCase.java:124)
  
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
  
 at junit.framework.TestSuite.run(TestSuite.java:227)
  
 at
 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
  
 at
 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
  
 at
 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
  
 at
 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
  
 at
 
 

Re: Distribution of regions to servers

2012-09-27 Thread Eugeny Morozov
Dan, see inlined.

On Thu, Sep 27, 2012 at 5:30 AM, Dan Han dannahan2...@gmail.com wrote:

 Hi, Eugeny ,

Thanks for your response. I answered your questions inline in Blue.
 And I'd like to give an example to describe my problem.

 Let's think about two data schemas for the same dataset.
 The two data schemas have different composite row keys.


Just the first idea. If you have different schemas, then it would be much
simpler to have two different tables with these schemas. Because in this
case HBase itself automatically distribute each of the tables' regions
evenly across the cluster. You could actually use the same coprocessor for
both of the tables.

In case you're using two different column families, you could specify
different BLOCKSIZE  (default value is '65536''). You could set this option
different in 10 times for CFs (as the difference in between your schemas).
I believe this would decrease number of readings for larger data chunks.

In general it is actually not good to have two (or more) really different
in size column families, because they have compaction and flushing based on
region, which means that if  HBase start compacting small column family it
will do the same for big one.
http://hbase.apache.org/book.html#number.of.cfs

BTW, I don't think that coprocessors are good choice to have data mining.
The reason is that it is kind of dangerous. Since coprocessor are server
side creatures - they live in Region Server - they simply could get the
whole system down. Expensive analysis creates heap and CPU pressure, which
in turn lead to GC pauses and even more CPU pressure.

Consider to use PIG and HBaseStorage to load data from HBase.

But there is
 a same part in both schemas, which represents a sequence ID.
 In 1st schema, one row contains 1KB information;
 while in 2nd schema, one row contains 10KB information.
 So the number of rows in one region in 1st schema is more than
 that in 2nd schema, right? If the queried data is based on the sequence ID,
 as one region in 1st schema is responsible for more number of rows than
 that in 2nd schema,
 there would be more computation and long execution time for the
 corresponding coprocessor.
 So in this case, if the regions are not distributed well,
 some region servers will suffer in excess workload.
 That is why I want to do some management of regions to get better load
 balance based on large queries.

 Hope it makes sense to you.

 Best Wishes
 Dan Han


 On Wed, Sep 26, 2012 at 3:19 PM, Eugeny Morozov
 emoro...@griddynamics.comwrote:

  Dan,
 
  I have additional questions.
  What is the access pattern of your queries? I mean that f.e.
 PrefixFilters
  have to be applied for all KeyValue pairs in HFiles, which could be slow.
  Or f.e. scanner setCaching option is able to decrease number of network
  hops to get data from RegionServer.
 

 I set the range of the rows and the related columns to narrow down the
 scan scope,
 and I used PrefixFilter/ColumnFilter/BinaryFilter to get the rows.
 I set a little cache (5KB), but I kept it the same for all evaluated
 data schema.
 Because I mainly focus on evaluate the performance of queries under the
 different data schemas.


  Additionally, coprocessors are able to use InternalScanner instead of
  ResultScanner, which is also could help greatly.
 

 yes, I used InternalScanner.

 
  Also, the more dimension you specify, the more precise your query is, the
  less data is about to be processed - family, columns, timeranges, etc.
 
 
  On Wed, Sep 26, 2012 at 7:39 PM, Dan Han dannahan2...@gmail.com wrote:
 
 Thanks for your swift response, Ramkrishna and Anoop. And I will
   explicate what we are doing now below.
  
  We are trying to explore a systematic way to design the appropriate
  data
   schema for various applications in HBase. So we first designed several
  data
   schemas for each dataset and evaluate them with the same queries.  The
   queries are designed based on the requirements, such as selecting the
  data
   with a matching expression, finding the difference between two
   snapshots. The queries were processed with user-level Coprocessor.
  
  In our experiments, we found that under some data schemas, the
 queries
   cannot get any results because of the connection timeout and RS crash
   sometimes. We observed that in this case, the queried data were
 centered
  in
   a few regions locating in a few region servers. We think the failure
  might
   be caused by the excess workload in these few region servers and the
   inappropriate load balance. To our best knowledge, this case can be
  avoided
   and improved by the well-distributed regions across the region servers.
  
 Therefore, we have been thinking to add a monitoring and management
   component between the client and server, which can schedule the
   queries/jobs from client side and distribute the regions dynamically
   according to the current workload of each region server, the incoming
   

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-27 Thread Damien Hardy
Actually, I have an old cluster on on prod with 0.90.3 version installed
manually and I am working on a CDH4 new cluster deployed full automatic
with puppet.
While migration is not reversible (according to the pointer given by
Jean-Daniel) I would like to keep he old cluster safe by side to be able to
revert operation
Switching from an old vanilla version to a Cloudera one is an other risk
introduced in migrating the actual cluster and I'm not feeling confortable
with.
My idea is to copy data from old to new and switch clients the new cluster
and I am lookin for the best strategy to manage it.

A scanner based on timestamp should be enougth to get the last updates
after switching (But trying to keep it short).

Cheers,

-- 
Damien

2012/9/27 n keywal nkey...@gmail.com

 You don't have to migrate the data when you upgrade, it's done on the fly.
 But it seems you want to do something more complex? A kind of realtime
 replication between two clusters in two different versions?

 On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com
 wrote:

  Hello,
 
  Corollary, what is the better way to migrate data from a 0.90 cluster to
 a
  0.92 cluser ?
 
  Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92
 
  All the data must tansit on a single host where compute the 2 clients.
 
  It may be paralalize with mutiple version working with different range
  scanner maybe but not so easy.
 
  Is there a copytable version that should read on 0.90 to write on 0.92
 with
  mapreduce version ?
 
  maybe there is some sort of namespace available for Java Classes that we
  may use 2 version of a same package and go for a mapreduce ?
 
  Cheers,
 
  --
  Damien
 
  2012/9/25 Jean-Daniel Cryans jdcry...@apache.org
 
   It's not compatible. Like the guide says[1]:
  
   replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you
   clear out all 0.90.x instances) and restart (You cannot do a rolling
   restart from 0.90.x to 0.92.x -- you must restart)
  
   This includes the client.
  
   J-D
  
   1. http://hbase.apache.org/book.html#upgrade0.92
  
   On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
   saurabh.agar...@citi.com wrote:
Hi,
   
We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked
   fine in hbase 0.90.4.
   
Our new setup has HBase 0.92 server and hbase 0.90.4 client. And
 throw
   following exception when client would like to connect to server.
   
Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me
  know,
   
Thanks,
Saurabh.
   
   
12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment
   complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181,
   sessionid = 0x139f61977650034, negotiated timeout = 6
   
java.lang.IllegalArgumentException: Not a host:port pair: ?
   
  at
   org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:60)
   
  at
  
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
   
  at
  
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
   
  at
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
   
  at
 org.apache.hadoop.hbase.client.HTable.init(HTable.java:179)
   
  at
  
 
 org.apache.hadoop.hbase.HBaseTestingUtility.truncateTable(HBaseTestingUtility.java:609)
   
  at
  
 
 com.citi.sponge.flume.sink.ELFHbaseSinkTest.testAppend2(ELFHbaseSinkTest.java:221)
   
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   
  at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
   
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
  Source)
   
  at java.lang.reflect.Method.invoke(Unknown Source)
   
  at junit.framework.TestCase.runTest(TestCase.java:168)
   
  at 

Re: Random Read Performance

2012-09-27 Thread Zhimao Guo
Anyone had a rough measurement of random read/write perf and throughput?

Assume a typical machines/workload, the region server has 5GB for memtable,
further assume each key (20 bytes) has 100 bytes value (for simplicity,
just one cf, one column).
Further assume workload is against a single region on this region server.
What is the latency/throughput of write-only workload? (assume all are
update ops, no insert ops, and updates on different keys are uniformly
distributed)
What is the latency/throughput of read-only workload? (assume lookup on
different keys are uniformly distributed)

Really want to get more sense of perf number of hbase for key/value
scenario.

Sorry, further assume clients are multi-processed, multi-threaded, which
can generate large enough concurrent requests before the network is
saturated.

best,
-zhimao

On Wed, Sep 26, 2012 at 9:14 PM, Kevin O'dell kevin.od...@cloudera.comwrote:

 -scm-users scm-us...@cloudera.org
 +hbase-u...@hadoop.apache.org

 I think YCSB can handle that, but I am not sure about the 100% random part.

 On Wed, Sep 26, 2012 at 4:25 AM, Dalia Hassan daliahass...@gmail.com
 wrote:

  Hello,
 
  Could anyone help me how to measure Hbase random read performance on a
  cluster
 
  Please reply asap
 
  Thanks,
 



 --
 Kevin O'Dell
 Customer Operations Engineer, Cloudera



RE: When I create one new table, there is .oldlogs dir in region dir of the table

2012-09-27 Thread Ramkrishna.S.Vasudevan
Hi 

That is not needed, infact it has been fixed in the latest trunk version as
part of HBASE-6327.

We can back port the issue I feel. Thanks for bringing this into notice.

Regards
Ram

 -Original Message-
 From: jlei liu [mailto:liulei...@gmail.com]
 Sent: Thursday, September 27, 2012 2:39 PM
 To: user@hbase.apache.org
 Subject: When I create one new table, there is .oldlogs dir in region
 dir of the table
 
 I use hbase0.94.1 version.
 
 When I create one new table,   there is .oldlogs dir in region dir of
 the
 table, example:
 drwxr-xr-x   - musa.ll supergroup  0 2012-09-27 16:41
 /hbase0.94.1/scantable1/0a0ff04f7b3db8b898a1312cc672ab24/.oldlogs
 -rw-r--r--   3 musa.ll supergroup134 2012-09-27 16:41
 /hbase0.94.1/scantable1/0a0ff04f7b3db8b898a1312cc672ab24/.oldlogs/hlog.
 1348735282237
 
  HLogs should be stored in /hbase.rootdir +/host,port,startcode/
 host%2C port%2Cstartcode.timestamp dir, and old HLogs should
 be
 stored in /hbase.rootdir/.oldlogs dir.  Why does the region dir also
 contain .oldlogs dir?
 
 Thanks,
 
 LiuLei



Re: Hbase clustering

2012-09-27 Thread Venkateswara Rao Dokku
I started the Hmaster on the hadoop-namenode. But I was not able to access
it from the hadoop-datanode. Could you please help me solving this problem
by sharing what are the possibilities for this to happen.

On Thu, Sep 27, 2012 at 1:21 PM, n keywal nkey...@gmail.com wrote:

 You should launch the master only once, on whatever machine you like. Then
 you will be able to access it from any other machine.
 Please have a look at the blog I mentioned in my previous mail.

 On Thu, Sep 27, 2012 at 9:39 AM, Venkateswara Rao Dokku 
 dvrao@gmail.com
  wrote:

  I can see that HMaster is not started on the data-node machine when the
  start scripts in hadoop  hbase ran on the hadoop-namenode. My doubt is
  that,Shall we have to start that master on the hadoop-datanode1 too or
 the
  hadoop-datanode1 will access the Hmaster that is running on the
  hadoop-namenode to create,list,scan tables as the two nodes are in the
  cluster as namenode  datanode.
 
  On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote:
 
   Hi,
  
   I would like to direct you to the reference guide, but I must
 acknowledge
   that, well, it's a reference guide, hence not really easy for a plain
 new
   start.
   You should have a look at Lars' blog (and may be buy his book), and
   especially this entry:
   http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  
   Some hints however:
   - the replication occurs at the hdfs level, not the hbase level: hbase
   writes files that are split in hdfs blocks that are replicated accross
  the
   datanodes. If you want to check the replications, you must look at what
   files are written by hbase and how they have been split in blocks by
 hdfs
   and how these blocks have been replicated. That will be in the hdfs
   interface. As a side note, it's not the easiest thing to learn when you
   start :-)
   - The error  ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
   Retried 7 times
 this is not linked to replication or whatever. It means that second
   machine cannot find the master. You need to fix this first. (googling 
   checking the logs).
  
  
   Good luck,
  
   Nicolas
  
  
  
  
   On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku 
   dvrao@gmail.com
wrote:
  
How can we verify that the data(tables) is distributed across the
   cluster??
Is there a way to confirm it that the data is distributed across all
  the
nodes in the cluster.?
   
On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
dvrao@gmail.com wrote:
   
 Hi,
 I am completely new to Hbase. I want to cluster the Hbase on
 two
 nodes.I installed hadoop,hbase on the two nodes  my conf files are
  as
 given below.
 *cat  conf/regionservers *
 hbase-regionserver1
 hbase-master
 *cat conf/masters *
 hadoop-namenode
 * cat conf/slaves *
 hadoop-datanode1
 *vim conf/hdfs-site.xml *
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
 property
 namedfs.replication/name
 value2/value
 descriptionDefault block replication.The actual number of
 replications can be specified when the file is created. The default
  is
used
 if replication is not specified in create time.
 /description
 /property
 property
 namedfs.support.append/name
 valuetrue/value
 descriptionDefault block replication.The actual number of
 replications can be specified when the file is created. The default
  is
used
 if replication is not specified in create time.
 /description
 /property
 /configuration
 * finally my /etc/hosts file is *
 127.0.0.1   localhost
 127.0.0.1   oc-PowerEdge-R610
 10.2.32.48  hbase-master hadoop-namenode
 10.240.13.35 hbase-regionserver1  hadoop-datanode1
  The above files are identical on both of the machines. The
 following
   are
 the processes that are running on my m/c's when I ran start scripts
  in
 hadoop as well as hbase
 *hadoop-namenode:*
 HQuorumPeer
 HMaster
 Main
 HRegionServer
 SecondaryNameNode
 Jps
 NameNode
 JobTracker
 *hadoop-datanode1:*

 TaskTracker
 Jps
 DataNode
 -- process information unavailable
 Main
 NC
 HRegionServer

 I can able to create,list  scan tables on the *hadoop-namenode*
   machine
 using Hbase shell. But while trying to run the same on the  *
 hadoop-datanode1 *machine I couldn't able to do it as I am getting
 following error.
 hbase(main):001:0 list
 TABLE


 ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
   times

 Here is some help for this command:
 List all tables in hbase. Optional regular expression parameter
 could
  

Re: Hbase clustering

2012-09-27 Thread Stas Maksimov
Rao,

Can you make sure your region server is actually running? You can use jps
command to see Java processes, or a ps ax |grep region.

Thanks,
Stas

On Thu, Sep 27, 2012 at 12:25 PM, Venkateswara Rao Dokku 
dvrao@gmail.com wrote:

 When I try to scan the table that is created by hadoop-namenode in the
 hadoop-datanode, I am getting the following error
 12/09/27 16:47:55 INFO ipc.HBaseRPC: Problem connecting to server:
 localhost/127.0.0.1:60020

 Could you please help me out in overcoming this problem.
 Thanks for replying.

 On Thu, Sep 27, 2012 at 4:02 PM, Venkateswara Rao Dokku 
 dvrao@gmail.com
  wrote:

  I started the Hmaster on the hadoop-namenode. But I was not able to
 access
  it from the hadoop-datanode. Could you please help me solving this
 problem
  by sharing what are the possibilities for this to happen.
 
 
  On Thu, Sep 27, 2012 at 1:21 PM, n keywal nkey...@gmail.com wrote:
 
  You should launch the master only once, on whatever machine you like.
 Then
  you will be able to access it from any other machine.
  Please have a look at the blog I mentioned in my previous mail.
 
  On Thu, Sep 27, 2012 at 9:39 AM, Venkateswara Rao Dokku 
  dvrao@gmail.com
   wrote:
 
   I can see that HMaster is not started on the data-node machine when
 the
   start scripts in hadoop  hbase ran on the hadoop-namenode. My doubt
 is
   that,Shall we have to start that master on the hadoop-datanode1 too or
  the
   hadoop-datanode1 will access the Hmaster that is running on the
   hadoop-namenode to create,list,scan tables as the two nodes are in the
   cluster as namenode  datanode.
  
   On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote:
  
Hi,
   
I would like to direct you to the reference guide, but I must
  acknowledge
that, well, it's a reference guide, hence not really easy for a
 plain
  new
start.
You should have a look at Lars' blog (and may be buy his book), and
especially this entry:
   
 http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
   
Some hints however:
- the replication occurs at the hdfs level, not the hbase level:
 hbase
writes files that are split in hdfs blocks that are replicated
 accross
   the
datanodes. If you want to check the replications, you must look at
  what
files are written by hbase and how they have been split in blocks by
  hdfs
and how these blocks have been replicated. That will be in the hdfs
interface. As a side note, it's not the easiest thing to learn when
  you
start :-)
- The error  ERROR:
  org.apache.hadoop.hbase.MasterNotRunningException:
Retried 7 times
  this is not linked to replication or whatever. It means that
 second
machine cannot find the master. You need to fix this first.
 (googling
  
checking the logs).
   
   
Good luck,
   
Nicolas
   
   
   
   
On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku 
dvrao@gmail.com
 wrote:
   
 How can we verify that the data(tables) is distributed across the
cluster??
 Is there a way to confirm it that the data is distributed across
 all
   the
 nodes in the cluster.?

 On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
 dvrao@gmail.com wrote:

  Hi,
  I am completely new to Hbase. I want to cluster the Hbase on
  two
  nodes.I installed hadoop,hbase on the two nodes  my conf files
  are
   as
  given below.
  *cat  conf/regionservers *
  hbase-regionserver1
  hbase-master
  *cat conf/masters *
  hadoop-namenode
  * cat conf/slaves *
  hadoop-datanode1
  *vim conf/hdfs-site.xml *
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
  property
  namedfs.replication/name
  value2/value
  descriptionDefault block replication.The actual number
  of
  replications can be specified when the file is created. The
  default
   is
 used
  if replication is not specified in create time.
  /description
  /property
  property
  namedfs.support.append/name
  valuetrue/value
  descriptionDefault block replication.The actual number
  of
  replications can be specified when the file is created. The
  default
   is
 used
  if replication is not specified in create time.
  /description
  /property
  /configuration
  * finally my /etc/hosts file is *
  127.0.0.1   localhost
  127.0.0.1   oc-PowerEdge-R610
  10.2.32.48  hbase-master hadoop-namenode
  10.240.13.35 hbase-regionserver1  hadoop-datanode1
   The above files are identical on both of the machines. The
  following
are
  the processes that are running on my m/c's when I ran start
  scripts
   in
  hadoop as well as hbase
 

Re: Hbase clustering

2012-09-27 Thread Venkateswara Rao Dokku
Yes, I can see the region server running. The output of the jps command is
given below
*Hadoop-namenode:*
* *HQuorumPeer
 Main
 HMaster
 HRegionServer
 SecondaryNameNode
 Jps
 NameNode
 JobTracker
hadoop-datanode1:
TaskTracker
DataNode
 Jps
 Main
NC
HRegionServer

The complete error is given below.
hbase(main):003:0 scan 't1'
ROWCOLUMN+CELL

12/09/27 17:54:42 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 17:55:44 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 17:56:46 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 17:57:48 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 17:58:52 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 17:59:55 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:01:00 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:02:01 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:03:03 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:04:05 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:05:07 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:06:10 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:07:13 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020
12/09/27 18:08:19 INFO ipc.HBaseRPC: Problem connecting to server:
localhost/127.0.0.1:60020

ERROR: java.net.SocketTimeoutException: Call to
localhost/127.0.0.1:60020failed on socket timeout exception:
java.net.SocketTimeoutException: 6
millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/127.0.0.1:33970remote=localhost/
127.0.0.1:60020]

Here is some help for this command:
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS.

If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.

The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.

Some examples:

  hbase scan '.META.'
  hbase scan '.META.', {COLUMNS = 'info:regioninfo'}
  hbase scan 't1', {COLUMNS = ['c1', 'c2'], LIMIT = 10, STARTROW =
'xyz'}
  hbase scan 't1', {COLUMNS = 'c1', TIMERANGE = [1303668804, 1303668904]}
  hbase scan 't1', {FILTER = (PrefixFilter ('row2') AND (QualifierFilter
(=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))}
  hbase scan 't1', {FILTER =
org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}

For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase scan 't1', {COLUMNS = ['c1', 'c2'], CACHE_BLOCKS = false}


On Thu, Sep 27, 2012 at 6:11 PM, Stas Maksimov maksi...@gmail.com wrote:

 Rao,

 Can you make sure your region server is actually running? You can use jps
 command to see Java processes, or a ps ax |grep region.

 Thanks,
 Stas

 On Thu, Sep 27, 2012 at 12:25 PM, Venkateswara Rao Dokku 
 dvrao@gmail.com wrote:

  When I try to scan the table that is created by hadoop-namenode in the
  hadoop-datanode, I am getting the following error
  12/09/27 16:47:55 INFO ipc.HBaseRPC: Problem connecting to server:
  localhost/127.0.0.1:60020
 
  Could you please help me out in overcoming this problem.
  Thanks for replying.
 
  On Thu, Sep 27, 2012 at 4:02 PM, Venkateswara Rao Dokku 
  dvrao@gmail.com
   wrote:
 
   I started the Hmaster on the hadoop-namenode. But I was not able to
  access
   it from the hadoop-datanode. Could you please help me solving this
  problem
   by sharing what are the possibilities for this to happen.
  
  
   On Thu, Sep 27, 2012 at 1:21 PM, n keywal nkey...@gmail.com wrote:
  
   You should launch the master only once, on whatever machine you like.
  Then
   you will be able to access it from any other machine.
   Please have a look at the blog I mentioned in my previous mail.
  
   On Thu, Sep 27, 2012 at 9:39 AM, Venkateswara Rao Dokku 
   dvrao@gmail.com
wrote:
  
I can see that HMaster is not started on the data-node machine when
  the
start scripts in hadoop  hbase ran on the hadoop-namenode. My doubt
  is
that,Shall we have to start that master on the hadoop-datanode1 too
 or
   the
hadoop-datanode1 will access the Hmaster that is running on the
hadoop-namenode to create,list,scan tables as the 

Region server not finding Zookeeper

2012-09-27 Thread Bai Shen
I'm setting up HBase using CDH4.

https://ccp.cloudera.com/display/CDH4DOC/HBase+Installation#HBaseInstallation-DeployingHBaseinaDistributedCluster

I installed Zookeeper on my namenode, which is also my HBase master.
hbase-master now starts and runs.  My understanding from the above guide is
that I only need the one Zookeeper node running on the namenode.  However,
my region servers aren't seeing the Zookeeper server.

What am I missing?

Thanks.


Re: Region server not finding Zookeeper

2012-09-27 Thread Mohammad Tariq
Hello Bai Shen,

It is not a compulsion to run ZK on the same machine where NN is
running. You can run it anywhere and if this is the case you have to
specify the location of you ZK node through the hbase-site.xml file.
Infact, in real world scenarios people create a separate ZK cluster and
avoid running ZK along with NN on the same machine.

A possible reason of your problem could be that your RS is not able to find
the location of your ZK node. Make sure you have done the configuration
properly.

Regards,
Mohammad Tariq



On Thu, Sep 27, 2012 at 7:37 PM, Bai Shen baishen.li...@gmail.com wrote:

 I'm setting up HBase using CDH4.


 https://ccp.cloudera.com/display/CDH4DOC/HBase+Installation#HBaseInstallation-DeployingHBaseinaDistributedCluster

 I installed Zookeeper on my namenode, which is also my HBase master.
 hbase-master now starts and runs.  My understanding from the above guide is
 that I only need the one Zookeeper node running on the namenode.  However,
 my region servers aren't seeing the Zookeeper server.

 What am I missing?

 Thanks.



Re: disable table

2012-09-27 Thread Mohit Anchlia
Thanks everyone for the input, it's helpful. I did remove the znode from
/hbase/table/SESSIONID_TIMELINE and after that I was able to list the
table. At that point I tried to do a put but when I did a put I got a
message NoRegionServer online. I looked in the logs and it says the Failed
to open region server at nodexxx. When I went to nodexxx it complains
something about unable to run testcompression.

I setup SNAPPY compression on my table and I also ran SNAPPY compression
test which was successful. Not sure what's going on in the cluster.
On Thu, Sep 27, 2012 at 1:10 AM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Mohit,

 It should be /hbase/hbase/table/SESSIONID_TIMELINE..Apologies for the
 typo. For rest of the things, I feel Ramkrishna sir has provided a good and
 proper explanation. Please let us know if you still have any doubt or
 question.

 Ramkrishna.S.Vasudevan : You are welcome sir. It's my pleasure to share
 space with you people.

 Regards,
 Mohammad Tariq



 On Thu, Sep 27, 2012 at 9:59 AM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:

  Hi Mohith
  First of all thanks to Tariq for his replies.
 
  Just to add on,
  Basically HBase  uses the Zookeeper to know the status of the cluster
 like
  the no of tables enabled, disabled and deleted.
  Enabled and deleted states are handled bit different in the 0.94 version.
 
  ZK is used for various region assignments.
 
  Also the ZK is used to track the Active master and standby master.
 
  As you understand correctly that the master is responsible for the
 overall
  maintenance of the no of tables and their respective states, it seeks the
  help of ZK to do it and that is where the states are persisted.
 
  Also there are few cases where the enable and disable table are having
 some
  issues due to some race conditions in the 0.92 versions, In the latest
  version we are trying to resolve them.
  You can attach the master and RS logs to identify exactly what caused
 this
  problem in your case which will be really help ful so that I can be fixed
  in
  the kernel.
 
  Regards
  Ram
 
   -Original Message-
   From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
   Sent: Thursday, September 27, 2012 5:09 AM
   To: user@hbase.apache.org
   Subject: Re: disable table
  
I did /hbase/table/SESSIONID_TIMELINE and that seem to work. I'll
   restart
   hbase and see if it works.
  
   One thing I don't understand is why is zookeeper holding information
   about
   this table if it is enabled or disabled? Wouldn't this information be
   with
   master?
  
   On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia
   mohitanch...@gmail.comwrote:
  
I don't see path like /hbase/SESSIONID_TIMELINE
This is what I see
   
[zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table
[SESSIONID_TIMELINE]
[zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table
   
cZxid = 0x100fe
ctime = Mon Sep 10 15:31:45 PDT 2012
mZxid = 0x100fe
mtime = Mon Sep 10 15:31:45 PDT 2012
pZxid = 0x508f1
cversion = 3
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 1
   
 On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq
   donta...@gmail.comwrote:
   
In order to delete a znode you have to go to the ZK shell and issue
   the
delete command along with the required path. For example :
delete /hbase/SESSIONID_TIMELINE. For detailed info you can visit
   the ZK
homepage at : zookeeper.apache.org
   
Actually when we try to fetch data from an Hbase table, the client
   or app
first contacts the ZK to get the location of server holding the
-ROOT- table. From this we come to know about the server hosting the
.META.
table. This tells us the location of the server which actually holds
   the
rows of interest. Because of some reasons the znode which was
   holding this
info has either faced some catastrophe or lost the info associated
   with
this particular table. Or sometimes the znode remains unable to keep
itself
updated with the latest changes. That could also be a probable
   reason. We
should always keep in mind that ZK is the centralized service that
actually
coordinating everything behind the scene. As a result, any problem
   to the
ZK quorum means problem with Hbase custer.
   
Regards,
Mohammad Tariq
   
   
   
On Thu, Sep 27, 2012 at 3:39 AM, Mohit Anchlia
   mohitanch...@gmail.com
wrote:
   
 Thanks! I do see Inconsistency. How do I remove the znode. And
   also
could
 you please help me understand how this might have happened?


 ERROR: Region

   SESSIONID_TIMELINE,,1348689726526.0e200aace5e81cead8d8714ed8076050. not
 deployed on any region server.


 On Wed, Sep 26, 2012 at 2:36 PM, Mohammad Tariq
   donta...@gmail.com
 wrote:

  A possible reason could be that the znode associated with this

Re: disable table

2012-09-27 Thread rajesh babu chintaguntla
Hi Mohit,

We should not delete znode's manually which will cause inconsistencies like
region may be shown as online on master, but it wont be on region server.
That's put is failing in your case. Master restart will bring back your
cluster to normal state(recovery any failures in enable/disable). Even hbck
also wont solve this problem.

FYI,
Presently discussion is going on this issue. You can follow jira associated
with this issue at
https://issues.apache.org/jira/browse/HBASE-6469

 On Thu, Sep 27, 2012 at 8:11 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Thanks everyone for the input, it's helpful. I did remove the znode from
 /hbase/table/SESSIONID_TIMELINE and after that I was able to list the
 table. At that point I tried to do a put but when I did a put I got a
 message NoRegionServer online. I looked in the logs and it says the Failed
 to open region server at nodexxx. When I went to nodexxx it complains
 something about unable to run testcompression.

 I setup SNAPPY compression on my table and I also ran SNAPPY compression
 test which was successful. Not sure what's going on in the cluster.
 On Thu, Sep 27, 2012 at 1:10 AM, Mohammad Tariq donta...@gmail.com
 wrote:

  Hello Mohit,
 
  It should be /hbase/hbase/table/SESSIONID_TIMELINE..Apologies for the
  typo. For rest of the things, I feel Ramkrishna sir has provided a good
 and
  proper explanation. Please let us know if you still have any doubt or
  question.
 
  Ramkrishna.S.Vasudevan : You are welcome sir. It's my pleasure to share
  space with you people.
 
  Regards,
  Mohammad Tariq
 
 
 
  On Thu, Sep 27, 2012 at 9:59 AM, Ramkrishna.S.Vasudevan 
  ramkrishna.vasude...@huawei.com wrote:
 
   Hi Mohith
   First of all thanks to Tariq for his replies.
  
   Just to add on,
   Basically HBase  uses the Zookeeper to know the status of the cluster
  like
   the no of tables enabled, disabled and deleted.
   Enabled and deleted states are handled bit different in the 0.94
 version.
  
   ZK is used for various region assignments.
  
   Also the ZK is used to track the Active master and standby master.
  
   As you understand correctly that the master is responsible for the
  overall
   maintenance of the no of tables and their respective states, it seeks
 the
   help of ZK to do it and that is where the states are persisted.
  
   Also there are few cases where the enable and disable table are having
  some
   issues due to some race conditions in the 0.92 versions, In the latest
   version we are trying to resolve them.
   You can attach the master and RS logs to identify exactly what caused
  this
   problem in your case which will be really help ful so that I can be
 fixed
   in
   the kernel.
  
   Regards
   Ram
  
-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Thursday, September 27, 2012 5:09 AM
To: user@hbase.apache.org
Subject: Re: disable table
   
 I did /hbase/table/SESSIONID_TIMELINE and that seem to work. I'll
restart
hbase and see if it works.
   
One thing I don't understand is why is zookeeper holding information
about
this table if it is enabled or disabled? Wouldn't this information be
with
master?
   
On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia
mohitanch...@gmail.comwrote:
   
 I don't see path like /hbase/SESSIONID_TIMELINE
 This is what I see

 [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table
 [SESSIONID_TIMELINE]
 [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table

 cZxid = 0x100fe
 ctime = Mon Sep 10 15:31:45 PDT 2012
 mZxid = 0x100fe
 mtime = Mon Sep 10 15:31:45 PDT 2012
 pZxid = 0x508f1
 cversion = 3
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x0
 dataLength = 0
 numChildren = 1

  On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq
donta...@gmail.comwrote:

 In order to delete a znode you have to go to the ZK shell and
 issue
the
 delete command along with the required path. For example :
 delete /hbase/SESSIONID_TIMELINE. For detailed info you can visit
the ZK
 homepage at : zookeeper.apache.org

 Actually when we try to fetch data from an Hbase table, the client
or app
 first contacts the ZK to get the location of server holding the
 -ROOT- table. From this we come to know about the server hosting
 the
 .META.
 table. This tells us the location of the server which actually
 holds
the
 rows of interest. Because of some reasons the znode which was
holding this
 info has either faced some catastrophe or lost the info associated
with
 this particular table. Or sometimes the znode remains unable to
 keep
 itself
 updated with the latest changes. That could also be a probable
reason. We
 should always keep in mind that ZK is the centralized service that
 actually
 coordinating everything behind the 

Re: disable table

2012-09-27 Thread Mohit Anchlia
I did restart entire cluster and still that didn't help. Looks like once I
get in this Race condition there is no way to come out of it?

On Thu, Sep 27, 2012 at 8:00 AM, rajesh babu chintaguntla 
chrajeshbab...@gmail.com wrote:

 Hi Mohit,

 We should not delete znode's manually which will cause inconsistencies like
 region may be shown as online on master, but it wont be on region server.
 That's put is failing in your case. Master restart will bring back your
 cluster to normal state(recovery any failures in enable/disable). Even hbck
 also wont solve this problem.

 FYI,
 Presently discussion is going on this issue. You can follow jira associated
 with this issue at
 https://issues.apache.org/jira/browse/HBASE-6469

  On Thu, Sep 27, 2012 at 8:11 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Thanks everyone for the input, it's helpful. I did remove the znode from
  /hbase/table/SESSIONID_TIMELINE and after that I was able to list the
  table. At that point I tried to do a put but when I did a put I got a
  message NoRegionServer online. I looked in the logs and it says the
 Failed
  to open region server at nodexxx. When I went to nodexxx it complains
  something about unable to run testcompression.
 
  I setup SNAPPY compression on my table and I also ran SNAPPY compression
  test which was successful. Not sure what's going on in the cluster.
  On Thu, Sep 27, 2012 at 1:10 AM, Mohammad Tariq donta...@gmail.com
  wrote:
 
   Hello Mohit,
  
   It should be /hbase/hbase/table/SESSIONID_TIMELINE..Apologies for
 the
   typo. For rest of the things, I feel Ramkrishna sir has provided a good
  and
   proper explanation. Please let us know if you still have any doubt or
   question.
  
   Ramkrishna.S.Vasudevan : You are welcome sir. It's my pleasure to share
   space with you people.
  
   Regards,
   Mohammad Tariq
  
  
  
   On Thu, Sep 27, 2012 at 9:59 AM, Ramkrishna.S.Vasudevan 
   ramkrishna.vasude...@huawei.com wrote:
  
Hi Mohith
First of all thanks to Tariq for his replies.
   
Just to add on,
Basically HBase  uses the Zookeeper to know the status of the cluster
   like
the no of tables enabled, disabled and deleted.
Enabled and deleted states are handled bit different in the 0.94
  version.
   
ZK is used for various region assignments.
   
Also the ZK is used to track the Active master and standby master.
   
As you understand correctly that the master is responsible for the
   overall
maintenance of the no of tables and their respective states, it seeks
  the
help of ZK to do it and that is where the states are persisted.
   
Also there are few cases where the enable and disable table are
 having
   some
issues due to some race conditions in the 0.92 versions, In the
 latest
version we are trying to resolve them.
You can attach the master and RS logs to identify exactly what caused
   this
problem in your case which will be really help ful so that I can be
  fixed
in
the kernel.
   
Regards
Ram
   
 -Original Message-
 From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
 Sent: Thursday, September 27, 2012 5:09 AM
 To: user@hbase.apache.org
 Subject: Re: disable table

  I did /hbase/table/SESSIONID_TIMELINE and that seem to work. I'll
 restart
 hbase and see if it works.

 One thing I don't understand is why is zookeeper holding
 information
 about
 this table if it is enabled or disabled? Wouldn't this information
 be
 with
 master?

 On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia
 mohitanch...@gmail.comwrote:

  I don't see path like /hbase/SESSIONID_TIMELINE
  This is what I see
 
  [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table
  [SESSIONID_TIMELINE]
  [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table
 
  cZxid = 0x100fe
  ctime = Mon Sep 10 15:31:45 PDT 2012
  mZxid = 0x100fe
  mtime = Mon Sep 10 15:31:45 PDT 2012
  pZxid = 0x508f1
  cversion = 3
  dataVersion = 0
  aclVersion = 0
  ephemeralOwner = 0x0
  dataLength = 0
  numChildren = 1
 
   On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq
 donta...@gmail.comwrote:
 
  In order to delete a znode you have to go to the ZK shell and
  issue
 the
  delete command along with the required path. For example :
  delete /hbase/SESSIONID_TIMELINE. For detailed info you can
 visit
 the ZK
  homepage at : zookeeper.apache.org
 
  Actually when we try to fetch data from an Hbase table, the
 client
 or app
  first contacts the ZK to get the location of server holding the
  -ROOT- table. From this we come to know about the server hosting
  the
  .META.
  table. This tells us the location of the server which actually
  holds
 the
  rows of interest. Because of some reasons the znode which was
 holding this

Re: Region server not finding Zookeeper

2012-09-27 Thread Bai Shen
What property do I set in hbase-site.xml?  That's what I'm having trouble
finding.

Thanks.

On Thu, Sep 27, 2012 at 10:30 AM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Bai Shen,

 It is not a compulsion to run ZK on the same machine where NN is
 running. You can run it anywhere and if this is the case you have to
 specify the location of you ZK node through the hbase-site.xml file.
 Infact, in real world scenarios people create a separate ZK cluster and
 avoid running ZK along with NN on the same machine.

 A possible reason of your problem could be that your RS is not able to find
 the location of your ZK node. Make sure you have done the configuration
 properly.

 Regards,
 Mohammad Tariq



 On Thu, Sep 27, 2012 at 7:37 PM, Bai Shen baishen.li...@gmail.com wrote:

  I'm setting up HBase using CDH4.
 
 
 
 https://ccp.cloudera.com/display/CDH4DOC/HBase+Installation#HBaseInstallation-DeployingHBaseinaDistributedCluster
 
  I installed Zookeeper on my namenode, which is also my HBase master.
  hbase-master now starts and runs.  My understanding from the above guide
 is
  that I only need the one Zookeeper node running on the namenode.
  However,
  my region servers aren't seeing the Zookeeper server.
 
  What am I missing?
 
  Thanks.
 



Re: Region server not finding Zookeeper

2012-09-27 Thread Bai Shen
NM. Turned out that I had screwed up the property setting.  Everything is
working now.

Thanks.

On Thu, Sep 27, 2012 at 1:28 PM, Bai Shen baishen.li...@gmail.com wrote:

 What property do I set in hbase-site.xml?  That's what I'm having trouble
 finding.

 Thanks.


 On Thu, Sep 27, 2012 at 10:30 AM, Mohammad Tariq donta...@gmail.comwrote:

 Hello Bai Shen,

 It is not a compulsion to run ZK on the same machine where NN is
 running. You can run it anywhere and if this is the case you have to
 specify the location of you ZK node through the hbase-site.xml file.
 Infact, in real world scenarios people create a separate ZK cluster and
 avoid running ZK along with NN on the same machine.

 A possible reason of your problem could be that your RS is not able to
 find
 the location of your ZK node. Make sure you have done the configuration
 properly.

 Regards,
 Mohammad Tariq



 On Thu, Sep 27, 2012 at 7:37 PM, Bai Shen baishen.li...@gmail.com
 wrote:

  I'm setting up HBase using CDH4.
 
 
 
 https://ccp.cloudera.com/display/CDH4DOC/HBase+Installation#HBaseInstallation-DeployingHBaseinaDistributedCluster
 
  I installed Zookeeper on my namenode, which is also my HBase master.
  hbase-master now starts and runs.  My understanding from the above
 guide is
  that I only need the one Zookeeper node running on the namenode.
  However,
  my region servers aren't seeing the Zookeeper server.
 
  What am I missing?
 
  Thanks.
 





Getting scans to timeout

2012-09-27 Thread Espinoza,Carlos
Hi

Thanks for you help. I've been doing this in a pseudo-distributed
hbase-0.92.1 environment with one region server. I'm trying to scan a
table and see it timeout. I'm trying to recreate a scenario where the RS
is not responding (for instance due to NIC failure). So I've been
issuing a 'kill -STOP' to the region server, and I expected the client
to timeout but instead it just blocks at HTable.getScanner(). There is
no output, no retries, nothing. I understand that I'm pausing the
execution on the region server, but from a client perspective, I'm
thinking that this should not matter.

 

My question is, is this a fair test? And if it is, any idea on how I can
get it to not block? I've been playing around with client side settings,
but no success. I've tried these settings (10sec)

 

conf.setInt(hbase.rpc.timeout, 1);

conf.setInt(hbase.client.operation.timeout, 1);

 

I've also tried these

HBaseClient.setSocketTimeout(this.conf, 1);

HBaseClient.setPingInterval(this.conf, 1);

 

This is the jstack output of my application after I STOP the region
server

 

main prio=10 tid=0x5c812000 nid=0x594e in Object.wait()
[0x410c4000]

   java.lang.Thread.State: WAITING (on object monitor)

  at java.lang.Object.wait(Native Method)

  - waiting on 0x2aaae205ee80 (a
org.apache.hadoop.hbase.ipc.HBaseClient$Call)

  at java.lang.Object.wait(Object.java:485)

  at
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:904)

  - locked 0x2aaae205ee80 (a
org.apache.hadoop.hbase.ipc.HBaseClient$Call)

  at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpc
Engine.java:150)

  at $Proxy4.openScanner(Unknown Source)

  at
org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallab
le.java:120)

  at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java
:76)

  at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java
:39)

  at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat
ion.getRegionServerWithRetries(HConnectionManager.java:1325)

  at
org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.j
ava:1246)

  at
org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.ja
va:1169)

  at
org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:670)

  at
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.getScanner(HTable
Pool.java:381)

  at
org.oclc.higgins.hbase.util.HBaseUtils.getHBaseRegions(HBaseUtils.java:9
5)

  at
org.oclc.higgins.hbase.snoop.Snoop.getCatalogRowsGroupedByRegionServer(S
noop.java:392)

  at org.oclc.higgins.hbase.snoop.Snoop.watch(Snoop.java:318)

  at org.oclc.higgins.hbase.snoop.Snoop.main(Snoop.java:278)

  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)

  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)

  at java.lang.reflect.Method.invoke(Method.java:597)

  at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

 



RE: Getting scans to timeout

2012-09-27 Thread Espinoza,Carlos
Including dev mailing list. So I let it run, and after about 43 minutes
I finally got some exceptions (Sorry for the long paste)

 

org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=10, exceptions:

Thu Sep 27 14:59:29 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:42711
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:01:49 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:55858
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:04:09 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:40588
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:06:29 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:51399
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:08:50 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:41546
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:11:11 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:43072
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:13:34 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:43809
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:15:58 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:53426
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:18:25 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:33724
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

Thu Sep 27 15:21:00 EDT 2012,
org.apache.hadoop.hbase.client.ScannerCallable@6c6742d0,
java.net.SocketTimeoutException: Call to
finddev07.dev.oclc.org/192.168.215.7:29319 failed on socket timeout
exception: java.net.SocketTimeoutException: 1 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.215.7:51604
remote=finddev07.dev.oclc.org/192.168.215.7:29319]

 

   at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat
ion.getRegionServerWithRetries(HConnectionManager.java:1345)

   at
org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.j
ava:1246)

   at
org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.ja
va:1169)

   at
org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:670)

   

HBase and Lily?

2012-09-27 Thread Jason Huang
Hello,

I am exploring HBase  Lily and I have a few starter questions hoping
to get some help from users in this group who had tried that before:

(1) Do I need to post all the HBase table contents to Lily (treat Lily
as another DataStore) in order to enable the index and search
functionality? If so, that's going to be another big storage issue
(and duplicate storage?)

(2) Should I always only allow one way update from Clients - HBase -
Lily? I want to use HBase as the data store and use Lily only as a
plug-in tool to help search. I want HBase to only accept updates from
Clients (not from Lily). Is there any update from Lily to HBase
required (in order to enable the search and index functionality)?

(3) Since Lily is not an Apache project - do you know if it's under
Apache 2.0 license? We may need to extend it with our own APIs. Do we
have to give our APIs back to them? We love sharing but some of our
APIs may be under different agreements and can't be shared.

thanks!

Jason


Re: Distribution of regions to servers

2012-09-27 Thread Dan Han
Thanks for your advice, Eugeny.

Best Wishes
Dan Han

On Thu, Sep 27, 2012 at 2:34 AM, Eugeny Morozov
emoro...@griddynamics.comwrote:

 Dan, see inlined.

 On Thu, Sep 27, 2012 at 5:30 AM, Dan Han dannahan2...@gmail.com wrote:

  Hi, Eugeny ,
 
 Thanks for your response. I answered your questions inline in Blue.
  And I'd like to give an example to describe my problem.
 
  Let's think about two data schemas for the same dataset.
  The two data schemas have different composite row keys.


 Just the first idea. If you have different schemas, then it would be much
 simpler to have two different tables with these schemas. Because in this
 case HBase itself automatically distribute each of the tables' regions
 evenly across the cluster. You could actually use the same coprocessor for
 both of the tables.

 In case you're using two different column families, you could specify
 different BLOCKSIZE  (default value is '65536''). You could set this option
 different in 10 times for CFs (as the difference in between your schemas).
 I believe this would decrease number of readings for larger data chunks.

 In general it is actually not good to have two (or more) really different
 in size column families, because they have compaction and flushing based on
 region, which means that if  HBase start compacting small column family it
 will do the same for big one.
 http://hbase.apache.org/book.html#number.of.cfs

 BTW, I don't think that coprocessors are good choice to have data mining.
 The reason is that it is kind of dangerous. Since coprocessor are server
 side creatures - they live in Region Server - they simply could get the
 whole system down. Expensive analysis creates heap and CPU pressure, which
 in turn lead to GC pauses and even more CPU pressure.

 Consider to use PIG and HBaseStorage to load data from HBase.

 But there is
  a same part in both schemas, which represents a sequence ID.
  In 1st schema, one row contains 1KB information;
  while in 2nd schema, one row contains 10KB information.
  So the number of rows in one region in 1st schema is more than
  that in 2nd schema, right? If the queried data is based on the sequence
 ID,
  as one region in 1st schema is responsible for more number of rows than
  that in 2nd schema,
  there would be more computation and long execution time for the
  corresponding coprocessor.
  So in this case, if the regions are not distributed well,
  some region servers will suffer in excess workload.
  That is why I want to do some management of regions to get better load
  balance based on large queries.
 
  Hope it makes sense to you.
 
  Best Wishes
  Dan Han
 
 
  On Wed, Sep 26, 2012 at 3:19 PM, Eugeny Morozov
  emoro...@griddynamics.comwrote:
 
   Dan,
  
   I have additional questions.
   What is the access pattern of your queries? I mean that f.e.
  PrefixFilters
   have to be applied for all KeyValue pairs in HFiles, which could be
 slow.
   Or f.e. scanner setCaching option is able to decrease number of network
   hops to get data from RegionServer.
  
 
  I set the range of the rows and the related columns to narrow down
 the
  scan scope,
  and I used PrefixFilter/ColumnFilter/BinaryFilter to get the rows.
  I set a little cache (5KB), but I kept it the same for all evaluated
  data schema.
  Because I mainly focus on evaluate the performance of queries under
 the
  different data schemas.
 
 
   Additionally, coprocessors are able to use InternalScanner instead of
   ResultScanner, which is also could help greatly.
  
 
  yes, I used InternalScanner.
 
  
   Also, the more dimension you specify, the more precise your query is,
 the
   less data is about to be processed - family, columns, timeranges, etc.
  
  
   On Wed, Sep 26, 2012 at 7:39 PM, Dan Han dannahan2...@gmail.com
 wrote:
  
  Thanks for your swift response, Ramkrishna and Anoop. And I will
explicate what we are doing now below.
   
   We are trying to explore a systematic way to design the
 appropriate
   data
schema for various applications in HBase. So we first designed
 several
   data
schemas for each dataset and evaluate them with the same queries.
  The
queries are designed based on the requirements, such as selecting the
   data
with a matching expression, finding the difference between two
snapshots. The queries were processed with user-level Coprocessor.
   
   In our experiments, we found that under some data schemas, the
  queries
cannot get any results because of the connection timeout and RS crash
sometimes. We observed that in this case, the queried data were
  centered
   in
a few regions locating in a few region servers. We think the failure
   might
be caused by the excess workload in these few region servers and the
inappropriate load balance. To our best knowledge, this case can be
   avoided
and improved by the well-distributed regions across the region
 servers.
   
  Therefore, we have been 

Re: Distribution of regions to servers

2012-09-27 Thread Dan Han
Hi Ramkrishna,

  I think relocating regions is based on the queries and queried data.
The relocation can scatter the regions involved in the query across region
servers
which might enable large queries get better load balance.
For small queries, distribution of regions can also impact the throughput.

To this point, I actually have a question here: can the region
be redundant?
For example, there are two regions which are responsible for the same range
of data?

I don't quite understand this: when you go with coprocessor on
a collocated regions, the caching and
rpc timeout needs to be set accordingly.
Could you please explain it further? Thanks in advance.

Best Wishes
Dan Han


On Wed, Sep 26, 2012 at 10:49 PM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:

 Just trying out here,

 Is it possible for you to collocate the region of the 1st schema and the
 region of the 2nd schema so that overall the total query execution happens
 on single RS and there is not much
 IO.
 Also when you go with coprocessor on a collocated regions, the caching and
 rpc timeout needs to be set accordingly.

 Regards
 Ram
  -Original Message-
  From: Dan Han [mailto:dannahan2...@gmail.com]
  Sent: Thursday, September 27, 2012 7:00 AM
  To: user@hbase.apache.org
  Subject: Re: Distribution of regions to servers
 
  Hi, Eugeny ,
 
 Thanks for your response. I answered your questions inline in Blue.
  And I'd like to give an example to describe my problem.
 
  Let's think about two data schemas for the same dataset.
  The two data schemas have different composite row keys. But there is
  a same part in both schemas, which represents a sequence ID.
  In 1st schema, one row contains 1KB information;
  while in 2nd schema, one row contains 10KB information.
  So the number of rows in one region in 1st schema is more than
  that in 2nd schema, right? If the queried data is based on the sequence
  ID,
  as one region in 1st schema is responsible for more number of rows than
  that in 2nd schema,
  there would be more computation and long execution time for the
  corresponding coprocessor.
  So in this case, if the regions are not distributed well,
  some region servers will suffer in excess workload.
  That is why I want to do some management of regions to get better load
  balance based on large queries.
 
  Hope it makes sense to you.
 
  Best Wishes
  Dan Han
 
 
  On Wed, Sep 26, 2012 at 3:19 PM, Eugeny Morozov
  emoro...@griddynamics.comwrote:
 
   Dan,
  
   I have additional questions.
   What is the access pattern of your queries? I mean that f.e.
  PrefixFilters
   have to be applied for all KeyValue pairs in HFiles, which could be
  slow.
   Or f.e. scanner setCaching option is able to decrease number of
  network
   hops to get data from RegionServer.
  
 
  I set the range of the rows and the related columns to narrow down
  the
  scan scope,
  and I used PrefixFilter/ColumnFilter/BinaryFilter to get the rows.
  I set a little cache (5KB), but I kept it the same for all
  evaluated
  data schema.
  Because I mainly focus on evaluate the performance of queries under
  the
  different data schemas.
 
 
   Additionally, coprocessors are able to use InternalScanner instead of
   ResultScanner, which is also could help greatly.
  
 
  yes, I used InternalScanner.
 
  
   Also, the more dimension you specify, the more precise your query is,
  the
   less data is about to be processed - family, columns, timeranges,
  etc.
  
  
   On Wed, Sep 26, 2012 at 7:39 PM, Dan Han dannahan2...@gmail.com
  wrote:
  
  Thanks for your swift response, Ramkrishna and Anoop. And I will
explicate what we are doing now below.
   
   We are trying to explore a systematic way to design the
  appropriate
   data
schema for various applications in HBase. So we first designed
  several
   data
schemas for each dataset and evaluate them with the same queries.
  The
queries are designed based on the requirements, such as selecting
  the
   data
with a matching expression, finding the difference between two
snapshots. The queries were processed with user-level Coprocessor.
   
   In our experiments, we found that under some data schemas, the
  queries
cannot get any results because of the connection timeout and RS
  crash
sometimes. We observed that in this case, the queried data were
  centered
   in
a few regions locating in a few region servers. We think the
  failure
   might
be caused by the excess workload in these few region servers and
  the
inappropriate load balance. To our best knowledge, this case can be
   avoided
and improved by the well-distributed regions across the region
  servers.
   
  Therefore, we have been thinking to add a monitoring and
  management
component between the client and server, which can schedule the
queries/jobs from client side and distribute the regions
  dynamically
according to the current 

RE: Problem with Hadoop and /etc/hosts file

2012-09-27 Thread Artem Ervits
I confirm, once I removed the localhost entry, HBase started working.

My hosts file now contains only:
x.x.x.1 Machine1
x.x.x..2Machine2
x.x.x.x.3   Machine3
x.x.x.N MachineN



-Original Message-
From: Artem Ervits [mailto:are9...@nyp.org] 
Sent: Friday, September 21, 2012 11:50 AM
To: 'user@hbase.apache.org'
Subject: Re: Problem with Hadoop and /etc/hosts file

I removed the reference to 127.0.0.1 from every node. Hadoop started as 
necessary and I didn't test hbase yet.



Artem Ervits
Data Analyst
New York Presbyterian Hospital

- Original Message -
From: Alberto Cordioli [mailto:cordioli.albe...@gmail.com]
Sent: Friday, September 21, 2012 10:56 AM
To: user@hbase.apache.org user@hbase.apache.org
Subject: Re: Problem with Hadoop and /etc/hosts file

Artem, it's the exact problem I have.
How did you solve it?

Alberto

On 21 September 2012 14:18, Artem Ervits are9...@nyp.org wrote:
 Actually, it is an hbase question. I faced same issue when I was testing 
 recovery with hadoop 1 and I had a semi finished hbase cluster setup. The 
 start up guide for hbase says to add 127.0.0.1 so when I did that on another 
 node and started hadoop using that node as name node, hadoop would only see 
 that node running on localhost and not on the static ip. The datanodes would 
 not see the name node either. It is kind of confusing that hadoop wants it 
 one way and hbase requires it the way hadoop won't work.


 Artem Ervits
 Data Analyst
 New York Presbyterian Hospital

 - Original Message -
 From: Stack [mailto:st...@duboce.net]
 Sent: Tuesday, September 18, 2012 01:08 PM
 To: user@hbase.apache.org user@hbase.apache.org
 Subject: Re: Problem with Hadoop and /etc/hosts file

 On Tue, Sep 18, 2012 at 12:01 AM, Alberto Cordioli 
 cordioli.albe...@gmail.com wrote:
 Sorry, maybe I didn't explain well.
 I don't know hot to set up rDNS. I'd just know if this problem could 
 generate the error I reported in the first post (since I get in any 
 case the correct results).


 No need to apologize.  I'm just suggesting that probably better lists 
 for figuring how to set up your networking.  You could start with 
 this? http://en.wikipedia.org/wiki/Reverse_DNS_lookup

 St.Ack


 

 This electronic message is intended to be for the use only of the named 
 recipient, and may contain information that is confidential or privileged.  
 If you are not the intended recipient, you are hereby notified that any 
 disclosure, copying, distribution or use of the contents of this message is 
 strictly prohibited.  If you have received this message in error or are not 
 the named recipient, please notify us immediately by contacting the sender at 
 the electronic mail address noted above, and delete and destroy all copies of 
 this message.  Thank you.




 

 This electronic message is intended to be for the use only of the named 
 recipient, and may contain information that is confidential or privileged.  
 If you are not the intended recipient, you are hereby notified that any 
 disclosure, copying, distribution or use of the contents of this message is 
 strictly prohibited.  If you have received this message in error or are not 
 the named recipient, please notify us immediately by contacting the sender at 
 the electronic mail address noted above, and delete and destroy all copies of 
 this message.  Thank you.






--
Alberto Cordioli




This electronic message is intended to be for the use only of the named 
recipient, and may contain information that is confidential or privileged.  If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of the contents of this message is 
strictly prohibited.  If you have received this message in error or are not the 
named recipient, please notify us immediately by contacting the sender at the 
electronic mail address noted above, and delete and destroy all copies of this 
message.  Thank you.






This electronic message is intended to be for the use only of the named 
recipient, and may contain information that is confidential or privileged.  If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or use of the contents of this message is 
strictly prohibited.  If you have received this message in error or are not the 
named recipient, please notify us immediately by contacting the sender at the 
electronic mail address noted above, and delete and destroy all copies of this 
message.  Thank you.






Confidential Information subject to NYP's (and its affiliates') information 
management and security policies (http://infonet.nyp.org/QA/HospitalManual).




This electronic message is intended to be for the use only of the named 
recipient, and may contain information that is confidential or 

Re: HBase and Lily?

2012-09-27 Thread Deepak Vohra
Lily is based on HBase. 

--- On Thu, 9/27/12, Jason Huang jason.hu...@icare.com wrote:

From: Jason Huang jason.hu...@icare.com
Subject: HBase and Lily?
To: user@hbase.apache.org
Date: Thursday, September 27, 2012, 1:58 PM

Hello,

I am exploring HBase  Lily and I have a few starter questions hoping
to get some help from users in this group who had tried that before:

(1) Do I need to post all the HBase table contents to Lily (treat Lily
as another DataStore) in order to enable the index and search
functionality? If so, that's going to be another big storage issue
(and duplicate storage?)

(2) Should I always only allow one way update from Clients - HBase -
Lily? I want to use HBase as the data store and use Lily only as a
plug-in tool to help search. I want HBase to only accept updates from
Clients (not from Lily). Is there any update from Lily to HBase
required (in order to enable the search and index functionality)?

(3) Since Lily is not an Apache project - do you know if it's under
Apache 2.0 license? We may need to extend it with our own APIs. Do we
have to give our APIs back to them? We love sharing but some of our
APIs may be under different agreements and can't be shared.

thanks!

Jason


RE: disable table

2012-09-27 Thread Ramkrishna.S.Vasudevan
Hi Mohith

Before restarting again just disable the compression means set it to default
and restart the cluster.  This is just to ensure that the cluster is able to
come back from the enable/disable problem.
  The Snappy problem could be different.  I am suggesting this so that we
can isolate the problem is because of compression.

Regards
Ram

 -Original Message-
 From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
 Sent: Thursday, September 27, 2012 9:55 PM
 To: user@hbase.apache.org
 Subject: Re: disable table
 
 I did restart entire cluster and still that didn't help. Looks like
 once I
 get in this Race condition there is no way to come out of it?
 
 On Thu, Sep 27, 2012 at 8:00 AM, rajesh babu chintaguntla 
 chrajeshbab...@gmail.com wrote:
 
  Hi Mohit,
 
  We should not delete znode's manually which will cause
 inconsistencies like
  region may be shown as online on master, but it wont be on region
 server.
  That's put is failing in your case. Master restart will bring back
 your
  cluster to normal state(recovery any failures in enable/disable).
 Even hbck
  also wont solve this problem.
 
  FYI,
  Presently discussion is going on this issue. You can follow jira
 associated
  with this issue at
  https://issues.apache.org/jira/browse/HBASE-6469
 
   On Thu, Sep 27, 2012 at 8:11 PM, Mohit Anchlia
 mohitanch...@gmail.com
  wrote:
 
   Thanks everyone for the input, it's helpful. I did remove the znode
 from
   /hbase/table/SESSIONID_TIMELINE and after that I was able to list
 the
   table. At that point I tried to do a put but when I did a put I
 got a
   message NoRegionServer online. I looked in the logs and it says the
  Failed
   to open region server at nodexxx. When I went to nodexxx it
 complains
   something about unable to run testcompression.
  
   I setup SNAPPY compression on my table and I also ran SNAPPY
 compression
   test which was successful. Not sure what's going on in the cluster.
   On Thu, Sep 27, 2012 at 1:10 AM, Mohammad Tariq
 donta...@gmail.com
   wrote:
  
Hello Mohit,
   
It should be /hbase/hbase/table/SESSIONID_TIMELINE..Apologies
 for
  the
typo. For rest of the things, I feel Ramkrishna sir has provided
 a good
   and
proper explanation. Please let us know if you still have any
 doubt or
question.
   
Ramkrishna.S.Vasudevan : You are welcome sir. It's my pleasure to
 share
space with you people.
   
Regards,
Mohammad Tariq
   
   
   
On Thu, Sep 27, 2012 at 9:59 AM, Ramkrishna.S.Vasudevan 
ramkrishna.vasude...@huawei.com wrote:
   
 Hi Mohith
 First of all thanks to Tariq for his replies.

 Just to add on,
 Basically HBase  uses the Zookeeper to know the status of the
 cluster
like
 the no of tables enabled, disabled and deleted.
 Enabled and deleted states are handled bit different in the
 0.94
   version.

 ZK is used for various region assignments.

 Also the ZK is used to track the Active master and standby
 master.

 As you understand correctly that the master is responsible for
 the
overall
 maintenance of the no of tables and their respective states, it
 seeks
   the
 help of ZK to do it and that is where the states are persisted.

 Also there are few cases where the enable and disable table are
  having
some
 issues due to some race conditions in the 0.92 versions, In the
  latest
 version we are trying to resolve them.
 You can attach the master and RS logs to identify exactly what
 caused
this
 problem in your case which will be really help ful so that I
 can be
   fixed
 in
 the kernel.

 Regards
 Ram

  -Original Message-
  From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
  Sent: Thursday, September 27, 2012 5:09 AM
  To: user@hbase.apache.org
  Subject: Re: disable table
 
   I did /hbase/table/SESSIONID_TIMELINE and that seem to work.
 I'll
  restart
  hbase and see if it works.
 
  One thing I don't understand is why is zookeeper holding
  information
  about
  this table if it is enabled or disabled? Wouldn't this
 information
  be
  with
  master?
 
  On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia
  mohitanch...@gmail.comwrote:
 
   I don't see path like /hbase/SESSIONID_TIMELINE
   This is what I see
  
   [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table
   [SESSIONID_TIMELINE]
   [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table
  
   cZxid = 0x100fe
   ctime = Mon Sep 10 15:31:45 PDT 2012
   mZxid = 0x100fe
   mtime = Mon Sep 10 15:31:45 PDT 2012
   pZxid = 0x508f1
   cversion = 3
   dataVersion = 0
   aclVersion = 0
   ephemeralOwner = 0x0
   dataLength = 0
   numChildren = 1
  
On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq
  donta...@gmail.comwrote:
  
   In 

RE: Distribution of regions to servers

2012-09-27 Thread Ramkrishna.S.Vasudevan
Hi Dan

Am not very sure whether my answer was infact relevant to your problem.
Any way I can try answering about the 'region being redundant'?
No two regions can be responsible for the same range of data in one table.
That is why if any region is not available that portion of data is not
available to the clients.

when you go with coprocessor on  a collocated regions, the caching and  rpc
timeout needs to be set accordingly.
What I meant here was now every scan will hit two regions and as per your
use case one is going to be dense and other one will return quickly.
May be we may need to see that the overall scan is not timeout.  

Regards
Ram


 -Original Message-
 From: Dan Han [mailto:dannahan2...@gmail.com]
 Sent: Friday, September 28, 2012 3:05 AM
 To: user@hbase.apache.org
 Subject: Re: Distribution of regions to servers
 
 Hi Ramkrishna,
 
   I think relocating regions is based on the queries and queried data.
 The relocation can scatter the regions involved in the query across
 region
 servers
 which might enable large queries get better load balance.
 For small queries, distribution of regions can also impact the
 throughput.
 
 To this point, I actually have a question here: can the region
 be redundant?
 For example, there are two regions which are responsible for the same
 range
 of data?
 
 I don't quite understand this: when you go with coprocessor on
 a collocated regions, the caching and
 rpc timeout needs to be set accordingly.
 Could you please explain it further? Thanks in advance.
 
 Best Wishes
 Dan Han
 
 
 On Wed, Sep 26, 2012 at 10:49 PM, Ramkrishna.S.Vasudevan 
 ramkrishna.vasude...@huawei.com wrote:
 
  Just trying out here,
 
  Is it possible for you to collocate the region of the 1st schema and
 the
  region of the 2nd schema so that overall the total query execution
 happens
  on single RS and there is not much
  IO.
  Also when you go with coprocessor on a collocated regions, the
 caching and
  rpc timeout needs to be set accordingly.
 
  Regards
  Ram
   -Original Message-
   From: Dan Han [mailto:dannahan2...@gmail.com]
   Sent: Thursday, September 27, 2012 7:00 AM
   To: user@hbase.apache.org
   Subject: Re: Distribution of regions to servers
  
   Hi, Eugeny ,
  
  Thanks for your response. I answered your questions inline in
 Blue.
   And I'd like to give an example to describe my problem.
  
   Let's think about two data schemas for the same dataset.
   The two data schemas have different composite row keys. But there
 is
   a same part in both schemas, which represents a sequence ID.
   In 1st schema, one row contains 1KB information;
   while in 2nd schema, one row contains 10KB information.
   So the number of rows in one region in 1st schema is more than
   that in 2nd schema, right? If the queried data is based on the
 sequence
   ID,
   as one region in 1st schema is responsible for more number of rows
 than
   that in 2nd schema,
   there would be more computation and long execution time for the
   corresponding coprocessor.
   So in this case, if the regions are not distributed well,
   some region servers will suffer in excess workload.
   That is why I want to do some management of regions to get better
 load
   balance based on large queries.
  
   Hope it makes sense to you.
  
   Best Wishes
   Dan Han
  
  
   On Wed, Sep 26, 2012 at 3:19 PM, Eugeny Morozov
   emoro...@griddynamics.comwrote:
  
Dan,
   
I have additional questions.
What is the access pattern of your queries? I mean that f.e.
   PrefixFilters
have to be applied for all KeyValue pairs in HFiles, which could
 be
   slow.
Or f.e. scanner setCaching option is able to decrease number of
   network
hops to get data from RegionServer.
   
  
   I set the range of the rows and the related columns to narrow
 down
   the
   scan scope,
   and I used PrefixFilter/ColumnFilter/BinaryFilter to get the
 rows.
   I set a little cache (5KB), but I kept it the same for all
   evaluated
   data schema.
   Because I mainly focus on evaluate the performance of queries
 under
   the
   different data schemas.
  
  
Additionally, coprocessors are able to use InternalScanner
 instead of
ResultScanner, which is also could help greatly.
   
  
   yes, I used InternalScanner.
  
   
Also, the more dimension you specify, the more precise your query
 is,
   the
less data is about to be processed - family, columns, timeranges,
   etc.
   
   
On Wed, Sep 26, 2012 at 7:39 PM, Dan Han dannahan2...@gmail.com
   wrote:
   
   Thanks for your swift response, Ramkrishna and Anoop. And I
 will
 explicate what we are doing now below.

We are trying to explore a systematic way to design the
   appropriate
data
 schema for various applications in HBase. So we first designed
   several
data
 schemas for each dataset and evaluate them with the same
 queries.
   The
 queries are designed based on the 

RE: HBase and Lily?

2012-09-27 Thread Anoop Sam John
Hi
   Lily is an indexing solution for HBase. This indexing is purely happening at 
the client side. If you see Lily will sit in between the client app and HBase. 
The app need to insert/delete data via Lily only.  Lily will write the user 
data into HBase table. Also there is another index table which it created for 
storing index details. Lily write this index data too. While deletion of it 
will handle both the table.   Scans also to go via Lily only. It will fetch the 
index data 1st and accordingly will fetch required rows only from the actual 
ctable.  Hope this makes it clear for you.. :)

-Anoop-

From: Jason Huang [jason.hu...@icare.com]
Sent: Friday, September 28, 2012 2:28 AM
To: user@hbase.apache.org
Subject: HBase and Lily?

Hello,

I am exploring HBase  Lily and I have a few starter questions hoping
to get some help from users in this group who had tried that before:

(1) Do I need to post all the HBase table contents to Lily (treat Lily
as another DataStore) in order to enable the index and search
functionality? If so, that's going to be another big storage issue
(and duplicate storage?)

(2) Should I always only allow one way update from Clients - HBase -
Lily? I want to use HBase as the data store and use Lily only as a
plug-in tool to help search. I want HBase to only accept updates from
Clients (not from Lily). Is there any update from Lily to HBase
required (in order to enable the search and index functionality)?

(3) Since Lily is not an Apache project - do you know if it's under
Apache 2.0 license? We may need to extend it with our own APIs. Do we
have to give our APIs back to them? We love sharing but some of our
APIs may be under different agreements and can't be shared.

thanks!

Jason