from:"N Keywal"

HBase User Group in Paris

2012-10-02 Thread n keywal

Hi all,

I was wondering how many HBase users there are in Paris (France...).

Would you guys be interested in participating in a Paris-based user group?
The idea would be to share HBase practises, with something like a meet-up
per quarter.

Reply to me directly or on the list, as you prefer.

Cheers,

Nicolas

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-28 Thread n keywal

Depending on what you're doing with the data, I guess you might have some
corner cases, especially after a major compaction. That may be a non
trivial piece of code to write (again, it depends on how you use HBase. May
be it is actually trivial).
And, if you're pessimistic, the regression in 0.92 can be one of those that
corrupts the data, so you will need manual data fixes as well during the
rollback.

It may be simpler to secure the migration by investing more in the testing
process (dry/parallel runs). As well, if you find bugs while a release is
in progress, it increases your chances to get your bugs fixed...

Nicolas

On Thu, Sep 27, 2012 at 10:37 AM, Damien Hardy dha...@viadeoteam.comwrote:

 Actually, I have an old cluster on on prod with 0.90.3 version installed
 manually and I am working on a CDH4 new cluster deployed full automatic
 with puppet.
 While migration is not reversible (according to the pointer given by
 Jean-Daniel) I would like to keep he old cluster safe by side to be able to
 revert operation
 Switching from an old vanilla version to a Cloudera one is an other risk
 introduced in migrating the actual cluster and I'm not feeling confortable
 with.
 My idea is to copy data from old to new and switch clients the new cluster
 and I am lookin for the best strategy to manage it.

 A scanner based on timestamp should be enougth to get the last updates
 after switching (But trying to keep it short).

 Cheers,

 --
 Damien

 2012/9/27 n keywal nkey...@gmail.com

  You don't have to migrate the data when you upgrade, it's done on the
 fly.
  But it seems you want to do something more complex? A kind of realtime
  replication between two clusters in two different versions?
 
  On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com
  wrote:
 
   Hello,
  
   Corollary, what is the better way to migrate data from a 0.90 cluster
 to
  a
   0.92 cluser ?
  
   Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase
 0.92
  
   All the data must tansit on a single host where compute the 2 clients.
  
   It may be paralalize with mutiple version working with different range
   scanner maybe but not so easy.
  
   Is there a copytable version that should read on 0.90 to write on 0.92
  with
   mapreduce version ?
  
   maybe there is some sort of namespace available for Java Classes that
 we
   may use 2 version of a same package and go for a mapreduce ?
  
   Cheers,
  
   --
   Damien
  
   2012/9/25 Jean-Daniel Cryans jdcry...@apache.org
  
It's not compatible. Like the guide says[1]:
   
replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you
clear out all 0.90.x instances) and restart (You cannot do a rolling
restart from 0.90.x to 0.92.x -- you must restart)
   
This includes the client.
   
J-D
   
1. http://hbase.apache.org/book.html#upgrade0.92
   
On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
saurabh.agar...@citi.com wrote:
 Hi,

 We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app
 worked
fine in hbase 0.90.4.

 Our new setup has HBase 0.92 server and hbase 0.90.4 client. And
  throw
following exception when client would like to connect to server.

 Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me
   know,

 Thanks,
 Saurabh.


 12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment
complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181,
sessionid = 0x139f61977650034, negotiated timeout = 6

 java.lang.IllegalArgumentException: Not a host:port pair: ?

   at
org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:60)

   at
   
  
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)

   at
   
  
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)

   at
   
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786)

   at
   
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)

   at
   
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)

   at
   
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797)

   at
   
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)

   at
   
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)

   at
   
  
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-28 Thread n keywal

I understood that you were targeting a backup plan to go back from 0.92 --
0.90 if anything goes wrong?
But in any case, it might work, it depends on the data you're working with
and the downtime you're ready to accept. It's not simple to ensure you
won't miss any operation and to manage the deletes mixed with compactions.
Not taking into account the root issue you may have with the source
cluster. For example, if you're migrating back because your 0.92 cluster
cannot handle the load, adding a map reduce task to do an export world on
top of this might bring this extra little workload that will put it down
completely.

On Fri, Sep 28, 2012 at 11:59 AM, Damien Hardy dha...@viadeoteam.comwrote:

 And what about hbase 0.90 export  distcp hftp://hdfs0.20/ dfs://hdfs1.0/
  hbase 0.92 import ?

 Then switch client (a rest interface), then recorver the few last update
 with the same approch limiting export on starttime

 http://hadoop.apache.org/docs/hdfs/current/hftp.html

 This way could be safe with a minimal downtime ?

 Cheers,

 2012/9/28 n keywal nkey...@gmail.com

  Depending on what you're doing with the data, I guess you might have some
  corner cases, especially after a major compaction. That may be a non
  trivial piece of code to write (again, it depends on how you use HBase.
 May
  be it is actually trivial).
  And, if you're pessimistic, the regression in 0.92 can be one of those
 that
  corrupts the data, so you will need manual data fixes as well during the
  rollback.
 
  It may be simpler to secure the migration by investing more in the
 testing
  process (dry/parallel runs). As well, if you find bugs while a release is
  in progress, it increases your chances to get your bugs fixed...
 
  Nicolas
 
  On Thu, Sep 27, 2012 at 10:37 AM, Damien Hardy dha...@viadeoteam.com
  wrote:
 
   Actually, I have an old cluster on on prod with 0.90.3 version
 installed
   manually and I am working on a CDH4 new cluster deployed full automatic
   with puppet.
   While migration is not reversible (according to the pointer given by
   Jean-Daniel) I would like to keep he old cluster safe by side to be
 able
  to
   revert operation
   Switching from an old vanilla version to a Cloudera one is an other
 risk
   introduced in migrating the actual cluster and I'm not feeling
  confortable
   with.
   My idea is to copy data from old to new and switch clients the new
  cluster
   and I am lookin for the best strategy to manage it.
  
   A scanner based on timestamp should be enougth to get the last updates
   after switching (But trying to keep it short).
  
   Cheers,
  
   --
   Damien
  
   2012/9/27 n keywal nkey...@gmail.com
  
You don't have to migrate the data when you upgrade, it's done on the
   fly.
But it seems you want to do something more complex? A kind of
 realtime
replication between two clusters in two different versions?
   
On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com
 
wrote:
   
 Hello,

 Corollary, what is the better way to migrate data from a 0.90
 cluster
   to
a
 0.92 cluser ?

 Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase
   0.92

 All the data must tansit on a single host where compute the 2
  clients.

 It may be paralalize with mutiple version working with different
  range
 scanner maybe but not so easy.

 Is there a copytable version that should read on 0.90 to write on
  0.92
with
 mapreduce version ?

 maybe there is some sort of namespace available for Java Classes
 that
   we
 may use 2 version of a same package and go for a mapreduce ?

 Cheers,

 --
 Damien

 2012/9/25 Jean-Daniel Cryans jdcry...@apache.org

  It's not compatible. Like the guide says[1]:
 
  replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure
 you
  clear out all 0.90.x instances) and restart (You cannot do a
  rolling
  restart from 0.90.x to 0.92.x -- you must restart)
 
  This includes the client.
 
  J-D
 
  1. http://hbase.apache.org/book.html#upgrade0.92
 
  On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
  saurabh.agar...@citi.com wrote:
   Hi,
  
   We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app
   worked
  fine in hbase 0.90.4.
  
   Our new setup has HBase 0.92 server and hbase 0.90.4 client.
 And
throw
  following exception when client would like to connect to server.
  
   Is anyone running HBase 0.92 server and hbase 0.90.4 client?
 Let
  me
 know,
  
   Thanks,
   Saurabh.
  
  
   12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session
  establishment
  complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181
 ,
  sessionid = 0x139f61977650034, negotiated timeout = 6
  
   java.lang.IllegalArgumentException: Not a host:port pair

Re: Hbase clustering

2012-09-27 Thread n keywal

Hi,

I would like to direct you to the reference guide, but I must acknowledge
that, well, it's a reference guide, hence not really easy for a plain new
start.
You should have a look at Lars' blog (and may be buy his book), and
especially this entry:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

Some hints however:
- the replication occurs at the hdfs level, not the hbase level: hbase
writes files that are split in hdfs blocks that are replicated accross the
datanodes. If you want to check the replications, you must look at what
files are written by hbase and how they have been split in blocks by hdfs
and how these blocks have been replicated. That will be in the hdfs
interface. As a side note, it's not the easiest thing to learn when you
start :-)
- The error  ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
Retried 7 times
  this is not linked to replication or whatever. It means that second
machine cannot find the master. You need to fix this first. (googling 
checking the logs).


Good luck,

Nicolas




On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku dvrao@gmail.com
 wrote:

 How can we verify that the data(tables) is distributed across the cluster??
 Is there a way to confirm it that the data is distributed across all the
 nodes in the cluster.?

 On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
 dvrao@gmail.com wrote:

  Hi,
  I am completely new to Hbase. I want to cluster the Hbase on two
  nodes.I installed hadoop,hbase on the two nodes  my conf files are as
  given below.
  *cat  conf/regionservers *
  hbase-regionserver1
  hbase-master
  *cat conf/masters *
  hadoop-namenode
  * cat conf/slaves *
  hadoop-datanode1
  *vim conf/hdfs-site.xml *
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
  property
  namedfs.replication/name
  value2/value
  descriptionDefault block replication.The actual number of
  replications can be specified when the file is created. The default is
 used
  if replication is not specified in create time.
  /description
  /property
  property
  namedfs.support.append/name
  valuetrue/value
  descriptionDefault block replication.The actual number of
  replications can be specified when the file is created. The default is
 used
  if replication is not specified in create time.
  /description
  /property
  /configuration
  * finally my /etc/hosts file is *
  127.0.0.1   localhost
  127.0.0.1   oc-PowerEdge-R610
  10.2.32.48  hbase-master hadoop-namenode
  10.240.13.35 hbase-regionserver1  hadoop-datanode1
   The above files are identical on both of the machines. The following are
  the processes that are running on my m/c's when I ran start scripts in
  hadoop as well as hbase
  *hadoop-namenode:*
  HQuorumPeer
  HMaster
  Main
  HRegionServer
  SecondaryNameNode
  Jps
  NameNode
  JobTracker
  *hadoop-datanode1:*
 
  TaskTracker
  Jps
  DataNode
  -- process information unavailable
  Main
  NC
  HRegionServer
 
  I can able to create,list  scan tables on the *hadoop-namenode* machine
  using Hbase shell. But while trying to run the same on the  *
  hadoop-datanode1 *machine I couldn't able to do it as I am getting
  following error.
  hbase(main):001:0 list
  TABLE
 
 
  ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times
 
  Here is some help for this command:
  List all tables in hbase. Optional regular expression parameter could
  be used to filter the output. Examples:
 
hbase list
hbase list 'abc.*'
  How can I list,scan the tables that are created by the *hadoop-namenode *
  from the *hadoop-datanode1* machine. Similarly Can I create some tables
  on  *hadoop-datanode1 * can I access them from the *hadoop-namenode * 
  vice-versa as the data is distributed as this is a cluster.
 
 
 
  --
  Thanks  Regards,
  Venkateswara Rao Dokku,
  Software Engineer,One Convergence Devices Pvt Ltd.,
  Jubille Hills,Hyderabad.
 
 


 --
 Thanks  Regards,
 Venkateswara Rao Dokku,
 Software Engineer,One Convergence Devices Pvt Ltd.,
 Jubille Hills,Hyderabad.

Re: Hbase clustering

2012-09-27 Thread n keywal

You should launch the master only once, on whatever machine you like. Then
you will be able to access it from any other machine.
Please have a look at the blog I mentioned in my previous mail.

On Thu, Sep 27, 2012 at 9:39 AM, Venkateswara Rao Dokku dvrao@gmail.com
 wrote:

 I can see that HMaster is not started on the data-node machine when the
 start scripts in hadoop  hbase ran on the hadoop-namenode. My doubt is
 that,Shall we have to start that master on the hadoop-datanode1 too or the
 hadoop-datanode1 will access the Hmaster that is running on the
 hadoop-namenode to create,list,scan tables as the two nodes are in the
 cluster as namenode  datanode.

 On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote:

  Hi,
 
  I would like to direct you to the reference guide, but I must acknowledge
  that, well, it's a reference guide, hence not really easy for a plain new
  start.
  You should have a look at Lars' blog (and may be buy his book), and
  especially this entry:
  http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
 
  Some hints however:
  - the replication occurs at the hdfs level, not the hbase level: hbase
  writes files that are split in hdfs blocks that are replicated accross
 the
  datanodes. If you want to check the replications, you must look at what
  files are written by hbase and how they have been split in blocks by hdfs
  and how these blocks have been replicated. That will be in the hdfs
  interface. As a side note, it's not the easiest thing to learn when you
  start :-)
  - The error  ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
  Retried 7 times
this is not linked to replication or whatever. It means that second
  machine cannot find the master. You need to fix this first. (googling 
  checking the logs).
 
 
  Good luck,
 
  Nicolas
 
 
 
 
  On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku 
  dvrao@gmail.com
   wrote:
 
   How can we verify that the data(tables) is distributed across the
  cluster??
   Is there a way to confirm it that the data is distributed across all
 the
   nodes in the cluster.?
  
   On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku 
   dvrao@gmail.com wrote:
  
Hi,
I am completely new to Hbase. I want to cluster the Hbase on two
nodes.I installed hadoop,hbase on the two nodes  my conf files are
 as
given below.
*cat  conf/regionservers *
hbase-regionserver1
hbase-master
*cat conf/masters *
hadoop-namenode
* cat conf/slaves *
hadoop-datanode1
*vim conf/hdfs-site.xml *
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
   
!-- Put site-specific property overrides in this file. --
   
configuration
property
namedfs.replication/name
value2/value
descriptionDefault block replication.The actual number of
replications can be specified when the file is created. The default
 is
   used
if replication is not specified in create time.
/description
/property
property
namedfs.support.append/name
valuetrue/value
descriptionDefault block replication.The actual number of
replications can be specified when the file is created. The default
 is
   used
if replication is not specified in create time.
/description
/property
/configuration
* finally my /etc/hosts file is *
127.0.0.1   localhost
127.0.0.1   oc-PowerEdge-R610
10.2.32.48  hbase-master hadoop-namenode
10.240.13.35 hbase-regionserver1  hadoop-datanode1
 The above files are identical on both of the machines. The following
  are
the processes that are running on my m/c's when I ran start scripts
 in
hadoop as well as hbase
*hadoop-namenode:*
HQuorumPeer
HMaster
Main
HRegionServer
SecondaryNameNode
Jps
NameNode
JobTracker
*hadoop-datanode1:*
   
TaskTracker
Jps
DataNode
-- process information unavailable
Main
NC
HRegionServer
   
I can able to create,list  scan tables on the *hadoop-namenode*
  machine
using Hbase shell. But while trying to run the same on the  *
hadoop-datanode1 *machine I couldn't able to do it as I am getting
following error.
hbase(main):001:0 list
TABLE
   
   
ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7
  times
   
Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
   
  hbase list
  hbase list 'abc.*'
How can I list,scan the tables that are created by the
  *hadoop-namenode *
from the *hadoop-datanode1* machine. Similarly Can I create some
 tables
on  *hadoop-datanode1 * can I access them from the *hadoop-namenode
 *
  
vice-versa as the data is distributed as this is a cluster.
   
   
   
--
Thanks  Regards

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-27 Thread n keywal

You don't have to migrate the data when you upgrade, it's done on the fly.
But it seems you want to do something more complex? A kind of realtime
replication between two clusters in two different versions?

On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com wrote:

 Hello,

 Corollary, what is the better way to migrate data from a 0.90 cluster to a
 0.92 cluser ?

 Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92

 All the data must tansit on a single host where compute the 2 clients.

 It may be paralalize with mutiple version working with different range
 scanner maybe but not so easy.

 Is there a copytable version that should read on 0.90 to write on 0.92 with
 mapreduce version ?

 maybe there is some sort of namespace available for Java Classes that we
 may use 2 version of a same package and go for a mapreduce ?

 Cheers,

 --
 Damien

 2012/9/25 Jean-Daniel Cryans jdcry...@apache.org

  It's not compatible. Like the guide says[1]:
 
  replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you
  clear out all 0.90.x instances) and restart (You cannot do a rolling
  restart from 0.90.x to 0.92.x -- you must restart)
 
  This includes the client.
 
  J-D
 
  1. http://hbase.apache.org/book.html#upgrade0.92
 
  On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
  saurabh.agar...@citi.com wrote:
   Hi,
  
   We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked
  fine in hbase 0.90.4.
  
   Our new setup has HBase 0.92 server and hbase 0.90.4 client. And throw
  following exception when client would like to connect to server.
  
   Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me
 know,
  
   Thanks,
   Saurabh.
  
  
   12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment
  complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181,
  sessionid = 0x139f61977650034, negotiated timeout = 6
  
   java.lang.IllegalArgumentException: Not a host:port pair: ?
  
 at
  org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:60)
  
 at
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
  
 at
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
  
 at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
  
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:179)
  
 at
 
 org.apache.hadoop.hbase.HBaseTestingUtility.truncateTable(HBaseTestingUtility.java:609)
  
 at
 
 com.citi.sponge.flume.sink.ELFHbaseSinkTest.testAppend2(ELFHbaseSinkTest.java:221)
  
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
  
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
 Source)
  
 at java.lang.reflect.Method.invoke(Unknown Source)
  
 at junit.framework.TestCase.runTest(TestCase.java:168)
  
 at junit.framework.TestCase.runBare(TestCase.java:134)
  
 at junit.framework.TestResult$1.protect(TestResult.java:110)
  
 at junit.framework.TestResult.runProtected(TestResult.java:128)
  
 at junit.framework.TestResult.run(TestResult.java:113)
  
 at junit.framework.TestCase.run(TestCase.java:124)
  
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
  
 at junit.framework.TestSuite.run(TestSuite.java:227)
  
 at
 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
  
 at
 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
  
 at
 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
  
 at
 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
  
 at

Re: H-base Master/Slave replication

2012-09-26 Thread n keywal

Hi,

I think there is a confusion between hbase replication (replication between
clusters) and hdfs replication (replication between datanodes).
hdfs replication is (more or less) hidden and done for you.

Nicolas

On Wed, Sep 26, 2012 at 9:20 AM, Venkateswara Rao Dokku dvrao@gmail.com
 wrote:

 Hi,
 I wanted to Cluster Hbase on 2 nodes. I put one of my nodes as
 hadoop-namenodeas well as hbase-master  the other node as hadoop-datanode1
 as well as hbase-region-server1. I started hadoop cluster as well as Hbase
 on the name-node side. They started fine. I created tables  it went fine
 in the master. Now I am trying to replicate the data across the nodes. In
 some of the sites it is mentioned that, we have to maintain zoo-keeper by
 our-self. How to do it??
  Currently my hbase is maintaining the zoo-keeper. what are the changes I
 need to do for conf/ files, in order to replicate data between Master/Slave
 nodes.

 --
 Thanks  Regards,
 Venkateswara Rao Dokku,
 Software Engineer,One Convergence Devices Pvt Ltd.,
 Jubille Hills,Hyderabad.

Re: RetriesExhaustedWithDetailsException while puting in Table

2012-09-19 Thread n keywal

DoNotRetryIOException means that the error is considered at permanent: it's
not a missing regionserver, but for example a table that's not enabled.
I would expect a more detailed exception (a caused by or something alike).
If it's missing, you should have more info in the regionserver logs.

On Wed, Sep 19, 2012 at 11:54 AM, Dhirendra Singh dps...@gmail.com wrote:

 I am getting this exception while trying to insert entry to the table. the
 table has its secondary index and its coprocessors defined properly.
 I suspect this error is because the inserting row didn't had all the
 columns which were required in the secondary index but not sure.

 could someone tell me the way to debug this scenario as the exception is
 also a bit vauge, it actually doesn't tell what went wrong,

 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
 1 action: DoNotRetryIOException: 1 time, servers with issues:
 tserver.corp.nextag.com:60020,
  at

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641)
 at

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
 at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:795)
 at

 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)


 --
 Warm Regards,
 Dhirendra Pratap
 +91. 9717394713

Re: Performance of scan setTimeRange VS manually doing it

2012-09-12 Thread n keywal

For each file; there is a time range. When you scan/search, the file is
skipped if there is no overlap between the file timerange and the timerange
of the query. As there are other parameters as well (row distribution,
compaction effects, cache, bloom filters, ...) it's difficult to know in
advance what's going to happen exactly.  But specifying a timerange does no
harm for sure, if it matches your functional needs...

This said, if you already have the rowkey, the time range is less
interesting as you will skip a lot of file already.

On Wed, Sep 12, 2012 at 11:52 PM, Tom Brown tombrow...@gmail.com wrote:

 When I query HBase, I always include a time range. This has not been a
 problem when querying recent data, but it seems to be an issue when I
 query older data (a few hours old). All of my row keys include the
 timestamp as part of the key (this value is the same as the HBase
 timestamp for the row).  I recently tried an experiment where I
 manually re-seek to the possible row (based on the timestamp as part
 of the row key) instead of using setTimeRange on my scan object and
 was amazed to see that there was no degradation for older data.

 Can someone postulate a theory as to why this might be happening? I'm
 happy to provide extra data if it will help you theorize...

 Is there a downside to stopping using setTimeRange?

 --Tom

Re: Local debugging (possibly with Maven and HBaseTestingUtility?)

2012-09-07 Thread n keywal

Hi,

You can use HBase in standalone mode? Cf.
http://hbase.apache.org/book.html#standalone_dist?
I guess you already tried and it didn't work?

Nicolas

On Fri, Sep 7, 2012 at 9:57 AM, Jeroen Hoek jer...@lable.org wrote:

 Hello,

 We are developing a web-application that uses HBase as database, with
 Tomcat as application server. Currently, our server-side code can act
 as a sort of NoSQL abstraction-layer for either HBase or Google
 AppEngine. HBase is used in production, AppEngine mainly for testing
 and demo deployments.

 Our current development setup is centred around Eclipse, and local
 testing and debugging is done by running the application from Eclipse,
 which launches the Jetty application server and connects to a local
 AppEngine database persisted to a single file in the WEB-INF
 directory. This allows the developers to easily test new features
 against an existing (local) database that is persisted as long you
 don't throw away the binary file yourself.

 We would like to be able to do the same thing with HBase. So far I
 have seen examples of HBaseTestingUtility being used in unit tests
 (usually with Maven), but while that covers unit-testing, I have not
 been able to find a way to run a local, persistent faux-HBase cluster
 like AppEngine does. Is there a recommended way of doing this?

 The reason for wanting to be able to test locally like this is to
 avoid the overhead of running a local VM with HBase or having to
 connect to a remote test-cluster when developing.

 Kind regards,

 Jeroen Hoek

Re: Extremely slow when loading small amount of data from HBase

2012-09-05 Thread n keywal

Hi,

With 8 regionservers, yes, you can. Target a few hundreds by default imho.

N.

On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 tewil...@gmail.com wrote:

 +HBase users.


 -- Forwarded message --
 From: Dmitriy Ryaboy dvrya...@gmail.com
 Date: 2012/9/4
 Subject: Re: Extremely slow when loading small amount of data from HBase
 To: u...@pig.apache.org u...@pig.apache.org


 I think the hbase folks recommend something like 40 regions per node
 per table, but I might be misremembering something. Have you tried
 emailing the hbase users list?

 On Sep 4, 2012, at 3:39 AM, 某因幡 tewil...@gmail.com wrote:

  After merging ~8000 regions to ~4000 on an 8-node cluster the things
  is getting better.
  Should I continue merging?
 
 
  2012/8/29 Dmitriy Ryaboy dvrya...@gmail.com:
  Can you try the same scans with a regular hbase mapreduce job? If you
 see the same problem, it's an hbase issue. Otherwise, we need to see the
 script and some facts about your table (how many regions, how many rows,
 how big a cluster, is the small range all on one region server, etc)
 
  On Aug 27, 2012, at 11:49 PM, 某因幡 tewil...@gmail.com wrote:
 
  When I load a range of data from HBase simply using row key range in
  HBaseStorageHandler, I find that the speed is acceptable when I'm
  trying to load some tens of millions rows or more, while the only map
  ends up in a timeout when it's some thousands of rows.
  What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
 
 
  --
  language: Chinese, Japanese, English
 
 
 
  --
  language: Chinese, Japanese, English


 --
 language: Chinese, Japanese, English

Re: HBase and unit tests

2012-08-31 Thread n keywal

Hi Cristopher,

HBase starts a minicluster for many of its tests because we have a lot of
destructive tests. Or the non destructive tests would be impacted by the
destructive tests. When writing a client application, you usually don't
need to do that: you can rely on the same instance for all your tests.

As well, it's useful to write the tests in a way compatible with a real
cluster or a pseudo distributed one. Sometimes, when the test fails, you
want to have a look at what the code wrote or found in HBase: you won't
have this in a mini cluster. And it saves a start.

I don't know if there is a blog entry on this; but it's not very difficult
to do (but as usual not that easy when you start). I've personally done it
with a singleton class + prefixing the table names by a random key (to
allow multiple tests in parallel on the same cluster without relying on
cleanup) + getProperty to decide between starting a mini cluster or
connecting to a cluster.

HTH,

Nicolas

On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber
cristofer.we...@neogrid.com wrote:

Hi Sonal, Stack and Ulrich!

Yes, I should provide more details :$

I reached the links you provided when I was searching for a way to start
HBase with JUnit. From default, the only params I have changed are
Zookeeper port and the amount of nodes, which is 1 in my case. Based on
logs I suspect that most of time are spent with HDFS and that's why I asked
if there is a way to start a standalone instance of HBase. The amount of
data written at each test case would probably fit in memstore anyway, and
table cleansing between each test method is managed by a loop of deletes.

At least 15 seconds are spent on starting the mini cluster for each test
case.

Right now I reminded that I should turn off WAL when running unit tests
:-), but this will not reflect on startup time.

Thanks!!

Best regards,
Cristofer

De: Ulrich Staudinger [ustaudin...@gmail.com]
Enviado: sexta-feira, 31 de agosto de 2012 2:21
Para: user@hbase.apache.org
Assunto: Re: HBase and unit tests

As a general advice, although you probably do take care of this,
instantiate the mini cluster only once in your junit test constructor
and not in every test method. at the end of each test, either cleanup
your hbase or use a different area per test.

best regards,
ulrich

--
connect on xing or linkedin. sent from my tablet.

On 31.08.2012, at 06:46, Stack st...@duboce.net wrote:

On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber
cristofer.we...@neogrid.com wrote:
Hi there!

After I started studying HBase, I've searched for open source projects
backed by HBase and I found Titan distributed graph database (you probably
heard about it). As soon as I read in their documentation that HBase
adapter is experimental and suboptimal (disclaimer here:
https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to
help improving this adapter and since then I made a few changes to improve
on running tests (reduced from hours to minutes) and also an improvement on
search feature.

Now I'm trying to break the dependency on a pre-installed HBase for
unit tests and found miniCluster inside HBase tests, but minicluster
demands too much time to start and I don't know if tweaking on configs will
improve significantly. Is there a way to start a 'lightweight' instance,
like programatically starting a standalone instance?

How much is 'too much time' Cristofer? Do you want a standalone cluster
at all?
St.Ack
P.S. If digging in this area, you might find the blog post by the
sematextians of use:

http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/

Re: HBase and unit tests

2012-08-31 Thread n keywal

On Fri, Aug 31, 2012 at 2:33 PM, Cristofer Weber 
cristofer.we...@neogrid.com wrote:

 For the other adapters (Cassandra, Cassandra + Thrift, Cassandra +
 Astyanax, etc) they managed to run tests as Internal and External for unit
 tests and also have a profile for Performance and Concurrent tests, where
 External and Performance/Concurrent runs over a live database instance and
 only with Internal tests it is expected to start a database per test case,
 remaining the same tests as in External. HBase adapter already have
 External and Performance/Concurrent so I'm trying to provide the Internal
 set where the objective is to test Titan|HBase interaction.


Understood, thanks for sharing the context.

And my goal is to achieve better times than Cassandra :-)

 Singleton seems to be a good option, but I have to check if Maven Surefire
 can keep same process between JUnit Test Cases.


It should be ok with the parameter forkMode=once in surefire.

Because Titan work with adapters for different databases and manage
 table/CF creation when not exists, I think it will not be possible to
 prefix table names per test without changing some core components of Titan,
 and it seems to be too invasive to change this now, and deletion is fast
 enough so we can keep same table.


It's useful on an external cluster, as you can't fully rely on the clean up
when a test fails nastily, or if you want to analyse the content. It won't
be such an issue on a mini cluster, as it's recreated between the test runs.

Thanks!!


You're welcome. Keep us updated, and tell us if you have issues.

Re: HBase Is So Slow To Save Data?

2012-08-29 Thread N Keywal

Hi Bing,

You should expect HBase to be slower in the generic case:
1) it writes much more data (see hbase data model), with extra columns
qualifiers, timestamps  so on.
2) the data is written multiple times: once in the write-ahead-log, once
per replica on datanode  so on again.
3) there are inter process calls  inter machine calls on the critical path.

This is the cost of the atomicity, reliability and scalability features.
With these features in mind, HBase is reasonably fast to save data on a
cluster.

On your specific case (without the points 2  3 above), the performance
seems to be very bad.

You should first look at:
- how much is spent in the put vs. preparing the list
- do you have garbage collection going on? even swap?
- what's the size of your final Array vs. the available memory?

Cheers,

N.


On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote:

 Dear all,

 By the way, my HBase is in the pseudo-distributed mode. Thanks!

 Best regards,
 Bing

 On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote:

  Dear all,
 
  According to my experiences, it is very slow for HBase to save data? Am I
  right?
 
  For example, today I need to save data in a HashMap to HBase. It took
  about more than three hours. However when saving the same HashMap in a
 file
  in the text format with the redirected System.out, it took only 4.5
 seconds!
 
  Why is HBase so slow? It is indexing?
 
  My code to save data in HBase is as follows. I think the code must be
  correct.
 
  ..
  public synchronized void
  AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString,
  ConcurrentHashMapString, SetString hhOutNeighborMap, int
 timingScale)
  {
  ListPut puts = new ArrayListPut();
 
  String hhNeighborRowKey;
  Put hubKeyPut;
  Put groupKeyPut;
  Put topGroupKeyPut;
  Put timingScalePut;
  Put nodeKeyPut;
  Put hubNeighborTypePut;
 
  for (Map.EntryString, ConcurrentHashMapString,
  SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
  {
  for (Map.EntryString, SetString
  groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
  {
  for (String neighborKey :
  groupNeighborEntry.getValue())
  {
  hhNeighborRowKey =
  NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
  Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
  groupNeighborEntry.getKey() + timingScale + neighborKey);
 
  hubKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
  hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
  Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
  puts.add(hubKeyPut);
 
  groupKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
  groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
  Bytes.toBytes(groupNeighborEntry.getKey()));
  puts.add(groupKeyPut);
 
  topGroupKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
 
 topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
 
 Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(;
  puts.add(topGroupKeyPut);
 
  timingScalePut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
 
 timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
  Bytes.toBytes(timingScale));
  puts.add(timingScalePut);
 
  nodeKeyPut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
  nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
  Bytes.toBytes(neighborKey));
  puts.add(nodeKeyPut);
 
  hubNeighborTypePut = new
  Put(Bytes.toBytes(hhNeighborRowKey));
 
 
 hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
  Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
  Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
  puts.add(hubNeighborTypePut);
  }
  }
  }

Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-28 Thread N Keywal



 Totally randoms (even on keys that do not exist).


It worth checking if it matches your real use cases. I expect that read by
row key are most of the time on existing rows (as a traditional db
relationship or a UI or workflow driven stuff), even if I'm sure it's
possible to have something totally different.

It's not going to have an impact all the time. But I can easily imagine
scenarios with better performances when the row exists vs. does not exist.
For example, you have to read more files to check that the row key is
really not there. This will be even more true if  you're inserting a lot of
data simultaneously (i.e. the files won't be major compacted). On the
opposite side, bloom filters may be more efficient in this case. But again,
I'm not sure they're going to be efficient on random data. It's like
compression algorithms: on really random data; they will all have similar 
bad results. It does not mean they are equivalent, nor useless.


 I'm working on it ! Thanks,


If you can reproduce a 'bad behavior' or a performance issue, we will try
to fix it for sure.

Have a nice day,

N.

Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-24 Thread N Keywal

Hi Adrien,

  What do you think about that hypothesis ?

Yes, there is something fishy to look at here. Difficult to say
without more logs as well.
Are your gets totally random, or are you doing gets on rows that do
exist? That would explain the number of request vs. empty/full
regions.

It does not explain all what you're seeing however. So if you're not
exhausting the system resources, there may be a bug somewhere. If you
can reproduce the behaviour on a pseudo distributed cluster it could
be interesting, as I understand from you previous mail, you have a
single client, and may be a single working server at the end...

Nicolas

Re: How to avoid stop-the-world GC for HBase Region Server under big heap size

2012-08-23 Thread N Keywal

Hi,

For a possible future, there is as well this to monitor:
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html
More or less requires JDK 1.7
See HBASE-2039

Cheers,

N.

On Thu, Aug 23, 2012 at 8:16 AM, J Mohamed Zahoor jmo...@gmail.com wrote:
 Slab cache might help
 http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/

 ./zahoor

 On Thu, Aug 23, 2012 at 11:36 AM, Gen Liu ge...@zynga.com wrote:

 Hi,

 We are running Region Server on big memory machine (70G) and set Xmx=64G.
 Most heap is used as block cache for random read.
 Stop-the-world GC is killing the region server, but using less heap (16G)
 doesn't utilize our machines well.

 Is there a concurrent or parallel GC option that won't block all threads?

 Any thought is appreciated. Thanks.

 Gen Liu

Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-23 Thread N Keywal

Hi Adrien,

As well, if you can share the client code (number of threads, regions,
is it a set of single get, or are they multi gets, this kind of
stuff).

Cheers,

N.


On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote:
 Hi Adrien,

 I would love to see the region server side of the logs while those
 socket timeouts happen, also check the GC log, but one thing people
 often hit while doing pure random read workloads with tons of clients
 is running out of sockets because they are all stuck in CLOSE_WAIT.
 You can check that by using lsof. There are other discussion on this
 mailing list about it.

 J-D

 On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet
 adrien.moge...@gmail.com wrote:
 Hi there,

 While I'm performing read-intensive benchmarks, I'm seeing storm of
 CallerDisconnectedException in certain RegionServers. As the
 documentation says, my client received a SocketTimeoutException
 (6ms etc...) at the same time.
 It's always happening and I get very poor read-performances (from 10
 to 5000 reads/sc) in a 10 nodes cluster.

 My benchmark consists in several iterations launching 10, 100 and 1000
 Get requests on a given random rowkey with a single CF/qualifier.
 I'm using HBase 0.94.1 (a few commits before the official stable
 release) with Hadoop 1.0.3.
 Bloom filters have been enabled (at the rowkey level).

 I do not find very clear informations about these exceptions. From the
 reference guide :
   (...) you should consider digging in a bit more if you aren't doing
 something to trigger them.

 Well... could you help me digging? :-)

 --
 AM.

Re: Problem - Bringing up the HBase cluster

2012-08-22 Thread N Keywal

Hi,

Please use the user mailing list (added at dest) for this type of
questions instead of the dev list (now in bcc).

It's a little bit strange to use the full distributed mode with a
single region server. Is the Pseudo-distributed mode working?
Check the number of datanodes vs. dfs.replication (default 3). If you
have less datanodes then dfs.replication value, it won't work
properly.
Check as well that the region server is connected to the master.

Cheers,

On Wed, Aug 22, 2012 at 3:16 AM, kbmkumar kbmku...@gmail.com wrote:
Hi,
I am trying to bring up a HBase cluster with 1 master and 1 one region
server. I am using
Hadoop 1.0.3
Hbase 0.94.1

Starting the hdfs was straight forward and i could see the namenode up and
running successfully. But the problem is with Hbase. I followed all the
guidelines given in the Hbase cluster setup (fully distributed mode) and ran
the start-hbase.sh

It started the Master, Region server and zookeeper (in the region server) as
per my configuration. But i am not sure the master is fully functional. When
i try to connect hbase shell and create table, it errors out saying
PleaseHoldException- Master is initializing

In UI HMaster status shows like this *Assigning META region (since 18mins,
39sec ago)*

and i see the Hmaster logs are flowing with the following debug prints, the
log file is full of below prints,
*
2012-08-22 01:08:19,637 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
serverName=hadoop-datanode1,60020,1345596463277
2012-08-22 01:08:19,638 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
serverName=hadoop-datanode1,60020,1345596463277
2012-08-22 01:08:19,639 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location,
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
serverName=hadoop-datanode1,60020,1345596463277*

Please help me in debugging this.

--
View this message in context:
http://apache-hbase.679495.n3.nabble.com/Problem-Bringing-up-the-HBase-cluster-tp4019948.html
Sent from the HBase - Developer mailing list archive at Nabble.com.

Re: Hbase Shell: UnsatisfiedLinkError

2012-08-22 Thread N Keywal

Hi,

Well the first steps would be:
1) Use the JDK 1.6 from Oracle. 1.7 is not supported yet.
2) Check the content of
http://hbase.apache.org/book.html#configuration to set up your first
cluster. Worth reading the whole guide imho.
3) Start with the last released version (.94), except if you have a
good reason to use the .90 of course.
4) Use the user mailing list for this type of questions and not the
dev one. :-). I kept dev in bcc.

Good luck,

N.

On Wed, Aug 22, 2012 at 12:25 PM, o brbrs obr...@gmail.com wrote:
 Hi,
 I'm new at hbase. I installed Hadoop 1.0.3 and Hbase 0.90.6 with Java 1.7.0
 on Ubuntu 12.04.
 When I run hbase shell command, this error occures:
 $ /usr/local/hbase/bin/hbase shell
 java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: Could not
 locate stub library in jar file.  Tried [jni/ı386-Linux/libjffi-1.0.so,
 /jni/ı386-Linux/libjffi-1.0.so]
 at
 com.kenai.jffi.Foreign$InValidInstanceHolder.getForeign(Foreign.java:90)
 at com.kenai.jffi.Foreign.getInstance(Foreign.java:95)
 at com.kenai.jffi.Library.openLibrary(Library.java:151)
 at com.kenai.jffi.Library.getCachedInstance(Library.java:125)
 at
 com.kenai.jaffl.provider.jffi.Library.loadNativeLibraries(Library.java:66)
 at
 com.kenai.jaffl.provider.jffi.Library.getNativeLibraries(Library.java:56)
 at
 com.kenai.jaffl.provider.jffi.Library.getSymbolAddress(Library.java:35)
 at
 com.kenai.jaffl.provider.jffi.Library.findSymbolAddress(Library.java:45)
 at
 com.kenai.jaffl.provider.jffi.AsmLibraryLoader.generateInterfaceImpl(AsmLibraryLoader.java:188)
 at
 com.kenai.jaffl.provider.jffi.AsmLibraryLoader.loadLibrary(AsmLibraryLoader.java:110)
 .

 What is the reason of this error? Please help.

 Thanks...
 --
 ...
 Obrbrs

Re: Problem - Bringing up the HBase cluster

2012-08-22 Thread N Keywal

If you have a single datanode with a replication of two, it will
(basically) won't work, as it will try to replicate the blocks on two
datanodes while there is only one available. Note that I'm speaking
about datanodes (i.e. hdfs) and not region servers (i.e. hbase).

pastebin the full logs with the region server, may be someone will
have an idea of the root issue.

But I think it's safer to start with the pseudo distributed, it's
easier to setup and it's documented. A distributed config with a
single node is not really standard, it's better to start with the
easiest path imho.

On Wed, Aug 22, 2012 at 5:43 PM, Jothikumar Ekanath kbmku...@gmail.com wrote:
Hi,
Thanks for the response, sorry i put this email in the dev space.
My data replication is 2. and yes the region and master server connectivity
is good

Initially i started with 4 data nodes and 1 master, i faced the same
problem. So i reduced the data nodes to 1 and wanted to test it. I see the
same issue. I haven't tested the pseudo distribution mode, i can test that.
But my objective is to test the full distributed mode and do some testing. I
can send my configuration for review. Please let me know if i am missing any
basic setup configuration.

On Wed, Aug 22, 2012 at 12:00 AM, N Keywal nkey...@gmail.com wrote:

Hi,

Please use the user mailing list (added at dest) for this type of
questions instead of the dev list (now in bcc).

Cheers,

On Wed, Aug 22, 2012 at 3:16 AM, kbmkumar kbmku...@gmail.com wrote:
Hi,
I am trying to bring up a HBase cluster with 1 master and 1 one
region
server. I am using
Hadoop 1.0.3
Hbase 0.94.1

Starting the hdfs was straight forward and i could see the namenode up
and
running successfully. But the problem is with Hbase. I followed all the
guidelines given in the Hbase cluster setup (fully distributed mode) and
ran
the start-hbase.sh

It started the Master, Region server and zookeeper (in the region
server) as
per my configuration. But i am not sure the master is fully functional.
When
i try to connect hbase shell and create table, it errors out saying
PleaseHoldException- Master is initializing

In UI HMaster status shows like this *Assigning META region (since
18mins,
39sec ago)*

and i see the Hmaster logs are flowing with the following debug prints,
the
log file is full of below prints,
*
2012-08-22 01:08:19,637 DEBUG

org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location,

connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
serverName=hadoop-datanode1,60020,1345596463277
2012-08-22 01:08:19,638 DEBUG

org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location,

connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
serverName=hadoop-datanode1,60020,1345596463277
2012-08-22 01:08:19,639 DEBUG

org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Looked up root region location,

connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
serverName=hadoop-datanode1,60020,1345596463277*

Please help me in debugging this.

--
View this message in context:
http://apache-hbase.679495.n3.nabble.com/Problem-Bringing-up-the-HBase-cluster-tp4019948.html
Sent from the HBase - Developer mailing list archive at Nabble.com.

Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

2012-08-10 Thread N Keywal

Hi,

What are your queries exactly? What's the HBase version?

The mechanism is:
- There is a location cache, per HConnection, on the client
- The client first tries the region server in its cache
- if it fails, the client removes this entry from the cache and enters
the retry loop
- there is a limited amount of retries and a sleep between the retries
- most of the times, the client will connect to meta to get the new location

When there are multiple queries, before HBASE-5924, the errors will be
analyzed after the other regions servers has returned as well. It
could be an explanation. HBASE-5877 exists as well, but only for
moves, not for splits...

Cheers,

N.


On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
deanforwever2...@gmail.com wrote:
 on the region server's log :2012-08-10 11:49:50,796 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 NotServingRegionException; Region is not online:
 test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.

 after region split, client didnt get result after timeout setting(1.5
 second),then the task is canceled by my program, so the HConnectionManager
 didnt delete the cachedLocation;
 the client  still query the old region id which is no more exists

 And more, part of my processes updated the region location info, part
 not.I'm sure the network is fine;

 how to fix the problem?why does it need so long time to detect the new
 regions?

Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

2012-08-10 Thread N Keywal

If it's a single row, I would expect the server to return the error
immediately. Then you will have the sleep I was mentioning previously,
but the cache should be cleaned before the sleep...

On Fri, Aug 10, 2012 at 1:32 PM, deanforwever2010
deanforwever2...@gmail.com wrote:
 hi, Keywal
 my hbase version is 0.94,
 my query is just to get limited columns of a row,
 I make a callable task of 1.5 seconds, so  maybe it didnot fail but
 canceled by my process,so the region cache didnot clear after many requests
 happened.
 my question is why should it take so long time for failure? and it behave
 different between my servers, and there is no problem with network.

 2012/8/10 N Keywal nkey...@gmail.com

 Hi,

 What are your queries exactly? What's the HBase version?

 The mechanism is:
 - There is a location cache, per HConnection, on the client
 - The client first tries the region server in its cache
 - if it fails, the client removes this entry from the cache and enters
 the retry loop
 - there is a limited amount of retries and a sleep between the retries
 - most of the times, the client will connect to meta to get the new
 location

 When there are multiple queries, before HBASE-5924, the errors will be
 analyzed after the other regions servers has returned as well. It
 could be an explanation. HBASE-5877 exists as well, but only for
 moves, not for splits...

 Cheers,

 N.


 On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
 deanforwever2...@gmail.com wrote:
  on the region server's log :2012-08-10 11:49:50,796 DEBUG
  org.apache.hadoop.hbase.regionserver.HRegionServer:
  NotServingRegionException; Region is not online:
  test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.
 
  after region split, client didnt get result after timeout setting(1.5
  second),then the task is canceled by my program, so the
 HConnectionManager
  didnt delete the cachedLocation;
  the client  still query the old region id which is no more exists
 
  And more, part of my processes updated the region location info, part
  not.I'm sure the network is fine;
 
  how to fix the problem?why does it need so long time to detect the new
  regions?

Re: HBaseTestingUtility on windows

2012-08-03 Thread N Keywal

Hi Mohit,

For simple cases, it works for me for hbase 0.94 at least. But I'm not
sure it works for all features. I've never tried to run hbase unit
tests on windows for example.

N.

On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 I am trying to run mini cluster using HBaseTestingUtility Class from hbase
 tests on windows, but I get bash command error. Is it not possible to run
 this utility class on windows?

 I followed this example:

 http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/

Re: hbase can't start:KeeperErrorCode = NoNode for /hbase

2012-08-02 Thread N Keywal

Hi,

The issue is in ZooKeeper, not directly HBase. It seems its data is
corrupted, so it cannot start. You can configure zookeeper to another
data directory to make it start.

N.


On Thu, Aug 2, 2012 at 11:11 AM, abloz...@gmail.com abloz...@gmail.com wrote:
 I even move /hbase to hbase2, and create a new dir /hbase1, modify
 hbase-site.xml to:
 property
 namehbase.rootdir/name
 valuehdfs://Hadoop48:54310/hbase1/value
 /property
  property
 namezookeeper.znode.parent/name
 value/hbase1/value
 /property

 But the error message still  KeeperErrorCode = NoNode for /hbase

 Any body can give any help?
 Thanks!

 Andy zhou

 2012/8/2 abloz...@gmail.com abloz...@gmail.com

 hi all,
 After I killed all java process, I can't restart hbase, it reports:

 Hadoop46: starting zookeeper, logging to
 /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop46.out
 Hadoop47: starting zookeeper, logging to
 /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop47.out
 Hadoop48: starting zookeeper, logging to
 /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop48.out
 Hadoop46: java.lang.RuntimeException: Unable to run quorum server
 Hadoop46:   at
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
 Hadoop46:   at
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
 Hadoop46:   at
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
 Hadoop46:   at
 org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74)
 Hadoop46:   at
 org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64)
 Hadoop46: Caused by: java.io.IOException: Failed to process transaction
 type: 1 error: KeeperErrorCode = NoNode for /hbase
 Hadoop46:   at
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 Hadoop46:   at
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 Hadoop46:   at
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
 Hadoop47: java.lang.RuntimeException: Unable to run quorum server
 Hadoop47:   at
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
 Hadoop47:   at
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
 Hadoop47:   at
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
  Hadoop47:   at
 org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74)
 Hadoop47:   at
 org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64)
 Hadoop47: Caused by: java.io.IOException: Failed to process transaction
 type: 1 error: KeeperErrorCode = NoNode for /hbase
 Hadoop47:   at
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
 Hadoop47:   at
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
 Hadoop47:   at
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)

 while Hadoop48 is HMaster.
 but hdfs://xxx/hbase is existed.
 [zhouhh@Hadoop47 ~]$ hadoop fs -ls /hbase
 Found 113 items
 drwxr-xr-x   - zhouhh supergroup  0 2012-07-03 19:24 /hbase/-ROOT-
 drwxr-xr-x   - zhouhh supergroup  0 2012-07-03 19:24 /hbase/.META.
 ...

 So what's the problem?
 Thanks!

 andy

Re: Region Server failure due to remote data node errors

2012-07-30 Thread N Keywal

Hi Jay,

Yes, the whole log would be interesting, plus the logs of the datanode
on the same box as the dead RS.
What's your hbase  hdfs versions?

The RS should be immune to hdfs errors. There are known issues (see
HDFS-3701), but it seems you have something different...
This:
 java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949
 remote=/10.128.204.225:50010]

Seems to say that the error was between the datanode on the same box as the RS?

Nicolas

On Mon, Jul 30, 2012 at 6:43 PM, Jay T jay.pyl...@gmail.com wrote:
  A couple of our region servers (in a 16 node cluster) crashed due to
 underlying Data Node errors. I am trying to understand how errors on remote
 data nodes impact other region server processes.

 *To briefly describe what happened:
 *
 1) Cluster was in operation. All 16 nodes were up, reads and writes were
 happening extensively.
 2) Nodes 7 and 8 were shutdown for maintenance. (No graceful shutdown DN and
 RS service were running and the power was just pulled out)
 3) Nodes 2 and 5 flushed and DFS client started reporting errors. From the
 log it seems like DFS blocks were being replicated to the nodes that were
 shutdown (7 and 8) and since replication could not go through successfully
 DFS client raised errors on 2 and 5 and eventually the RS itself died.

 The question I am trying to get an answer for is : Is a Region Server immune
 from remote data node errors (that are part of the replication pipeline) or
 not. ?
 *
 Part of the Region Server Log:* (Node 5)

 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream 10.128.204.225:50010 java.io.IOException: Bad
 connect ack with firstBadLink
 as 10.128.204.228:50010
 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-316956372096761177_489798
 2012-07-26 18:53:15,246 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
 datanode 10.128.204.228:50010
 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.StoreFile:
 NO General Bloom and NO DeleteFamily was added to HFile
 (hdfs://Node101:8020/hbase/table/754de060
 c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da124)
 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.Store:
 Flushed , sequenceid=4046717645, memsize=256.5m, into tmp file
 hdfs://Node101:8020/hbase/table/754de0
 60c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da1242012-07-26
 18:53:16,907 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming
 flushed file at
 hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/.tmp/26f5c
 d1fb2cb4547972a31073d2da124 to
 hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2da124
 2012-07-26 18:53:16,921 INFO org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2d
 a124, entries=1137956, sequenceid=4046717645, filesize=13.2m2012-07-26
 18:53:32,048 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
 java.net.SocketTimeoutException: 15000 millis timeout while waiting for
 channel to be ready for write. ch :
 java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949
 remote=/10.128.204.225:50010]
 at
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
 at
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
 at
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
 at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2857)
 2012-07-26 18:53:32,049 WARN org.apache.hadoop.hdfs.DFSClient: Error
 Recovery for block blk_5116092240243398556_489796 bad datanode[0]
 10.128.204.225:50010
 2012-07-26 18:53:32,049 WARN org.apache.hadoop.hdfs.DFSClient: Error
 Recovery for block blk_5116092240243398556_489796 in pipeline
 10.128.204.225:50010, 10.128.204.221:50010, 10.128.204.227:50010: bad
 datanode 10.128.204.225:50010

 I can pastebin the entire log but this is when things started going wrong
 for Node 5 and eventually shutdown hook for RS started and the RS was
 shutdown.

 Any help in troubleshooting this is greatly appreciated.

 Thanks,
 Jay

Re: Region Server failure due to remote data node errors

2012-07-30 Thread N Keywal

Hi Jay,

As you said aldready, the pipeline for blk_5116092240243398556_489796
contains only dead nodes, and this is likely the cause for the wrong
behavior.
This block is used by a hlog file, created just before the error. I
don't get why there are 3 nodes in the pipeline, I would expect only
2. Do you have a specific setting for dfs.replication?

Log files are specific, HBase checks that the replication really
occurs by checking the replication count, and close them if it's not
ok. But it seems that all the nodes are dead from the start, and this
could be ill-managed in HBase. Reproducing this may be difficult, but
should be possible.

Then the region server is stopped, but I didn't see in the logs what
was the path for this, so it's surprising to say the least.

After this, all the errors on 'already closed are not that critical
imho: the close will fail as hdfs closes the file when it cannot
recover from an error.

I guess your question is still opened. But from what I see it could be
a HBase bug.

I will be interested to know the conclusions of your analysis...

Nicolas

On Mon, Jul 30, 2012 at 8:01 PM, Jay T jay.pyl...@gmail.com wrote:
Thanks for the quick reply Nicolas. We are using HBase 0.94 on Hadoop
1.0.3.

I have uploaded the logs here:

Region Server log: http://pastebin.com/QEQ22UnU
Data Node log: http://pastebin.com/DF0JNL8K

Appreciate your help in figuring this out.

Thanks,
Jay

On 7/30/12 1:02 PM, N Keywal wrote:

Hi Jay,

Yes, the whole log would be interesting, plus the logs of the datanode
on the same box as the dead RS.
What's your hbase hdfs versions?

The RS should be immune to hdfs errors. There are known issues (see
HDFS-3701), but it seems you have something different...
This:

java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949
remote=/10.128.204.225:50010]

Seems to say that the error was between the datanode on the same box as
the RS?

Nicolas

On Mon, Jul 30, 2012 at 6:43 PM, Jay Tjay.pyl...@gmail.com wrote:

A couple of our region servers (in a 16 node cluster) crashed due to
underlying Data Node errors. I am trying to understand how errors on
remote
data nodes impact other region server processes.

*To briefly describe what happened:
*
1) Cluster was in operation. All 16 nodes were up, reads and writes were
happening extensively.
2) Nodes 7 and 8 were shutdown for maintenance. (No graceful shutdown DN
and
RS service were running and the power was just pulled out)
3) Nodes 2 and 5 flushed and DFS client started reporting errors. From
the
log it seems like DFS blocks were being replicated to the nodes that were
shutdown (7 and 8) and since replication could not go through
successfully
DFS client raised errors on 2 and 5 and eventually the RS itself died.

The question I am trying to get an answer for is : Is a Region Server
immune
from remote data node errors (that are part of the replication pipeline)
or
not. ?
*
Part of the Region Server Log:* (Node 5)

2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Exception
in
createBlockOutputStream 10.128.204.225:50010 java.io.IOException: Bad
connect ack with firstBadLink
as 10.128.204.228:50010
2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_-316956372096761177_489798
2012-07-26 18:53:15,246 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
datanode 10.128.204.228:50010
2012-07-26 18:53:16,903 INFO
org.apache.hadoop.hbase.regionserver.StoreFile:
NO General Bloom and NO DeleteFamily was added to HFile
(hdfs://Node101:8020/hbase/table/754de060
c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da124)
2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.Store:
Flushed , sequenceid=4046717645, memsize=256.5m, into tmp file
hdfs://Node101:8020/hbase/table/754de0

60c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da1242012-07-26
18:53:16,907 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming
flushed file at

hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/.tmp/26f5c
d1fb2cb4547972a31073d2da124 to

hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2da124
2012-07-26 18:53:16,921 INFO org.apache.hadoop.hbase.regionserver.Store:
Added

hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2d
a124, entries=1137956, sequenceid=4046717645, filesize=13.2m2012-07-26
18:53:32,048 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception:
java.net.SocketTimeoutException: 15000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949
remote=/10.128.204.225:50010]
at

org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at

org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146

Re: Lowering HDFS socket timeouts

2012-07-18 Thread N Keywal

Hi Bryan,

It's a difficult question, because dfs.socket.timeout is used all over
the place in hdfs. I'm currently documenting this.
Especially:
- it's used for connections between datanodes, and not only for
connections between hdfs clients  hdfs datanodes.
- It's also used for the two types of datanodes connection (ports
beeing 50010  50020 by default).
- It's used as a connect timeout, but as well as a read timeout
(socket is connected, but the application does not write for a while)
- It's used with various extensions, so when your seeing stuff like
69000 or 66000 it's often the same setting timeout + 3s (hardcoded) *
#replica

For a single datanode issue, with everything going well, it will make
the cluster much more reactive: hbase will go to another node
immediately instead of waiting. But it will make it much more
sensitive to gc and network issues. If you have a major hardware
issue, something like 10% of your cluster going down, this setting
will multiply the number of retries, and will add a lot of workload to
your already damaged cluster, and this could make the things worse.

This said, I think we will need to make it shorter sooner or later, so
if you do it on your cluster, it will be helpful...

N.

On Tue, Jul 17, 2012 at 7:11 PM, Bryan Beaudreault
bbeaudrea...@gmail.com wrote:
 Today I needed to restart one of my region servers, and did so without 
 gracefully shutting down the datanode.  For the next 1-2 minutes we had a 
 bunch of failed queries from various other region servers trying to access 
 that datanode.  Looking at the logs, I saw that they were all socket timeouts 
 after 6 milliseconds.

 We use HBase mostly as an online datastore, with various APIs powering 
 various web apps and external consumers.  Writes come from both the API in 
 some cases, but we have continuous hadoop jobs feeding data in as well.

 Since we have web app consumers, this 60 second timeout seems unreasonably 
 long.  If a datanode goes down, ideally the impact would be much smaller than 
 that.  I want to lower the dfs.socket.timeout to something like 5-10 seconds, 
 but do not know the implications of this.

 In googling I did not find much precedent for this, but I did find some 
 people talking about upping the timeout to much longer than 60 seconds.  Is 
 it generally safe to lower this timeout dramatically if you want faster 
 failures? Are there any downsides to this?

 Thanks

 --
 Bryan Beaudreault

Re: Lowering HDFS socket timeouts

2012-07-18 Thread N Keywal

I don't know. The question is mainly for the read time out: you will
connect to the ipc.Client with a read timeout of let say 10s. Server
side the implementation may do something with another server, with a
connect  read timeout of 60s. So if you have:
HBase -- live DN -- dead DN

The timeout will be triggered in HBase while the live DN is still
waiting for the answer from the dead dn. It could even retry on
another node.
 On paper, this should work, as this could happen in real life without
changing the dfs timeouts.. And may be this case does not even exist.
But as the extension mechanism is designed to add some extra seconds,
it could exist for this reason or something alike. Worth asking on the
hdfs mailing list I would say.

On Wed, Jul 18, 2012 at 4:28 PM, Bryan Beaudreault
bbeaudrea...@hubspot.com wrote:
 Thanks for the response, N.  I could be wrong here, but since this problem is 
 in the HDFS client code, couldn't I set this dfs.socket.timeout in my 
 hbase-site.xml and it would only affect hbase connections to hdfs?  I.e. we 
 wouldn't have to worry about affecting connections between datanodes, etc.

 --
 Bryan Beaudreault


 On Wednesday, July 18, 2012 at 4:38 AM, N Keywal wrote:

 Hi Bryan,

 It's a difficult question, because dfs.socket.timeout is used all over
 the place in hdfs. I'm currently documenting this.
 Especially:
 - it's used for connections between datanodes, and not only for
 connections between hdfs clients  hdfs datanodes.
 - It's also used for the two types of datanodes connection (ports
 beeing 50010  50020 by default).
 - It's used as a connect timeout, but as well as a read timeout
 (socket is connected, but the application does not write for a while)
 - It's used with various extensions, so when your seeing stuff like
 69000 or 66000 it's often the same setting timeout + 3s (hardcoded) *
 #replica

 For a single datanode issue, with everything going well, it will make
 the cluster much more reactive: hbase will go to another node
 immediately instead of waiting. But it will make it much more
 sensitive to gc and network issues. If you have a major hardware
 issue, something like 10% of your cluster going down, this setting
 will multiply the number of retries, and will add a lot of workload to
 your already damaged cluster, and this could make the things worse.

 This said, I think we will need to make it shorter sooner or later, so
 if you do it on your cluster, it will be helpful...

 N.

 On Tue, Jul 17, 2012 at 7:11 PM, Bryan Beaudreault
 bbeaudrea...@gmail.com (mailto:bbeaudrea...@gmail.com) wrote:
  Today I needed to restart one of my region servers, and did so without 
  gracefully shutting down the datanode. For the next 1-2 minutes we had a 
  bunch of failed queries from various other region servers trying to access 
  that datanode. Looking at the logs, I saw that they were all socket 
  timeouts after 6 milliseconds.
 
  We use HBase mostly as an online datastore, with various APIs powering 
  various web apps and external consumers. Writes come from both the API in 
  some cases, but we have continuous hadoop jobs feeding data in as well.
 
  Since we have web app consumers, this 60 second timeout seems unreasonably 
  long. If a datanode goes down, ideally the impact would be much smaller 
  than that. I want to lower the dfs.socket.timeout to something like 5-10 
  seconds, but do not know the implications of this.
 
  In googling I did not find much precedent for this, but I did find some 
  people talking about upping the timeout to much longer than 60 seconds. Is 
  it generally safe to lower this timeout dramatically if you want faster 
  failures? Are there any downsides to this?
 
  Thanks
 
  --
  Bryan Beaudreault

Re: Maximum number of tables ?

2012-07-13 Thread N Keywal

Hi,

There is no real limits as far as I know. As you will have one region
per table (at least :-), the number of region will be something to
monitor carefully  if you need thousands of table. See
http://hbase.apache.org/book.html#arch.regions.size.

Don't forget that you can add as many column as you want, and that an
empty cell cost nothing. For example, a class hierarchy is often
mapped to multiple tables in a RDBMS, while in HBase having a single
table for the same hierarchy makes much more sense. Moreover, there is
no transaction between tables, so sometimes a 'uml composition' will
go to a single table. And so on.

N.

On Fri, Jul 13, 2012 at 9:04 AM, Adrien Mogenet
adrien.moge...@gmail.com wrote:
 Hi there,

 I read some good practices about number of columns / column families, but
 nothing about the number of tables.
 What if I need to spread my data among hundred or thousand (big) tables ?
 What should I care about ? I guess I should keep a tight number of
 storeFiles per RegionServer ?

 --
 Adrien Mogenet
 http://www.mogenet.me

Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread N Keywal

Thanks for the jira.
The client can be connected to multiple RS, depending on the rows is
working on. So yes it's initial, but it's a dynamic initial :-).
This said there is a retry on error...

On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma svarma...@gmail.com wrote:
I will create a JIRA ticket ...

The only side-effect I could think of is ... if a RS is having a GC of
a few seconds, any _new_ client trying to connect would get connect
failures. So ... the _initial_ connection to the RS is what would
suffer from a super-low setting of the ipc.socket.timeout. This was my
read of the code.

So - was hoping to get a confirmation if this is the only side effect.
Again - this is on the client side - I wouldn't risk doing this on the
cluster side ...
--Suraj

On Mon, Jul 9, 2012 at 9:44 AM, N Keywal nkey...@gmail.com wrote:
Hi,

What you're describing -the 35 minutes recovery time- seems to match
the code. And it's a bug (still there on trunk). Could you please
create a jira for it? If you have the logs it even better.

Lowering the ipc.socket.timeout seems to be an acceptable partial
workaround. Setting it to 10s seems ok to me. Lower than this... I
don't know.

On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma svarma...@gmail.com wrote:
Hello:
I'd like to get advice on the below strategy of decreasing the
ipc.socket.timeout configuration on the HBase Client side ... has
anyone tried this? Has anyone had any issues with configuring this
lower than the default 20s?

Thanks,
--Suraj

On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma svarma...@gmail.com wrote:
By power down below, I mean powering down the host with the RS that
holds the .META. table. (So - essentially, the host IP is unreachable
and the RS/DN is gone.)

Just wanted to clarify my below steps ...
--S

On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma svarma...@gmail.com wrote:
Hello:
We've been doing some failure scenario tests by powering down a .META.
holding region server host and while the HBase cluster itself recovers
and reassigns the META region and other regions (after we tweaked down
the default timeouts), our client apps using HBaseClient take a long
time to recover.

hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23

Process:
1) Apply load via client app on HBase cluster for several minutes
2) Power down the region server holding the .META. server
3) Measure how long it takes for cluster to reassign META table and
for client threads to re-lookup and re-orient to the lesser cluster
(minus the RS and DN on that host).

What we see:
1) Client threads spike up to maxThread size ... and take over 35 mins
to recover (i.e. for the thread count to go back to normal) - no calls
are being serviced - they are all just backed up on a synchronized
method ...

2) Essentially, all the client app threads queue up behind the
HBaseClient.setupIOStreams method in oahh.ipc.HBaseClient
(http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#312).
http://tinyurl.com/7js53dj

After taking several thread dumps we found that the thread within this
synchronized method was blocked on
NetUtils.connect(this.socket, remoteId.getAddress(),
getSocketTimeout(conf));

Essentially, the thread which got the lock would try to connect to the
dead RS (till socket times out), retrying, and then the next thread
gets in and so forth.

Solution tested:
---
So - the ipc.HBaseClient code shows ipc.socket.timeout default is 20s.
We dropped this down to a low number (1000 ms, 100 ms, etc) and the
recovery was much faster (in a couple of minutes).

So - we're thinking of setting the HBase client side hbase-site.xml
with an ipc.socket.timeout of 100ms. Looking at the code, it appears
that this is only ever used during the initial HConnection setup via
the NetUtils.connect and should only ever be used when connectivity to
a region server is lost and needs to be re-established. i.e it does
not affect the normal RPC actiivity as this is just the connect
timeout.

Am I reading the code right? Any thoughts on how whether this is too
low for comfort? (Our internal tests did not show any errors during
normal operation related to timeouts etc ... but, I just wanted to run
this by the experts.).

Note that this above timeout tweak is only on the HBase client side.
Thanks,
--Suraj

Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread N Keywal

I expect (without double checking the path in the code ;-) that the
code in HConnectionManager will retry.

On Tue, Jul 10, 2012 at 7:22 PM, Suraj Varma svarma...@gmail.com wrote:
Yes.

On the maxRetries, though ... I saw the code
(http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#677)
show
this.maxRetries = conf.getInt(hbase.ipc.client.connect.max.retries, 0);

So - looks like by default, the maxRetries is set to 0? So ... there
is effectively no retry (i.e. it is fail-fast)
--Suraj

On Tue, Jul 10, 2012 at 10:12 AM, N Keywal nkey...@gmail.com wrote:
Thanks for the jira.
The client can be connected to multiple RS, depending on the rows is
working on. So yes it's initial, but it's a dynamic initial :-).
This said there is a retry on error...

On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma svarma...@gmail.com wrote:
I will create a JIRA ticket ...

So - was hoping to get a confirmation if this is the only side effect.
Again - this is on the client side - I wouldn't risk doing this on the
cluster side ...
--Suraj

On Mon, Jul 9, 2012 at 9:44 AM, N Keywal nkey...@gmail.com wrote:
Hi,

What you're describing -the 35 minutes recovery time- seems to match
the code. And it's a bug (still there on trunk). Could you please
create a jira for it? If you have the logs it even better.

Lowering the ipc.socket.timeout seems to be an acceptable partial
workaround. Setting it to 10s seems ok to me. Lower than this... I
don't know.

Thanks,
--Suraj

Just wanted to clarify my below steps ...
--S

hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23

After taking several thread dumps we found that the thread within this
synchronized method was blocked on
NetUtils.connect(this.socket, remoteId.getAddress(),
getSocketTimeout(conf));

Essentially, the thread which got the lock would try to connect to the
dead RS (till socket times out), retrying, and then the next thread
gets in and so forth.

Note that this above

Re: HBaseClient recovery from .META. server power down

2012-07-09 Thread N Keywal

Hi,

What you're describing -the 35 minutes recovery time- seems to match
the code. And it's a bug (still there on trunk). Could you please
create a jira for it? If you have the logs it even better.

Lowering the ipc.socket.timeout seems to be an acceptable partial
workaround. Setting it to 10s seems ok to me. Lower than this... I
don't know.

Thanks,
--Suraj

Just wanted to clarify my below steps ...
--S

hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23

After taking several thread dumps we found that the thread within this
synchronized method was blocked on
NetUtils.connect(this.socket, remoteId.getAddress(),
getSocketTimeout(conf));

Essentially, the thread which got the lock would try to connect to the
dead RS (till socket times out), retrying, and then the next thread
gets in and so forth.

Note that this above timeout tweak is only on the HBase client side.
Thanks,
--Suraj

Re: distributed log splitting aborted

2012-07-06 Thread N Keywal

Hi Cyril,

BTW, have you checked dfs.datanode.max.xcievers and ulimit -n? When
underconfigured they can cause this type of errors, even if it seems
it's not the case here...

Cheers,

On Fri, Jul 6, 2012 at 11:31 AM, Cyril Scetbon cyril.scet...@free.fr wrote:
The file is now missing but I have tried with another one and you can see the
error :

shell hdfs dfs -ls
/hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446
Found 1 items
-rw-r--r-- 4 hbase supergroup 0 2012-07-04 17:06
/hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446
shell hdfs dfs -cat
/hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446
12/07/06 09:27:51 WARN hdfs.DFSClient: Last block locations not available.
Datanodes might not have reported blocks completely. Will retry for 3 times
12/07/06 09:27:55 WARN hdfs.DFSClient: Last block locations not available.
Datanodes might not have reported blocks completely. Will retry for 2 times
12/07/06 09:27:59 WARN hdfs.DFSClient: Last block locations not available.
Datanodes might not have reported blocks completely. Will retry for 1 times
cat: Could not obtain the last block locations.

I'm using hadoop 2.0 from Cloudera package (CDH4) with hbase 0.92.1

Regards
Cyril SCETBON

On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote:

Interesting... Can you read the file? Try a hadoop dfs -cat on it
and see if it goes to the end of it.

It could also be useful to see a bigger portion of the master log, for
all I know maybe it handles it somehow and there's a problem
elsewhere.

Finally, which Hadoop version are you using?

Thx,

J-D

On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon cyril.scet...@free.fr wrote:
yes :

/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971

I did a fsck and here is the report :

Status: HEALTHY
Total size:618827621255 B (Total open files size: 868 B)
Total dirs:4801
Total files: 2825 (Files currently being written: 42)
Total blocks (validated): 11479 (avg. block size 53909541 B) (Total
open file blocks (not validated): 41)
Minimally replicated blocks: 11479 (100.0 %)
Over-replicated blocks:1 (0.008711561 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:4
Average block replication: 4.873
Corrupt blocks:0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 12
Number of racks: 1
FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds

The filesystem under path '/hbase' is HEALTHY

Cyril SCETBON

On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote:

Does this file really exist in HDFS?

hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711

If so, did you run fsck in HDFS?

It would be weird if HDFS doesn't report anything bad but somehow the
clients (like HBase) can't read it.

J-D

On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon cyril.scet...@free.fr
wrote:
Hi,

I can nolonger start my cluster correctly and get messages like
http://pastebin.com/T56wrJxE (taken on one region server)

I suppose Hbase is not done for being stopped but only for having some
nodes going down ??? HDFS is not complaining, it's only HBase that can't
start correctly :(

I suppose some data has not been flushed and it's not really important
for me. Is there a way to fix theses errors even if I will lose data ?

thanks

Cyril SCETBON

Re: Hmaster and HRegionServer disappearance reason to ask

2012-07-05 Thread N Keywal

Hi,

It's a ZK expiry on sunday 1st. Root cause could be the leap second bug?

N.

On Thu, Jul 5, 2012 at 8:59 AM, lztaomin lztao...@163.com wrote:
 HI ALL
   My HBase group a total of 3 machine, Hadoop HBase mounted in the same 
 machine, zookeeper using HBase own. Operation 3 months after the reported 
 abnormal as follows. Cause hmaster and HRegionServer processes are gone. 
 Please help me.
 Thanks

 The following is a log

 ABORTING region server serverName=datanode1,60020,1325326435553, 
 load=(requests=332, regions=188, usedHeap=2741, maxHeap=8165): 
 regionserver:60020-0x3488dec38a02b1 regionserver:60020-0x3488dec38a02b1 
 received expired from ZooKeeper, aborting
 Cause:
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
 2012-07-01 13:45:38,707 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
 for datanode1,60020,1325326435553
 2012-07-01 13:45:38,756 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 32 hlog(s) 
 in hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553
 2012-07-01 13:45:38,764 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of 
 32: 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352,
  length=5671397
 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering 
 file 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352
 2012-07-01 13:45:39,766 INFO org.apache.hadoop.hbase.util.FSUtils: Finished 
 lease recover attempt for 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352
 2012-07-01 13:45:39,880 INFO 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs 
 -- HDFS-200
 2012-07-01 13:45:39,925 INFO 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs 
 -- HDFS-200

 ABORTING region server serverName=datanode2,60020,1325146199444, 
 load=(requests=614, regions=189, usedHeap=3662, maxHeap=8165): 
 regionserver:60020-0x3488dec38a0002 regionserver:60020-0x3488dec38a0002 
 received expired from ZooKeeper, aborting
 Cause:
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
 2012-07-01 13:24:10,308 INFO org.apache.hadoop.hbase.util.FSUtils: Finished 
 lease recover attempt for 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341075090535
 2012-07-01 13:24:10,918 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 21 of 
 32: 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341078690560,
  length=11778108
 2012-07-01 13:24:29,809 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path 
 hdfs://namenode:9000/hbase/t_speakfor_relation_chapter/ffd2057b46da227e078c82ff43f0f9f2/recovered.edits/00660951991
  (wrote 8178 edits in 403ms)
 2012-07-01 13:24:29,809 INFO 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting 
 completed in -1268935 ms for 
 hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553
 2012-07-01 13:24:29,824 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received 
 exception accessing META during server shutdown of 
 datanode1,60020,1325326435553, retrying META read
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not 
 running, aborting
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2408)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1649)
 at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)



 lztaomin

Re: HMASTER -- odd messages ?

2012-07-03 Thread N Keywal

 Would Datanode issues impact the HMaster stability?

Yes and no. If you have only a few datanodes down, their should be no
issue. When there are enough missing datanodes to make some blocks not
available at all in the cluster, there are many tasks that can not be
done anymore (to say the least, and depending on the blocks), for the
master or for the region server. In this case the ideal contract for
the master would be to survive, does the tasks it can do, logs the
tasks it can't do. Today, the contract for the master in such
situation is more do your best but don't corrupt anything. Note that
there is an autorestart option in the scripts in the planned 0.96, so
the master can be asked to restart automatically if not stopped
properly.

N.

On Tue, Jul 3, 2012 at 7:08 PM, Jay Wilson
registrat...@circle-cross-jn.com wrote:
 My HMaster and HRegionservers start and run for awhile.

 Looking at the messages, there are appear to be some Datanodes with some
 issues, HLogSplitter has some block issues, the HMaster appears to drop
 off the network (i know bad), then it comes back, and then the cluster
 runs for about 10 more minutes before everything aborts.

 Questions:
   . Are HLogSplitter block error messages common?
   . Would Datanode issues impact the HMaster stability?
   . Other than an actual network issue is there anything that can cause
 a No route to host

 Thank you
 ---
 Jay Wilson

 2012-07-03 09:04:58,266 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers
 finished
 2012-07-03 09:04:58,273 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived
 processed log
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-03,60020,1341328322971-splitting/devrackA-03%3A60020.1341328323503
 to
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.oldlogs/devrackA-03%3A60020.1341328323503
 2012-07-03 09:04:58,275 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file
 splitting completed in 1052 ms for
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-03,60020,1341328322971-splitting
 2012-07-03 09:04:58,277 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1
 hlog(s) in
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting
 2012-07-03 09:04:58,277 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
 Thread[WriterThread-0,5,main]: starting
 2012-07-03 09:04:58,277 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
 Thread[WriterThread-1,5,main]: starting
 2012-07-03 09:04:58,278 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
 Thread[WriterThread-2,5,main]: starting
 2012-07-03 09:04:58,278 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1
 of 1:
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517,
 length=124
 2012-07-03 09:04:58,278 INFO org.apache.hadoop.hbase.util.FSUtils:
 Recovering file
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
 2012-07-03 09:04:59,282 INFO org.apache.hadoop.hbase.util.FSUtils:
 Finished lease recover attempt for
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
 2012-07-03 09:04:59,339 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Pushed=0 entries
 from
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
 2012-07-03 09:04:59,341 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Waiting for split
 writer threads to finish
 2012-07-03 09:04:59,342 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers
 finished
 2012-07-03 09:04:59,347 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived
 processed log
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
 to
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.oldlogs/devrackA-04%3A60020.1341328323517
 2012-07-03 09:04:59,349 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file
 splitting completed in 1073 ms for
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting
 2012-07-03 09:04:59,352 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1
 hlog(s) in
 hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-05,60020,1341328322976-splitting
 2012-07-03 09:04:59,352 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
 Thread[WriterThread-0,5,main]: starting
 2012-07-03 09:04:59,352 DEBUG
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
 Thread[WriterThread-1,5,main]: starting
 2012-07-03 09:04:59,352 INFO

Re: Stargate: ScannerModel

2012-06-28 Thread N Keywal

(moving this to the user mailing list, with the dev one in bcc)

From what you said it should be

customerid_MIN_TX_ID to customerid_MAX_TX_ID
But only if customerid size is constant.

Note that with this rowkey design there will be very few regions
involved, so it's unlikely to be parallelized.

N.


On Thu, Jun 28, 2012 at 7:43 AM, sameer sameer_therat...@infosys.com wrote:
 Hello,

 I want to what are the parameters for scan.setStartRow ans scan.setStopRow.

 My requirement is that I have a table, with key as customerid_transactionId.

 I want to scan all the rows, they key of which contains the customer Id that
 I have.

 I tried using rowFilter but it is quite slow.

 If I am using the scan - setStartRow and setStopRow then what would I give
 as parameters?

 Thanks,
 Sameer

 --
 View this message in context: 
 http://apache-hbase.679495.n3.nabble.com/Stargate-ScannerModel-tp2975161p4019139.html
 Sent from the HBase - Developer mailing list archive at Nabble.com.

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal

 that appears on
 the UI?

 In your random scan how many Regions are scanned whereas in gets may be
 many
 due to randomness.

 Regards
 Ram

 -Original Message-
 From: N Keywal [mailto:nkey...@gmail.com]
 Sent: Thursday, June 28, 2012 2:00 PM
 To: user@hbase.apache.org
 Subject: Re: Scan vs Put vs Get

 Hi Jean-Marc,

 Interesting :-)

 Added to Anoop questions:

 What's the hbase version you're using?

 Is it repeatable, I mean if you try twice the same gets with the
 same client do you have the same results? I'm asking because the
 client caches the locations.

 If the locations are wrong (region moved) you will have a retry loop,
 and it includes a sleep. Do you have anything in the logs?

 Could you share as well the code you're using to get the ~100 ms time?

 Cheers,

 N.

 On Thu, Jun 28, 2012 at 6:56 AM, Anoop Sam John anoo...@huawei.com
 wrote:
  Hi
      How many Gets you batch together in one call? Is this equal to
 the Scan#setCaching () that u are using?
  If both are same u can be sure that the the number of NW calls is
 coming almost same.
 
  Also you are giving random keys in the Gets. The scan will be always
 sequential. Seems in your get scenario it is very very random reads
 resulting in too many reads of HFile block from HDFS. [Block caching is
 enabled?]
 
  Also have you tried using Bloom filters?  ROW blooms might improve
 your get performance.
 
  -Anoop-
  
  From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
  Sent: Thursday, June 28, 2012 5:04 AM
  To: user
  Subject: Scan vs Put vs Get
 
  Hi,
 
  I have a small piece of code, for testing, which is putting 1B lines
  in an existing table, getting 3000 lines and scanning 1.
 
  The table is one family, one column.
 
  Everything is done randomly. Put with Random key (24 bytes), fixed
  family and fixed column names with random content (24 bytes).
 
  Get (batch) is done with random keys and scan with RandomRowFilter.
 
  And here are the results.
  Time to insert 100 lines: 43 seconds (23255 lines/seconds)
  That's correct for my needs based on the poor performances of the
  servers in the cluster. I'm fine with the results.
 
  Time to read 3000 lines: 11444.0 mseconds (262 lines/seconds)
  This is way to low. I don't understand why. So I tried the random
 scan
  because I'm not able to figure the issue.
 
  Time to read 1 lines: 108.0 mseconds (92593 lines/seconds)
  This it impressive! I have added that after I failed with the get. I
  moved from 262 lines per seconds to almost 100K lines/seconds!!! It's
  awesome!
 
  However, I'm still wondering what's wrong with my gets.
 
  The code is very simple. I'm using Get objects that I'm executing in
 a
  Batch. I tried to add a filter but it's not helping. Here is an
  extract of the code.
 
                         for (long l = 0; l  linesToRead; l++)
                         {
                                 byte[] array1 = new byte[24];
                                 for (int i = 0; i  array1.length;
 i++)
                                                 array1[i] =
 (byte)Math.floor(Math.random() * 256);
                                 Get g = new Get (array1);
                                 gets.addElement(g);
                         }
                                 Object[] results = new
 Object[gets.size()];
                                 System.out.println(new java.util.Date
 () +  \gets\ created.);
                                 long timeBefore =
 System.currentTimeMillis();
                         table.batch(gets, results);
                         long timeAfter = System.currentTimeMillis();
 
                         float duration = timeAfter - timeBefore;
                         System.out.println (Time to read  +
 gets.size() +  lines : 
  + duration +  mseconds ( + Math.round(((float)linesToRead /
  (duration / 1000))) +  lines/seconds));
 
  What's wrong with it? I can't add the setBatch neither I can add
  setCaching because it's not a scan. I tried with different numbers of
  gets but it's almost always the same speed. Am I using it the wrong
  way? Does anyone have any advice to improve that?
 
  Thanks,
 
  JM

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal

Thank you. It's clearer now. From the code you sent, RandomRowFilter
is not used. You're only using the KeyOnlyFilter (the second setFilter
replaces the first one; you need to use like FilterList to combine
filters). (Note as well that you would need to initialize
RandomRowFilter#chance, if not all the rows will be filtered out.)

So, in one case -list of gets-, you're reading a well defined set of
rows (defined randomly, but well defined :-), and this set spreads all
other the regions.
In the second one (KeyOnlyFilter), you're reading the first 1K rows
you could get from the cluster.

This explains the difference between the results. Activating
RandomRowFilter should not change much the results, as it's different
to select a random set of rows and to get a set of rows defined
randomly (don't know if I'm clear here...).

Unfortunately you're likely to be more interested of the performance
when there is a real selection. Your code for list of gets was correct
imho. I'm interested by the results if you activate bloomfilters.

Cheers,

N.

On Thu, Jun 28, 2012 at 3:45 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
 Hi N Keywal,

 This result:
 Time to read 1 lines : 122.0 mseconds (81967 lines/seconds)

 Is obtain with this code:
 HTable table = new HTable(config, test3);
 final int linesToRead = 1;
 System.out.println(new java.util.Date () +  Processing iteration  +
 iteration + ... );
 RandomRowFilter rrf = new RandomRowFilter();
 KeyOnlyFilter kof = new KeyOnlyFilter();

 Scan scan = new Scan();
 scan.setFilter(rrf);
 scan.setFilter(kof);
 scan.setBatch(Math.min(linesToRead, 1000));
 scan.setCaching(Math.min(linesToRead, 1000));
 ResultScanner scanner = table.getScanner(scan);
 processed = 0;
 long timeBefore = System.currentTimeMillis();
 for (Result result : scanner.next(linesToRead))
 {
        if (result != null)
                processed++;
 }
 scanner.close();
 long timeAfter = System.currentTimeMillis();

 float duration = timeAfter - timeBefore;
 System.out.println (Time to read  + linesToRead +  lines :  +
 duration +  mseconds ( + Math.round(((float)linesToRead / (duration
 / 1000))) +  lines/seconds));
 table.close ();

 This is with the scan.

 scan  80 000 lines/seconds
 put  20 000 lines/seconds
 get  300 lines/seconds

 2012/6/28, Jean-Marc Spaggiari jean-m...@spaggiari.org:
 Hi Anoop,

 Are Bloom filters for columns? If I add g.setFilter(new
 KeyOnlyFilter()); that mean I can't use bloom filters, right?
 Basically, what I'm doing here is something like
 existKey(byte[]):boolean where I try to see if a key exist in the
 database whitout taking into consideration if there is any column
 content or not. This should be very fast. Even faster than the scan
 which need to keep some tracks of where I'm reading for the next row.

 JM

 2012/6/28, Anoop Sam John anoo...@huawei.com:
blockCacheHitRatio=69%
 Seems blocks you are getting from cache.
 You can check with Blooms also once.

 You can enable the usage of bloom using the config param
 io.storefile.bloom.enabled set to true  . This will enable the usage of
 bloom globally
 Now you need to set the bloom type for your CF
 HColumnDescriptor#setBloomFilterType()   U can check with type
 BloomType.ROW

 -Anoop-

 _
 From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
 Sent: Thursday, June 28, 2012 5:42 PM
 To: user@hbase.apache.org
 Subject: Re: Scan vs Put vs Get

 Oh! I never looked at this part ;) Ok. I have it.

 Here are the numbers for one server before the read:

 blockCacheSizeMB=186.28
 blockCacheFreeMB=55.4
 blockCacheCount=2923
 blockCacheHitCount=195999
 blockCacheMissCount=89297
 blockCacheEvictedCount=69858
 blockCacheHitRatio=68%
 blockCacheHitCachingRatio=72%

 And here are the numbers after 100 iterations of 1000 gets for  the same
 server:

 blockCacheSizeMB=194.44
 blockCacheFreeMB=47.25
 blockCacheCount=3052
 blockCacheHitCount=232034
 blockCacheMissCount=103250
 blockCacheEvictedCount=83682
 blockCacheHitRatio=69%
 blockCacheHitCachingRatio=72%

 Don't forget that there is between 40B and 50B of lines in the table,
 so I don't think the servers can store all of them in memory. And
 since I'm accessing based on a random key, odds to have the right row
 in memory are small I think.

 JM

 2012/6/28, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com:
 In 0.94

 The UI of the RS has a metrics table.  In that you can see
 blockCacheHitCount, blockCacheMissCount etc.  May be there is a
 variation
 when you do scan() and get() here.

 Regards
 Ram



 -Original Message-
 From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org]
 Sent: Thursday, June 28, 2012 4:44 PM
 To: user@hbase.apache.org
 Subject: Re: Scan vs Put vs Get

 Wow. First, thanks a lot all for jumping into this.

 Let me try to reply to everyone in a single post.

  How many Gets you batch together in one call
 I tried with multiple different values from 10 to 3000 with similar
 results.
 Time to read

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal

For the filter list my guess is that you're filtering out all rows
because RandomRowFilter#chance is not initialized (it should be
something like RandomRowFilter rrf = new RandomRowFilter(0.5);)
But note that this test will never be comparable to the test with a
list of gets. You can make it as slow/fast as you want by playing with
the 'chance' parameter.

The results with gets and bloom filter are also in the interesting
category, hopefully an expert will get in the loop...



On Thu, Jun 28, 2012 at 6:04 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
 Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I
 mean, bad I did not figured that. Thanks for pointing that. That
 definitively explain the difference in the performances.

 I have activated the bloomfilters with this code:
 HBaseAdmin admin = new HBaseAdmin(config);
 HTable table = new HTable(config, test3);
 System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
 HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0];
 cd.setBloomFilterType(BloomType.ROW);
 admin.disableTable(test3);
 admin.modifyColumn(test3, cd);
 admin.enableTable(test3);
 System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);

 And here is the result for the first attempt (using gets):
 {NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'NONE',
 REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION = 'NONE',
 MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS =
 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK =
 'true', BLOCKCACHE = 'true'}
 {NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROW',
 REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION = 'NONE',
 MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS =
 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK =
 'true', BLOCKCACHE = 'true'}
 Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0...
 Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds)

 2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds)
 3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds)
 After few more iterations (about 30), I'm between 200 and 250
 lines/seconds, like before.

 Regarding the filterList, I tried, but now I'm getting this error from
 the servers:
 org.apache.hadoop.hbase.regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease
 '-6376193724680783311' does not exist
 Here is the code:
        final int linesToRead = 1;
        System.out.println(new java.util.Date () +  Processing iteration  +
 iteration + ... );
        RandomRowFilter rrf = new RandomRowFilter();
        KeyOnlyFilter kof = new KeyOnlyFilter();
        Scan scan = new Scan();
        ListFilter filters = new ArrayListFilter();
        filters.add(rrf);
        filters.add(kof);
        FilterList filterList = new FilterList(filters);
        scan.setFilter(filterList);
        scan.setBatch(Math.min(linesToRead, 1000));
        scan.setCaching(Math.min(linesToRead, 1000));
        ResultScanner scanner = table.getScanner(scan);
        processed = 0;
        long timeBefore = System.currentTimeMillis();
        for (Result result : scanner.next(linesToRead))
        {
                System.out.println(Result:  + result); //
                if (result != null)
                        processed++;
        }
        scanner.close();

 It's failing when I try to do for (Result result :
 scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and
 1 with the same result :(

 I will try to find the root cause, but if you have any hint, it's welcome.

 JM

Re: HBase first steps: Design a table

2012-06-13 Thread N Keywal

Hi,

 Usually I'm inserting about 40 000 rows at a time. Should I do 40 000
 calls to put? Or is there any bulkinsert method?

There is this chapter on bulk loading:
http://hbase.apache.org/book.html#arch.bulk.load
But for 40K rows you may just want to use void put(final ListPut
puts) in HTableInterface, that will save a lot of rpc calls.

Cheers,

N.

Re: Region is not online Execptions

2012-06-07 Thread N Keywal

Hi,

You can have this if the region moved, i.e. was previously managed by
this region server and is now managed by another. The client keeps a
cache of the locations, so after a move it will first contact the
wrong server. Then the client will update its cache. By default there
are 10 internal retries so the next retry will be the right one, and
this error should not be seen in the customer code.

 In 0.96 the regions server will send back a RegionMovedException with
the new location if the move is not too old (less than around ~5
minutes if I remember well).

N.

On Thu, Jun 7, 2012 at 9:36 PM, arun sirimalla arunsi...@gmail.com wrote:
 Hi,

 My Hbase cluster seems to work fine, but i see some exepctions in one of
 the RegionServer  with below message

 2012-06-07 19:24:48,809 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-07 19:24:56,154 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 NotServingRegionException; Region is not online: -ROOT-,,0

 Though this regionserver is not Hosting the ROOT region. The -ROOT- region
 is hosted by another Regionserver. Can someone please tell me why these
 exceptions occur.

 Thanks
 Arun

Re: hosts unreachables

2012-06-01 Thread N Keywal

Yes, this is the balance process (as its name says: keeps the cluster
balanced), and it's not related to the process of looking after dead
nodes.
The nodes are monitored by ZooKeeper, the timeout is by default 180
seconds (setting: zookeeper.session.timeout)

On Fri, Jun 1, 2012 at 4:40 PM, Cyril Scetbon cyril.scet...@free.fr wrote:
 I've another regionserver (hb-d2) that crashed (I can easily reproduce the
 issue by continuing injections), and as I see in master log, it gets
 information about hb-d2 every 5 minutes. I suppose it's what helps him to
 note if a node is dead or not. However it adds hb-d2 to the dead node list
 at 13:32:20, so before 5 minutes since the last time it got the server
 information. Is it normal ?

 2012-06-01 13:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
 Server information: hb-d5,60020,1338553124247=47,
 hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
 hb-d10,60020,1338553126695=47, hb-d6,60020,133
 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
 hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
 hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
 ..
 2012-06-01 13:07:36,319 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
 Server information: hb-d5,60020,1338553124247=47,
 hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
 hb-d10,60020,1338553126695=47, hb-d6,60020,133
 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
 hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
 hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
 ..
 2012-06-01 13:12:36,328 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
 Server information: hb-d5,60020,1338553124247=47,
 hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
 hb-d10,60020,1338553126695=47, hb-d6,60020,133
 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
 hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
 hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
 ..
 2012-06-01 13:17:36,337 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
 Server information: hb-d5,60020,1338553124247=47,
 hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
 hb-d10,60020,1338553126695=47, hb-d6,60020,133
 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
 hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
 hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
 ..
 2012-06-01 13:22:36,346 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
 Server information: hb-d5,60020,1338553124247=47,
 hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
 hb-d10,60020,1338553126695=47, hb-d6,60020,133
 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
 hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
 hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
 ..
 2012-06-01 13:27:36,353 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
 Server information: hb-d5,60020,1338553124247=47,
 hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
 hb-d10,60020,1338553126695=47, hb-d6,60020,133
 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
 hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
 hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
 ..
 2012-06-01 13:32:20,048 INFO
 org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
 ephemeral node deleted, processing expiration [hb-d2,60020,1338553126560]
 2012-06-01 13:32:20,048 DEBUG org.apache.hadoop.hbase.master.ServerManager:
 Added=hb-d2,60020,1338553126560 to dead servers, submitted shutdown handler
 to be executed, root=false, meta=false
 2012-06-01 13:32:20,048 INFO
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs
 for hb-d2,60020,1338553126560



 On 6/1/12 3:25 PM, Cyril Scetbon wrote:

 I've added hbase.hregion.memstore.mslab.enabled = true to the
 configuration of all regionservers and add flags -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
 -XX:CMSInitiatingOccupancyFraction=60 to the hbase environment
 However my regionservers are still crashing when I load data into the
 cluster

 Here are the logs for the node hb-d3 that crashed at 12:56

 - GC logs : http://pastebin.com/T0d0y8pZ
 - regionserver logs : http://pastebin.com/n6v9x3XM

 thanks

 On 5/31/12 11:12 PM, Jean-Daniel Cryans wrote:

 Both, also you could bigger log snippets (post them on something like
 pastebin.com) and we could see more evidence of the issue.

 J-D

 On Thu, May 31, 2012 at 2:09 PM, Cyril Scetboncyril.scet...@free.fr
  wrote:

 On 5/31/12 11:00 PM, Jean-Daniel Cryans wrote:

 What I'm seeing looks more like GC issues. Start reading this:
 http://hbase.apache.org/book.html#gc

 J-D

 Hi,

 Really not sure cause I've enabled gcc's verbose option and I don't see
 anything taking a long time. Maybe I can check again on one node. On
 which
 node do

Re: Null rowkey with empty get operation

2012-05-29 Thread N Keywal

There is a one to one mapping between the result and the get arrays;
so the result for rowkeys[i] is in results[i].
That's not what you want?

On Tue, May 29, 2012 at 9:34 AM, Ben Kim benkimkim...@gmail.com wrote:
 Maybe I showed you a bad example. This makes more sense when it comes to
 using ListGet
 For instance,

 ListGet gets = new ArrayList();
 for(String rowkey : rowkeys){
  Get get = new Get(Bytes.toBytes(rowkey));
  get.addFamily(family);
  Filter filter = new QualifierFilter(CompareOp.NOT_EQUAL, new
 BinaryComparator(item));
  get.setFilter(filter);
  gets.add(get);
 }
 Result[] results = table.get(get);

 Now I have multiple results, I need to find the rowkey of the result that
 has no keyvalue.
 but results[0].getRow() is null if results[0] has no keyvalue.  so it's
 hard to derive which row the empty result belongs to :(

 Thank you for your response,
 Ben



 On Tue, May 29, 2012 at 2:33 PM, Anoop Sam John anoo...@huawei.com wrote:

 Hi Ben,
      In HBase rowkey exists with KVs only. As in your case there is no KVs
 in the result, and so no rowkey. What is the use case that you are
 referring here? When you issued Get with a rowkey and empty result for that
 , you know the rowkey already right? I mean any specific reason why you try
 to find the rowkey from the result object?

 -Anoop-

 
 From: Ben Kim [benkimkim...@gmail.com]
 Sent: Tuesday, May 29, 2012 6:42 AM
 To: user@hbase.apache.org
 Subject: Null rowkey with empty get operation

 I have following Get code with HBase 0.92.0

 Get get = new Get(Bytes.toBytes(rowkey));
 get.addFamily(family);
 Filter filter = new QualifierFilter(CompareOp.NOT_EQUAL, new
 BinaryComparator(item));
 get.setFilter(filter);
 Result r = table.get(get);

 System.out.println(r);  // (1) prints keyvalues=NONE
 System.out.println(Bytes.toString(r.getRow()));  // (2) throws
 NullpointerException



 printing out the result shows that all columns in a row was filtered out.
 but i still want to print out the row key of the empty result.
 But the value of r.getRow() is null

 Shouldn't r.getRow() return the rowkey even if the keyvalues are emtpy?


 --

 *Benjamin Kim**
 benkimkimben at gmail*




 --

 *Benjamin Kim*
 **Mo : +82 10.5357.0521*
 benkimkimben at gmail*

Re: Issues with Java sample for connecting to remote Hbase

2012-05-29 Thread N Keywal

From http://hbase.apache.org/book/os.html:
HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some
other distributions, for example, will default to 127.0.1.1 and this
will cause problems for you.

It worths reading the whole section ;-).

You also don't need to set the master address: it will be read from
zookeeper. I.e. you can remove this line from your client code:
config.set(hbase.master, 10.78.32.131:60010);

N.

On Tue, May 29, 2012 at 3:46 PM, AnandaVelMurugan Chandra Mohan
ananthu2...@gmail.com wrote:
 Thanks for the response. It still errors out.

 On Tue, May 29, 2012 at 7:05 PM, Mohammad Tariq donta...@gmail.com wrote:

 change the name from localhost to something else in the line
 10.78.32.131    honeywel-4a7632    localhost and see if it works

 Regards,
     Mohammad Tariq


 On Tue, May 29, 2012 at 6:59 PM, AnandaVelMurugan Chandra Mohan
 ananthu2...@gmail.com wrote:
  I have HBase version 0.92.1 running in standalone mode. I created a table
  and added few rows using hbase shell. Now I am developing a standalone
 java
  application to connect to Hbase and retrieve the data from the table.
  *
  This is the code I am using
  *
               Configuration config = HBaseConfiguration.create();
                config.clear();
                config.set(hbase.zookeeper.quorum, 10.78.32.131);
                config.set(hbase.zookeeper.property.clientPort,2181);
                config.set(hbase.master, 10.78.32.131:60010);
 
                HBaseAdmin.checkHBaseAvailable(config);
 
 
                // This instantiates an HTable object that connects you to
  the myTable
                // table.
                HTable table = new HTable(config, asset);
 
                Get g = new Get(Bytes.toBytes(APU 331-350));
                Result r = table.get(g);
 
  *This is the content of my /etc/hosts file*
 
  #127.0.0.1    localhost.localdomain    localhost
  #10.78.32.131   honeywel-4a7632
  #127.0.1.1    honeywel-4a7632
  ::1    honeywel-4a7632    localhost6.localdomain6    localhost6
  10.78.32.131    honeywel-4a7632    localhost
  *
  This is part of my error stack trace*
 
  12/05/29 18:53:33 INFO
 client.HConnectionManager$HConnectionImplementation:
  getMaster attempt 0 of 1 failed; no more retrying.
  java.net.ConnectException: Connection refused: no further information
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
     at
 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
     at
 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
     at
 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
     at
 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
     at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
     at $Proxy5.getProtocolVersion(Unknown Source)
     at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
     at
 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642)
     at
 org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:106)
     at
 
 org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1553)
     at hbaseMain.main(hbaseMain.java:27)
  12/05/29 18:53:33 INFO
 client.HConnectionManager$HConnectionImplementation:
  Closed zookeeper sessionid=0x13798c3ce190003
  12/05/29 18:53:33 INFO zookeeper.ZooKeeper: Session: 0x13798c3ce190003
  closed
  12/05/29 18:53:33 INFO zookeeper.ClientCnxn: EventThread shut down
 
  Can some one help me fix this? Thanks a lot.
  --
  Regards,
  Anand




 --
 Regards,
 Anand

Re: understanding the client code

2012-05-29 Thread N Keywal

Hi,

If you're speaking about preparing the query it's in HTable and
HConnectionManager.
If you're on the pure network level, then, on trunk, it's now done
with a third party called protobuf.

See the code from HConnectionManager#createCallable to see how it's used.

Cheers,

N.

On Tue, May 29, 2012 at 4:15 PM, S Ahmed sahmed1...@gmail.com wrote:
 I'm looking at the client code here:
 https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client

 Is this the high level operations, and the actual sending of this data over
 the network is done somewhere else?

 For example, during a PUT, you may want it to write to n nodes, where is
 the code that does that? And the actual network connection etc?

Re: understanding the client code

2012-05-29 Thread N Keywal

There are two levels:
- communication between hbase client and hbase cluster: this is the
code you have in hbase client package. As a end user you don't really
care, but you care if you want to learn hbase internals.
- communication between customer code and hbase as a whole if you
don't want to use the hbase client. Then several options are
available, thrift being one of them (I'm not sure of avro status).

What do you want to do exactly?

On Tue, May 29, 2012 at 4:33 PM, S Ahmed sahmed1...@gmail.com wrote:
So how does thrift and avro fit into the picture? (I believe I saw
references to that somewhere, are those alternate connection libs?)

I know protobuf is just generating types for various languages...

On Tue, May 29, 2012 at 10:26 AM, N Keywal nkey...@gmail.com wrote:

Hi,

If you're speaking about preparing the query it's in HTable and
HConnectionManager.
If you're on the pure network level, then, on trunk, it's now done
with a third party called protobuf.

See the code from HConnectionManager#createCallable to see how it's used.

Cheers,

On Tue, May 29, 2012 at 4:15 PM, S Ahmed sahmed1...@gmail.com wrote:
I'm looking at the client code here:

https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client

Is this the high level operations, and the actual sending of this data
over
the network is done somewhere else?

For example, during a PUT, you may want it to write to n nodes, where is
the code that does that? And the actual network connection etc?

Re: understanding the client code

2012-05-29 Thread N Keywal

So it's the right place for the internals :-).
The main use case for the thrift api is when you have non java client code.

On Tue, May 29, 2012 at 5:07 PM, S Ahmed sahmed1...@gmail.com wrote:
I don't really want any, I just want to learn the internals :)

So why would someone not want to use the client, for data intensive tasks
like mapreduce etc. where they want direct access to the files?

On Tue, May 29, 2012 at 11:00 AM, N Keywal nkey...@gmail.com wrote:

What do you want to do exactly?

I know protobuf is just generating types for various languages...

On Tue, May 29, 2012 at 10:26 AM, N Keywal nkey...@gmail.com wrote:

Hi,

If you're speaking about preparing the query it's in HTable and
HConnectionManager.
If you're on the pure network level, then, on trunk, it's now done
with a third party called protobuf.

See the code from HConnectionManager#createCallable to see how it's
used.

Cheers,

On Tue, May 29, 2012 at 4:15 PM, S Ahmed sahmed1...@gmail.com wrote:
I'm looking at the client code here:

https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client

Is this the high level operations, and the actual sending of this data
over
the network is done somewhere else?

For example, during a PUT, you may want it to write to n nodes, where
is
the code that does that? And the actual network connection etc?

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread N Keywal

Hi,

For the multiget, if it's small enough, it will be:
- parallelized on all region servers concerned. i.e. you will be as
fast as the slowest region server.
- there will be one query per region server (i.e. gets are grouped by
region server).

If there are too many gets, it will be split in small subsets and the
strategy above will be used for each subset, doing one subset after
another (and blocking between them).

so Large set  -- Small set will be ok from this point of view. Large
-- Large won't.

N.


On Tue, May 29, 2012 at 5:54 PM, Em mailformailingli...@yahoo.de wrote:
 Ian,

 thanks for your detailed response!

 Let me give you feedback to each point:
 1. You could denormalize the additional information (e.g. course
 name) into the students table. Then, you're simply reading the
 student row, and all the info you need is there. That places an extra
 burden of write time and disk space, and does make you do a lot more
 work when a course name changes.
 That's exactly what I thought about and that's why I avoid it. The
 students and courses example is an example you find at several points on
 the web, when describing the differences and translations of relations
 from an RDBMS into a Key-Value-store.
 In fact, everything you model with a Key-Value-storage like HBase,
 Cassandra etc. can be modeled as an RDMBS-scheme.
 Since a lot of people, like me, are coming from that edge, we must
 re-learn several basic things.
 It starts with understanding that you model a K-V-storage the way you
 want to access the data, not as the data relates to eachother (in
 general terms) and ends with translating the connections of data into a
 K-V-schema as good as possible.


 2. You could do what you're talking about in your HBase access code:
 find the list of course IDs you need for the student, and do a multi
 get on the course table. Fundamentally, this won't be much more
 efficient to do in batch mode, because the courses are likely to be
 evenly spread out over the region servers (orthogonal to the
 students). You're essentially doing a hash join, except that it's a
 lot less pleasant than on a relational DB b/c you've got network
 round trips for each GET. The disk blocks from the course table (I'm
 assuming it's the smaller side) will likely be cached so at least
 that part will be fast--you'll be answering those questions from
 memory, not via disk IO.

 Whow, what?
 I thought a Multiget would reduce network-roundtrips as it only accesses
 each region *one* time, fetching all the queried keys and values from
 there. If your data is randomly distributed, this could result in the
 same costs as with doing several Gets in a loop, but should work better
 if several Keys are part of the same region.
 Am I right or did I missunderstood the concept???

 3. You could also let a higher client layer worry about this. For
 example, your data layer query just returns a student with a list of
 their course IDs, and then another process in your client code looks
 up each course by ID to get the name. You can then put an external
 caching layer (like memcached) in the middle and make things a lot
 faster (though that does put the burden on you to have the code path
 for changing course info also flush the relevant cache entries). In
 your example, it's unlikely any institution would have more than a
 few thousand courses, so they'd probably all stay in memory and be
 served instantaneously.
 Hm, in what way does this give me an advantage over using HBase -
 assuming that the number of courses is small enough to fit in RAM - ?
 I know that Memcached is optimized for this purpose and might have much
 faster response times - no doubts.
 However, from a conceptual point of view: Why does Memcached handles the
 K-V-distribution more efficiently than a HBase with warmed caches?
 Hopefully this question isn't that hard :).

 This might seem laborious, and to a degree it is. But note that it's 
 difficult to see the utility of HBase with toy examples like this; if 
 you're really storing courses and students, don't use HBase (unless
 you've got billions of students and courses, which seems unlikely).
 The extra thought you have to put in to making schemas work for you
 in HBase is only worth it when it gives you the ability to scale to
 gigantic data sets where other solutions wouldn't.
 Well, the background is a private project. I know that it's a lot easier
 to do what I want in a RDBMS and there is no real need for using a
 highly scalable beast like HBase.
 However, I want to learn something new and since I do not break
 someone's business by trying out new technology privately, I want to go
 with HStack.
 Without ever doing it, you never get a real feeling of when to use the
 right tool.
 Using a good tool for the wrong problem can be an interesting
 experience, since you learn some of the do's and don'ts of the software
 you use.

 Since I am a reader of the MEAP-edition of HBase in Action, I am aware
 of the

Re: batch insert performance

2012-05-27 Thread N Keywal

Hi,

What version are you using?
On trunk, put(Put) and put(ListPut) calls the same code, so I would
expect  comparable performances when autoflush it set to false.

However, with 250K small puts you may have the gc playing a role.

What are the results if you do the inserts with 50 times 5K rows?

N.

On Sun, May 27, 2012 at 1:58 AM, Faruk Berksöz fberk...@gmail.com wrote:
 Codes and their results :

    Code Nmr List List Size Code Avarage Elapsed Time (sec)  1 ListPut
 batchAllRows; 250.000   table.setAutoFlush(false);
  for (Put mRow : batchAllRows) {
   table.put(mRow);
  }
  table.flushCommits(); 27       2 ListPut batchAllRows; 250.000
 table.setAutoFlush(false);
    table.put(batchAllRows);
    table.flushCommits(); 103     3 ListRow batchAllRows; 250.000
 table.setAutoFlush(false);
  Object[] results = new Object[batchAllRows.size()];
  table.batch(batchAllRows, results);
  //table.batch(batchAllRows) ; /* already tried */
  table.flushCommits(); 105
 -- Forwarded message --
 From: Faruk Berksöz fberk...@gmail.com
 Date: 2012/5/27
 Subject: batch insert performance
 To: user@hbase.apache.org


 Hi, HBase users,

 I have 250.000 Rows in a list.
 I want to insert all rows in HTable as soon as possible.
 I have 3 different Code and 3 different elapsed time.
 Why HTable.batch(List? extends Row actions, Object[] results) and
 HTable.put(ListPut
 puts) methods 4 times slower than 1.Code which inserts records to htable in
 a simple loop ?
 Codes and their results :






 Faruk

Re: Important Undefined Error

2012-05-14 Thread N Keywal

Hi,

There could be multiple issues, but it's strange to have in hbase-site.xml

  valuehdfs://namenode:9000/hbase/value

while the core-site.xml says:

valuehdfs://namenode:54310//value

The two entries should match.

I would recommend to:
- use netstat to check the ports (netstat -l)
- do the check recommended by Harsh J previously.

N.


On Mon, May 14, 2012 at 3:21 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote:


 pleas hel

 From: dalia.mohso...@hotmail.com
 To: user@hbase.apache.org
 Subject: RE: Important Undefined Error
 Date: Mon, 14 May 2012 12:20:18 +0200



 Hi,
 I tried what you told me, but nothing worked:(((
 First when I run this command:dalia@namenode:~$ host -v -t A 
 `hostname`Output:Trying namenodeHost namenode not found: 
 3(NXDOMAIN)Received 101 bytes from 10.0.2.1#53 in 13 ms My 
 core-site.xml:configurationproperty        namefs.default.name/name  
       !--valuehdfs://namenode:8020/value--        
 valuehdfs://namenode:54310//value/property/configuration
 My 
 hdfs-site.xmlconfigurationpropertynamedfs.name.dir/namevalue/data/1/dfs/nn,/nfsmount/dfs/nn/value/property!--propertynamedfs.data.dir/namevalue/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn/value/property--propertynamedfs.datanode.max.xcievers/namevalue4096/value/propertypropertynamedfs.replication/namevalue3/value/propertyproperty
  namedfs.permissions.superusergroup/name valuehadoop/value/property
 My 
 Mapred-site.xmlconfigurationnamemapred.local.dir/namevalue/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local/value/configuration
 My 
 Hbase-site.xmlconfigurationpropertynamehbase.cluster.distributed/name
   valuetrue/value/propertyproperty  namehbase.rootdir/name     
 valuehdfs://namenode:9000/hbase/value/propertypropertynamehbase.zookeeper.quorun/name
  
 valuenamenode/value/propertypropertynamehbase.regionserver.port/namevalue60020/valuedescriptionThe
  host and port that the HBase master runs 
 at./description/propertypropertynamedfs.replication/namevalue1/value/propertypropertynamehbase.zookeeper.property.clientPort/namevalue2181/valuedescriptionProperty
  from ZooKeeper's config zoo.cfg.The port at which the clients will 
 connect./description/property/configuration
 Please Help I am really disappointed I have been through all that for two 
 weeks 



  From: dwivedishash...@gmail.com
  To: user@hbase.apache.org
  Subject: RE: Important Undefined Error
  Date: Sat, 12 May 2012 23:31:49 +0530
 
  The problem is your hbase is not able to connect to Hadoop, can you put 
  your
  hbase-site.xml content  here.. have you specified localhost somewhere, if
  so remove localhost from everywhere and put your hdfsl namenode address
  suppose your namenode is running on master:9000 then put your hbase file
  system setting as master:9000/hbase here I am sending you the configuration
  which I am using in hbase and is working
 
 
  My hbase-site.xml content is
 
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
  !--
  /**
   * Copyright 2010 The Apache Software Foundation
   *
   * Licensed to the Apache Software Foundation (ASF) under one
   * or more contributor license agreements.  See the NOTICE file
   * distributed with this work for additional information
   * regarding copyright ownership.  The ASF licenses this file
   * to you under the Apache License, Version 2.0 (the
   * License); you may not use this file except in compliance
   * with the License.  You may obtain a copy of the License at
   *
   *     http://www.apache.org/licenses/LICENSE-2.0
   *
   * Unless required by applicable law or agreed to in writing, software
   * distributed under the License is distributed on an AS IS BASIS,
   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   * See the License for the specific language governing permissions and
   * limitations under the License.
   */
  --
  configuration
  property
  namehbase.rootdir/name
  valuehdfs://master:9000/hbase/value
  /property
  property
  namehbase.master/name
  valuemaster:6/value
  descriptionThe host and port that the HBase master runs at./description
  /property
  property
  namehbase.regionserver.port/name
  value60020/value
  descriptionThe host and port that the HBase master runs at./description
  /property
  !--property
  namehbase.master.port/name
  value6/value
  descriptionThe host and port that the HBase master runs at./description
  /property--
  property
  namehbase.cluster.distributed/name
  valuetrue/value
  /property
  property
  namehbase.tmp.dir/name
  value/home/shashwat/Hadoop/hbase-0.90.4/temp/value
  /property
  property
  namehbase.zookeeper.quorum/name
  valuemaster/value
  /property
  property
  namedfs.replication/name
  value1/value
  /property
  property
  namehbase.zookeeper.property.clientPort/name
  value2181/value
  descriptionProperty from ZooKeeper's config zoo.cfg.
  The port at which the clients will connect.
  /description
  /property

Re: Important Undefined Error

2012-05-14 Thread N Keywal

In core-file.xml, do you have this?

configuration
property
namefs.default.name/name
 valuehdfs://namenode:8020/hbase/value
/property

If you want hbase to connect to 8020 you must have hdfs listening on
8020 as well.


On Mon, May 14, 2012 at 5:17 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote:
 H

 I have tried to make both ports the same.
 But the prob is the hbase cannot connect to port 8020.
 When i run nmap hostname, port 8020 wasnt with the list of open ports.
 I have tried what harsh told me abt.
 I used the same port he used but same error occurred.
 Another aspect in cloudera doc it says that i have to canonical name for the 
 host ex: namenode.example.com as the hostname, but i didnt find it in any 
 tutorial. No one makes it.
 Note that i am deploying my cluster in fully distributed mode i.e am using 4 
 machines..

 So any ideas??!!

 Sent from my iPhone

 On 2012-05-14, at 4:07 PM, N Keywal nkey...@gmail.com wrote:

 Hi,

 There could be multiple issues, but it's strange to have in hbase-site.xml

  valuehdfs://namenode:9000/hbase/value

 while the core-site.xml says:

 valuehdfs://namenode:54310//value

 The two entries should match.

 I would recommend to:
 - use netstat to check the ports (netstat -l)
 - do the check recommended by Harsh J previously.

 N.


 On Mon, May 14, 2012 at 3:21 PM, Dalia Sobhy dalia.mohso...@hotmail.com 
 wrote:


 pleas hel

 From: dalia.mohso...@hotmail.com
 To: user@hbase.apache.org
 Subject: RE: Important Undefined Error
 Date: Mon, 14 May 2012 12:20:18 +0200



 Hi,
 I tried what you told me, but nothing worked:(((
 First when I run this command:dalia@namenode:~$ host -v -t A 
 `hostname`Output:Trying namenodeHost namenode not found: 
 3(NXDOMAIN)Received 101 bytes from 10.0.2.1#53 in 13 ms My 
 core-site.xml:configurationproperty        
 namefs.default.name/name        
 !--valuehdfs://namenode:8020/value--        
 valuehdfs://namenode:54310//value/property/configuration
 My 
 hdfs-site.xmlconfigurationpropertynamedfs.name.dir/namevalue/data/1/dfs/nn,/nfsmount/dfs/nn/value/property!--propertynamedfs.data.dir/namevalue/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn/value/property--propertynamedfs.datanode.max.xcievers/namevalue4096/value/propertypropertynamedfs.replication/namevalue3/value/propertyproperty
  namedfs.permissions.superusergroup/name 
 valuehadoop/value/property
 My 
 Mapred-site.xmlconfigurationnamemapred.local.dir/namevalue/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local/value/configuration
 My 
 Hbase-site.xmlconfigurationpropertynamehbase.cluster.distributed/name
   valuetrue/value/propertyproperty  namehbase.rootdir/name     
 valuehdfs://namenode:9000/hbase/value/propertypropertynamehbase.zookeeper.quorun/name
  
 valuenamenode/value/propertypropertynamehbase.regionserver.port/namevalue60020/valuedescriptionThe
  host and port that the HBase master runs 
 at./description/propertypropertynamedfs.replication/namevalue1/value/propertypropertynamehbase.zookeeper.property.clientPort/namevalue2181/valuedescriptionProperty
  from ZooKeeper's config zoo.cfg.The port at which the clients will 
 connect./description/property/configuration
 Please Help I am really disappointed I have been through all that for two 
 weeks 



 From: dwivedishash...@gmail.com
 To: user@hbase.apache.org
 Subject: RE: Important Undefined Error
 Date: Sat, 12 May 2012 23:31:49 +0530

 The problem is your hbase is not able to connect to Hadoop, can you put 
 your
 hbase-site.xml content  here.. have you specified localhost somewhere, 
 if
 so remove localhost from everywhere and put your hdfsl namenode address
 suppose your namenode is running on master:9000 then put your hbase file
 system setting as master:9000/hbase here I am sending you the 
 configuration
 which I am using in hbase and is working


 My hbase-site.xml content is

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 !--
 /**
  * Copyright 2010 The Apache Software Foundation
  *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * License); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an AS IS BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
 implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 --
 configuration
 property
 namehbase.rootdir/name
 valuehdfs://master:9000/hbase/value
 /property
 property
 namehbase.master/name
 valuemaster

Re: RegionServer silently stops (only issue: CMS-concurrent-mark ~80sec)

2012-05-01 Thread N Keywal

Hi Alex,

On the same idea, note that hbase is launched with
-XX:OnOutOfMemoryError=kill -9 %p.

N.

On Tue, May 1, 2012 at 10:41 AM, Igal Shilman ig...@wix.com wrote:

 Hi Alex, just to rule out, oom killer,
 Try this:

 http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer


 On Mon, Apr 30, 2012 at 10:48 PM, Alex Baranau alex.barano...@gmail.com
 wrote:

  Hello,
 
  During recent weeks I constantly see some RSs *silently* dying on our
 HBase
  cluster. By silently I mean that process stops, but no errors in logs
  [1].
 
  The only thing I can relate to it is long CMS-concurrent-mark: almost 80
  seconds. But this should not cause issues as it is not a stop-the-world
  process.
 
  Any advice?
 
  HBase: hbase-0.90.4-cdh3u3
  Hadoop: 0.20.2-cdh3u3
 
  Thank you,
  Alex Baranau
 
  [1]
 
  last lines from RS log (no errors before too, and nothing written in
 *.out
  file):
 
  2012-04-30 18:52:11,806 DEBUG
  org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
  requested for agg-sa-1.3,0011|
 
 
 te|dtc|\x00\x00\x00\x00\x00\x00\x1E\x002\x00\x00\x00\x015\x9C_n\x00\x00\x00\x00\x00\x00\x00\x00\x00,1334852280902.4285f9339b520ee617c087c0fd0dbf65.
  because regionserver60020.cacheFlusher; priority=-1, compaction queue
  size=0
  2012-04-30 18:54:58,779 DEBUG
  org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using new
  createWriter -- HADOOP-6840
  2012-04-30 18:54:58,779 DEBUG
  org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
 
 
 Path=hdfs://xxx.ec2.internal/hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651,
  syncFs=true, hflush=false
  2012-04-30 18:54:58,874 INFO
 org.apache.hadoop.hbase.regionserver.wal.HLog:
  Roll
 
 
 /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335811856672,
  entries=73789, filesize=63773934. New hlog
 
 
 /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651
  2012-04-30 18:56:31,867 INFO
  org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke
 up
  with memory above low water.
  2012-04-30 18:56:31,867 INFO
  org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
  agg-sa-1.3,s_00I4|
 
 
 tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.
  due to global heap pressure
  2012-04-30 18:56:31,867 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegion:
  Started memstore flush for agg-sa-1.3,s_00I4|
 
 
 tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.,
  current region memstore size 138.1m
  2012-04-30 18:56:31,867 DEBUG
 org.apache.hadoop.hbase.regionserver.HRegion:
  Finished snapshotting, commencing flushing stores
  2012-04-30 18:56:56,303 DEBUG
  org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=322.84
 MB,
  free=476.34 MB, max=799.17 MB, blocks=5024, accesses=12189396,
 hits=127592,
  hitRatio=1.04%%, cachingAccesses=132480, cachingHits=126949,
  cachingHitsRatio=95.82%%, evictions=0, evicted=0, evictedPerRun=NaN
  2012-04-30 18:56:59,026 INFO org.apache.hadoop.hbase.regionserver.Store:
  Renaming flushed file at
 
 
 hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/.tmp/391890051647401997
  to
 
 
 hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168
  2012-04-30 18:56:59,034 INFO org.apache.hadoop.hbase.regionserver.Store:
  Added
 
 
 hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168,
  entries=476418, sequenceid=880198761, memsize=138.1m, filesize=5.7m
  2012-04-30 18:56:59,097 INFO
 org.apache.hadoop.hbase.regionserver.HRegion:
  Finished memstore flush of ~138.1m for region agg-sa-1.3,s_00I4|
 
 
 tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.
  in 27230ms, sequenceid=880198761, compaction requested=false
  ~
 
  [2]
 
  last lines from GC log:
 
  2012-04-30T18:58:46.683+: 105717.791: [GC 105717.791: [ParNew:
  35638K-1118K(38336K), 0.0548970 secs] 3145651K-3111412K(4091776K)
  icms_dc=6 , 0.0550360 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
  2012-04-30T18:58:46.961+: 105718.069: [GC 105718.069: [ParNew:
  35230K-2224K(38336K), 0.0802440 secs] 3145524K-3112533K(4091776K)
  icms_dc=6 , 0.0803810 secs] [Times: user=0.06 sys=0.00, real=0.13 secs]
  2012-04-30T18:58:47.114+: 105718.222: [CMS-concurrent-mark:
  8.770/80.230 secs] [Times: user=61.34 sys=5.69, real=80.23 secs]

Re: HBaseAdmin needs a close methord

2012-04-19 Thread N Keywal

Hi,

fwiw, the close method was added in HBaseAdmin for HBase 0.90.5.

N.

On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee softse@gmail.com wrote:

 I don't think this issue can resovle the problem
 ZKWatcher is removed,but the configuration and HConnectionImplementation
 objects are still in HConnectionManager
 this may still cause memery leak

 but calling HConnectionManager.**deleteConnection may resolve HBASE-5073
 problem.
 I can see

  if (this.zooKeeper != null) {
LOG.info(Closed zookeeper sessionid=0x +
  Long.toHexString(this.**zooKeeper.getZooKeeper().**
 getSessionId()));
this.zooKeeper.close();
this.zooKeeper = null;
  }

 in HConnectionImplementation.**close which is called by
 HConnectionManager.**deleteConnection




  Hi Lee

 Is HBASE-5073 resolved in that release?

 Regards
 Ram

  -Original Message-
 From: Eason Lee [mailto:softse@gmail.com]
 Sent: Thursday, April 19, 2012 10:40 AM
 To: user@hbase.apache.org
 Subject: Re: HBaseAdmin needs a close methord

 I am using cloudera's cdh3u3

 Hi Lee

 Which version of HBase are you using?

 Regards
 Ram

  -Original Message-
 From: Eason Lee [mailto:softse@gmail.com]
 Sent: Thursday, April 19, 2012 9:36 AM
 To: user@hbase.apache.org
 Subject: HBaseAdmin needs a close methord

 Resently, my app meets a problem list as follows

 Can't construct instance of class
 org/apache/hadoop/hbase/**client/HBaseAdmin
 Exception in thread Thread-2 java.lang.OutOfMemoryError: unable to
 create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.**java:640)
 at org.apache.zookeeper.**ClientCnxn.start(ClientCnxn.**java:414)
 at org.apache.zookeeper.**ZooKeeper.init(ZooKeeper.**java:378)
 at org.apache.hadoop.hbase.**zookeeper.ZKUtil.connect(**
 ZKUtil.java:97)
 at

  org.apache.hadoop.hbase.**zookeeper.ZooKeeperWatcher.**
 init(ZooKeeperWatc

 her.java:119)
 at

  org.apache.hadoop.hbase.**client.HConnectionManager$**
 HConnectionImplementa

 tion.getZooKeeperWatcher(**HConnectionManager.java:1002)
 at

  org.apache.hadoop.hbase.**client.HConnectionManager$**
 HConnectionImplementa

 tion.setupZookeeperTrackers(**HConnectionManager.java:304)
 at

  org.apache.hadoop.hbase.**client.HConnectionManager$**
 HConnectionImplementa

 tion.init(**HConnectionManager.java:295)
 at

  org.apache.hadoop.hbase.**client.HConnectionManager.**
 getConnection(HConnec

 tionManager.java:157)
 at

 org.apache.hadoop.hbase.**client.HBaseAdmin.init(**
 HBaseAdmin.java:90)

 Call to org.apache.hadoop.hbase.**HBaseAdmin::HBaseAdmin failed!

 My app create HBaseAdmin every 30s,and the threads used by my app
 increases about 1thread/30s.See from the stack, there is only one
 HBaseAdmin in Memory, but lots of Configuration and
 HConnectionImplementation instances.

 I can see from the sources, everytime when HBaseAdmin is created, a

 new

 Configuration and HConnectionImplementation is created and added to
 HConnectionManager.HBASE_**INSTANCES.Sohttp://HConnectionManager.HBASE_INSTANCES.Sothey
  are not collected by gc

 when

 HBaseAdmin is collected.

 So i think we need to add a close methord to remove the
 Configuration**HConnectionImplementation from
 HConnectionManager.HBASE_**INSTANCES.Just as follows:

 public void close(){
HConnectionManager.**deleteConnection(**getConfiguration(),
 true);
 }

Re: TIMERANGE performance on uniformly distributed keyspace

2012-04-14 Thread N Keywal

Hi,

For the filtering part, every HFile is associated to a set of meta data.
This meta data includes the timerange. So if there is no overlap between
the time range you want and the time range of the store, the HFile is
totally skipped.

This work is done in StoreScanner#selectScannersFrom

Cheers,

N.


On Sat, Apr 14, 2012 at 5:11 PM, Doug Meil doug.m...@explorysmedical.comwrote:

 Hi there-

 With respect to:

 * Does it need to hit every memstore and HFile to determine if there
 isdata available? And if so does it need to do a full scan of that file to
 determine the records qualifying to the timerange, since keys are stored
 lexicographically?

 And...

 Using scan 'table', {TIMERANGE = [t, t+x]} :
 See...


 http://hbase.apache.org/book.html#regions.arch
 8.7.5.4. KeyValue



 The timestamp is an attribute of the KeyValue, but unless you perform a
 restriction using start/stop row it have to process every row.

 Major compactions don't change this fact, they just change the number of
 HFiles that have to get processed.



 On 4/14/12 10:38 AM, Rob Verkuylen r...@verkuylen.net wrote:

 I'm trying to find a definitive answer to the question if scans on
 timerange alone will scale when you use uniformly distributed keys like
 UUIDs.
 
 Since the keys are randomly generated that would mean the keys will be
 spread out over all RegionServers, Regions and HFiles. In theory, assuming
 enough writes, that would mean that every HFile will contain the entire
 timerange of writes.
 
 Now before a major compaction, data is in the memstores and (non
 max.filesize) flushedmerged HFiles. I can imagine that a scan using a
 TIMERANGE can quickly serve from memstores and the smaller files, but how
 does it perform after a major compaction?
 
 Using scan 'table', {TIMERANGE = [t, t+x]} :
 * How does HBase handle this query in this case(UUIDs)?
 * Does it need to hit every memstore and HFile to determine if there is
 data available? And if so does it need to do a full scan of that file to
 determine the records qualifying to the timerange, since keys are stored
 lexicographically?
 
 I've run some tests on 300+ region tables, on month old data(so after
 major
 compaction) and performance/response seems fairly quick. But I'm trying to
 understand why that is, because hitting every HFile on every region seems
 to be ineffective. Lars' book figure 9-3 seems to indicate this as well,
 but cant seem to get the answer from the book or anywhere else.
 
 Thnx, Rob

Re: Zookeeper available but no active master location found

2012-04-13 Thread N Keywal

Hi,

Literally, it means that ZooKeeper is there but the hbase client can't find
the hbase master address in it.
By default, the node used is /hbase/master, and it contains the hostname
and port of the master.

You can check its content in ZK by doing a get /hbase/master in
bin/zkCli.sh (see
http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_ConnectingToZooKeeper
).

There should be a root cause for this, so it worths looking for other error
messages in the logs (master especially).

N.

On Fri, Apr 13, 2012 at 1:23 AM, Henri Pipe henri.p...@gmail.com wrote:

 client.HConnectionManager$HConnectionImplementation: ZooKeeper available
 but no active master location found

 Having a problem with master startup that I have not seen before.

 running the following packages:

 hadoop-hbase-0.90.4+49.137-1
 hadoop-0.20-secondarynamenode-0.20.2+923.197-1
 hadoop-hbase-thrift-0.90.4+49.137-1
 hadoop-zookeeper-3.3.4+19.3-1
 hadoop-0.20-datanode-0.20.2+923.197-1
 hadoop-0.20-namenode-0.20.2+923.197-1
 hadoop-0.20-tasktracker-0.20.2+923.197-1
 hadoop-hbase-regionserver-0.90.4+49.137-1
 hadoop-zookeeper-server-3.3.4+19.3-1
 hadoop-0.20-0.20.2+923.197-1
 hadoop-0.20-jobtracker-0.20.2+923.197-1
 hadoop-hbase-master-0.90.4+49.137-1
 [root@ip-10-251-27-130 logs]# java -version
 java version 1.6.0_31
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

 I start master and region server on another node.

 Master is initialized, but as soon as I try to check the master_status or
 do a zkdump via web interface, it blows up with:

 2012-04-12 19:16:10,453 INFO

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
 ZooKeeper available but no active master location found
 2012-04-12 19:16:10,453 INFO

 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
 getMaster attempt 10 of 10 failed; retrying after sleep of 16000

 I am running three zookeepers:

 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial
 # synchronization phase can take
 initLimit=10
 # The number of ticks that can pass between
 # sending a request and getting an acknowledgement
 syncLimit=5
 # the directory where the snapshot is stored.
 dataDir=/mnt/zookeeper
 # The maximum number of zookeeper client connections
 maxClientCnxns=2000
 # the port at which the clients will connect
 clientPort=2181
 server.1=10.251.27.130:2888:3888
 server.2=10.250.9.220:2888:3888
 server.3=10.251.110.50:2888:3888

 I can telnet to the zookeepers just fine.

 Here is my hbase-site.xml file:

 configuration
  property
namehbase.rootdir/name
valuehdfs://namenode:9000/hbase/value
  /property
  property
namehbase.cluster.distributed/name
valuetrue/value
  /property
 property
namehbase.zookeeper.quorum/name
value10.251.27.130,10.250.9.220,10.251.110.50/value
 /property
 property
namehbase.zookeeper.property.dataDir/name
value/hadoop/zookeeper/data/value
 /property
 property
namehbase.zookeeper.property.maxClientCnxns/name
value2000/value
finaltrue/final
 /property
 /configuration

 Any thoughts? Any help is greatly appreciated.

 Thanks

 Henri Pipe

Re: can hbase-0.90.2 work with zookeeper-3.3.4?

2012-04-05 Thread N Keywal

Hi,

It should. I haven't tested the .90, but I tested the hbase trunk a few
month ago vs. ZK 3.4.x and ZK 3.3.x and it was working.

N.

2012/4/5 lulynn_2008 lulynn_2...@163.com

  Hi,
 I found hbase-0.90.2 use zookeeper-3.4.2. Can this version hbase work with
 zookeeper-3.3.4?

 Thank you.

Re: Hbase RegionServer stalls on initialization

2012-03-28 Thread N Keywal

Then you should have an error in the master logs.
If not, it worths checking that the master  the region servers speak to
the same ZK...

As it's hbase related, I redirect the question to hbase user mailing list
(hadoop common is in bcc).

On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman 
nabib.elrah...@tubemogul.com wrote:

 The master is up. is it possible that zookeeper might not know about it?


  *Nabib El-Rahman *|  Senior Sofware Engineer

 *M:* 734.846.25 734.846.2529
 www.tubemogul.com | *twitter: @nabiber*

  http://www.tubemogul.com/
  http://www.tubemogul.com/

 On Mar 28, 2012, at 10:42 AM, N Keywal wrote:

 It must be waiting for the master. Have you launched the master?

 On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman 
 nabib.elrah...@tubemogul.com wrote:

 Hi Guys,

 I'm starting up an region server and it stalls on initialization.  I took
 a thread dump and found it hanging on this spot:

 regionserver60020 prio=10 tid=0x7fa90c5c4000 nid=0x4b50 in 
 Object.wait() [0x7fa9101b4000]

java.lang.Thread.State: TIMED_WAITING (on object monitor)

 at java.lang.Object.wait(Native Method)

 - waiting on 0xbc63b2b8 (a 
 org.apache.hadoop.hbase.MasterAddressTracker)


 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:122)


 - locked 0xbc63b2b8 (a 
 org.apache.hadoop.hbase.MasterAddressTracker)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:516)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:493)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initialize(HRegionServer.java:461)


 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:560)


 at java.lang.Thread.run(Thread.java:662)



 Any Idea on who or what its being blocked on?

  *Nabib El-Rahman *|  Senior Sofware Engineer

 *M:* 734.846.2529
 www.tubemogul.com | *twitter: @nabiber*

  http://www.tubemogul.com/
  http://www.tubemogul.com/

Re: HBase schema model question.

2012-03-20 Thread N Keywal

Hi,

Just a few... See http://hbase.apache.org/book.html#number.of.cfs

N.

On Tue, Mar 20, 2012 at 12:39 PM, Manish Bhoge
manishbh...@rocketmail.comwrote:

 Very basic question:
 How many column families possible in a table in Hbase? I know you can have
 thousand of columns in a family. But I don't know how many families can be
 possible. So far in example I haven't seen more than 1.
 Thanks
 Manish
 Sent from my BlackBerry, pls excuse typo

Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal

Hi,

The way you describe the in memory caching component, it looks very
similar to HBase memstore. Any reason for not relying on it?

N.

On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian 
christian.kleegr...@siemens.com wrote:

 Dear all,

 We are currently working on an architecture for a system that should be
 serve as an archive for 1000+ measuring components that frequently (~30/s)
 send messages containing measurement values (~300 bytes/message). The
 archiving system should be capable of not only serving as a long term
 storage but also as a kind of streaming data processing and caching
 component. There are several functions that should be computed on the
 incoming data before finally storing it.

 We suggested an architecture that comprises of:
 A message routing component that could route data to calculations and
 route calculation results to other components that are interested in these
 data.
 An in memory caching component that is used for storing up to 10 - 20
 minutes of data before it is written to the long term archive.
 An hBase database that is used for the long term storage.
 MapReduce framework for doing analytics on the data stored in the hBase
 database.

 The complete system should be failsafe and reliable regarding component
 failures and it should scale with the number of computers that are utilized.

 Are there any suggestions or feedback to this approach from the community?
 and are there any suggestions which tools or systems to use for the message
 routing component and the in memory cache.

 Thanks for any help and suggestions

 all the best

 Christian


 8---

 Siemens AG
 Corporate Technology
 Corporate Research and Technologies
 CT T DE IT3
 Otto-Hahn-Ring 6
 81739 Munich, Germany
 Tel.: +49 89 636-42722
 Fax: +49 89 636-41423
 mailto:christian.kleegr...@siemens.com

 Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
 Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
 Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
 Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
 Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
 Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684;
 WEEE-Reg.-No. DE 23691322

Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal

Hi Christian,

It's a component internal to HBase, so you don't have to use it directly.
See http://hbase.apache.org/book/wal.html on how writes are handled by
HBase to ensure reliability  data distribution...

Cheers,

N.

On Fri, Mar 16, 2012 at 7:39 PM, Kleegrewe, Christian 
christian.kleegr...@siemens.com wrote:

 Hi

 Is this memstore replicated? Since we store a significant amount of data
 in the memory cache we need a replicated solution. Also I can't find lots
 of information besides a java api doc for the MemStore class. I will
 continue searching for this, but if you have any URL with more
 documentation please send it. Thanks in advance

 regards

 Christian


 8--
 Siemens AG
 Corporate Technology
 Corporate Research and Technologies
 CT T DE IT3
 Otto-Hahn-Ring 6
 81739 München, Deutschland
 Tel.: +49 89 636-42722
 Fax: +49 89 636-41423
 mailto:christian.kleegr...@siemens.com

 Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard
 Cromme; Vorstand: Peter Löscher, Vorsitzender; Roland Busch, Brigitte
 Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt,
 Siegfried Russwurm, Peter Y. Solmssen, Michael Süß; Sitz der Gesellschaft:
 Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg,
 HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


 -Ursprüngliche Nachricht-
 Von: N Keywal [mailto:nkey...@gmail.com]
 Gesendet: Freitag, 16. März 2012 18:02
 An: user@hbase.apache.org
 Betreff: Re: Streaming data processing and hBase

 Hi,

 The way you describe the in memory caching component, it looks very
 similar to HBase memstore. Any reason for not relying on it?

 N.

 On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian 
 christian.kleegr...@siemens.com wrote:

  Dear all,
 
  We are currently working on an architecture for a system that should be
  serve as an archive for 1000+ measuring components that frequently
 (~30/s)
  send messages containing measurement values (~300 bytes/message). The
  archiving system should be capable of not only serving as a long term
  storage but also as a kind of streaming data processing and caching
  component. There are several functions that should be computed on the
  incoming data before finally storing it.
 
  We suggested an architecture that comprises of:
  A message routing component that could route data to calculations and
  route calculation results to other components that are interested in
 these
  data.
  An in memory caching component that is used for storing up to 10 - 20
  minutes of data before it is written to the long term archive.
  An hBase database that is used for the long term storage.
  MapReduce framework for doing analytics on the data stored in the hBase
  database.
 
  The complete system should be failsafe and reliable regarding component
  failures and it should scale with the number of computers that are
 utilized.
 
  Are there any suggestions or feedback to this approach from the
 community?
  and are there any suggestions which tools or systems to use for the
 message
  routing component and the in memory cache.
 
  Thanks for any help and suggestions
 
  all the best
 
  Christian
 
 
 
 8---
 
  Siemens AG
  Corporate Technology
  Corporate Research and Technologies
  CT T DE IT3
  Otto-Hahn-Ring 6
  81739 Munich, Germany
  Tel.: +49 89 636-42722
  Fax: +49 89 636-41423
  mailto:christian.kleegr...@siemens.com
 
  Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
  Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
  Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
  Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
  Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
  Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB
 6684;
  WEEE-Reg.-No. DE 23691322

RE: Java Programming and Hbase

2012-03-12 Thread N Keywal

You will need the hadoop jar for this. Hbase uses hadoop for common stuff
like the configuration you've seen, so even a simple client needs it.

N.
Le 12 mars 2012 12:06, Mahdi Negahi negahi.ma...@hotmail.com a écrit :


 Is it necessary to install hadoop for hbase, if  want use Hbase in my
 laptop and use it via Java ?

  Date: Mon, 12 Mar 2012 10:43:44 +0100
  Subject: Re: Java Programming and Hbase
  From: khi...@googlemail.com
  To: user@hbase.apache.org
 
  you also need to import hadoop.jar, since hbase runs on hahoop
 
 
 
  On Mon, Mar 12, 2012 at 9:45 AM, Mahdi Negahi negahi.ma...@hotmail.com
 wrote:
 
  
   Dear Friends
  
  
   I try to write a simple application with Java and manipulate my Hbase
   table. so I read this post and try to follow it.
  
   http://hbase.apache.org/docs/current/api/index.html
  
   I use eclipse and add hbase-092.0.jar as external jar file for my
 project.
   but i have problem in the first line of guideline. the following code
 line
   Configuration config = HBaseConfiguration.create();
  
   has a following error
  
   The type org.apache.hadoop.conf.Configuration cannot be resolved. It is
   indirectly referenced from required .class files
  
   and Configuration's package that eclipse want to add to my project is
  
   import javax.security.auth.login.Configuration;
  
   i think it is not an appropriate package.
  
   please advice me and refer me to new guideline.

Re: Java Programming and Hbase

2012-03-12 Thread N Keywal

only jar files. They are already in the hbase distrib (i.e. if you download
hbase, you get the hadoop jar files you need). You just need to import them
in your IDE.


On Mon, Mar 12, 2012 at 1:05 PM, Mahdi Negahi negahi.ma...@hotmail.comwrote:


 I so confused. I must install Hadoop or use only jar files ?

  Date: Mon, 12 Mar 2012 12:46:09 +0100
  Subject: RE: Java Programming and Hbase
  From: nkey...@gmail.com
  To: user@hbase.apache.org
 
  You will need the hadoop jar for this. Hbase uses hadoop for common stuff
  like the configuration you've seen, so even a simple client needs it.
 
  N.
  Le 12 mars 2012 12:06, Mahdi Negahi negahi.ma...@hotmail.com a
 écrit :
 
  
   Is it necessary to install hadoop for hbase, if  want use Hbase in my
   laptop and use it via Java ?
  
Date: Mon, 12 Mar 2012 10:43:44 +0100
Subject: Re: Java Programming and Hbase
From: khi...@googlemail.com
To: user@hbase.apache.org
   
you also need to import hadoop.jar, since hbase runs on hahoop
   
   
   
On Mon, Mar 12, 2012 at 9:45 AM, Mahdi Negahi 
 negahi.ma...@hotmail.com
   wrote:
   

 Dear Friends


 I try to write a simple application with Java and manipulate my
 Hbase
 table. so I read this post and try to follow it.

 http://hbase.apache.org/docs/current/api/index.html

 I use eclipse and add hbase-092.0.jar as external jar file for my
   project.
 but i have problem in the first line of guideline. the following
 code
   line
 Configuration config = HBaseConfiguration.create();

 has a following error

 The type org.apache.hadoop.conf.Configuration cannot be resolved.
 It is
 indirectly referenced from required .class files

 and Configuration's package that eclipse want to add to my project
 is

 import javax.security.auth.login.Configuration;

 i think it is not an appropriate package.

 please advice me and refer me to new guideline.

Re: Retrieve Column Family and Column with Java API

2012-03-12 Thread N Keywal

Hi,

Yes and no.
No, because as a table can have millions of columns and these columns can
be different for every row, the only way to get all the columns is to scan
the whole table.
Yes, because if you scan the table you can have the columns names. See
Result#getMap: it's organized by family -- qualifier -- version -- value
And yes, because you can get the column families from the HTableDescriptor.

N.

On Mon, Mar 12, 2012 at 3:10 PM, Mahdi Negahi negahi.ma...@hotmail.comwrote:


 Dear All friends

 Is there any way to retrieve a table's column families and columns with
 Java.

 for example, i want to scan a table that i know only its name.

Re: HBase-0.92.0 removed HBaseClusterTestCase, is there any replacement for this class

2012-03-07 Thread N Keywal

Hi,

It's replaced by HBaseTestingUtility.

Cheers,

N.

2012/3/8 lulynn_2008 lulynn_2...@163.com

  Hi All,
 I am integrating flume-0.9.4 with hbase-0.92.0. And I find hbase-0.92.0
 removed HBaseClusterTestCase which is used in flume-0.9.4.
 My question is:
 Is there any replacement for HBaseClusterTestCase?

 Thank you.

Re: 0.92 in mvn repository somewhere?

2012-02-15 Thread N Keywal

You cannot use the option -D*skipTests* ?

On Wed, Feb 15, 2012 at 5:27 PM, Stack st...@duboce.net wrote:

 On Tue, Feb 14, 2012 at 11:18 PM, Ulrich Staudinger
 ustaudin...@activequant.com wrote:
  Hi St.Ack,
 
  i don't wanna be a pain in the back, but any progress on this?
 

 You are not being a pain.

 I'm fumbling the mvn publishing, repeatedly.  Its a little
 embarrassing which is why I'm not talking to much about it (smile).

 To publish to maven, we need to build ~3 (perhaps 4 times).  Each
 build takes ~two hours.  They can fail on an odd flakey test.  Also,
 maven release can fail w/ an error code 1 and thats all she wrote so I
 try a few things to try and get over the error code 1.. it doesn't
 always happen (then I restart the two hour build).  I'm doing this
 task in background so I forget about it from time to time (until you
 email me above).

 I promise to doc all I do to get it up there this time.  I half did it
 last time: http://hbase.apache.org/book.html#mvn_repo  Also, our build
 gets more sane in next versions taking 1/4 time.

 Sorry its taking so long,
 St.Ack

Re: Is it possible to connect HBase remotely?

2012-02-08 Thread N Keywal

Hi,

The client needs to connect to zookeeper as well. You haven't set the
parameters for zookeeper, so it goes with the default settings
(localhost/2181), hence the error you're seeing. Set the zookeeper
connection property in the client, it should work.

This should do it:
conf .set(hbase.zookeeper.quorum, 192.168.2.122);
conf .set(hbase.zookeeper.property.clientPort, 2181);

Cheers,

N.

On Wed, Feb 8, 2012 at 3:26 PM, shashwat shriparv dwivedishash...@gmail.com
 wrote:

 I have two machine on same network IPs are like *192.168.2.122* and *
 192.168.2.133*, suppose hbase (stand alone mode) running on *192.168.2.122,
 *and i have eclipse or netbeans running on *192.168.2.133,* so i need to
 retrieve and put data to hbase running on other ip, till now what i have
 tried is creating a configuration for hbase inside my code like :

 Configuration conf = HBaseConfiguration.create();
 conf.set(hbase.master, *192.168.2.122:9000*);
 HTable hTable = new HTable(conf, table);

 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
 12/02/08 19:44:28 INFO zookeeper.ClientCnxn: Opening socket connection to
 server* localhost/127.0.0.1:2181*
 12/02/08 19:44:28 WARN zookeeper.ClientCnxn: Session 0x1355d44ae6f0003 for
 server null, unexpected error, closing socket connection and attempting
 reconnect

 I a not able to understand why its trying to go to *localhost/
 127.0.0.1:2181
 .*
 *
 *
 My host file configuration is follows :


 ==
 127.0.0.1 localhost
 127.0.0.1 ubuntu.ubuntu-domain ubuntu
 192.168.2.126 ubuntu
 192.168.2.125   ubuntu1
 192.168.2.106   ubuntu2
 192.168.2.56   ubuntu3

 # The following lines are desirable for IPv6 capable hosts
 ::1 ip6-localhost ip6-loopback
 fe00::0 ip6-localnet
 ff00::0 ip6-mcastprefix
 ff02::1 ip6-allnodes
 ff02::2 ip6-allrouters

 ==
 I am able to telnet to localhost:9000, 127.0.0.1:9000, myhostname:9000,
 but
 if i am trying to connect to my ip which is 1982.168.2.125 its not
 connecting : its saying connection reffused.

 What method should follow to achieve this(connect to HBase running on
 another pc on the same network). any tutorial link will be appreciated.

zookeeper 3.3/3.4 on hbase trunk

2012-02-07 Thread N Keywal

Hi,

FYI. I've been doing some tests mixing zookeeperclient/server versions
onhbase trunk, by executing medium category unit tests with a standalone
zookeeper server (Mixing versions 3.3  3.4 is officially supported by
Zookeeper, but was worth checking)

I tested:
Zookeeper Server server 3.3.4 and 3.4.2
Zookeeper Client API 3.3.4 and 3.4.2 (with some changes in hbase to make it
build with 3.3 API).

Meaning;
Client 3.4.2 -- Server 3.4.2
Client 3.3.4 -- Server 3.4.2
Client 3.3.4 -- Server 3.3.4
Client 3.4.2 -- Server 3.3.4

Conclusion:
- It works, except of course if you're activating secure login (the related
unit tests will hang).
- I had a strange random error with the 3.3.4 server (whatever the client
version), but it seems to be linked only to the start/stop phase (zookeeper
server surviving to a stop request).
- It's difficult from the client to know what's the zookeeper server
version. A zookeeper jira was created for this (ZOOKEEPER-1381)
- if you use a 3.4.2 feature like multi on a 3.3 server, it hangs: once
again, it's up to the developer/administrator to make sure he's not using
something specific to the 3.4 server, hence the jira 1381 if we want stuff
like warnings or implementations optimized for a given server.

Cheers,

N.

Re: sequence number

2012-01-31 Thread N Keywal

Hi,

Yes, each cell is associated to a long. By default it's a timestamps, but
you can set it yourself when you create the put.
It's stored everywhere.

You've got a lot of information and links on this in the hbase book (
http://hbase.apache.org/book.html#versions)

Cheers,

N.

On Mon, Jan 30, 2012 at 9:38 PM, Noureddine BOUYAHIAOUI 
nour.bouyahia...@free.fr wrote:

 Hi,

 In my reads about HBase, I understand that, The HRegionServer (n time
 HRegion) use sequence number ( AtomicLong ) to version each key/value
 stored in WAL.

 Please can you give me some details about this notion, for example how
 HRegionServer create his sequence number, and why we use it. Is't
 considered as version identifier?

 Best regards.

 Noureddine Bouyahiaoui

Re: How to implement tests for python based application using Hbase-thrift interface

2012-01-30 Thread N Keywal

Hi Damien,

Can't say for the Python stuff.
You can reuse or extract what you need in HBaseTestingUtility from the
hbase test package, this will allow you to start a full Hbase mini cluster
in a few lines of Java code.

Cheers,

N.


On Mon, Jan 30, 2012 at 11:10 AM, Damien Hardy dha...@figarocms.fr wrote:

 Hello,

 I wrote some code in python using Hbase as image storage.
 I want my code to be tested independently of some external Hbase full
 architecture so my question is :
 Is there some howto helping on instantiate a temporary local
 minicluster + thrift interface in order to pass python (or maybe other
 language) hbase-thrift based tests easily.

 Cheers,

 --
 Damien

Re: hbase heap size beyond 16G ?

2011-11-08 Thread N Keywal

If your're interested, some good slides on GC (slide 45 and after):
http://www.azulsystems.com/sites/www.azulsystems.com/SpringOne2011_UnderstandingGC.pdf

On Tue, Nov 8, 2011 at 11:25 PM, Mikael Sitruk mikael.sit...@gmail.comwrote:

 Concurrent GC (a.k.a CMS) does not mean that there is no more pause. The
 pauses are reduced to minimum but can still happen especially if the
 concurrent thread will not finish their work under high pressure. The G1
 collector in JDK 7.0 pretends to be a better collector than CMS, but i
 presume tests will need to be done to validate this.
 BTW the CMS collector is the one that is recommented in the book.

 Mikael.S

 On Tue, Nov 8, 2011 at 11:57 PM, Sujee Maniyam su...@sujee.net wrote:

  HI All
  the HBase book by Lars warns it is not recommended to set heap size above
  16G, because of 'stop the world' GC.
 
  Does this still apply?  Specially with  'concurrentGC' ?
 
  thanks
  Sujee
  http://sujee.net
 



 --
 Mikael.S

71 matches

Mail list logo